Unlock instant, AI-driven research and patent intelligence for your innovation.

Non-structured formatted data searching method

A technology of format data and search method, which is applied in the field of data search, can solve problems such as irrelevant text, unknown, and far-reaching intentions, and achieve the effect of improving efficiency and accuracy

Inactive Publication Date: 2009-05-06
SHANGHAI SECOND POLYTECHNIC UNIVERSITY
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the amount of unstructured data on the Internet exceeds 85% of the total amount of data on the Internet, and there is a disadvantage of using this word segmentation method, that is, the search engine does not know what the content of these words is, but only knows the structured word segmentation (keyword)
When a user uses "buy a leathe

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Non-structured formatted data searching method
  • Non-structured formatted data searching method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] Said unstructured format data searching method of the present invention, step is:

[0017] 1) Collect different text types for classification, and establish corresponding pattern libraries according to different types;

[0018] 2), decompose the searched file into several keywords;

[0019] 3), carry out pattern matching with the keywords in the unknown article and the pattern library;

[0020] 4) When the matching degree reaches a certain value, the searched articles can be classified.

[0021] For example: there must be more phrases in an article describing a football game than in a paragraph describing financial news, such as: "shooting", "passing", "goal", "game", etc., and conversely, "stock market" , "economy", "banking" and other phrases appear in financial news more than football articles. If it is classified according to the type of text (sports, politics, finance, entertainment, etc.), the keywords with the above-mentioned characteristic categories in each ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a data searching method which is mainly applied to Internet content identification, Intranet text classification of a local area network inside a coproration, and the like. The non-structural format data searching method in the invention comprises the following steps: (1) collecting different text types for classification and setting up corresponding model bases according to different types, (2) splitting the searched files into a plurality of key words, (3) performing pattern matching on the key words in unknown texts and model bases and (4) classifying the searched texts when the matching is finished to certain extent. The method can identify different texts through the occurrence frequencies of words or phrases in the texts. When users input key words, the text contents which are in accordance with as well as in reference to the key words are returned, thereby the searching efficiency and accuracy of users are greatly improved.

Description

technical field [0001] The invention relates to a data search method, which is mainly used in Internet content identification, text classification of enterprise internal local area network and the like. Background technique [0002] Current web search engines, including Google, are all searching for data in structured formats. These search engines mainly divide documents into certain words (such as text content: "I am in Huaihai Road today". Divide it into "I", " Today", "Huaihai Road"), and store these participles as keywords on the server. Once the user enters a keyword such as "Huaihai Road", the above paragraph will be returned to the user as a search result. However, the amount of unstructured data on the Internet exceeds 85% of the total amount of data on the Internet, and there is a disadvantage of using this word segmentation method, that is, the search engine does not know what the content of these words is, but only knows the structured word segmentation (keyword...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 陈建
Owner SHANGHAI SECOND POLYTECHNIC UNIVERSITY