Forward word segmentation method and device based on Chinese retrieval
A word segmentation method and word segmentation technology, applied in the field of Chinese web page retrieval in search engines, can solve problems such as blindly selecting the maximum length
Inactive Publication Date: 2014-01-29
JIANGSU XINRUIFENG INFORMATION TECH
View PDF2 Cites 5 Cited by
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
The present invention aims at the problems existing in the single forward maximum matching algorithm. In the search system, especially the vertical search system, the professional environment is fully utilized to establish a professional thesaurus in the machine dictionary. First, according to the proper nouns in the thesaurus The maximum length to determine the value of MAX_Length solves the problem of blindly selecting the maximum length in the matching algorithm, and forms a forward matching algorithm by combining the forward maximum matching algorithm, which greatly improves the accuracy of retrieval
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View moreImage
Smart Image Click on the blue labels to locate them in the text.
Smart ImageViewing Examples
Examples
Experimental program
Comparison scheme
Effect test
Embodiment Construction
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More PUM
Login to View More Abstract
The invention discloses a forward word segmentation method and device based on Chinese retrieval and relates to the field of processing of webpage character information in computer networks. According to the forward word segmentation method and device based on the Chinese retrieval, word segmentation is carried out on a Chinese character string which is S=C1C2C3C4...Cn through the device which is composed of a central processing unit, input-and-output equipment, a register, a mechanized dictionary, a window counter and a memorizer. The forward word segmentation method and device based on the Chinese retrieval aim to solve the problems existing in an independent forward maximum matching algorithm, a professional environment is utilized fully in a searching system, particularly in a vertical searching system, professional word banks are established in a robot dictionary, the value of the MAX_Length is determined according to the maximum lengths of proper nouns in the word banks, the problem that the maximum lengths are selected blindly according to a matching algorithm is solved, a forward maximum matching algorithm is formed through the forward maximum matching algorithm, and then the retrieval accuracy is improved to a great extent.
Description
technical field The invention relates to the field of text information processing of webpages in computer networks, in particular to a method and device for retrieving Chinese webpages in search engines. Background technique With the continuous development of the Internet, the number of web pages has increased dramatically, and web pages have become the largest and most extensive source of information for people. A lot of useful information is submerged in the vast number of Web pages. Faced with massive information, people can no longer simply rely on manual processing of all the information. Text search is one of the important application technologies in the field of large-scale information processing, and it is also an important research direction in the field of information processing. With the in-depth study of text classification search technology, text search technology is more and more widely used in information technology. The word segmentation technology is the "...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More Application Information
Patent Timeline
Login to View More IPC IPC(8): G06F17/27G06F17/30
Inventor 刘迎春魏华峰方筠捷
Owner JIANGSU XINRUIFENG INFORMATION TECH



