Unlock instant, AI-driven research and patent intelligence for your innovation.

Backward word segmentation method and device based on Chinese retrieval

A word segmentation method and word segmentation technology, applied in the field of Chinese web page retrieval in search engines

Inactive Publication Date: 2014-01-29
JIANGSU XINRUIFENG INFORMATION TECH
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0017] The present invention aims at the problems existing in the single reverse maximum matching algorithm. In the search system, especially the vertical search system, the professional environment is fully utilized to establish a professional thesaurus in the machine dictionary. The maximum length is used to determine the value of MAX_Length, which solves the problem of blindly selecting the maximum length in the matching algorithm, and forms a reverse matching algorithm by combining the reverse maximum matching algorithm, which greatly improves the accuracy of retrieval

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Backward word segmentation method and device based on Chinese retrieval
  • Backward word segmentation method and device based on Chinese retrieval
  • Backward word segmentation method and device based on Chinese retrieval

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The following uses the word segmentation result of the sentence "emphasis on strengthening the management of natural gas transportation projects" as a specific example to illustrate the improved effect of the improved reverse maximum length matching in word segmentation. After word segmentation with the ordinary reverse maximum length matching algorithm, the result can be obtained as "focus / enhancement / natural gas / transportation / engineering / management". In the design of gas field surface engineering, "natural gas transportation engineering" itself, as a professional vocabulary, is the focus of petroleum industry research. If this word is segmented into "natural gas / transportation / engineering" to match separately, it will not be able to meet the purpose of the search results expected by users. Obviously, at this time, even using the ordinary reverse maximum length matching algorithm, the best word segmentation results cannot be made. Then adopt the "window" matching met...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a backward word segmentation method and device based on Chinese retrieval and relates to the field of processing of webpage character information in computer networks. According to the backward word segmentation method and device based on the Chinese retrieval, professional word banks are established in a robot dictionary, the value of the MAX_Length is determined firstly according to the maximum lengths of proper nouns in the word banks, a backward matching algorithm is formed through a backward maximum matching algorithm, and in order to solve the problems of word segmentation ambiguity and incomplete matching during backward matching, a maximum length matching algorithm is improved. According to the backward word segmentation method and device based on the Chinese retrieval, word segmentation is carried out on a Chinese character string which is S=C1C2C3C4...Cn through the device which is composed of a central processing unit, input-and-output equipment, a register, a mechanized dictionary, a window counter and a memorizer, accuracy segmentation of Chinese character strings can be achieved on the premise that the semantic of the Chinese character strings is not lost, a word segmentation result is quite accurate when a sentence is quite long, and searching accuracy can be improved. The backward word segmentation method and device based on the Chinese retrieval can be applied to an automatic abstracting and sorting system in the field of information retrieval.

Description

technical field [0001] The invention relates to the field of text information processing of webpages in computer networks, in particular to a method and device for retrieving Chinese webpages in search engines. Background technique [0002] With the continuous development of the Internet, the number of web pages has increased dramatically, and web pages have become the largest and most extensive source of information for people. A lot of useful information is submerged in the vast number of Web pages. Faced with massive information, people can no longer simply rely on manual processing of all the information. Text search is one of the important application technologies in the field of large-scale information processing, and it is also an important research direction in the field of information processing. With the in-depth study of text classification search technology, text search technology is more and more widely used in information technology. The word segmentation tec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/353
Inventor 刘迎春魏华峰方筠捷
Owner JIANGSU XINRUIFENG INFORMATION TECH