Unlock instant, AI-driven research and patent intelligence for your innovation.

Recursive and multilevel Chinese word segmentation method

A Chinese word segmentation, multi-level technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of large segmentation granularity, small segmentation granularity, long cycle, etc., to improve accuracy and eliminate ambiguity , to ensure the effect of segmentation granularity

Inactive Publication Date: 2015-02-18
SHANGHAI LAISEEK CO LTD +1
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The Chinese word segmentation method based on string matching has the advantages of high efficiency, flexible update and maintenance, can include type information, and has the advantages of certain ambiguity resolution ability, but it has poor recognition ability for unregistered words
Coarse-grained segmentation can eliminate ambiguity, but the segmentation granularity is large. Fine-grained segmentation does not have the function of eliminating ambiguity, but the segmentation granularity is small
The word segmentation method based on statistics is to learn the statistical information of words into words from the corpus, so as to discover some word formation rules. It has a good ability to recognize unregistered words, but the efficiency is low, the update and maintenance are troublesome, and the cycle is long. At the same time, word segmentation The granularity is biased towards fine-grained segmentation
In many current natural language processing applications and search engine applications, considering the dual requirements of segmentation granularity and word segmentation efficiency, the word segmentation method based on string matching with fine segmentation granularity is adopted, and the generated word segmentation contains multiple levels. But there is no disambiguation function

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Recursive and multilevel Chinese word segmentation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] Embodiments of the present invention will be specifically described below in conjunction with the accompanying drawings.

[0016] A recursive multi-level Chinese word segmentation method comprises the following steps:

[0017] Step 1, use the current dictionary tree to use the maximum matching algorithm to perform Chinese word segmentation on the input Chinese text, and generate the current word segmentation and the current word segmentation level;

[0018] Step 2, selectively masking the word segmentation generated in step 1 in the current dictionary tree;

[0019] Step 3, using the trie selectively masked in step 2 as the current trie;

[0020] Step 4, determine whether each Chinese word segmentation generated in the above step 1 has a non-single-character prefix word in the current dictionary tree, if there is a non-single-character prefix word in a word segment, then proceed to the above steps 1 to 3, if each If there is no non-single word prefix in the participle...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a recursive and multilevel Chinese word segmentation method, which comprises the following steps of: performing Chinese word segmentation on an input Chinese text by using a maximum matching algorithm, selectively shielding generated segmented words in a current dictionary tree, repeatedly performing Chinese word segmentation on the input Chinese text by using the shielded dictionary tree and the maximum matching algorithm, selectively shielding the generated segmented words till each generated segmented word does not have non-individual word prefix in the current dictionary tree, ending the word segmentation process, outputting the word segmentation result and recovering the dictionary tree before shielding. According to the recursive and multilevel Chinese word segmentation method, recursive and multilevel word segmentation is combined on the basis of a character string matched fine-grained Chinese word segmentation algorithm, so that the segmentation granularity is ensured, ambiguity is eliminated in word segmentation on each level, and the word segmentation accuracy is improved.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a recursive multi-level Chinese word segmentation method. Background technique [0002] Chinese word segmentation refers to dividing a sequence of Chinese characters into individual words, and word segmentation is the process of recombining continuous sequences of Chinese characters into word sequences according to certain specifications. The existing Chinese word segmentation algorithms can be roughly divided into: word segmentation methods based on string matching and word segmentation methods based on statistics. The word segmentation method based on string matching matches the Chinese character sequence with the entry in a "sufficiently large" dictionary. If a certain string is found in the dictionary, the match is successful, that is, a word is recognized. According to different scanning directions, string matching word segmentation methods can be divide...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 吕强陶导方强
Owner SHANGHAI LAISEEK CO LTD