Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for Chinese word segmentation

A word segmentation method and word segmentation technology, which are applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of complex grammar rules, reduced word segmentation efficiency, and impact on word segmentation accuracy, so as to improve speed and relevance. , the effect of improving the accuracy

Active Publication Date: 2018-04-17
彩讯科技股份有限公司
View PDF9 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since the complex word segmentation algorithm has a large amount of calculations and the grammatical rules adopted are relatively complex, it is easy to reduce the efficiency of word segmentation on the basis of ensuring the accuracy of word segmentation
However, if only one or two of the above word segmentation algorithms are used for word segmentation, although the calculation amount can be reduced and the operation speed can be improved, the contextual factors considered in the word segmentation process are not comprehensive.
Therefore, the accuracy of word segmentation is easily affected
Therefore, the word segmentation method provided by the existing technology is not compatible with the accuracy and rapidity of word segmentation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for Chinese word segmentation
  • Method and device for Chinese word segmentation
  • Method and device for Chinese word segmentation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0026] figure 1 It is a schematic flowchart of a Chinese word segmentation method provided by Embodiment 1 of the present invention. The method of this embodiment can be implemented by a Chinese word segmentation device, wherein the device can be implemented by software and / or hardware, and generally can be integrated into a search engine with a word segmentation function. Such as figure 1 As shown, the method may include:

[0027] S110: Obtain feature information of the text to be segmented.

[0028] Wherein, the text to be segmented may be a single sentence, a paragraph composed of multiple sentences, or an article composed of multiple paragraphs. For Chinese, there is at least one division criterion for preliminary division of the text, for example, the text can be divided according to at least one or more of control characters such as paragraph division, punctuation mark and space. The above-mentioned control characters can be used as feature information of the text, a...

Embodiment 2

[0058] figure 2 It is a schematic flowchart of a Chinese word segmentation method provided by Embodiment 2 of the present invention. This embodiment two has optimized above-mentioned embodiment, with reference to figure 2 , Embodiment 2 of the present invention specifically includes:

[0059] S210: Obtain feature information of the text to be segmented.

[0060] S220: Determine all natural intervals in the word text to be segmented according to the feature information.

[0061] S230: Divide the natural interval into ambiguous intervals and non-ambiguous intervals.

[0062] S240: Determine the candidate words in the ambiguous section, and judge whether the candidate words match the text in the non-ambiguous section, if yes, execute step S250; otherwise, execute step S270.

[0063] Exemplarily, after the candidate words in the ambiguous interval are determined, the candidate words can be sequentially stored in the ambiguous interval linked list, so as to provide convenienc...

Embodiment 3

[0081] image 3 It is a structural block diagram of a Chinese word segmentation device provided by Embodiment 3 of the present invention. The device can be realized by software and / or hardware, and generally can be integrated into a word segmentation system with a word segmentation function. Such as image 3 As shown, the device includes: a feature information acquisition module 310 , a natural interval determination module 320 , an interval division module 330 , a candidate word matching module 340 and a word segmentation processing module 350 .

[0082] Wherein, the feature information acquisition module 310 is used to acquire feature information of the text to be segmented; the natural interval determination module 320 is used to determine all natural intervals in the text to be segmented according to the feature information, wherein the feature information includes At least one of paragraph division, punctuation marks or blanks; interval division module 330, used to divi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a method and a device for Chinese word segmentation. The method comprises the steps of acquiring characteristic information of a to-be-segmented text, whereinthe characteristic information includes at least one of a paragraph division, a punctuation mark or a space character; determining all natural intervals in the to-be-segmented text according to the characteristic information; dividing the natural interval into an ambiguous interval and a non-ambiguous interval; determining a candidate word in the ambiguous interval, and matching the candidate word with the text in the non-ambiguous interval; determining a word segmentation rule of the candidate word according to a matched result, and performing word segmentation on the text in the ambiguous interval according to a word segmentation rule. By use of the above-mentioned technical scheme, the relevance between a word segmentation result and a to-be-segmented text context is effectively improved so that the word segmentation accuracy degree is enhanced. Compared with the scheme of word segmentation provided in the prior art, the technical scheme provided by the embodiment is small in calculated amount and can accelerate the rapidness of the word segmentation to a certain extent.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of word segmentation, and in particular to a Chinese word segmentation method and device. Background technique [0002] With the rapid development of the Internet, network applications tend to be diversified, and the amount of information on the Internet has increased dramatically. Among them, word segmentation is the basis for information processing and information retrieval, and all information processing and information retrieval work are performed after word segmentation. Therefore, word segmentation errors will be superimposed on the subsequent processing, and it is difficult to be eliminated. Because of this, the pursuit of word segmentation accuracy is a continuous process. [0003] Under normal circumstances, in English writing, spaces are used as natural delimiters between words. Words, sentences, and paragraphs in Chinese can also be demarcated simply by obvious delimite...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/289
Inventor 杨良志汪志新丁德平王向军
Owner 彩讯科技股份有限公司