Method and device for Chinese word segmentation
A word segmentation method and word segmentation technology, which are applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of complex grammar rules, reduced word segmentation efficiency, and impact on word segmentation accuracy, so as to improve speed and relevance. , the effect of improving the accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0026] figure 1 It is a schematic flowchart of a Chinese word segmentation method provided by Embodiment 1 of the present invention. The method of this embodiment can be implemented by a Chinese word segmentation device, wherein the device can be implemented by software and / or hardware, and generally can be integrated into a search engine with a word segmentation function. Such as figure 1 As shown, the method may include:
[0027] S110: Obtain feature information of the text to be segmented.
[0028] Wherein, the text to be segmented may be a single sentence, a paragraph composed of multiple sentences, or an article composed of multiple paragraphs. For Chinese, there is at least one division criterion for preliminary division of the text, for example, the text can be divided according to at least one or more of control characters such as paragraph division, punctuation mark and space. The above-mentioned control characters can be used as feature information of the text, a...
Embodiment 2
[0058] figure 2 It is a schematic flowchart of a Chinese word segmentation method provided by Embodiment 2 of the present invention. This embodiment two has optimized above-mentioned embodiment, with reference to figure 2 , Embodiment 2 of the present invention specifically includes:
[0059] S210: Obtain feature information of the text to be segmented.
[0060] S220: Determine all natural intervals in the word text to be segmented according to the feature information.
[0061] S230: Divide the natural interval into ambiguous intervals and non-ambiguous intervals.
[0062] S240: Determine the candidate words in the ambiguous section, and judge whether the candidate words match the text in the non-ambiguous section, if yes, execute step S250; otherwise, execute step S270.
[0063] Exemplarily, after the candidate words in the ambiguous interval are determined, the candidate words can be sequentially stored in the ambiguous interval linked list, so as to provide convenienc...
Embodiment 3
[0081] image 3 It is a structural block diagram of a Chinese word segmentation device provided by Embodiment 3 of the present invention. The device can be realized by software and / or hardware, and generally can be integrated into a word segmentation system with a word segmentation function. Such as image 3 As shown, the device includes: a feature information acquisition module 310 , a natural interval determination module 320 , an interval division module 330 , a candidate word matching module 340 and a word segmentation processing module 350 .
[0082] Wherein, the feature information acquisition module 310 is used to acquire feature information of the text to be segmented; the natural interval determination module 320 is used to determine all natural intervals in the text to be segmented according to the feature information, wherein the feature information includes At least one of paragraph division, punctuation marks or blanks; interval division module 330, used to divi...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


