Unlock instant, AI-driven research and patent intelligence for your innovation.

Chinese word cutting method and device

A Chinese word segmentation and word segmentation technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of complex word segmentation processing and inconvenient realization, and achieve the effect of reducing complexity and simple word segmentation process.

Inactive Publication Date: 2007-11-14
TENCENT TECH (SHENZHEN) CO LTD
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0018] From the above analysis, it can be known that the existing shortest path word segmentation method needs to convert the original word segmentation set obtained from the dictionary into the form of a directed acyclic graph, and needs to call the algorithm for solving the shortest path, which makes the word segmentation process more complicated and inconvenient accomplish

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese word cutting method and device
  • Chinese word cutting method and device
  • Chinese word cutting method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051]The Chinese word segmentation method provided by the embodiment of the present invention is to select m words from a word segmentation set including n words, m≤n, and these m words form a complete sentence after connecting end to end without redundant characters. Therefore, the above word segmentation set usually refers to a more detailed word segmentation set with redundancy. In addition, each word in the word segmentation set can also be called a word segmentation.

[0052] Among them, the so-called relatively detailed word segmentation set with redundancy usually refers to the word segmentation set obtained by performing word segmentation processing on a certain sentence by using the full segmentation word segmentation method. The full segmentation method is to segment all possible words in the sentence that match the dictionary.

[0053] For example, in the sentence "what he said is indeed reasonable", the participle set obtained by using the full segmentation and s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Chinese word segmentation method. Including: the sub-term participant weights, the word referred to the sub-set of words in the sentence in accordance with the terms of the location of sorting. The words from the set the last word at the beginning, the current record-term value of the right word to the previous sentence at the end of the distance. As the current term to the end of the sentence at the distance, marking the first word segmentation and segmentation of the current relations between the stitching until the words are set in the first term to the end of the sentence at the distance. And the word referred to a sub-section with the words of the former sub-stitching; points from the word referred to a sub-set of words, referred to choose The shortest distance between the end of the sentence the word at the first sentence, the word referred to the first sentence of the first word of the sentence referred to the first word from the first sentence referred to the words,, in accordance with Mosaic, followed by access splicing relations marked by the first-term until the end of the sentence. The present invention also open a Chinese word segmentation devices. The above methods or devices reduces the complexity.

Description

technical field [0001] The invention relates to the field of Chinese information processing, in particular to a Chinese word segmentation method and device used in the field of Chinese information processing. Background technique [0002] For Chinese, the smallest, independently active, and meaningful language component is a word. A word is composed of a single or multiple characters. Generally, two-character words are the most used, followed by single-character words, and there are also some multi-character words. Words (such as idioms, proper nouns, etc.). However, Chinese uses characters as the basic unit of writing, and there is no such symbol as English spaces between words to identify word boundaries. Therefore, to segment each sentence in the Chinese text, that is, the machine automatically recognizes the words in the sentence Word boundary is the first problem to be solved in Chinese text analysis and processing. [0003] At present, commonly used word segmentation...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/28
Inventor 王启明
Owner TENCENT TECH (SHENZHEN) CO LTD