Chinese word segmentation method based on hash table dictionary structure

A Chinese word segmentation and hash table technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as language information organization, achieve the effects of improving efficiency, improving matching efficiency, and increasing comparison speed
CN103646018AActive Publication Date: 2014-03-19DALIAN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
DALIAN UNIV
Publication Date
2014-03-19

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
Patent Text Reader

Abstract

The invention discloses a Chinese word segmentation method based on a hash table dictionary structure. The Chinese word segmentation method comprises the following steps: A, performing pretreatment to a to-be-treated document; B, performing positive maximum matching scanning segmentation and negative maximum matching scanning segmentation to each treatment block; C, comparing the results of two scanning of each treatment block, if the segmentation results of the two scanning are the same, outputting a positive segmentation result, if the segmentation results are different, respectively calculating the segmentation numbers S, the separate word dictionary word numbers D, the non-dictionary word numbers N and the maximum word lengths L of the positive maximum scanning result and the negative maximum scanning result respectively; D, comparing and analyzing data produced in step 3 in combination with the method and then outputting a right result. The Chinese word segmentation method has the benefits that the matching efficiency in the segmentation process is improved, the comparison rate after positive and negative scanning can be improved, and the efficiency of two-way maximum matching algorithm is improved fundamentally.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the technical field of Chinese information processing, in particular to a Chinese word segmentation method based on a hash table dictionary structure. Background technique

[0002] Chinese word segmentation is the most basic and important issue in Chinese information processing. It is a key step in the automatic annotation of Chinese text, search engines, machine translation, speech recognition, etc. The quality of word segmentation directly affects the accuracy of the results. Chinese and English word segmentation are different. There is no formal delimiter between Chinese words and words, and the continuous Chinese character sequence can only be recombined according to certain Chinese norms. However, the complexity and variability of Chinese sentence composition make Chinese word segmentation has always been a difficult point in Chinese information processing. The discovery of unregistered words and the resolution of ambigui...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More