Chinese PINYIN quick word segmentation method based on word search tree

A word search tree and Chinese pinyin technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as low query performance, lost words, low efficiency, etc., to improve memory usage efficiency and ensure accuracy , the effect of improving search efficiency

Active Publication Date: 2013-01-09
康威通信技术股份有限公司
View PDF7 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is exactly in order to solve the above-mentioned problem, provides a kind of Chinese phonetic alphabet quick segmentation method based on word search tree, combines search tree and hash table, uses a kind of variation of hash tree to finish the word segmentation of Chinese phonetic alphabet fast , this word segmentation method not only avoids the problems of low query performance, low efficiency, and missing words, but also improves the search efficiency and realizes fast word segmentation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese PINYIN quick word segmentation method based on word search tree
  • Chinese PINYIN quick word segmentation method based on word search tree
  • Chinese PINYIN quick word segmentation method based on word search tree

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0041] Such as figure 1 As shown, first, a hash tree combining a lookup tree and a hash table is established based on the existing Chinese single-character pinyin table, and then a given string of continuous Chinese pinyin is segmented, and the analysis result is given, and finally the search tree is destroyed, and the release resources, reclaim memory.

[0042] Build a hash tree, and build a word lookup tree based on all known Chinese single-character pinyin tables. The root node of the search tree contains no characters, and each node except the root node contains only one character. All child nodes of each node of the search tree contain different characters. In addition to the leaf nodes of the search tree, each node has a hash table with a length of 26. The hash table is indexed in ascending order of 26 English letters. Each element stores a chi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese PINYIN quick word segmentation method based on a word search tree. The method is implemented by a computer or embedded mobile equipment and comprises the following working steps of: 1, building a Chinese character PINYIN search tree according to all the known Chinese character PINYIN lists; 2, combining the search tree with a hash table according to the built word search tree, and segmenting a string of given Chinese PINYINs; 3, working out a word segmentation result; and 4, destroying the search tree and releasing resources. Due to a public prefix of a character string, a construction space is saved, so that unnecessary character string comparison is greatly reduced; by the redundancy hash table with an index, the search efficiency is improved; and the time complexity of an algorithm is reduced to the minimum.

Description

technical field [0001] The invention belongs to the technical field of Chinese information processing of computers or various hand-held embedded mobile devices, and in particular relates to a Chinese pinyin rapid word segmentation method based on a word search tree. Background technique [0002] From a series of continuous Chinese pinyin, the computer software algorithm can automatically recognize each individual character's pinyin, which is a must-use technology for pinyin input methods and search engines (associating Chinese sentences based on pinyin-type keywords). Use all existing Chinese single-character pinyin as keywords, build a hash table, and perform word segmentation on a string of continuous Chinese pinyin by searching and matching multiple times from the established hash table during word segmentation, but this method is inefficient Not a high question. [0003] In order to improve efficiency, the above-mentioned hash table is improved as follows in the prior a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 于少飞袁美英杨震威
Owner 康威通信技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products