Method for managing and searching dictionary with perfect even numbers group TRIE Tree

A double array and dictionary technology, which is applied in the field of perfect double array TRIE tree dictionary management and retrieval, can solve the problem of trie index tree space waste, and achieve the effect of improving space utilization and reducing sparsity.

Active Publication Date: 2006-06-14
灵玖中科软件(北京)有限公司
View PDF0 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the large number of word branches, the space waste of the Trie index tree is serious

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for managing and searching dictionary with perfect even numbers group TRIE Tree
  • Method for managing and searching dictionary with perfect even numbers group TRIE Tree
  • Method for managing and searching dictionary with perfect even numbers group TRIE Tree

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0068] figure 1 It is a schematic diagram of converting the Trie tree structure into a double array Trie tree structure.

[0069] figure 2 It is to represent all the words in the dictionary with a Trie tree structure.

[0070] image 3 It is the flow process of the perfect double array TRIE tree dictionary management and retrieval method of the present invention, its steps:

[0071] Step 1, take two bytes as a coding unit for the entry in the dictionary, count the frequency of occurrence of each coding unit in the dictionary, and give different sequence codes to different coding units according to the frequency; Coding unit, the smaller the serial code assigned, finally save all the coding units and corresponding serial codes in the serial code file;

[0072] Step 2, read in all the entries of the dictionary, represent them with a Trie tree, read in the sequence code file, process all Trie tree nodes in turn according to the strategy of prioritizing the nodes with more br...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the fields of natural language processing and information searching, especially a perfect-double array TRIE tree dictionary managing and searching method, converting the Trie tree structure into two linear arrays to express the Trie tree structure and advancing an optimizing policy and an self-adapting coding solution to automatically code characters using a byte as a coding unit, comprising the steps of: (1) expressing a dictionary by the Trie tree structure; (2) converting the Trie tree into two linear arrays; (3) according to the user input, using the two arrays to realize the dictionary searching. And its concrete steps comprise: 1. automatically coding the dictionary by using byte as a unit to generate a sequence code file; 2. expressing the dictionary with the Trie tree and using the sequence code file to convert the Trie tree into two linear arrays; 3. searching the user-submitted words in the two linear arrays.

Description

technical field [0001] The invention relates to the fields of natural language processing and information retrieval, in particular to a perfect double array TRIE tree dictionary management and retrieval method. Background technique [0002] For the storage and search of a large amount of data, the index structure is usually used to realize it at present. Commonly used index structures include linear index tables, inverted lists, hash (Hash) tables, and various search trees. [0003] The linear index is a static index structure, which is not conducive to updating, and the position of each index item in the index table needs to be changed every time a sequential update is performed. The inverted list is also a static index. It is the same as the linear index table. When searching the data in it, it can only be searched sequentially or in half. [0004] The hash (Hash) method is to establish a definite corresponding function relationship Hash() between the storage location of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 张华平王思力
Owner 灵玖中科软件(北京)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products