Method for constructing perfect Hash function for processing bulk dictionary

A large-scale, dictionary technology, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as large working space, a large number of, and can not guarantee perfect hash function, to reduce working space, good effect, The effect of reducing construction time

Active Publication Date: 2007-07-11
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] The biggest disadvantage of Fox's method is that it requires a large amount of space to save the random number set, and the number of vertices of the correlation graph of this method is 0.6 times the number of words in the dictionary, which leads to the working space of the Fox method (the memory required to save the hash function) larger
In addition, the random number strategy cannot guarantee that a perfect hash function can be constructed for any dictionary

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for constructing perfect Hash function for processing bulk dictionary
  • Method for constructing perfect Hash function for processing bulk dictionary
  • Method for constructing perfect Hash function for processing bulk dictionary

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0057]In this embodiment, the method for constructing a perfect hash function for processing large-scale dictionaries is also divided into three stages: the character smoothing stage, the multi-level correlation graph construction stage and the vertex assignment stage, which are described in detail below:

[0058] 1. Character smoothing stage

[0059] In a correlogram, the higher the degree of a vertex, the more difficult it is to associate an integer with that vertex, and the smaller the fill factor of the hash function is likely to be. Therefore, the present invention smoothes all the words in the dictionary before constructing the multi-level correlation graph, so that the vertex degrees of each constructed correlation graph are small and evenly distributed. In the present invention, all characters except the first character of the word are smoothed into two characters by two different smoothing functions, for example, for the word k=c 1 c 2 …c n , the shape after smooth...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a perfect function for processing large scale dictionary, comprising smoothing the word in the dictionary and dividing the smoothed word dictionary into n sub dictionaries with relevant structural charts, with n being the natural number, sequencing the points of these relevant charts and mapping each word to various address to get the perfect function of the dictionary. It can include millions of words' dictionary to form perfect Hash function to process large character collection like Chinese, with reduced time and working space.

Description

technical field [0001] The invention relates to the technical fields of information retrieval and natural language processing, in particular to a method for constructing a perfect hash function for processing large-scale dictionaries. Background technique [0002] Many applications in the fields of information retrieval and natural language processing involve dictionary lookup, and the speed of dictionary lookup determines the overall performance of the system to a large extent. For example, the recognition of reserved words in the integrated development environment (IDE), the spelling check of the editor, the result verification of the optical character recognition (OCR), the Chinese word segmentation of text processing, and the post list positioning of the inverted index all have very high speed requirements. Both require a fast dictionary lookup. The perfect hash function can quickly map the words of the dictionary to unique integers without conflicts, so it is an ideal ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 龚才春
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products