An Orderly Construction and Retrieval Method of String Data Dictionary

A string data, dictionary technology, applied in the direction of unstructured text data retrieval, electronic digital data processing, text database query, etc., can solve the problems of high time overhead, poor processing effect, poor hash performance, etc., to ensure flexibility , the effect of reducing time consumption and memory usage

Active Publication Date: 2017-02-01
DALIAN UNIV OF TECH
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the research results of perfect hash function design, the time overhead for generating a perfect hash function is usually very high. Even the most famous perfect hash function generator Gperf under the Linux system cannot guarantee that it can still be used under large-scale data. Generate a perfect hash function. When the dictionary entries exceed 15,000, its hash performance is poor, especially for multi-byte large character set languages ​​​​like Chinese.
On the other hand, since the design goal of the hash function is uniform distribution, the order of the dictionary entries is usually not guaranteed. When performing ordered traversal, all dictionary entries must be sorted first, which is almost impossible for practical application in terms of efficiency.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An Orderly Construction and Retrieval Method of String Data Dictionary
  • An Orderly Construction and Retrieval Method of String Data Dictionary
  • An Orderly Construction and Retrieval Method of String Data Dictionary

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0095] In order to make the purpose, technical solutions and beneficial effects of the present invention clearer and easier to implement, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0096] Assume that there are 2 documents to be processed, the contents of which are shown in Table 1:

[0097] Table 1 Contents of documents to be processed

[0098] Document number document content 1 a, cat, cats, dog 2 cats, dog, dom

[0099] Step 1, construct a temporary burst tree, the specific process is as follows:

[0100] First create an empty temporary burst tree and a final burst tree; that is, create an empty temporary burst node, read in document 1, add or update the string data content to the temporary burst tree, the result is as follows figure 2 As shown; input a, cat, cats, and dog into the temporary burst tree one by one, and generate corres...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for orderly constructing and retrieving a string data dictionary. The method includes S1, inputting string data into a temporary bursting tree one by one; S2, when data quantity in the temporary bursting tree reaches a threshold value condition, combing the data into a final bursting tree; S3, converting the final bursting tree into a six-tuple structure finite state converter; S4, compiling the six-tuple structure finite state converter to be in a three-array structure form; S5, according to application needs, utilizing the three-array structure finite state converter after being compiled to realize retrieving or order transversing of the data dictionary. By the method, high-efficiency dictionary construction can be performed on millions of data items, and retrieval needs in different environments and application can be met.

Description

technical field [0001] The invention relates to the fields of information retrieval, natural language processing and pattern recognition and matching, in particular to an orderly construction and retrieval method of a dictionary suitable for string data of any scale. Background technique [0002] Dictionary construction and retrieval for string data has always been an important technical link in many applications in the fields of information retrieval, natural language processing, and pattern recognition and matching. The construction and retrieval speed of dictionaries largely determine the overall performance of the application system. For example, the positioning of inverted index items in search engines, word segmentation and synonym replacement in text processing, spell checking in text editors, and text association in input methods all have very high requirements on the construction of corresponding dictionaries and retrieval performance. [0003] Due to the key positi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/33G06F16/374
Inventor 马云龙林鸿飞
Owner DALIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products