Term matching method based on a cedar double-array trie algorithm

A double array and tree algorithm technology, applied in the field of computer communication, can solve the problems of slow query, slow word search efficiency, slow term indexing, etc., and achieve the effect of improving efficiency

Active Publication Date: 2017-03-22
TRANSN IOL TECH CO LTD
View PDF5 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is that the current term matching engine based on the database is relatively slow in word search efficiency, and the way to improve this proble

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Term matching method based on a cedar double-array trie algorithm
  • Term matching method based on a cedar double-array trie algorithm
  • Term matching method based on a cedar double-array trie algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The technical solutions of the present invention will be further specifically described below in conjunction with the accompanying drawings and specific embodiments.

[0050] In order to solve the above technical problems, such as figure 1 As shown, the present invention provides a kind of method that carries out term matching based on cedar double array dictionary tree algorithm, it is characterized in that comprising the step of building index, and the step that carries out term query matching with index;

[0051] in,

[0052] 1. The step of building an index is to traverse the database, obtain the term set, call cedar double array dictionary tree to insert the term, in order to form the index of the term set;

[0053] In the cedar double-array dictionary tree, each array element includes a structure array array[n] with reference value and check value as members (such as image 3 shown), a circular queue queue[n] with the same size as the structure array, and a binar...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a term matching method based on a cedar double-array trie algorithm. The method is characterized in comprising a step of establishing an index and a step of carrying out term search matching through the index. The step of establishing the index comprises the sub-steps of traversing a database to obtain a term set, and calling the cedar double-array trie to insert terms, thereby forming the index of the term set, wherein the cedar double-array trie comprise a structure array which taken a reference value and a check value as members and a circular queue of which volume is the same to that of the structure array. Through application of the cedar double-array algorithm to the index establishment of a term matching engine and term search according to the index, the efficiency of the term matching engine is greatly improved. Moreover, according to the algorithm, the deficiency that a classic double-array algorithm libdatrie is very low in speed to the disadvantage of rapid data reconstruction when the index is established for a great number of terms is avoided. A binary tree is taken as an auxiliary structure, so the whole double-array trie can be rapidly restored.

Description

technical field [0001] The invention belongs to the field of computer communication, in particular to a method for term matching based on a cedar double-array trie algorithm. Background technique [0002] At present, the translation industry continues to expand, and the growth rate of corpus and terminology is relatively fast, and the number is also relatively large. A large number of terms are the cornerstone of translation, and effective information technology must be used to manage them. At present, the original text, translation, and other detailed information about terminology within the company are stored in the mongo database. It is very slow to directly query the database to obtain the original text or translation, and the original text or translation may be too long, and it is not convenient to use it as an index field. There is an existing set of term matching engine implementation, which uses a double array algorithm to create a peripheral index for the term, and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/28
CPCG06F16/316G06F40/58
Inventor 冯泽康
Owner TRANSN IOL TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products