Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for prefix indexing during word segmentation

A prefix and index technology, used in text database indexing, digital data information retrieval, unstructured text data retrieval, etc., can solve problems such as reducing retrieval efficiency

Active Publication Date: 2020-10-30
IOL WUHAN INFORMATION TECH CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when performing prefix indexing during the word segmentation process, it is necessary to go to each double-array Trie tree to retrieve the word in turn, which greatly reduces the retrieval efficiency.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for prefix indexing during word segmentation
  • Method and device for prefix indexing during word segmentation
  • Method and device for prefix indexing during word segmentation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0025] figure 1 A schematic flowchart of a method for performing prefix indexing in the word segmentation process provided by the embodiment of the present invention, as shown in the figure, includes:

[0026] Step 100, based on the improved hash algorithm SDBMHash, the dictionary data is split and stored in multiple double-array Trie trees;

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a method and a device for performing prefix indexing in a word segmentation process. The method comprises the following steps of: splitting dictionary data based on an improved Hash algorithm SDBMHash and storing the dictionary data into a plurality of double-array Trie trees; Carrying out Hash calculation on the to-be-retrieved word by utilizing the improved Hash algorithm SDBMHash, and determining a double-array Trie tree where the to-be-retrieved word is located according to a Hash calculation result; And performing prefix indexing on the to-be-retrieved word in a double-array Trie tree where the to-be-retrieved word is located. According to the embodiment of the invention, the high efficiency of the prefix index in the word segmentation processcan be ensured under the application scene of supporting the splitting of the dictionary into a plurality of double-array Trie trees.

Description

technical field [0001] Embodiments of the present invention relate to the technical field of natural language processing, and more specifically, to a method and device for performing prefix indexing during word segmentation. Background technique [0002] Double Array Trie Tree (DoubleArrayTrie) is a Trie tree with low space complexity, which is mainly used in the field of information retrieval to build word segmentation dictionaries. The double-array Trie tree combines the fastness of array access and the compression of linked storage. The double-array Trie tree supports prefix indexing, that is, it can be retrieved whether there are other words prefixed with this word in the tree. [0003] Word segmentation is to break a sentence into multiple words. The scenario where the double array Trie tree is applied is to decompose a sentence into multiple words that exist in the double array Trie tree. In the word segmentation process, it is necessary to perform prefix query on t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/31G06F16/332G06F16/36
Inventor 谭峰
Owner IOL WUHAN INFORMATION TECH CO LTD