Method for optimizing word segmentation of search engine through precomputation and word segmenting device of search engine

A search engine and pre-computing technology, applied in the field of computer science, can solve problems such as large word segmentation dictionary, low space utilization rate, and difficulty in reaching the order of magnitude

Active Publication Date: 2012-08-29
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF1 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] (1) The search time of static search is related to the size of the word segmentation dictionary. When the dictionary capacity is large, the response time is long, and it is difficult to reach the order of O(1), and the static search needs to be changed when the dictionary is changed.
[0005] (2) The hash search speed is fast, and the time complexity can be optimized to O(1), but the space utilization rate is low, the memory usage is large, and the query time also depends on the design of the Hash conflict ha

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for optimizing word segmentation of search engine through precomputation and word segmenting device of search engine
  • Method for optimizing word segmentation of search engine through precomputation and word segmenting device of search engine
  • Method for optimizing word segmentation of search engine through precomputation and word segmenting device of search engine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

[0051] Refer below Figure 1 to Figure 3 A method for optimizing search engine word segmentation through precomputation according to an embodiment of the present invention is described.

[0052] In one embodiment of the present invention, the content in the word segmentation dictionary includes: Ah, Argentina, Ejiao, Arabia, Arabs, and Egypt. The Trie tree generated according to the word segmentation dictionary is as follows figure 2shown. It can be understood that the above word segmentation dictionary and Trie tree are for example ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for optimizing the word segmentation of a search engine through precomputation. The method comprises the following steps of: coding a character in a word segmentation dictionary according to the number of sub nodes of a Trie tree, so as to generate a sequence code, wherein the character with many numbers of the sub nodes of the Trie tree is coded preferentially; carrying out the precomputation according to the sequence code, so as to generate a first array and a second array of a double-array Trie tree; and carrying out word segmentation inquiry in the word segmentation dictionary according to the sequence code, the first array and the second array. By using the method, the spatial utilization rate of the word segmentation of the search engine is improved; the loading speed of a word segmenting module is quickened; and the stability of an online service is enhanced. The invention further discloses a word segmenting device of the search engine.

Description

technical field [0001] The invention relates to the technical field of computer science, in particular to a method for optimizing search engine word segmentation through pre-calculation and a search engine word segmentation device. Background technique [0002] Word segmentation is the most basic function of search engines, and it is a technology that search engines use various methods to match according to the keyword strings submitted by users. In order to achieve a better word segmentation effect in existing search engines, word segmentation dictionaries generally have relatively large capacity, and use plaintext word segmentation dictionaries to generate internal data structures through online calculations. Existing search engines generally adopt data structures such as linear index table, inverted list, hash (hash) and search tree. Both the linear index table and the inverted list are static index structures. Hash (Hash) maps characters to a certain storage location t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 阮星华张敏
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products