Unlock instant, AI-driven research and patent intelligence for your innovation.

Retrieval method and device based on separate character indexing system

一种索引系统、索引的技术,应用在搜索引擎领域,能够解决增加检索时间、降低单字索引系统检索性能等问题,达到提高整体性能、改善检索系统性能、减少交集运算次数的效果

Active Publication Date: 2012-12-05
ALIBABA GRP HLDG LTD
View PDF4 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This retrieval method performs an intersection operation on the index documents of each word in the phrase when obtaining the document set of the phrase. Usually, the number of index documents corresponding to each word is huge, so the number of operation objects of the intersection operation is also quite large. In order to obtain For retrieval results, the index system must complete all intersection operations, which will greatly increase the retrieval time and reduce the retrieval performance of the single-word index system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Retrieval method and device based on separate character indexing system
  • Retrieval method and device based on separate character indexing system
  • Retrieval method and device based on separate character indexing system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] attached image 3 It is a flow chart of Embodiment 1 of the method of the present invention. The retrieval method based on single character index system described in this embodiment comprises the following steps:

[0036] Step 101: receiving a search sentence input by a user;

[0037] The search sentence is the search object input by the user, and generally consists of a series of natural language morphemes and logical words describing the logical relationship between these morphemes. A morpheme is a meaningful character that constitutes a search sentence, and a logical word is the basis for logical operations. This embodiment is applicable to the occasion of "AND" logical retrieval. The sequence of logical words is an "AND" sequence, and the corresponding logical operation is an intersection operation.

[0038] Step 102: extracting the character sequence of the search statement, and splitting the character sequence to obtain a retrieval unit, the retrieval unit inclu...

Embodiment 2

[0048] Step 103 in the first embodiment mentions that the index document set is processed including the selection operation. This processing can have many specific implementation methods, as long as the processing step of the selection operation is included in these methods, it does not hinder the scope of the present invention. Realization of the purpose of the invention. For example, a comparison operation may be performed on multiple index document sets of the retrieval unit first, and the index document set with the least number of index documents is obtained through comparison and judgment, and then the index document set is selected as the retrieval result of the retrieval unit. Thus, another embodiment of the present invention can be constituted. Compared with the first embodiment, the second embodiment is the same except step 103 . See attached Figure 4 , step 103 of Embodiment 1 is changed in this embodiment to:

[0049] Step 203: comparing the number of index docu...

Embodiment 3

[0053] In step 103 of the first embodiment, it is mentioned that the index document set is processed including the selection operation. In addition to the method described in the second embodiment, it can also be carried out as follows: each index document set of the retrieval unit is divided into two groups, Perform intersection operation on the index document sets in each group, then compare the number of documents in the operation result document sets after the intersection operation in each group, and select the operation result document set with fewer documents as the retrieval result of the retrieval unit. Thus, another embodiment of the present invention can be constructed according to the above description, see appended Figure 5 . The difference between this embodiment and Embodiment 1 is that step 103 in Embodiment 1 is changed to:

[0054] Step 3031: Divide the index document set of the retrieval unit into two groups;

[0055] Here, as an embodiment, the index doc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a retrieval method based on a separate character indexing system, which comprises the steps as follows: character sequences of a retrieval statement are split into retrieval units, an indexing document set corresponding to retrieval characters are acquired from an index table according to the retrieval characters of the retrieval units, the indexing document set is taken as a processing object, the treatments comprising selection operation are carried out, and processing results are taken as retrieval results of the retrieval units; the retrieval results of the retrieval units are mixed and returned to a result document set ; the retrieval units are utilized to scan the result document set so as to judge whether a document comprising the retrieval units at the same time exists; and if yes, the document is returned. The invention also provides a device based on separate character indexing system. The retrieval method and the device perform treatments comprising selection operation on the indexing document set of the retrieval units rather than performing intersection calculation on all the indexing document set, so that the intersection operands are reduced, and the retrieval performance of the indexing system is improved.

Description

technical field [0001] The invention relates to search engine technology, in particular to a retrieval method and device based on a single-word index system. Background technique [0002] With the rapid popularization of the Internet, the massive increase in information. The emergence of search engine technology enables people to find all kinds of information they need in these massive amounts of information conveniently and quickly. [0003] Single-word indexing system has been widely used as a solution to retrieve target information. The single-word index system includes numerous pre-built index tables (as attached figure 1 As shown), each index table mainly contains three columns of data: one column is index characters, most of these characters are in the form of single characters, and a small number of characters are natural language phrases, idioms, and even short sentences. The search statement is queried based on the index characters Index table; the second column ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F17/30G06F17/30678G06F17/30622G06F16/3341G06F16/319
Inventor 杨栋
Owner ALIBABA GRP HLDG LTD