Distributed retrieval method and device

A distributed and information retrieval technology, applied in the field of information retrieval, can solve problems such as counting instability, achieve comprehensive retrieval results, speed up processing, and reduce network traffic

Inactive Publication Date: 2017-09-01
ALIBABA GRP HLDG LTD
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] This application provides a distributed retrieval method and device, which can solve the problem of unstable counting

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed retrieval method and device
  • Distributed retrieval method and device
  • Distributed retrieval method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0072] Embodiment 1. A distributed retrieval method, applied to a server, such as figure 1 As shown, steps S110-S130 are included.

[0073] S110. After receiving the retrieval request, search for similar fingerprints in the stored fingerprints according to the fingerprints of the information to be retrieved carried in the retrieval request;

[0074] S120. For each similar fingerprint found, perform the following operations respectively: compare the segment of the similar fingerprint with the segment of the fingerprint of the information to be retrieved according to a predetermined order, and when the first identical segment is found, Stop comparing after the segment; compare the identification of the identical segment with the identification carried in the retrieval request, and if they are the same, include the similar fingerprint in the counting result; wherein, the division of segments and the identification of each segment The identification is determined according to a f...

other Embodiment approach

[0106] In other implementation manners, other ways may also be used to establish and save the corresponding relationship between segments and fingerprints, for example, the segments and their fingerprints may be stored in one-to-one correspondence.

[0107] In an implementation manner of this alternative solution, the server may also classify and save the segments according to the identifiers of the segments, that is, when the segments are saved, the identifiers of the segments are used as indexes; for example, the segment whose identifier is 1 Save together with index "1"; save together the segment identified as "2" with index "2", and so on;

[0108] In this embodiment, the searching for a segment that is exactly the same as the acquired segment among the saved segments may include:

[0109] In the stored segments indexed by the identifier carried in the retrieval request, a segment that is exactly the same as the acquired segment is searched for.

[0110] This embodiment c...

Embodiment 2

[0113] Embodiment 2, a distributed retrieval method, applied to the client, such as figure 2 As shown, steps S210-S230 are included.

[0114] S210. Determine the server corresponding to each segment according to the second predetermined rule;

[0115] S220. Send a retrieval request to the server corresponding to each segment; the retrieval request carries the fingerprint of the information to be retrieved and the identification of the segment corresponding to the server; The identification of the segment is determined according to a first predetermined rule;

[0116] S230. Add up the counting results returned by the server for the retrieval request to obtain a retrieval result.

[0117] In this embodiment, the same type of fingerprint is stored on multiple servers; when the same type of fingerprint is stored on multiple servers, the servers corresponding to different segments of the fingerprint can also be determined according to the first predetermined rule, and the server...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed retrieval method and device. The method comprises the following steps that: after a retrieval request is received, according to the fingerprint of to-be-retrieved information carried in the retrieval request, searching a similar fingerprint in stored fingerprints; for each found similar fingerprint, independently carrying out the following operations that: correspondingly comparing the segment of the similar fingerprint and the segment of the fingerprint of the to-be-retrieved information in sequence according to a preset sequence, and stopping comparison after the first identical segment is found; comparing the identification of the identical segmentation with the identification carried in the retrieval request, and if the identification of the identical segmentation and the identification carried in the retrieval request are the same, ensuring that a counting result contains the similar fingerprint, wherein the division of the segment and the identification of each segment are determined according to a first rule; and returning the counting result. By use of the method, the problems that counting is unstable and additional counting duplicate removal operation is required when similar information identification is carried out in the distributed system can be solved.

Description

technical field [0001] The invention relates to the field of information retrieval, in particular to a distributed retrieval method and device. Background technique [0002] Similar information recognition technology is currently widely used. A typical application scenario is to detect the existence of similar information in massive information, such as deduplication of web pages in search engine crawler systems; another typical application scenario is to detect the presence of similar information How often, for example, in an anti-spam system to detect the number of similar emails. [0003] SIMHASH is a relatively common algorithm for identifying repetitive information. SIMHASH can convert information such as documents into a 64-bit byte, which is called a fingerprint in this paper; This n generally takes a value of 3), and it is considered that the two pieces of information are similar; wherein, the Hamming distance refers to the number of bits whose corresponding bits of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/2471
Inventor 林治晖沈朝阳
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products