A fingerprint-based corpus full-text retrieval method and system
A technology of corpus and fingerprint database, applied in the field of corpus full-text retrieval method and system based on fingerprint, can solve the problem of inability to generate retrieval results quickly and accurately, and achieve the effects of easy promotion, improved accuracy and strong applicability
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0078] A fingerprint-based corpus full-text retrieval method, such as figure 1 shown, including:
[0079] Step 1: Construct fingerprints for the documents to be checked based on the distance map method in parallel;
[0080] Step 2: Based on the fingerprint of the document to be checked, search in parallel in the pre-built fingerprint library for one or more fingerprints with the greatest similarity to the fingerprint of the document to be checked;
[0081] Step 3: The document corresponding to the fingerprint is the retrieval result for the document to be checked.
[0082] Step 1: Construct fingerprints for the documents to be checked based on the distance map method in parallel.
[0083] Specifically, the construction of the fingerprint database includes:
[0084] Based on the full text of all documents in the corpus, the distance map method is used to construct fingerprints for each document, and a fingerprint index is generated.
[0085] Specifically, said using the dis...
Embodiment 2
[0125] Based on the same inventive concept, the present invention also provides a fingerprint-based corpus full-text retrieval system, such as figure 2 As shown, including index module, similarity module and retrieval module:
[0126] Fingerprint module: used to construct fingerprints for the documents to be checked based on the distance map method in a parallel manner;
[0127] Similarity module: used to search in parallel in the pre-built fingerprint database for one or more fingerprints with the greatest similarity to the fingerprint of the document to be checked based on the fingerprint of the document to be checked;
[0128] Retrieval module: the document corresponding to the fingerprint is the retrieval result for the document to be checked.
[0129] In the fingerprint module, the construction of the fingerprint library includes:
[0130] Based on the full text of all documents in the corpus, the distance map method is used to construct fingerprints for each document,...
Embodiment 3
[0170] Fingerprint-based corpus full-text retrieval methods can be divided into two phases: index generation and index-based search. The process of generating an index is generally a one-time process. As long as the main content and structure of the document do not change, the corresponding index will generally not be updated.
[0171] Related concepts and symbolic representations thereof involved in the present invention are defined as follows:
[0172] K-order distance: For a given document D, its word sequence is denoted as seq(D), and the word set is denoted as N(D). If in seq(D), words, also called nodes, are represented by n, n i in the word n j At most k positions have appeared at least 1 time before, where n i ,n j ∈N(D), then say n i to n j The distance is the k-order distance, k≥0.
[0173] Edge of order k: if node n in document D i to n j The distance is the k-order distance, then it is called n i to n j The directed edge is e i,j is an edge of order k, ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com