Unlock instant, AI-driven research and patent intelligence for your innovation.

MapReduce-based document retrieval method and system

A document retrieval and document technology, applied in the field of information processing, can solve problems such as low efficiency, and achieve the effects of reducing costs, improving query efficiency, and improving accuracy.

Active Publication Date: 2017-07-04
AEROSPACE INFORMATION
View PDF3 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the low efficiency of document retrieval for massive data in the prior art, it is necessary to provide a solution based on the MapReduce programming model for massive data retrieval in the field of document information retrieval. This solution needs to integrate and improve the inverted index, etc. technology, and is suitable for parallel computing of document retrieval

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • MapReduce-based document retrieval method and system
  • MapReduce-based document retrieval method and system
  • MapReduce-based document retrieval method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] Exemplary embodiments of the present invention will now be described with reference to the drawings; however, the present invention may be embodied in many different forms and are not limited to the embodiments described herein, which are provided for the purpose of exhaustively and completely disclosing the present invention. invention and fully convey the scope of the invention to those skilled in the art. The terms used in the exemplary embodiments shown in the drawings do not limit the present invention. In the figures, the same units / elements are given the same reference numerals.

[0040] Unless otherwise specified, the terms (including scientific and technical terms) used herein have the commonly understood meanings to those skilled in the art. In addition, it can be understood that terms defined by commonly used dictionaries should be understood to have consistent meanings in the context of their related fields, and should not be understood as idealized or over...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a document retrieval method. The method comprises the steps of obtaining a query statement; performing word segmentation on the query statement to obtain a plurality of terms different from one another; calculating term frequency-inverse document frequency (TF-IDF) of each term in the terms; performing vector conversion on each term in the terms to obtain a term vector (V); for each document in a document library, calculating a plurality of matching degrees between each document and each term of a to-be-queried statement by utilizing TF-IDF and V matching modes; calculating a maximum value in the matching degrees to obtain a maximum matching degree between each document and each term of the to-be-queried statement; and calculating a sum of the maximal matching degrees between the documents and the terms in the query statement to obtain a matching degree between each document and the query statement.

Description

technical field [0001] The invention relates to the field of information processing, and in particular to a MapReduce-based document retrieval method and system. Background technique [0002] Document retrieval refers to the process of finding the best matching document of the query statement from the document database under the condition of a given query statement. Currently, inverted index is a common technique in the field of document retrieval. An inverted index is similar to a word index table in a book, which records the position of each occurrence of each word. When querying a word, the inverted index can quickly locate all the positions where the word appears. However, as the number of documents and the number of words continue to increase, it is inefficient to perform sequential searches in massive data. Moreover, if the sentence to be queried contains multiple words, how to determine the weight of each word when matching becomes another important problem to be s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/93
Inventor 胡懋地
Owner AEROSPACE INFORMATION