Document retrieval method and device

A document retrieval and document technology, which is applied in the direction of instruments, calculations, electrical digital data processing, etc., can solve the problems of unable to sort the retrieval results, etc., and achieve the effect of meeting user needs and accurate sorting results

Active Publication Date: 2012-07-11
NEW FOUNDER HLDG DEV LLC +1
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The embodiment of the present invention provides a document retrieval method and device, which are used to solve the problem that the retrieval results cannot be sorted according to the position where the retrieval word appears in the document and the data length of the document

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document retrieval method and device
  • Document retrieval method and device
  • Document retrieval method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0072] This embodiment is an index establishment process, specifically as follows:

[0073] Step 01: Segment the fields in the document with precise retrieval requirements by word to obtain one or more retrieval tokens, and create an index for each retrieval token;

[0074] Step 02: Add an extra marker (Term) to the index to mark the end of the field. The text of Term uses a predefined character END. END is an illegal character in the character encoding set to ensure that it will not repeat with normal text;

[0075] Step 03: Record and save the length of the field of each document, that is, the number of search word segments contained in the field, and the length value greater than 255 is treated as 255 to facilitate storage and calculation.

Embodiment 2

[0077] This embodiment is a document retrieval process, specifically as follows:

[0078] Step 11: Segment the search keyword in the search request by character to obtain N search word segments. If it involves the positional relationship with the end of the field, additionally add END as the N+1th search word;

[0079] Step 12: Analyze the search keywords and the wildcards in them, obtain and record the positional relationship between each search word, including:

[0080] The positional relationship between the first search word and the beginning of the document, the positional relationship between the second search word and the first search word, ..., the positional relationship between the Nth search word and the end of the document;

[0081] The position relationship can be represented by a set of minimum position value and maximum position value, denoted as (min, max). The minimum value of min is 0, that is, the position is the same, and the maximum value of max is MAX, ...

Embodiment 3

[0086] This embodiment illustrates the specific implementation mode through the implementation of searching the entry fields of "Ci Hai" in the enterprise search application.

[0087] The search for the entry fields of "Ci Hai" requires the ability to find the documents containing the search word at a specific position, and use the above rules to sort according to the hit position and the length of the hit document.

[0088] Wildcards "?" and "*" are supported in the search request, where "?" represents 0 or 1 character, and "*" represents 0 or 1 or more characters. wildcards.

[0089] The following is a detailed explanation of the use of various types of wildcards:

[0090]

[0091] During the retrieval process, it is necessary to match not only the positional relationship between the searched word, but also the positional relationship between the searched word and the beginning and end of the document.

[0092] Before retrieval, an index building process is required, as...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a document retrieval method and device, relating to the field of computer information processing. The method and the device are used for solving the problem that the retrieval results can not be sequenced according to the positions of retrieval segmented words in the documents and the data length of the documents. The method comprises the following steps: sequencing a plurality of retrieved documents according to the positions of the retrieval segmented words in the retrieval keywords in the retrieved documents and the data length of the retrieved documents after retrieving a plurality of documents containing the retrieval segmented words in the retrieval keywords; and returning the retrieved documents as the retrieval results according to the sequencing result of the retrieved documents. Visibly, the retrieval results can be sequenced according to the positions of the retrieval segmented words in the documents and the data length of the documents by adopting the method and the device.

Description

technical field [0001] The invention relates to the field of computer information processing, in particular to a document retrieval method and device. Background technique [0002] Full-text retrieval means that the full-text retrieval system creates an index item for each word by scanning each word in the document, indicating the number and position of the word in the document. The established index file is searched, and the search results are returned to the user according to a certain sorting method. In practical applications, a document processed by a full-text retrieval system may contain multiple fields, such as title, author, text, and so on. [0003] Specifically, after the user submits a search request, the full-text search system analyzes and determines the search word segmentation included in the search keyword in the search request. The search word segmentation refers to the word segmentation formed by dividing the search keyword into characters. Character divi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 童征宇徐剑波
Owner NEW FOUNDER HLDG DEV LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products