A method and system for document retrieval

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A literature retrieval and literature technology, applied in the field of massive data processing, can solve the problems of low efficiency and poor pertinence of retrieval methods, and achieve the effect of improving hit rate, use value and high value

Inactive Publication Date: 2018-06-29

AGRI INFORMATION INST OF CAS

View PDF5 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] For this reason, the technical problem to be solved by the present invention is that the retrieval methods in the prior art are inefficient and poorly targeted, thereby proposing an efficient document retrieval method and system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0053] The invention provides a document retrieval method, which can be used for retrieval of scientific and technological documents, the flow chart is as follows figure 1 shown, including the following steps:

[0054] The first is the preprocessing of documents, including step S1 and step S2.

[0055] S1. Select multiple documents and determine the core data of each document.

[0056] When selecting multiple documents, select the documents that belong to the related field of the content to be retrieved as needed. The selected documents may be some documents in the field related to the retrieval information, or documents in some authoritative periodicals and other databases. Due to the large amount of full-text data in the literature, it is not easy to reflect the core content of the literature. Selecting the core content of the literature for analysis can make it more targeted. Here, the core content of the retrieved documents, such as the subject of the document (or the t...

Embodiment 2

[0096] In addition, a specific application example is also provided in this embodiment, the retrieved information is provided by the user, and the rest of the process is completed in the background server.

[0097] S1. First, select a data source. In this example, more than 30,000 English core periodicals of 18 rice species in 20 years (1995-2012) were used as the data source. The specific periodical list is shown in Table 2.

[0098] Table 2 List of Journal Data Sources

[0099]

[0100]

[0101] Then, the core content of the documents such as the subject (or title), search terms (keywords in general), and abstracts are extracted from the above documents as the core data set.

[0102] S2. Perform phrase extraction and statistics on the core data of each of the above documents, and map phrases with similar meanings into the same concept to obtain a concept set, which includes concept, source and concept frequency.

[0103] When performing knowledge extraction, a total ...

Embodiment 3

[0164] As another embodiment of this embodiment, this embodiment provides a document retrieval system, the structural block diagram is as follows image 3 shown, including:

[0165] Core Data Extraction Sheet 01, select multiple documents and determine the core data of each document;

[0166] The concept set generation unit 02 is used to extract and count the phrases of the core data of each document, and map the phrases with similar meanings to the same concept to obtain a concept set, which includes concept, source and concept frequency;

[0167] The search information acquisition unit 03 is used to obtain the search information input by the user, and the search information includes search words, search time period and time slice length;

[0168] Retrieval unit 04, performing a pre-retrieval of search term matching in the core data of the document according to the search term, and obtaining the document matching the search term and the publication time and concept set of th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a document retrieval method and a document retrieval system. The method comprises the steps as follows: performing pre-retrieval in core data of a selected document according to an index word input by a user, performing concept clustering to obtain a theme class, recognizing a core node in the theme class, obtaining a development mode of the theme according to the core node; then obtaining the core node belonging to each theme development mode, and finally using the document corresponding to the core node as a retrieval result. Because the pre-retrieval result obtained according to the index word is further reduced, and the information of all theme classes is huge, and the development condition of the theme could not be reflected, the method firstly obtains the core node in the theme class, and then uses the core node to obtain the theme development mode. The core node belonging to the them development mode will be the document with important value in this retrieval after the theme development mode in the retrieval result is obtained, so the core node is used as the retrieval result to enable the retrieved document to have higher value, thereby improving the hit ratio and usage value of the retrieved document.

Description

technical field [0001] The invention relates to a massive data processing method, in particular to a document retrieval method and system. Background technique [0002] With the rapid development of Internet technology, the number of electronic documents is increasing. How to help users, especially scientific researchers, quickly and effectively find the relevant documents they need from the massive electronic documents has become an urgent problem to be solved. Personalized recommendation technology can effectively solve the problem of information overload. It is a field of multidisciplinary development such as information retrieval, human-computer interaction, data mining and user modeling. It has achieved rich research results in the field of research for many years, especially in The field of e-commerce has achieved good application results, such as recommendations for personal preferences and product evaluations. At present, a relatively rich method and technical system...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06F17/30

CPCG06F16/3334G06F16/334G06F16/35

Inventor孙巍张学福郝心宁谢能付

OwnerAGRI INFORMATION INST OF CAS

A method and system for document retrieval

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology