Document retrieval method and system

A document retrieval and document technology, applied in the field of massive data processing, can solve the problems of low efficiency and poor pertinence of retrieval methods, and achieve the effect of improving hit rate, use value and high value.

Inactive Publication Date: 2015-11-18
AGRI INFORMATION INST OF CAS
View PDF5 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] For this reason, the technical problem to be solved by the present invention is that the retrieval methods ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document retrieval method and system
  • Document retrieval method and system
  • Document retrieval method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] The invention provides a document retrieval method, which can be used for retrieval of scientific and technological documents, the flow chart is as follows figure 1 shown, including the following steps:

[0054] The first is the preprocessing of documents, including step S1 and step S2.

[0055] S1. Select multiple documents and determine the core data of each document.

[0056] When selecting multiple documents, select the documents that belong to the related field of the content to be retrieved as needed. The selected documents may be some documents in the field related to the retrieval information, or documents in some authoritative periodicals and other databases. Due to the large amount of full-text data in the literature, it is not easy to reflect the core content of the literature. Selecting the core content of the literature for analysis can make it more targeted. Here, the core content of the retrieved documents, such as the subject of the document (or the t...

Embodiment 2

[0095] In addition, a specific application example is also provided in this embodiment, the retrieved information is provided by the user, and the rest of the process is completed in the background server.

[0096] S1. First, select a data source. In this example, more than 30,000 English core periodicals of 18 rice species in 20 years (1995-2012) were used as the data source. The specific periodical list is shown in Table 2.

[0097] Table 2 List of Journal Data Sources

[0098]

[0099]

[0100] Then, the core content of the documents such as the subject (or title), search terms (keywords in general), and abstracts are extracted from the above documents as the core data set.

[0101] S2. Perform phrase extraction and statistics on the core data of each of the above documents, and map phrases with similar meanings into the same concept to obtain a concept set, which includes concept, source and concept frequency.

[0102] When performing knowledge extraction, a total ...

Embodiment 3

[0162] As another embodiment of this embodiment, this embodiment provides a document retrieval system, the structural block diagram is as follows image 3 shown, including:

[0163] Core Data Extraction Sheet 01, select multiple documents and determine the core data of each document;

[0164] The concept set generation unit 02 is used to extract and count the phrases of the core data of each document, and map the phrases with similar meanings to the same concept to obtain a concept set, which includes concept, source and concept frequency;

[0165] The search information acquisition unit 03 is used to obtain the search information input by the user, and the search information includes search words, search time period and time slice length;

[0166] Retrieval unit 04, performing a pre-retrieval of search term matching in the core data of the document according to the search term, and obtaining the document matching the search term and the publication time and concept set of th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a document retrieval method and a document retrieval system. The method comprises the steps as follows: performing pre-retrieval in core data of a selected document according to an index word input by a user, performing concept clustering to obtain a theme class, recognizing a core node in the theme class, obtaining a development mode of the theme according to the core node; then obtaining the core node belonging to each theme development mode, and finally using the document corresponding to the core node as a retrieval result. Because the pre-retrieval result obtained according to the index word is further reduced, and the information of all theme classes is huge, and the development condition of the theme could not be reflected, the method firstly obtains the core node in the theme class, and then uses the core node to obtain the theme development mode. The core node belonging to the them development mode will be the document with important value in this retrieval after the theme development mode in the retrieval result is obtained, so the core node is used as the retrieval result to enable the retrieved document to have higher value, thereby improving the hit ratio and usage value of the retrieved document.

Description

technical field [0001] The invention relates to a massive data processing method, in particular to a document retrieval method and system. Background technique [0002] With the rapid development of Internet technology, the number of electronic documents is increasing. How to help users, especially scientific researchers, quickly and effectively find the relevant documents they need from the massive electronic documents has become an urgent problem to be solved. Personalized recommendation technology can effectively solve the problem of information overload. It is a field of multidisciplinary development such as information retrieval, human-computer interaction, data mining and user modeling. It has achieved rich research results in the field of research for many years, especially in The field of e-commerce has achieved good application results, such as recommendations for personal preferences and product evaluations. At present, a relatively rich method and technical system...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/3334G06F16/334G06F16/35
Inventor 孙巍张学福郝心宁谢能付
Owner AGRI INFORMATION INST OF CAS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products