Literature content retrieval and recognition method and device

A content and document technology, applied in the field of document content retrieval and identification methods and devices, can solve problems such as difficult to judge value, consume a lot of time, and small amount of effective information, so as to avoid repeated reading, reduce workload, and realize similarity judgment Effect

Pending Publication Date: 2019-04-16
TRAFFIC CONTROL TECH CO LTD
View PDF6 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, none of the above-mentioned methods achieves automatic content acquisition, consumes a lot of manpower, and consumes a lot of time for reading the entire text; it is difficult to achieve horizontal comparison of a large number of documents, and it is impossible to solve the problem of document similarity distinction, and there are cases of repeated reading; When only the abstract and keywords are read, the amount of effective information provided by the article is too small; the use of automated methods such as crawlers does not realize the distinction between article content and article type, and the amount of content obtained is large but of low quality; it is impossible to count the frequency of keywords in the article , it is difficult to judge the value of the article to readers before reading the article

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Literature content retrieval and recognition method and device
  • Literature content retrieval and recognition method and device
  • Literature content retrieval and recognition method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0026] figure 1 It shows a schematic flow chart of a method for document content retrieval and identification provided by an embodiment of the present invention, as shown in figure 1 As shown, the document content retrieval and identification method in this embodiment includes:

[0027] S1. Use the distributed computing engine architecture to st...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the invention provide a literature content retrieval and recognition method and device. The method comprises the steps: storing and reading a target article by utilizing a distributed calculation engine architecture, carrying out splitting of the target article, and obtaining the statement and vocabulary of the target article; removing invalid vocabularies in statements and vocabularies of the target article by utilizing the word stop library; inputting the statements and the vocabularies without the invalid vocabularies into a pre-generated similarity analysis model in sequence, and extracting all the statements similar to the content of the target statement library in the target article and all the vocabularies similar to the content of the target lexicon in the target article; wherein the target word bank, the stop word bank and the target statement bank are obtained by carrying out word bank division in advance according to expected specific contents; wherein the pre-generated similarity analysis model is generated by utilizing a logistic regression algorithm based on contents in the target statement library and the target word library. Automatic acquisition of literature contents can be realized, a large amount of literature reading work can be handled, repeated reading is avoided, and the workload is reduced.

Description

technical field [0001] The embodiments of the present invention relate to the field of computer technology, and in particular to a method and device for document content retrieval and identification. Background technique [0002] With the increase of technical literature in various industries, reading literature in a certain technical direction and acquiring core knowledge content has become one of the important tasks of professional and technical personnel. [0003] In the process of document reading, the content of the document is the most important part, and it is also the ultimate goal of the reader. At present, in the face of a large amount of literature content, the common practice is to adopt methods including: fine search method, reverse reading method, catalog method and other automatic methods. Among them, the fine search method is to reduce the number of hit targets by adding multiple keywords when searching, and to filter the results of the reading documents; th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F16/33
CPCG06F40/279G06F40/30
Inventor 罗铭刘波
Owner TRAFFIC CONTROL TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products