Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Intelligent information extraction method and system

An information extraction and intelligent technology, which is applied in a sub-field of language processing---the field of information extraction, can solve the problems of machine reading comprehension, target transfer of referring entities, and high sample complexity

Active Publication Date: 2020-06-16
阿基米德(上海)传媒有限公司
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, it is possible that the referent entity target in the text has been transferred, resulting in the recalled content not referring to the content of the question. This problem is a difficult problem for machine reading comprehension.
[0011] Requires detailed and large-scale manual labeling data: Similar to entity recognition and entity relationship classification, detailed manual labeling is required, and often faces a multi-classification task, although adding some probabilistic graphical models (such as conditional random fields) can improve multi-classification. The accuracy of the task, but multi-classification tasks will essentially face problems such as category imbalance and high sample complexity, and the cost of data scale is also an important reason for limiting the use of this method for information extraction.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Intelligent information extraction method and system
  • Intelligent information extraction method and system
  • Intelligent information extraction method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] In order to make the technical problems, technical solutions and beneficial effects solved by the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0032] Each step of the intelligent information extraction method provided by the present invention is as follows: figure 2 , image 3 shown. The method includes the following steps:

[0033] S1. Document tensorization: use document tensor extraction technology to tensorize documents and question texts, and extract original document tensors and question text tensors;

[0034] S2. Use the topic model to perform topic aggregation and filtering: decompose the original document tensor through a non-negative matrix algorithm (NMF) to obtain N clustering topics, and perform all sentences ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an intelligent information extraction method and system. According to the method, after an NMF method is adopted to decompose a document feature matrix to obtain K clustering topics, clustering sentences in the document to obtain a set of a plurality of sentences; taking the first k topic sentence sets with the highest semantic similarity with the question text, using the question text again to respectively retrieve the k topic sentence sets, returning m most relevant sentences from each sentence set to form a corresponding document, and combining the obtained k documents into a long document; and performing answer extraction on the long document and the question text by adopting an MRC model combining a bidirectional attention flow model and a question text tensorinitialized Point Net model. According to the scheme provided by the invention, an improved MRC algorithm is adopted, and question text information is utilized in the stages of document theme aggregation and filtering, sentence retrieval recall and sorting and answer extraction; the OOV problem existing in the prior art is effectively solved, the data labeling cost is low, and the calculation efficiency and accuracy are both achieved.

Description

technical field [0001] The present invention relates to a sub-field of language processing---Information Extraction, in particular to an intelligent information extraction method and system combining traditional text retrieval and machine reading comprehension. Background technique [0002] Document-based retrieval technology has been extensively studied due to the continuous development of search services in the Internet industry. Algorithms similar to BM25 and page Rank based on empirical formulas do not rely on training models to obtain better accuracy and recall rates, but in large documents In the process of processing, there will be a problem of recalling more redundant information, so it cannot be directly used in the results of information or knowledge extraction. It often needs to be matched with a matching model such as first-order predicate logic based on rules and patterns, and subject-verb-object triplets. The design of information extraction, pattern and defini...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F16/335G06F40/205G06N20/00
CPCG06F16/355G06F16/335G06N20/00Y02D10/00
Inventor 胡家新
Owner 阿基米德(上海)传媒有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products