Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Hybrid information extraction method and system for open domain

An information extraction and open field technology, applied in the field of hybrid information extraction, can solve the problems that the second example sentence is not obtained, cannot be fully extracted, etc., and achieve the effect of meeting atomicity requirements and improving information extraction capabilities

Pending Publication Date: 2022-07-12
NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

More seriously, Stanford OIE completely ignores this relationship, resulting in no results in the extraction of the second example sentence
In addition, it should be noted that in the output (13), although M^OIE can extract some cross-clause relations (such as "reopened"), it still cannot fully extract such relations (such as "redesigned")

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hybrid information extraction method and system for open domain
  • Hybrid information extraction method and system for open domain
  • Hybrid information extraction method and system for open domain

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] In order to facilitate the understanding of those skilled in the art, the present invention will be further described below with reference to the embodiments and the accompanying drawings, and the contents mentioned in the embodiments are not intended to limit the present invention.

[0037] figure 1 is a general flow chart of the method of an inventive embodiment, where the input is natural language textual data and the output is a generated relation triple.

[0038] like figure 1 As shown, a hybrid information extraction method oriented to the open field disclosed by the embodiment of the present invention belongs to a pipeline method, which mainly includes three stages, namely preprocessing, entity recognition and relationship extraction, wherein the preprocessing The processing stage consists of two steps: contextual clause decomposition and NLP preprocessing. Contextual sentence decomposition is used for compound sentence simplification, and then NLP preprocesses ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a hybrid information extraction method and system oriented to an open domain. According to the method, firstly, through context clause decomposition and NLP preprocessing, a composite sentence is simplified, and language attributes of the sentence are obtained; then identifying explicit phrases in the sentences, and identifying implicit phrases by using the two defined extension rules; and finally, based on the defined six language scene rules or the combination thereof, extracting the relationship between the identified entities, and generating a relationship triple. Aiming at the defects of non-atomic extraction, implicit phrases in a text are expanded by utilizing dependency analysis and heuristic rules, long noun phrases and relation phrases can be further decomposed into meaningful and more compact phrases, and the atomic requirement of information extraction can be better met; for cross-clause extraction, co-reference resolution and context clause decomposition are adopted for compositing sentences, the relation on the simplified clauses is extracted in combination with dependency analysis, and the extraction capacity on a composite sentence pattern is improved.

Description

technical field [0001] The invention belongs to the technical field of open information extraction in natural language processing, and in particular relates to a hybrid information extraction method and system oriented to the open field. Background technique [0002] Open Information Extraction (OIE) in Natural Language Processing (NLP) is the task of transforming unstructured information represented by natural language text into structured information representation in an unsupervised and domain-independent manner . Different from traditional closed domain information extraction, a set of relational categories is given in advance and the corpus domain is limited, open information extraction does not specify relational categories and extracts structured information from multi-domain texts. In this way, it has facilitated the development of extracting information from text and has been extended to corpora such as the Web. [0003] The results of open information extraction ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/295G06F40/211G06F16/33G06N5/02
CPCG06F40/295G06F40/211G06F16/3334G06F16/3338G06N5/025
Inventor 沈国华李锐黄志球蔡茂东杨思恩李广龙
Owner NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products