Information extraction method based on deep semantic comprehension

A technology of information extraction and semantic understanding, applied in semantic analysis, natural language data processing, electronic digital data processing, etc., can solve the problems of low accuracy rate, low recall rate, and poor versatility of long sentence extraction, and achieve the goal of promoting Recall and precision, effect of reducing workload

Pending Publication Date: 2020-03-17
鼎复数据科技(北京)有限公司
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 1. The supervision-based method requires a large amount of manual labeling data, and the model generated based on a certain field cannot be directly used in other fields, and still requires a lot of labeling work, and the versatility is not strong;
[0006] 2. The recall rate of the results obtained based on the semi-supervised method is too low, and a large amount of sample data is usually required for model optimizat...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information extraction method based on deep semantic comprehension
  • Information extraction method based on deep semantic comprehension
  • Information extraction method based on deep semantic comprehension

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0130] Step 1: Convert the collected domain information into a text format and store it in the corpus; and use the domain information to construct ontology and basic relationships.

[0131] When the ontology is executive resignation, the basic relationship constructed is: name, gender, resignation time, resignation reason, and resignation position.

[0132] Use annotation tools to manually annotate part of the corpus, and record the annotation content and its specific position in the text. The manually marked corpus is as follows (the underlined part indicates the marked content):

[0133] Recently, on September 25, 2008, the board of directors of Shenzhen *** Information Technology Co., Ltd. (hereinafter referred to as "the company") received written resignation reports from directors Mr. Cui ** and director Mr. He **. because age reason , Mr. Cui** and He**xian pregnancy Separately apply for resignation from the company vice president position and General manage...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an information extraction method based on deep semantic comprehension, which comprises the following steps of: constructing a body and a basic relationship in the field, and manually labeling parts of corpora; processing the manually annotated corpora, identifying an entity type corresponding to a specific relationship, and mining new words and synonyms in the field at the same time; merging synonyms recognized in the sentences, abstracting the original sentences and making syntactic analysis; clustering the abstracted sentences into sentence templates, and performing template learning; making sentence template evaluation; and performing new relationship extraction on manually unlabeled corpora by utilizing the sentence template, and evaluating and filtering a new relationship. According to the method provided by the invention, the syntactic analysis result can be better utilized, so that the automatically mined template has higher-level abstraction and generalization capabilities.

Description

technical field [0001] The invention relates to an information extraction method, in particular to an information extraction method based on deep semantic understanding. Background technique [0002] With the popularization of digital technology and Internet technology, text information has grown explosively. How to organize text information reasonably and find important information quickly and conveniently has become an urgent problem to be solved. [0003] One of the common ways to organize text information is structured charts, and information extraction is a common method for structuring structured, semi-structured and unstructured data. [0004] Among the existing information extraction methods, many attempts have been made on structured and semi-structured data, and better results can be achieved. But for unstructured data, the current extraction methods have the following problems: [0005] 1. The supervision-based method requires a large amount of manual labeling ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/211G06F40/295G06F40/30G06F40/247G06F16/35
Inventor 徐祯琦李超吴雪军
Owner 鼎复数据科技(北京)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products