Remote supervision relation extraction method based on multiple tasks and multiple examples

A technology of relation extraction and remote supervision, which is applied in the field of natural language processing and information extraction, can solve problems such as insufficient training and class imbalance, and achieve the effect of simple method, reduced impact, and improved contribution

Active Publication Date: 2021-02-23
EAST CHINA NORMAL UNIV +1
View PDF7 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The gradient disappearance problem on long text tasks can be effectively alleviated through PCNN, and due to the advantages of parallel convolutional neural networks, time consumption is further reduced, and the commonly used graph convolutional network is used for representation at this stage to obtain the sentence lexical structure and semantic level The information is more suitable for the original ecological expression of the sentence. Combined with multi-task and multi-

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Remote supervision relation extraction method based on multiple tasks and multiple examples
  • Remote supervision relation extraction method based on multiple tasks and multiple examples
  • Remote supervision relation extraction method based on multiple tasks and multiple examples

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0039] See attached figure 1 , the present invention performs remote supervision relationship extraction according to the following steps:

[0040] (1) Data preprocessing

[0041] Select a large-scale dataset marked by remote supervision heuristics, and combine sentences aligned according to the same entity into a package, and then segment each sentence in the package, and use the CBOW model of Word2vec to pre-train the word vector. Each sentence will correspond to a matrix of word vectors.

[0042] 1) Word2vec word vector

[0043] Since the data set is based on English, and due to the natural space word segmentation characteristics of English, each word is used as a token. Second, use the CBOW model in Word2vec to pre-train word vectors. Specifically, Word2vec is a pre-training method based on the bag-of-words model. According to a sentence, a window of an appropriate size is selected. In each window, the CBOW model predicts other unknown words based on the word in the ce...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a remote supervision relationship extraction method based on multiple tasks and multiple examples, which is characterized in that remote supervision relationship extraction iscarried out by adopting a multi-task and multi-example learning architecture, Word2vec word vector pre-training and a multi-example sentence-level attention mechanism method; the method specifically comprises the steps of data preprocessing, input representation, abstract semantic representation, entity type representation, multi-task multi-example relationship extraction and the like. Compared with the prior art, the method is simple and convenient, the problems of noise, insufficient training and data class imbalance are effectively solved, the influence of noise on classification is effectively reduced, the contribution of real sentences to classification is improved, and the method has a certain practical value for relieving the influence of noise and NA on classification.

Description

technical field [0001] The invention relates to the technical field of natural language processing and information extraction of knowledge graphs, in particular to a multi-task and multi-instance-based remote supervision relation extraction method. Background technique [0002] At present, fields including text summarization, machine translation, question answering, and recommendation are inseparable from structured knowledge bases constructed by information extraction. Information extraction, as a natural language processing technology under artificial intelligence, has become a necessary process for knowledge graph construction due to its advantages of efficiently extracting structured knowledge from unstructured data. With the rapid development of the computer Internet and the popularization of mobile terminals, unstructured mass data is increasing rapidly, such as daily chat information, news push, website log data, etc. These unstructured data constitute a large and sma...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/33G06F16/35G06F40/289G06F40/30G06N3/04G06N3/08
CPCG06F16/3344G06F16/35G06F40/289G06F40/30G06N3/08G06N3/048G06N3/045
Inventor 高明王嘉宁蔡文渊徐林昊周傲英
Owner EAST CHINA NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products