Unlock instant, AI-driven research and patent intelligence for your innovation.

A Distant Supervised Relation Extraction Method Based on Multi-instance Cooperative Adversarial Training

A technology of remote supervision and relation extraction, applied in neural learning methods, instruments, biological neural network models, etc., can solve the problem of not giving full play to sentences with low attention scores, information not being used by multi-instance learning frameworks, sacrificing data utilization, etc. question

Active Publication Date: 2021-01-19
ZHEJIANG UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Although multi-instance learning alleviates the data noise problem to a certain extent, it actually sacrifices a certain amount of data utilization.
Specifically, in order to obtain a more reliable bag-level representation, multi-instance learning only focuses on those sentences with high attention scores, and does not give full play to the role of a large number of sentences with low attention scores.
In fact, the attention score distribution of the sentences in the bag is a long-tailed distribution, and the attention scores of most sentences in the bag are relatively low, which means that there is a large amount of potential information that has not been utilized by the multi-instance learning framework.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Distant Supervised Relation Extraction Method Based on Multi-instance Cooperative Adversarial Training
  • A Distant Supervised Relation Extraction Method Based on Multi-instance Cooperative Adversarial Training
  • A Distant Supervised Relation Extraction Method Based on Multi-instance Cooperative Adversarial Training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0103] The steps in this embodiment are the same as the aforementioned steps, that is, steps S1-S6, which will not be repeated here. Part of the implementation process and results are shown below:

[0104] This embodiment uses the widely used dataset—NYT10 dataset in the field of remote supervision relation extraction as a test benchmark. The dataset is aligned with the knowledge base method through remote supervision, and the 2005-2006 New York Times text is marked as the training set, and the 2007 New York Times text is marked as the test set. The training set contains a total of 522,611 sentences, 281,270 entity-relationship pairs and 18,252 relational triples. Correspondingly, the test set contains 172,448 sentences, 96,678 relation pairs and 1,950 relation triples. In this embodiment, the hyperparameters are set as follows: score threshold T α is 0.1, the radius of the first neighborhood is 0.02, the second neighborhood radius for 10 -6 , the weight coefficient ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a remote supervisory relationship extraction method based on multi-instance cooperative confrontation training to solve the problem of low data usage efficiency in the traditional multi-instance learning framework in the remote supervisory relationship extraction task. The problem of low data usage is caused by the tendency of multi-instance learning frameworks to only focus on high-quality sentences within the bag, while ignoring a large number of potentially noisy sentences. The method of the present invention cooperates with virtual confrontation training and confrontation training to respectively constrain the noise samples in the package and the accurate characteristics of the package level, and further strengthen the model performance while solving the problem of data utilization. This method is better than some mainstream correlation algorithms in recent years.

Description

technical field [0001] The invention relates to natural language processing, in particular to a remote supervision relation extraction method based on multi-instance cooperative confrontation training. Background technique [0002] Natural Language Processing (NLP) is an interdisciplinary subject integrating linguistics and computer science. Relation Extraction (RE) is a key subtask in information extraction, and it plays a vital role in the process of building an automated knowledge base. Its main goal is to judge the relationship category between entity pairs based on the given context (Context) sentences and specified entity (Entity) pairs. For non-relational entity pairs, a special relationship category (NA) is generally used to refer to them. [0003] Traditional relational extraction models rely on a large amount of manually labeled data, and the process of obtaining these data is often extremely time-consuming and laborious. Therefore, the distance supervision meth...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/20G06F40/295G06N3/08
CPCG06N3/08G06F40/20G06F40/295
Inventor 庄越挺汤斯亮肖俊陈涛吴飞李晓林谭炽烈蒋韬
Owner ZHEJIANG UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More