Sequence labeling-based text relation extraction method

A technology of relationship extraction and sequence labeling, applied in the field of data processing, can solve problems such as exposure deviation, non-compliance with applications, failure to consider correlation, etc., and achieve the effect of improving accuracy and recall

Active Publication Date: 2021-07-30
SHANDONG COMP SCI CENTNAT SUPERCOMP CENT IN JINAN
View PDF8 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the traditional method, there are the following three shortcomings: 1) Some relationship extraction methods regard the task of relationship extraction as a relationship classification task, that is, given the marked entities in the text, and classify the relationship between these entities, this The method is not suitable for practical application
2) There are also some methods that simplify the relationship extraction problem, only solve the single relationship extraction problem, and do not take into account the actual situation; in fact, multiple relationship extraction is the most common problem in our practical applications
3) The Taoflow method divides the relation extraction task into two separate tasks, namely, entity recognition and relation classification tasks. This method does not take into account the correlation between these two tasks, resulting in the problem of exposure bias, that is, in a certain order When producing the final result, the result of the later step will be affected by the result of the previous step

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sequence labeling-based text relation extraction method
  • Sequence labeling-based text relation extraction method
  • Sequence labeling-based text relation extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] A method for extracting textual relations based on sequence labeling that this embodiment involves includes the following steps:

[0039] Step 1. Preset all possible entity relationship categories and establish a relationship set R: for business scenarios, determine key knowledge information in this field through expert argumentation, or determine important attributes of service objects in business scenarios. For example, in the attribute analysis of medicines in the medical field, its important attributes include: indications, taking methods, adverse reactions, precautions, etc., based on which four or more preset relationships of medicines can be obtained and added to the relationship set R.

[0040] Step 2. Construct a training data set applicable to the business field: First, we define a complete training data set, which must contain the original sentence S and all ternary relationships contained in S that exist in the relation set R. Secondly, given a paragraph or...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of data processing, in particular to a sequence labeling-based text relation extraction method, which comprises the following steps of constructing a training data set similar to prediction data, and presetting all possible bidirectional entity relations and three fixed dependency relations; segmenting the input sentence into word sequences, and inputting the word sequences into a pre-training model to obtain a representation vector of a word in each sentence; forming a unique word pair sequence by using a mode similar to handshake for the word vector sequence; inputting the obtained vector pair sequence into a neural network classification layer; calculating loss and carrying out back propagation; judging the category of each word pair, and judging whether the word pair has the relation corresponding to the position or not; and decoding the final result by using the pseudo-code displayed in the drawing according to the corresponding relation, and finally obtaining all extracted triples. According to the method, two tasks including entity identification and relationship classification can be completed at the same time. And the extraction accuracy and the recall rate are remarkably improved and are greatly improved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a text relation extraction method based on sequence annotation. Background technique [0002] As a knowledge representation that can be stored in a structured way, triplet relations have been widely used in the field of natural language understanding and have played a great role. In the field of natural language text, a piece of text knowledge can always be represented by one or more triples. Then extract the structured triplet relationship from one or more paragraphs of text, that is, transform the knowledge represented by the discrete text into a form that the machine can understand or store. This task is called relation extraction. [0003] Relation extraction methods, from early statistical methods to recent neural network methods, have been developed for many years, and relation extraction tasks have also evolved from simple single relation extraction to overlapping...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/117G06F40/211G06F40/242G06K9/62G06N3/04G06N3/08
CPCG06F40/117G06F40/211G06F40/242G06N3/084G06N3/045G06F18/241Y02D10/00
Inventor 展一鸣李钊吴士伟李慧娟辛国茂陈通胡传会张超赵秀浩
Owner SHANDONG COMP SCI CENTNAT SUPERCOMP CENT IN JINAN
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products