Domain relation extraction-oriented tagged corpus generation method

A technology of domain relationship and corpus, applied in the field of natural language processing, can solve problems such as reducing the cost of manual labeling and insufficient initial corpus

Active Publication Date: 2021-09-10
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF17 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a method for generating tagged corpus oriented to domain relation extraction for many problems of the above-mentioned prior art. In this case, generating the annotated corpus required for the relation extraction task greatly reduces the cost of manual annotation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Domain relation extraction-oriented tagged corpus generation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The specific embodiments of the present invention are described below so that those skilled in the art can understand the present invention, but it should be clear that the present invention is not limited to the scope of the specific embodiments. For those of ordinary skill in the art, as long as various changes Within the spirit and scope of the present invention defined and determined by the appended claims, these changes are obvious, and all inventions and creations using the concept of the present invention are included in the protection list.

[0046] This embodiment provides a method for generating tagged corpus oriented to domain relationship extraction, the process of which is as follows figure 1As shown, it includes two stages of model training and corpus generation; this embodiment is aimed at the banking field, and illustrates the tagging corpus generation method applied to the relationship extraction task in this field; the specific steps are as follows:

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of natural language processing, relates to domain relation extraction, particularly provides a domain relation extraction-oriented tagged corpus generation method, and is used for overcoming the problems of corpus lacking and high labor cost faced by domain relation extraction. According to the method, synonymous sentences are generated by utilizing a back translation method, and the synonymous sentences and original sentences are used as training corpora of a sequence generation model, so that the problem of insufficient field corpora is solved; meanwhile, entities in the training corpus are replaced by specific active and passive masking symbols through dependency relationship analysis and language judgment, so that the sequence generation model directly generates a corpus with a tag required by a relationship extraction task; and moreover, by masking the entities, the sequence generation model focuses on learning the relationship between the entities, so that the accuracy of relationship extraction is effectively improved. In conclusion, under the condition of lack of initial corpus, the corpus with the tag required by the relation extraction task can be generated, and the manual tagging cost is greatly reduced.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and in particular relates to a method for generating tagged corpus oriented to domain relation extraction. Background technique [0002] The Knowledge Graph describes the concepts, entities and their relationships in the objective world in a structured form, and expresses the information of the Internet in a form closer to the human cognitive world, providing a better organization and management And the ability to understand the vast amount of information on the Internet. Different fields often need to build a knowledge map in this field. The primary task of building a knowledge map in a domain is domain knowledge extraction; domain knowledge extraction refers to the extraction of specific domain knowledge from different sources and different data, and the formation of knowledge is stored in the knowledge map. the process of. Domain knowledge extraction can be divided into t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/117G06F40/169G06F40/216G06F40/284G06F40/289G06F40/247G06F40/253G06F40/211G06F40/30
CPCG06F40/117G06F40/169G06F40/216G06F40/284G06F40/289G06F40/247G06F40/253G06F40/211G06F40/30
Inventor 甘涛张恒何艳敏王志阳
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products