A Data Augmentation Algorithm for Chinese Named Entity Recognition Based on Sequence Generative Adversarial Networks

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A named entity recognition and sequence generation technology, applied in the Internet field, can solve problems such as costing a lot of manpower and time, not being solved, and lacking a large amount of labeled data

Active Publication Date: 2021-04-13

BEIJING UNIV OF POSTS & TELECOMM

View PDF12 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0015] 1. Although modifying the structure of the deep model can enhance the semantic representation of the text, it does not solve the problem of lacking a large amount of labeled data

[0016] 2. The introduction of external resources requires a lot of manpower and time to collect external resources, and it is necessary to design effective rules to add external resources to the model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0061] refer to figure 1 , 2 As shown, the present invention provides a method for applying a data enhancement algorithm based on a sequence generation confrontation network to a named entity recognition task. Specifically, during training, the method includes:

[0062] Step 1: Process the sentences in the corpus, divide each sentence into entity and non-entity parts according to the entity label information of the sentence, and add the entity and non-entity parts to the dictionary at the same time. Suppose a text sequence {c 1 ,c 2 ,c 3 ,c 4 ,c 5 ,c 6} label is {O,O,B-PER,I-PER,O,O}, you can put c 1 c 2 ,c 5 c 6 Classified as non-substantial parts, c 3 c 4 into entity parts, and then add them and their corresponding labels to the dictionary.

[0063] Step 2: According to the dictionary formed by entities and non-entities, the entities and non-entities in each sentence are mapped to corresponding indexes in the dictionary to form an index sequence.

[0064] Step ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention provides a method of selecting positive sample data in the source domain data to expand the training data of the target domain by fusing the semantic differences and label differences of the sentences in the source domain and the target domain, so as to enhance the named entity recognition performance of the target domain method. On the basis of the previous Bi‑LSTM+CRF model, in order to fuse the semantic difference and label difference of sentences in the source domain and the target domain, we introduce the semantic difference and label difference through the state representation and reward setting in reinforcement learning, so that the training The decision-making network can select sentences that have a positive impact on the performance of named entity recognition in the target domain in the data of the source domain, expand the training data of the target domain, solve the problem of insufficient training data in the target domain, and improve the named entity recognition of the target domain performance.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a method of using a sequence generation confrontation network to enhance data and improve the performance of Chinese named entity recognition. Background technique [0002] In recent years, deep learning has made great progress in image, speech and natural language processing. As an emerging technology of machine learning algorithms, deep learning is motivated by the establishment of a neural network that simulates the human brain for analysis and learning. In the field of images, people use deep neural networks to realize target detection in images, such as combining convolutional neural networks with candidate windows to detect pedestrians in images; in the field of speech, deep learning is used for speech synthesis and recognition provide us with an intelligent voice system; in the field of natural language processing, deep learning is applied to various life scenarios, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06F40/295G06F40/216G06F16/31G06F16/35G06F16/36G06N3/04G06N3/08

CPCG06F40/295G06F40/216G06F16/316G06F16/35G06F16/36G06N3/049G06N3/084G06N3/045

Inventor李思王蓬辉李明正孙忆南

OwnerBEIJING UNIV OF POSTS & TELECOMM

A Data Augmentation Algorithm for Chinese Named Entity Recognition Based on Sequence Generative Adversarial Networks

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology