Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Semantic and label difference fused named entity identification field self-adaption method

A technology for named entity recognition and labeling, which is applied in the Internet field and can solve the problems of not fully considering the large difference of sentence semantic vectors, not considering the influence of label differences, and label set differences.

Active Publication Date: 2020-02-07
BEIJING UNIV OF POSTS & TELECOMM
View PDF7 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] In the field migration of Chinese named entity recognition, there are two problems: one is that there are great differences in the semantics of sentences in the corpus, and the other is that there are differences in the label sets of sentences in the corpus, which are caused by different labeling rules. of
[0016] 1. The traditional way of judging whether the current source domain sentence is conducive to the training of the target domain naming recognition model based on the semantic similarity between the source domain and the target domain does not take into account the impact of entity label differences
[0017] 2. When using the label transfer relationship between the source domain and the target domain for migration, the situation that the semantic vectors of the sentences in the source domain and the target domain are too different is not fully considered

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic and label difference fused named entity identification field self-adaption method
  • Semantic and label difference fused named entity identification field self-adaption method
  • Semantic and label difference fused named entity identification field self-adaption method

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0069] refer to figure 1 , 2 As shown, the present invention provides a method for integrating semantics and label differences to perform domain transfer on named entity recognition tasks. Specifically, during training, the method includes:

[0070] Step 1. Preprocess the sentences in the source domain and the target domain corpus, remove URLs and special symbols, and perform traditional and simplified conversion to convert all sentences in the corpus into simplified Chinese.

[0071] Step 2: Process the labels of the sentences in the source domain corpus to unify the entity label sets of the source domain and the target domain. Specifically, the PER tag in the source domain is changed to PER.NAM, the LOC tag is changed to LOC.NAM, and the ORG tag is changed to ORG.NAM, while the O tag remains unchanged.

[0072] Step 3: The sentences in the source domain and the sentences in the target domain are mapped into vector representations according to the same dictionary, and are u...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for selecting positive sample data in source domain data to extend training data of a target domain by fusing semantic difference and label difference of sentences in the source domain and the target domain, so as to achieve the purpose of enhancing named entity recognition performance of the target domain. Based on a conventional Bi-LSTM+CRF model, in order to fusesemantic differences and label differences of sentences in a source domain and a target domain, semantic difference and label difference are introduced through state representation and reward settingin reinforcement learning; therefore, the trained decision network can select sentences having positive influence on the named entity recognition performance of the target domain in the data of the source domain, expand the training data of the target domain, solve the problem of insufficient training data of the target domain, and improve the named entity recognition performance of the target domain at the same time.

Description

technical field [0001] The invention relates to the field of Internet technology, in particular to a fusion of semantic differences and label differences between domains, and domain migration on named entity recognition tasks. Background technique [0002] In recent years, deep learning and machine learning have made great progress in computer vision and natural language processing. In terms of computer vision, people use deep neural networks to classify images, such as using convolutional neural networks to recognize handwritten digits, and in this regard, the accuracy rate exceeds that of human self-recognition; in terms of natural language processing, deep learning is more It is applied in various life scenarios, such as using neural networks to analyze users' browsing records and consumption behaviors, recommending products that users may like, and using a large number of parallel corpora to train translation systems, so that machines can achieve high-level translation c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/30G06N3/08G06N3/04
CPCG06N3/084G06N3/044G06N3/045
Inventor 李思王蓬辉徐雅静李明正孙忆南
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products