Vietnamese Event Entity Recognition Method Fusion Dictionary and Adversarial Transfer

A technology of entity recognition and dictionary, applied in neural learning methods, semantic analysis, natural language translation, etc., can solve problems such as polysemy of a word in bilingual translation, poor sequence feature effect, single semantic representation of target language words, etc. Achieve the effect of improving the effect of entity recognition and improving the effect of entity recognition

Active Publication Date: 2022-07-29
KUNMING UNIV OF SCI & TECH
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Multi-task learning is that all tasks share a coding layer, and knowledge transfer can be performed through the shared coding layer. However, due to the different sequence structures of different languages, when encoding the language information of two different resources at the same time, the encoder cannot guarantee to extract language-independent information. The sequence information of the high-resource language can be better transferred to the label information of the high-resource language; the word-level confrontation realizes the bilingual word embedding representation and only performs confrontation training on the pre-trained word vectors of the two languages ​​to map the two languages ​​​​into the same semantic space. The sequence feature information of the two languages ​​is ignored, and the sequence features of the source language cannot be fully used to assist the target language for entity recognition; bilingual dictionaries realize bilingual word embedding representations, and use large-scale bilingual dictionaries to align the word vector spaces of the source language and the target language, thereby Migrate the source language annotation information to the target language space, but it is relatively difficult to artificially construct a large-scale bilingual dictionary and this method does not consider the polysemy problem of bilingual translation; the two-layer confrontation transfer is based on the BiLSTM-CRF network, using word-level confrontation transfer Integrate two languages ​​into the same semantic space, and use sentence-level adversarial transfer to extract language-independent sequence features, but the semantic representation of target language words is single and the effect of extracting language-independent sequence features is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Vietnamese Event Entity Recognition Method Fusion Dictionary and Adversarial Transfer
  • Vietnamese Event Entity Recognition Method Fusion Dictionary and Adversarial Transfer
  • Vietnamese Event Entity Recognition Method Fusion Dictionary and Adversarial Transfer

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0026] Embodiment 1, as figure 1 As shown, a Vietnamese event entity recognition method integrating dictionary and adversarial transfer, the method includes:

[0027] Step1. In the process of word-level adversarial transfer training, the linear mapping layer and the word-level discriminator are confronted and confused to make the linear mapping layer continuously optimized;

[0028] Step 2. Extract and fuse the target language word-level features, target language character-level features in the target language sentences and the corresponding source language word-level features found through the bilingual dictionary; extract and fuse the source language word-level features and source language words in the source language sentences. Character-level features and word-level features of the source language after the sentence has passed through the optimized linear mapping layer;

[0029] Step 3. In the process of sentence-level adversarial transfer training, the multi-head attenti...

Embodiment 2

[0041] Embodiment 2, as figure 1 As shown, the fusion dictionary and the Vietnamese event entity recognition method against migration, the specific steps of the fusion dictionary and the Vietnamese event entity recognition method against migration are as follows:

[0042] Step1. First, obtain the monolingual corpus of English, Chinese and Vietnamese respectively, and train their respective pre-trained monolingual word vectors through the fasttext tool. English and Chinese were used as the source language and Vietnamese as the target language, respectively. Get the pre-trained target language word vector with pretrained source language word vectors

[0043] in, and target language words word with source language The vector representation of , N and M are the number of words contained in the word vector, d t and d s Represent the dimensions of the target language word vector and the source language word vector, respectively.

[0044] Then use a linear mapping fun...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to a Vietnamese event entity recognition method integrating dictionary and adversarial transfer. The present invention takes Vietnamese as the target language, respectively takes English and Chinese as the source languages, and utilizes entity labeling information of the source language and a bilingual dictionary to improve the entity recognition effect of the target language. The invention first uses word-level adversarial migration to realize the semantic space sharing between the source language and the target language, and then integrates bilingual dictionaries for multi-granularity feature embedding to enrich the semantic representation of words in the target language, and then uses sentence-level adversarial migration to extract language-independent sequences. feature, and finally mark the entity recognition result through CRF. The experimental results on the Vietnamese news dataset show that when the source languages ​​are English and Chinese, the proposed model has improved entity recognition performance compared with the monolingual entity recognition model and the current mainstream transfer learning model. Compared with the monolingual entity recognition model, the F1 value increased by 19.61 and 18.73, respectively.

Description

technical field [0001] The invention relates to a Vietnamese event entity recognition method integrating dictionaries and confrontation migration, and belongs to the technical field of natural language processing. Background technique [0002] The goal of Vietnamese event entity recognition is to identify and assign specific types of labels in Vietnamese news texts, such as names of people, places, organizations, and specific political concepts. At present, most event entity recognition systems use a combination of BiLSTM-CRF based on bidirectional long short-term memory (BiLSTM) network and conditional random field (CRF) for entity recognition. The performance is very low on the task of low-resource linguistic entity recognition. The current method with better effect on the task of low-resource language event entity recognition is to use the idea of ​​transfer learning, that is, to use the annotation information of high-resource languages ​​to improve the labeling effect o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/295G06F40/216G06F40/242G06F40/47G06F40/30G06N3/04G06N3/08G06F16/35
CPCG06F40/295G06F40/216G06F40/242G06F40/47G06F40/30G06N3/08G06F16/35G06N3/045
Inventor 余正涛薛振宇线岩团相艳王红斌
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products