Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Vietnamese event entity recognition method fusing dictionary and adversarial migration

A technology of entity recognition and dictionary, which is applied in neural learning methods, semantic analysis, natural language translation, etc., can solve the problem of polysemy without considering bilingual translation, poor sequence feature effect, and encoder cannot guarantee extraction and migration. Achieve the effect of improving the effect of entity recognition and improving the effect of entity recognition

Active Publication Date: 2021-06-08
KUNMING UNIV OF SCI & TECH
View PDF11 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Multi-task learning is that all tasks share a coding layer, and knowledge transfer can be performed through the shared coding layer. However, due to the different sequence structures of different languages, when encoding the language information of two different resources at the same time, the encoder cannot guarantee to extract language-independent information. The sequence information of the high-resource language can be better transferred to the label information of the high-resource language; the word-level confrontation realizes the bilingual word embedding representation and only performs confrontation training on the pre-trained word vectors of the two languages ​​to map the two languages ​​​​into the same semantic space. The sequence feature information of the two languages ​​is ignored, and the sequence features of the source language cannot be fully used to assist the target language for entity recognition; bilingual dictionaries realize bilingual word embedding representations, and use large-scale bilingual dictionaries to align the word vector spaces of the source language and the target language, thereby Migrate the source language annotation information to the target language space, but it is relatively difficult to artificially construct a large-scale bilingual dictionary and this method does not consider the polysemy problem of bilingual translation; the two-layer confrontation transfer is based on the BiLSTM-CRF network, using word-level confrontation transfer Integrate two languages ​​into the same semantic space, and use sentence-level adversarial transfer to extract language-independent sequence features, but the semantic representation of target language words is single and the effect of extracting language-independent sequence features is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Vietnamese event entity recognition method fusing dictionary and adversarial migration
  • Vietnamese event entity recognition method fusing dictionary and adversarial migration
  • Vietnamese event entity recognition method fusing dictionary and adversarial migration

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0026] Embodiment 1, as figure 1 As shown, the Vietnamese event entity recognition method of fusion dictionary and confrontation migration, the method includes:

[0027] Step1. During the word-level confrontation transfer training process, the linear mapping layer and the word-level discriminator are confronted and confused so that the linear mapping layer is continuously optimized;

[0028] Step2. Extract and fuse target language word-level features and target language character-level features in target language sentences with corresponding source language word-level features found through bilingual dictionaries; extract and fuse source language word-level features and source language features in source language sentences Character-level features and word-level features of the source language after the sentence passes through the optimized linear mapping layer;

[0029] Step3. During the sentence-level confrontation transfer training process, the multi-head attention feature...

Embodiment 2

[0041] Embodiment 2, as figure 1 As shown, the fusion dictionary and the Vietnamese event entity recognition method against migration, the concrete steps of the Vietnamese event entity recognition method of the fusion dictionary and confrontation migration are as follows:

[0042] Step1. First obtain the monolingual corpora of English, Chinese and Vietnamese respectively, and train their respective pre-trained monolingual word vectors through the fasttext tool. English and Chinese were used as the source language, and Vietnamese was used as the target language. Get the pre-trained target language word vector with the pre-trained source language word vector

[0043] in, and target language words with source language words The vector representation of , N and M are the number of words contained in the word vector, d t and d s Represent the dimensions of the target language word vector and the source language word vector, respectively.

[0044] Then use a linear m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Vietnamese event entity recognition method fusing a dictionary and adversarial migration. The Vietnamese is taken as the target language, the English and the Chinese are respectively taken as the source languages, and the entity identification effect of the target language is improved by utilizing the entity labeling information of the source languages and the bilingual dictionary. According to the method, firstly, semantic space sharing of a source language and a target language is achieved through word-level adversarial migration, then multi-granularity feature embedding is conducted by fusing a bilingual dictionary to enrich semantic representation of target language words, then sequence features irrelevant to languages are extracted through sentence-level adversarial migration, and finally an entity recognition result is marked through CRF. Experimental results on a Vietnamese news data set show that under the condition that source languages are English and Chinese, compared with a monolingual entity recognition model and a current mainstream transfer learning model, the model has the advantages that the entity recognition effect of the provided model is improved, and compared with the monolingual entity recognition model, the model has the advantages that F1 values are increased by 19.61 and 18.73 respectively.

Description

technical field [0001] The invention relates to a Vietnamese event entity recognition method that integrates dictionaries and confrontation migration, and belongs to the technical field of natural language processing. Background technique [0002] The goal of Vietnamese event entity recognition is to identify and assign specific types of labels in Vietnamese news texts, such as names of people, places, organizations, and specific political concepts. At present, most event entity recognition systems use a combination of BiLSTM-CRF based on bidirectional long-short-term memory (BiLSTM) network and conditional random field (CRF) for entity recognition, but this method requires a large amount of training annotation corpus to make entity recognition better. Very low performance on low-resource linguistic entity recognition tasks. At present, the method with good effect on the task of entity recognition of low-resource language events is to use the idea of ​​transfer learning, th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/216G06F40/242G06F40/47G06F40/30G06N3/04G06N3/08G06F16/35
CPCG06F40/295G06F40/216G06F40/242G06F40/47G06F40/30G06N3/08G06F16/35G06N3/045
Inventor 余正涛薛振宇线岩团相艳王红斌
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products