Dependency syntactic analysis method fusing multi-strategy data enhancement under low-resource condition

A technology that relies on syntax and analysis methods, applied in neural learning methods, electrical digital data processing, natural language data processing, etc. Improve generalization ability and alleviate the effect of overfitting

Active Publication Date: 2022-01-07
KUNMING UNIV OF SCI & TECH
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a dependency syntactic analysis method that integrates multi-strategy data enhancement under low resource conditions to be used for dependency syntactic analysis under low resource conditions such as Thai, Vietnamese, and small-scale English. Problems such as too high and model overfitting lead to poor results of dependency syntax analysis

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Dependency syntactic analysis method fusing multi-strategy data enhancement under low-resource condition
  • Dependency syntactic analysis method fusing multi-strategy data enhancement under low-resource condition
  • Dependency syntactic analysis method fusing multi-strategy data enhancement under low-resource condition

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] Embodiment 1: as figure 1 , figure 2 and image 3 As shown, under low resource conditions, the fusion of multi-strategy data enhanced dependency syntax analysis, the specific steps of the method are as follows:

[0051] Step1. Process the dependent syntactic analysis data obtained from the Thai, Vietnamese and small-scale English corpora of the UD dataset, obtain the synonymous information of words in the three languages ​​from the Babelnet website, and construct synonymous words based on the synonymous information dictionary.

[0052] Step1.1, count the words in the training data of Thai, Vietnamese and English, and collect corresponding synonym information from the Babelnet website for these words, including synonyms and corresponding parts of speech.

[0053] Step1.2. Filter the synonyms information. The filtering method is as follows: (1). Classify and divide the obtained synonyms according to the part of speech, so as to ensure that when the subsequent replacem...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a dependency syntactic analysis method fusing multi-strategy data enhancement under a low-resource condition, and belongs to the field of natural language processing. The method comprises the following steps: constructing homomorphic synonym dictionary of Thai, Vietnamese and English; carrying out synonym replacement on small-scale UD (Universal Dependencies treebanks) data sets of three languages by utilizing a synonym dictionary so as to expand training data; and performing mixup on original words and synonyms in the training data in different stages of model training by utilizing various mix data enhancement strategies to generate virtual new words for subsequent training. According to the method, various data enhancement strategies are provided for the low-resource-dependency syntactic analysis problem. According to the method, training data are effectively expanded through synonym replacement, and the problem of unknown words is relieved. Through a plurality of mixup data enhancement strategies, the problem of model overfitting is effectively relieved, and the generalization ability of the model is improved.

Description

technical field [0001] The invention relates to a dependency syntax analysis method for fusion multi-strategy data enhancement under low resource conditions, and belongs to the field of natural language processing. Background technique [0002] In natural language processing, dependency parsing aims to identify syntactic dependencies between words in a sentence. Dependency syntax can provide syntactic features for tasks such as information extraction, automatic question answering, and machine translation to improve model performance. [0003] Although the existing dependency parsing methods have carried out a lot of research work on feature encoding, dependency scoring and decoding, they have also effectively improved the effect of dependency parsing. However, under low-resource conditions, the performance of existing models and methods is difficult to obtain good analysis results. This problem is especially evident in low-resource languages ​​such as Thai and Vietnamese. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/211G06F40/242G06F40/247G06N3/04G06N3/08
CPCG06F40/211G06F40/242G06F40/247G06N3/08G06N3/044
Inventor 线岩团高凡雅余正涛相艳
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products