Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

A Method of Constructing Vietnamese Dependency Treebank Based on Improved Nivre Algorithm

A tree bank and algorithm technology, applied in the field of natural language processing, can solve problems such as the difficulty of labeling Vietnamese sentence dependencies, and achieve the effect of ensuring diversity and improving quality

Active Publication Date: 2019-04-09
KUNMING UNIV OF SCI & TECH
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention provides a method for constructing a Vietnamese language dependency tree bank based on the improved Nivre algorithm, to solve the problem of difficult labeling of Vietnamese sentence dependencies, and to effectively utilize a large number of unmarked Vietnamese sentence sub-level corpus for tree bank construction , which can solve the difficulties brought by the treebank construction due to the small size of the initial training corpus; it can be used to effectively avoid the cumbersome process of manually labeling the dependencies of Vietnamese sentences, which fully saves the time of manpower and material resources; it can be used to effectively improve the Vietnamese language The accuracy of dependency analysis; to provide strong support for upper-level applications such as syntactic analysis, machine translation, and information acquisition of Vietnamese

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method of Constructing Vietnamese Dependency Treebank Based on Improved Nivre Algorithm
  • A Method of Constructing Vietnamese Dependency Treebank Based on Improved Nivre Algorithm
  • A Method of Constructing Vietnamese Dependency Treebank Based on Improved Nivre Algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0065] Embodiment 1: as Figure 1-2 Shown, build the method for Vietnamese language dependency tree bank based on improved Nivre algorithm, the concrete steps of the method for described based on improved Nivre algorithm build Vietnamese language dependency tree bank are as follows:

[0066] The specific steps of the method for constructing the Vietnamese language dependency tree bank based on the improved Nivre algorithm are as follows:

[0067] Step1, first construct the initial training corpus, extended corpus and test corpus;

[0068] Step2, then use the constructed initial training corpus to train two dependency analysis weak learners S1 and S2 as two fully redundant views based on the improved Nivre algorithm;

[0069] Step3, then use the two trained weak learners S1 and S2 to perform dependency analysis on the extended corpus and build a Vietnamese dependency tree bank model;

[0070] Step4. Finally, use the constructed Vietnamese dependency treebank model to conduct ...

Embodiment 2

[0091] Embodiment 2: as Figure 1-2 Shown, build the method for Vietnamese language dependency tree bank based on improved Nivre algorithm, the concrete steps of the method for described based on improved Nivre algorithm build Vietnamese language dependency tree bank are as follows:

[0092] Step1, first construct the initial training corpus, extended corpus and test corpus;

[0093] As a preferred solution of the present invention, the specific steps of the step Step1 are:

[0094] Step1.1. First, use the crawler program to crawl some rough news corpus from the Voice of Vietnam radio station, and obtain Vietnamese text-level corpus samples. The news covers politics, economy, military, sports, entertainment and other aspects, ensuring the diversity of experimental data. Because corpus is a very important concept in the field of natural language processing research, corpus is not only the object of marking, but also the object of experiment, so the selection of corpus is very...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method for establishing a Vietnamese dependency tree bank based on an improved Nivre algorithm, and belongs to the technical field of natural language processing. The method comprises the steps of firstly, establishing an initial training corpus, an expansion corpus and a test corpus; secondly training two dependency parsing weak learners S1 and S2 based on the improved Nivre algorithm by utilizing the established initial training corpus to serve as two fully redundant views; thirdly, performing dependency parsing on the expansion corpus by utilizing the two trained weak learners S1 and S2 and building a Vietnamese dependency tree bank model; and finally, performing dependency parsing testing on the test corpus and finally establishing the Vietnamese dependency tree bank. According to the method, the powerful support can be provided for upper applications of syntactic analysis, machine translation, information acquisition and the like of a Vietnamese language; the process of manually marking a dependency relation of Vietnamese sentences can be effectively avoided, so that the time of manpower and material resources is saved; and a large amount of unmarked Vietnamese sentence level corpora can be effectively utilized for improving the accuracy of dependency parsing.

Description

technical field [0001] The invention relates to a method for constructing a Vietnamese language dependency tree bank based on an improved Nivre algorithm, and belongs to the technical field of natural language processing. Background technique [0002] In the field of Vietnamese information processing research, some achievements have been made in lexical and bilingual alignment methods, but there is still little work in dependency syntax analysis and dependency treebank construction. With the rapid development of statistical learning, the use of statistical learning to study language information processing has become the mainstream. Among them, Lai et al. solved the problem of Chinese dependency analysis through the method of statistical learning based on the idea of ​​span in 2001; Yamada et al. completely converted the English sentences in Penn Treebank into a dependency structure in 2003, and then used the method of statistical learning Modeling and analysis of sentences ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27G06F16/36
CPCG06F16/36G06F40/211
Inventor 余正涛邱国柯郭剑毅文永华王红斌陈玮
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products