Method for constructing Vietnamese dependency tree bank on basis of Chinese-Vietnamese vocabulary alignment corpora

A technology of word alignment and corpus, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of building Vietnamese dependency treebanks such as scarcity and difficulty in syntactic analysis of dependencies, so as to save manpower and build treebanks The effect of time and accuracy improvement

Inactive Publication Date: 2015-10-21
KUNMING UNIV OF SCI & TECH
View PDF5 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention provides a method for constructing a Vietnamese dependency tree bank based on a Chinese-Vietnamese word alignment corpus, which is used to solve the problem that the prior art is difficult for studying Vietnamese dependency syntax analysis, and building a Vietnamese dependency tree bank is relatively scarce. Problem, the Vietnamese language dependency tree bank constructed by the present invention can provide strong support for upper-level applications such as syntactic analysis, machine translation, and information acquisition of Vietnamese

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for constructing Vietnamese dependency tree bank on basis of Chinese-Vietnamese vocabulary alignment corpora
  • Method for constructing Vietnamese dependency tree bank on basis of Chinese-Vietnamese vocabulary alignment corpora
  • Method for constructing Vietnamese dependency tree bank on basis of Chinese-Vietnamese vocabulary alignment corpora

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0025] Embodiment 1: as Figure 1-3 Shown, a kind of method based on Chinese-Vietnamese word alignment corpus constructs Vietnamese language dependency tree bank, described concrete steps of the method for building Vietnamese language dependency tree bank based on Chinese-Vietnamese word alignment corpus are as follows:

[0026] Step1. First, build a Chinese-Vietnamese word alignment parallel sentence pair database;

[0027] Step1.1, first collect Chinese-Vietnamese parallel sentence pairs;

[0028] Step1.2. Construction of Chinese-Vietnamese parallel sentence pair library with word alignment; use GIZA++ for word alignment training on Chinese-Vietnamese parallel sentence pairs, and then obtain Chinese-Vietnamese word-aligned parallel sentence pair library through manual adjustment;

[0029] Step2. Build a Chinese dependency tree corpus;

[0030] Step2.1. Perform Chinese sentence segmentation processing on the Chinese-Vietnamese word alignment parallel sentence pair library; ...

Embodiment 2

[0036] Embodiment 2: as Figure 1-3 Shown, a kind of method based on Chinese-Vietnamese word alignment corpus constructs Vietnamese language dependency tree bank, described concrete steps of the method for building Vietnamese language dependency tree bank based on Chinese-Vietnamese word alignment corpus are as follows:

[0037] Step1. First, build a Chinese-Vietnamese word alignment parallel sentence pair database;

[0038] Step1.1, first collect Chinese-Vietnamese parallel sentence pairs;

[0039] Step1.2. Construction of Chinese-Vietnamese parallel sentence pair library with word alignment; use GIZA++ for word alignment training on Chinese-Vietnamese parallel sentence pairs, and then obtain Chinese-Vietnamese word-aligned parallel sentence pair library through manual adjustment;

[0040] Step2. Build a Chinese dependency tree corpus;

[0041] Step2.1. Perform Chinese sentence segmentation processing on the Chinese-Vietnamese word alignment parallel sentence pair library; ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to a method for constructing a Vietnamese dependency tree bank on the basis of Chinese-Vietnamese vocabulary alignment corpora and belongs to the technical field of natural language processing. According to the present invention, firstly, a Chinese-Vietnamese vocabulary alignment sentence pair library is constructed; then a Chinese dependency tree corpus is constructed; and according to the constructed Chinese-Vietnamese vocabulary alignment sentence pair library and Chinese dependency tree corpus, a Vietnamese dependency tree corpus is constructed. The Vietnamese dependency tree bank constructed by the method can provide powerful support for upper layer applications of syntactic analysis, machine translation, information acquisition and the like; a bilingual parallel dependency tree corpus is constructed; according to the method for constructing a dependency tree, which is disclosed by the present invention, the process of manually collecting and labeling the Vietnamese dependency tree bank is simplified and labor and time of constructing the tree bank are saved; and compared with a method adopting a machine to carry out learning, the method for constructing a dependency tree, which is disclosed by the present invention, is obviously improved in accuracy.

Description

technical field [0001] The invention relates to a method for constructing a Vietnamese dependency tree bank based on a Chinese-Vietnamese word alignment corpus, and belongs to the technical field of natural language processing. Background technique [0002] The China-ASEAN Free Trade Area is the most populous free trade area in the world. The "Bridgehead Strategy" is a strategic need to promote my country's southwest development and realize good-neighborliness and friendship with ASEAN countries. Yunnan is an important bridgehead for China's opening to the southwest. Language Communication is the prerequisite for political, cultural and economic exchanges between China and ASEAN countries. Vietnam, a member of ASEAN, is connected by mountains and rivers to Yunnan. The people of the two countries have a long history of exchanges. Language communication has played a very important role in the friendly coexistence and mutual learning of the people on the border between the two s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/28
Inventor 余正涛李发杰郭剑毅
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products