Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chunk-based Vietnamese phrase tree construction method

A construction method and phrase tree technology, applied in natural language translation, natural language data processing, special data processing applications, etc., can solve the problems of inconvenient Vietnamese phrase tree database, difficulty of Vietnamese phrase tree database, low accuracy rate, etc. To achieve the effect of improved accuracy, high accuracy and high quality

Active Publication Date: 2016-12-07
KUNMING UNIV OF SCI & TECH
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention provides a method for constructing a Vietnamese phrase tree based on chunks, which is used to solve the problem of manual labeling of the Vietnamese phrase tree library, the inconvenience of constructing a large-scale Vietnamese phrase tree library, and the traditional construction of a Vietnamese phrase tree library. The problem of low accuracy and time-consuming of the tree bank method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chunk-based Vietnamese phrase tree construction method
  • Chunk-based Vietnamese phrase tree construction method
  • Chunk-based Vietnamese phrase tree construction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0028] Embodiment 1: as figure 1 Shown, based on the Vietnamese phrase tree construction method of chunking, the specific steps of the Vietnamese phrase tree construction method based on chunking are as follows:

[0029] Step1. Firstly, the Vietnamese phrase tree tagging set is tagged with upper-level chunks and lower-level chunks, and the tagged phrase tree is used as a training corpus; the accuracy of the training corpus obtained by using this method is relatively high, so that the use of this The feature set obtained from the training corpus is more accurate;

[0030] Step2. Select the feature set of the upper-level chunks and the lower-level chunks, adjust the CRF model according to the training corpus, train the improved CRF model, use the improved CRF model to build the upper-level chunk and the lower-level chunk model, and combine the upper-level group After the combination of block and basic-level block model, it is converted into a block-based Vietnamese phrase treeb...

Embodiment 2

[0033] Embodiment 2: as figure 1 As shown, the Vietnamese phrase tree construction method based on chunking, this embodiment is the same as Embodiment 1, wherein, as a preferred solution of the present invention, in the step Step1, the Vietnamese phrase tree obtained by manual labeling is grouped at the upper level The specific steps of block and base group block labeling are as follows:

[0034] Step1.1. According to the language characteristics of Vietnamese and combined with CTB, which is the annotation system of Chinese Penn State Treebank, formulate the annotation set of Vietnamese phrase tree;

[0035] Step1.2, combine the definition of the upper-level block and the basic-level block label to complete the labeling of the upper-level block and the basic-level block of the label set of the Vietnamese phrase tree;

[0036] Step1.3. Use the annotated Vietnamese phrase tree composed of upper-level chunks and lower-level chunks as the training corpus.

Embodiment 3

[0037] Embodiment 3: as figure 1 As shown, the method for building a Vietnamese phrase tree based on chunks, this embodiment is the same as Embodiment 2, wherein, as a preferred solution of the present invention, the specific steps of the step Step2 are as follows:

[0038] Step2.1, adjust the CRF model according to the training corpus, and train the improved CRF model;

[0039] Step2.2. Select and set the feature set of the upper block and the lower block;

[0040] Step2.3. Use the feature set of the upper layer block and the lower layer block and the improved CRF model to construct the upper layer block model and the lower layer block model, and then convert the upper layer block model and the lower layer block model into a group-based block Vietnamese phrase treebank construction model;

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a chunk-based Vietnamese phrase tree construction method, and belongs to the technical field of natural language processing. The method comprises the following steps of: firstly carrying out upper-layer chunk labeling and basic-layer chunk labeling on a Vietnamese phrase tree label set; selecting feature sets of an upper-layer chunk and a basic-layer chunk, and constructing a chunk-based Vietnamese phrase tree library construction model; carrying out chunk analysis on word-segmented Vietnamese sentences by utilizing a chunk analysis tool, so as to obtain a chunk construction-based primary Vietnamese phrase tree library; and correcting the chunk construction-based primary Vietnamese phrase tree library by utilizing a phrase tree library corrector, so as to obtain a corrected final Vietnamese phrase tree library. According to the method provided by the invention, the process of manually collecting and labelling the Vietnamese phrase tree libraries is avoided, and the manpower and the time of constructing the tree libraries are saved; and compared with the method for constructing Vietnamese phrase tree libraries by adoption of context-free grammars and maximum entropies, the phrase tree construction method disclosed by the invention has an advantage of remarkably improving the correctness.

Description

technical field [0001] The invention relates to a method for building a Vietnamese phrase tree based on chunks, and belongs to the technical field of natural language processing. Background technique [0002] The analysis and construction of the phrase tree bank plays a very important role in linguistic research, such as the extraction of syntactic patterns and the investigation of linguistic phenomena; at the same time, it is usually used to train systems such as word segmentation tools, syntactic analyzers, and semantic role taggers , and these systems are the basis for applications such as information extraction, machine translation, question answering systems, and text classification. In recent years, with the rapid development of machine learning methods and artificial intelligence, the automatic construction of phrase treebanks has become more and more important. [0003] Short sentence parsing is to automatically deduce the grammatical structure of a sentence accordi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/28
CPCG06F40/42G06F40/253G06F40/205G06F40/289
Inventor 郭剑毅李英余正涛线岩团毛存礼陈玮
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products