Method and system for transferring tree bank

A technology for treebank and structure transformation, applied in the fields of instrumentation, computing, and electrical digital data processing, etc., can solve the problems of not considering part-of-speech tagging set transformation, and the inconsistency of dependent treebank systems, so as to increase the scale of treebanks and improve performance Effect

Inactive Publication Date: 2010-12-08
BEIJING KINGSOFT SOFTWARE +2
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0015] In order to solve the problem that the system of the existing converted dependency tree bank is not unified, and the conversion of the part-of-speech tagging set is not considered in the conversion process, the present invention provides a tree bank The transformation method and system of the library convert PennChinese Treebank into HIT-IR-CDT, and the converted treebank can be easily merged with the original HIT-IR-CDT, thereby increasing the size of the treebank and effectively improving the syntax Profiler performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for transferring tree bank
  • Method and system for transferring tree bank
  • Method and system for transferring tree bank

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0077] The invention provides a tree bank conversion method, which converts Penn Chinese Treebank into HIT-IR-CDT, and the converted tree bank can be easily merged with the original HIT-IR-CDT, thereby increasing the scale of the tree bank. And then effectively improve the performance of the syntax analyzer.

[0078] see figure 2 with image 3 , figure 2 It is the flow chart of the first embodiment of the transformation method of the tree bank according to the present invention; image 3 It is a flow chart of establishing a training dependency mapping model according to the present invention.

[0079] The transformation method of the tree bank described in the first embodiment of the present invention comprises the following steps:

[0080] S100. Transform the phrase structure of Penn Chinese Treebank into a dependency structure.

[0081] According to the pre-established Head (core node) mapping table, determine the core node of grammar derivation in the phrase structur...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a transforming method of a treebank and the method comprises: the PennChineseTreebank phase structure is transformed into the dependancy structure; the part of speech tagging set of the PennChineseTreebank is transformed into the part of speech tagging set of 863; the dependency relation of the flat phase structure of the PennChineseTreebank dependency relation is analyzedby a syntactic analyzer of HIT-IR-CDT; a mapping model with the dependency relation is trained by establishing an HIT-IR-CDT treebank in advance and the PennChineseTreebank is transformed by the denpendency relation, thus forming a transformed dependency structure tree. The invention also discloses a treebank transforming system; the invention provides a transforming method of the treebank and a system thereof, which leads the transformed treebank to be merged with the original HIT-IR-CDT, thus increasing the scale of the treebank and improving the performance of the syntactic analyzer.

Description

technical field [0001] The invention relates to tree bank transformation, especially a method and system for transforming Chinese phrase structure tree bank. Background technique [0002] Syntactic analysis is a very important research direction in the field of natural language processing. In the syntactic analysis method based on statistics, according to the different corpus used, it can be divided into guided method and unsupervised method. The guided method needs to manually mark some sentences as training data according to certain grammatical norms, and then obtain the knowledge required for syntactic analysis from the training data through various probability statistics methods or machine learning methods. The unguided method uses unlabeled data for training, and automatically learns grammatical rules according to a certain mechanism. [0003] Guided syntactic analysis is the mainstream method now, and it has achieved high accuracy in English and other languages. In ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
Inventor 李正华高立琦刘挺王海洲
Owner BEIJING KINGSOFT SOFTWARE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products