Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for automatic treebank transformation based on pattern embedding

A pattern and tree bank technology, applied in the direction of text database indexing, natural language data processing, unstructured text data retrieval, etc., can solve the problems of inability to effectively learn the corresponding rules of norms, lack of double-tree alignment data, and insufficient utilization of source-side tree banks. Sufficient and other issues

Active Publication Date: 2021-06-22
SUZHOU UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The main problem of the indirect method is that the source-end treebank is not fully utilized, and it cannot effectively describe the corresponding rules between norms; while the direct method based on transformation is limited by the lack of double-tree alignment data, and cannot effectively learn the relationship between norms. Corresponding law, so the conversion effect is average

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for automatic treebank transformation based on pattern embedding
  • Method and system for automatic treebank transformation based on pattern embedding
  • Method and system for automatic treebank transformation based on pattern embedding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0051] The present embodiment is based on the automatic tree bank conversion method of pattern embedding, comprises:

[0052] Obtaining a double-tree alignment database, which stores sentences marked with two annotation specifications;

[0053] Calculate respectively the arc-score value of the dependence of every two words in the target terminal tree in each described sentence, wherein, the two words described are separated by word w i and the word w j Indicates that the presupposition word w i and the word w j In the target tree, they are respectively modifiers and core words, word w i and the word w j The calculation process of dependent arc minutes in the target tree includes:

[0054] according to the word w i and the word w j In the source tree d src The syntactic relationship in the determination word w i and the word w j mode;

[0055] Based on the pattern and embedding vector correspondence table, the word w i and the word w j The pattern of is transformed...

Embodiment 2

[0071] This embodiment is based on the automatic tree bank conversion method of pattern embedding, on the basis of embodiment 1, also includes: the word w of each sentence in the database based on double tree alignment i and the word w j Data training is performed on the dependent arc score of the target tree to obtain a supervised conversion model, in which the global CRFloss is used to define a loss function for each sentence.

[0072] Biaffine Parser defines a local softmax loss for each word. Considering that the labeled training in this embodiment is usually locally labeled data, the disadvantage of the local loss function is that words without labeled words cannot participate in training at all. In local labeling, existing dependency arcs will affect the probability distribution of other dependency arcs and form certain guidance information. Therefore, the Baiffine Parser is extended to use the global CRF loss to define a loss function for each sentence, so as to make ...

Embodiment 3

[0074] In this embodiment, the automatic treebank conversion system based on pattern embedding, in order to run the method described in the above-mentioned embodiment 1, includes: a dual-tree alignment database, which stores sentences labeled with two annotation specifications;

[0075] The dependent arc-minute value prediction unit of the target tree includes:

[0076] Double tree alignment database, storing sentences tagged with two tagging specifications;

[0077] The dependent arc-minute value prediction unit of the target tree includes:

[0078] Mode judgment module, pre-defined in the target tree, word w i as a modifier, the word w j as the core word, according to the word w i and the word w j In the source tree d src The syntactic relationship in the determination word w i and the word w j mode;

[0079]Pattern embedding vector generation module, based on the pattern and embedding vector correspondence table, word w i and the word w j The pattern of is transfo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an automatic tree bank conversion method and system based on pattern embedding, which is designed to obtain an accurate supervised conversion model. The present invention is based on the automatic tree bank conversion method of pattern embedding, and determines the word w i and the word w j pattern; the word w i and the word w j The pattern of is transformed into the corresponding pattern embedding vector; the word w in the source tree i , word w j , the smallest common ancestor node w a The dependency labels corresponding to the three are respectively transformed into dependency embedding vectors; the pattern embedding vector and the three dependency embedding vectors are spliced ​​together as the word w in the source tree i and the word w j The representation vector of the structural information of the cyclic neural network, the top-level output of the recurrent neural network is spliced ​​with the representation vector respectively, and used as the input of the perceptron MLP; the word w is obtained by biaffine calculation i and the word w j The target end depends on the arc-score value; the invention makes full use of the source-end syntax tree to describe the corresponding laws of the two labeling specifications, and finally completes the high-quality tree bank conversion.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and in particular relates to an automatic tree bank conversion method and system based on pattern embedding. Background technique [0002] At present, researchers have carried out a lot of research and development work in tree bank research, and have also achieved considerable results. The annotation systems used by these treebanks are very different, and they can be roughly divided into two types according to the description method, one is the phrase structure tree, and the other is the dependency tree. [0003] For the dependency tree, the dependency annotations of the two treebanks follow different annotation specifications, and the two treebanks are said to be heterogeneous. Many mainstream languages ​​in the world have multiple large-scale heterogeneous treebanks. Since the construction of treebanks requires a very high labor cost, how to use different heterogeneous tree...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/31G06F40/284
CPCG06F40/284
Inventor 李正华章波江心舟张民陈文亮
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products