Method and system for automatic treebank transformation based on pattern embedding
A pattern and tree bank technology, applied in the direction of text database indexing, natural language data processing, unstructured text data retrieval, etc., can solve the problems of inability to effectively learn the corresponding rules of norms, lack of double-tree alignment data, and insufficient utilization of source-side tree banks. Sufficient and other issues
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0051] The present embodiment is based on the automatic tree bank conversion method of pattern embedding, comprises:
[0052] Obtaining a double-tree alignment database, which stores sentences marked with two annotation specifications;
[0053] Calculate respectively the arc-score value of the dependence of every two words in the target terminal tree in each described sentence, wherein, the two words described are separated by word w i and the word w j Indicates that the presupposition word w i and the word w j In the target tree, they are respectively modifiers and core words, word w i and the word w j The calculation process of dependent arc minutes in the target tree includes:
[0054] according to the word w i and the word w j In the source tree d src The syntactic relationship in the determination word w i and the word w j mode;
[0055] Based on the pattern and embedding vector correspondence table, the word w i and the word w j The pattern of is transformed...
Embodiment 2
[0071] This embodiment is based on the automatic tree bank conversion method of pattern embedding, on the basis of embodiment 1, also includes: the word w of each sentence in the database based on double tree alignment i and the word w j Data training is performed on the dependent arc score of the target tree to obtain a supervised conversion model, in which the global CRFloss is used to define a loss function for each sentence.
[0072] Biaffine Parser defines a local softmax loss for each word. Considering that the labeled training in this embodiment is usually locally labeled data, the disadvantage of the local loss function is that words without labeled words cannot participate in training at all. In local labeling, existing dependency arcs will affect the probability distribution of other dependency arcs and form certain guidance information. Therefore, the Baiffine Parser is extended to use the global CRF loss to define a loss function for each sentence, so as to make ...
Embodiment 3
[0074] In this embodiment, the automatic treebank conversion system based on pattern embedding, in order to run the method described in the above-mentioned embodiment 1, includes: a dual-tree alignment database, which stores sentences labeled with two annotation specifications;
[0075] The dependent arc-minute value prediction unit of the target tree includes:
[0076] Double tree alignment database, storing sentences tagged with two tagging specifications;
[0077] The dependent arc-minute value prediction unit of the target tree includes:
[0078] Mode judgment module, pre-defined in the target tree, word w i as a modifier, the word w j as the core word, according to the word w i and the word w j In the source tree d src The syntactic relationship in the determination word w i and the word w j mode;
[0079]Pattern embedding vector generation module, based on the pattern and embedding vector correspondence table, word w i and the word w j The pattern of is transfo...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com