A Method of Constructing Vietnamese Dependency Treebank Based on Improved Nivre Algorithm
A tree bank and algorithm technology, applied in the field of natural language processing, can solve problems such as the difficulty of labeling Vietnamese sentence dependencies, and achieve the effect of ensuring diversity and improving quality
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0065] Embodiment 1: as Figure 1-2 Shown, build the method for Vietnamese language dependency tree bank based on improved Nivre algorithm, the concrete steps of the method for described based on improved Nivre algorithm build Vietnamese language dependency tree bank are as follows:
[0066] The specific steps of the method for constructing the Vietnamese language dependency tree bank based on the improved Nivre algorithm are as follows:
[0067] Step1, first construct the initial training corpus, extended corpus and test corpus;
[0068] Step2, then use the constructed initial training corpus to train two dependency analysis weak learners S1 and S2 as two fully redundant views based on the improved Nivre algorithm;
[0069] Step3, then use the two trained weak learners S1 and S2 to perform dependency analysis on the extended corpus and build a Vietnamese dependency tree bank model;
[0070] Step4. Finally, use the constructed Vietnamese dependency treebank model to conduct ...
Embodiment 2
[0091] Embodiment 2: as Figure 1-2 Shown, build the method for Vietnamese language dependency tree bank based on improved Nivre algorithm, the concrete steps of the method for described based on improved Nivre algorithm build Vietnamese language dependency tree bank are as follows:
[0092] Step1, first construct the initial training corpus, extended corpus and test corpus;
[0093] As a preferred solution of the present invention, the specific steps of the step Step1 are:
[0094] Step1.1. First, use the crawler program to crawl some rough news corpus from the Voice of Vietnam radio station, and obtain Vietnamese text-level corpus samples. The news covers politics, economy, military, sports, entertainment and other aspects, ensuring the diversity of experimental data. Because corpus is a very important concept in the field of natural language processing research, corpus is not only the object of marking, but also the object of experiment, so the selection of corpus is very...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com