Chunk-based Vietnamese phrase tree construction method
A construction method and phrase tree technology, applied in natural language translation, natural language data processing, special data processing applications, etc., can solve the problems of inconvenient Vietnamese phrase tree database, difficulty of Vietnamese phrase tree database, low accuracy rate, etc. To achieve the effect of improved accuracy, high accuracy and high quality
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0028] Embodiment 1: as figure 1 Shown, based on the Vietnamese phrase tree construction method of chunking, the specific steps of the Vietnamese phrase tree construction method based on chunking are as follows:
[0029] Step1. Firstly, the Vietnamese phrase tree tagging set is tagged with upper-level chunks and lower-level chunks, and the tagged phrase tree is used as a training corpus; the accuracy of the training corpus obtained by using this method is relatively high, so that the use of this The feature set obtained from the training corpus is more accurate;
[0030] Step2. Select the feature set of the upper-level chunks and the lower-level chunks, adjust the CRF model according to the training corpus, train the improved CRF model, use the improved CRF model to build the upper-level chunk and the lower-level chunk model, and combine the upper-level group After the combination of block and basic-level block model, it is converted into a block-based Vietnamese phrase treeb...
Embodiment 2
[0033] Embodiment 2: as figure 1 As shown, the Vietnamese phrase tree construction method based on chunking, this embodiment is the same as Embodiment 1, wherein, as a preferred solution of the present invention, in the step Step1, the Vietnamese phrase tree obtained by manual labeling is grouped at the upper level The specific steps of block and base group block labeling are as follows:
[0034] Step1.1. According to the language characteristics of Vietnamese and combined with CTB, which is the annotation system of Chinese Penn State Treebank, formulate the annotation set of Vietnamese phrase tree;
[0035] Step1.2, combine the definition of the upper-level block and the basic-level block label to complete the labeling of the upper-level block and the basic-level block of the label set of the Vietnamese phrase tree;
[0036] Step1.3. Use the annotated Vietnamese phrase tree composed of upper-level chunks and lower-level chunks as the training corpus.
Embodiment 3
[0037] Embodiment 3: as figure 1 As shown, the method for building a Vietnamese phrase tree based on chunks, this embodiment is the same as Embodiment 2, wherein, as a preferred solution of the present invention, the specific steps of the step Step2 are as follows:
[0038] Step2.1, adjust the CRF model according to the training corpus, and train the improved CRF model;
[0039] Step2.2. Select and set the feature set of the upper block and the lower block;
[0040] Step2.3. Use the feature set of the upper layer block and the lower layer block and the improved CRF model to construct the upper layer block model and the lower layer block model, and then convert the upper layer block model and the lower layer block model into a group-based block Vietnamese phrase treebank construction model;
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com