Training-corpus quality evaluation and selection method orienting to statistical-machine translation
A technology for statistical machine translation and quality evaluation, applied in the field of training corpus quality evaluation and selection for statistical machine translation, and can solve problems such as unavailability, time-consuming and labor-intensive
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0070] The present invention will be further elaborated below in conjunction with the accompanying drawings of the description.
[0071] The present invention is oriented to the training corpus quality evaluation and selection method of statistical machine translation and comprises the following steps:
[0072] Automatic weight acquisition: use small-scale corpus to train the weight automatic acquisition model to obtain the weight and classification threshold of each feature in the quality evaluation linear model;
[0073] Sentence pair quality evaluation: The above weights and classification thresholds are used together with the original large-scale parallel corpus as input, and the large-scale parallel corpus is classified by the sentence pair quality evaluation linear model to generate each corpus subset;
[0074] Selection of high-quality corpus subsets: On the basis of the above-mentioned corpus subsets, high-quality corpus is selected as the training data of the statisti...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com