Graph-based machine translation data selection method and machine translation data selection system

A data selection and machine translation technology, applied in the field of data processing, can solve the problems of not being able to give the probability distribution of all fields, failing to incorporate the commonality between fields into the data selection method, and ignoring the commonality.

Active Publication Date: 2017-11-28
GLOBAL TONE COMM TECH
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The existing data selection technology is to use the data of a specific field to train a model and score the data of the field to be divided; the main defect of this method is to give a specific value to each sentence pair of the field to be divided Represents the probability that the sentence pair belongs to a specific field and ignores the commonality between certain fields; in fact, there are some specific sentence pairs that can be divided into many fields at the same time, such as a sentence description in the news is information about sports, then this sentence can be classified into both the news field and the sports field; since the existing method initially assumes that the output result is the probability of a single field, and cannot be based on the given Given the number of domains and some marked domain data, the probability distribution of all domains is given, and the commonality between domains cannot be taken into consideration in the data selection method.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Graph-based machine translation data selection method and machine translation data selection system
  • Graph-based machine translation data selection method and machine translation data selection system
  • Graph-based machine translation data selection method and machine translation data selection system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0033] The existing data selection technology is to use the data of a specific field to train a model and score the data of the field to be divided; the main defect of this method is to give a specific value to each sentence pair of the field to be divided It represents the probability that the sentence pair belongs to a certain field and ignores the commonality between some fields. In fact, there are some specific sentence pairs, and they can be divided into many fields at the same time. For example, if a sentence in the news describes information about sports, then this sentence can be classified into the news field, and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of data processing, and discloses a graph-based machine translation data selection method and a machine translation data selection system. The method comprises firstly building a graph, that is building an undirected graph; then performing label propagation; and finally performing data selection according to probability distribution of the field corresponding to each node after the label propagation is performed. An existing machine translation data selection method is improved; only data in one field can be selected by the existing machine translation data selection method, general characters between the fields are ignored; and for the data in the fields to be divided, the probability distribution of all fields can be given according to the given number of fields and the partial labeled field data, and the general characters between the fields are accepted in the considering scope of the data selection method.

Description

technical field [0001] The invention belongs to the technical field of data processing, and in particular relates to a graph-based machine translation data selection method. Background technique [0002] Machine translation is a process of translating one natural language into another natural language using machine learning techniques. As an important branch of computational linguistics, it involves cognitive science, linguistics and other disciplines, and is one of the ultimate goals of artificial intelligence. [0003] Existing machine translation uses data-driven techniques. Therefore, in theory, as the amount of data increases, the performance of the machine translation system can also be improved. However, when the training data and the source of the corpus to be translated are very different, the translation performance will often be seriously degraded. For example, the translation system trained with the corpus in the news field is obviously not suitable for transla...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28G06K9/62
CPCG06F40/58G06F18/23
Inventor 汪一鸣程国艮宗浩
Owner GLOBAL TONE COMM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products