Phrase mining method and device

A phrase and corpus technology, applied in the field of machine translation, can solve problems such as inaccurate translation results, difficulty in quality control, and inability to meet user application needs, achieving high translation quality and improving coverage

Active Publication Date: 2017-12-12
阿里巴巴(中国)网络技术有限公司
View PDF12 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, it has been verified in practice that even through automatic learning of large-scale data, the translation results of statistical machine translation still have uncontrollable quality problems, especially for data with accurate translation results, the translation results output by statistical machine translation May not be accurate, thus unable to meet the actual application needs of users

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Phrase mining method and device
  • Phrase mining method and device
  • Phrase mining method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] Due to the problems existing in statistical machine translation, a machine translation method based on translation memory is further proposed in the related art. Translation memory (translation memory) is a language database used to store original texts and their translations. By storing accurate translations of terms in advance, users can directly search to obtain existing accurate translation results.

[0020] By establishing a translation memory system, the above-mentioned translation memory function can be realized. The translation memory system may include translation template databases, terminology databases, and recurring sentences; among them, the terminology database stores a large number of words, phrases and other terms used to describe products, services or industry terms. Improvement ensures more accurate translations and greater translation consistency.

[0021] Therefore, how to obtain valuable and high-quality phrases is an important factor in creating ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a phrase mining method and device. The method comprises the following steps of: extracting a candidate phrase set from an original corpus through a pre-configured combined strategy, wherein the candidate phrase set comprises a plurality of candidate phrases and the candidate phrases correspond to at least one sub-strategy in the combined strategy; and screening phrases satisfying a preset quality condition from the candidate phrase set. Through the method and device, a cover degree of the candidate phrase set can be expanded, so that loss of potential high-quality phrases can be avoided and then correct mining for the high-quality phrases can be realized.

Description

technical field [0001] The present application relates to the technical field of machine translation, in particular to a phrase mining method and device. Background technique [0002] Machine translation (also known as automatic translation) is the process of using a computer to convert a natural source language into another natural target language. In the related art, a statistical machine translation method is proposed, that is, a large amount of parallel corpus is analyzed by a statistical method, and a translation operation is completed through a machine translation model constructed thereby. [0003] However, it has been verified in practice that even through automatic learning of large-scale data, the translation results of statistical machine translation still have uncontrollable quality problems, especially for data with accurate translation results, the translation results output by statistical machine translation It may not be accurate, so that it cannot meet the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/28
CPCG06F40/289G06F40/47
Inventor 史黎鑫张海波赵宇骆卫华林锋卞华明管陶然刘禹
Owner 阿里巴巴(中国)网络技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products