Check patentability & draft patents in minutes with Patsnap Eureka AI!

BPE encoding method and system based on Chinese subword unit, and machine translation system

A coding method and coding system technology, applied in the field of computer software, can solve the problems of low readability of translated text and degradation of translation quality.

Inactive Publication Date: 2018-12-18
GLOBAL TONE COMM TECH
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] To sum up, the problem existing in the existing technology is: when the encoder-decoder framework contains words that are not in the vocabulary in the sentence to be translated, UNK will be generated in the translation, resulting in low readability of the translation and poor translation quality. decline

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • BPE encoding method and system based on Chinese subword unit, and machine translation system
  • BPE encoding method and system based on Chinese subword unit, and machine translation system
  • BPE encoding method and system based on Chinese subword unit, and machine translation system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0020] The invention improves the existing BPE encoding method. A better and more theoretical method for generating Chinese BPE codes is provided, so that Chinese BPE codes can solve the problem of Chinese unregistered words. While making full use of Chinese word information, it avoids the shortcomings caused by the traditional BPE encoding method. Due to the existence of Chinese characters constructed by Wubi typing, Chinese characters can be easily converted into English alphabets whose Chinese root corresponds to Wubi typing, thus solving the problem of Chinese splitting and effectively reducing the occurrence of Chines...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of computer software, and discloses a BPE coding method and system based on Chinese sub-character units, a machine translation system. The method comprisesthe steps of: splitting Chinese characters according to a five-stroke character root input mode; for words in a sentence, breaking the words first down into smaller and more common sub-word units; translating the unknown word by translating the subword portion. Before carrying out the BPE encoding step, the invention additionally processes Chinese characters. As such, in practical application, that parameter scale and the operation time of the invention are comparable to those of the present BPE, and the practical complexity is not increased excessively.

Description

technical field [0001] The invention belongs to the technical field of computer software, and in particular relates to a BPE encoding method and system based on Chinese subword units, and a machine translation system. Background technique [0002] At present, the existing technology commonly used in the industry is as follows: machine translation is the process of using computer algorithms to automatically translate a sentence in a source language into a sentence in another target language. Machine translation is a research direction of artificial intelligence, which has very important scientific research value and practical value. With the continuous deepening of the globalization process and the rapid development of the Internet, machine translation technology is playing an increasingly important role in domestic and foreign political, economic, social, and cultural exchanges. With the improvement of computer computing power and the application of big data, deep learning ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22G06F17/28
CPCG06F40/126G06F40/58
Inventor 汪一鸣谭新熊德意程国艮
Owner GLOBAL TONE COMM TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More