An ancient Chinese automatic translation method based on multi-feature fusion

A multi-feature fusion, automatic translation technology, applied in natural language translation, special data processing applications, instruments, etc., can solve problems affecting translation quality, and achieve the effect of improving translation performance, model performance, and sentence alignment accuracy.

Active Publication Date: 2019-04-26
ZHEJIANG UNIV
View PDF18 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, statistical machine translation requires preprocessing such as word alignment, phrase extraction, and syntactic analysis. Errors in each link will gradually accumulate and affect subsequent translation quality.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An ancient Chinese automatic translation method based on multi-feature fusion
  • An ancient Chinese automatic translation method based on multi-feature fusion
  • An ancient Chinese automatic translation method based on multi-feature fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0054] The original text in classical Chinese: And if you don’t return for ten days from hunting, how can you bear the relationship between China and foreign countries?

[0055] Modern translation: And go hunting and hunting for ten days without returning, how can the people inside and outside the court bear it?

[0056] Use the open source Chinese word segmentation tool Jieba for word segmentation, and use the classical Chinese vocabulary to initialize the user dictionary. The word segmentation results are as follows:

[0057] And | hunting | ten days | not returning |, | Chinese and foreign | love | how worthy |?

[0058] Combined with the LDA topic model, according to the word-topic conditional probability distribution generated by it, the corresponding topic sequence of the original word sequence in classical Chinese is obtained, as follows:

[0059] And / 23|safari / 25|ten days / 10|no return / 11|, / 26|Chinese and foreign / 25|feelings / 19|why / 39|? / twenty four

[0060] Take "...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an ancient Chinese automatic translation method based on multi-feature fusion. The method comprises the following steps: 1) collecting a text, modern text translation data of the text, a text word list and modern Chinese monolingual corpus data; And 2) cleaning the data and constructing an ancient Chinese parallel corpus by using a sentence alignment method. And 3) carryingout word segmentation on the modern text and the ancient text by using a Chinese word segmentation tool; 4) performing topic modeling on the ancient text corpus to generate topics-Word distribution and word-Subject conditional probability distribution 5) using the modern Chinese monolingual corpus to train to obtain a modern Chinese language model; And obtaining an aligned dictionary by using ancient Chinese parallel corpora. 6) on the basis of the attention-based recurrent neural network translation model, fusing statistical machine translation characteristics such as a language model and analignment dictionary, and using an ancient Chinese parallel sentence pair and a word topic sequence training model, and 7) inputting a to-be-translated text by a user, and obtaining a modern text translation by using the model obtained by training in the step 6).

Description

technical field [0001] The invention relates to the fields of topic model, language model and machine translation in the field of natural language processing, in particular to a multi-feature fusion automatic translation method of ancient and modern Chinese. Background technique [0002] China has a long history and has left behind a voluminous volume of ancient books. These ancient books have witnessed the history of Chinese civilization, recorded and inherited rich historical and cultural connotations. However, ancient books are generally written in classical Chinese, which is relatively concise and quite different from the vernacular we use today, making it difficult for ordinary people to understand. For this reason, scholars of ancient Chinese began to translate classics and ancient books, but only relying on a small number of scholars cannot complete the translation of all ancient books. [0003] Machine translation (Machine Translation, MT) is the use of computers to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28
CPCG06F40/58
Inventor 张引陈琴菲
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products