Syntactic structure fused Tibetan and Chinese language neural machine translation method

A machine translation and syntactic structure technology, applied in natural language translation, neural architecture, biological neural network model, etc., can solve the problems of syntactic structure difference and training difficulties.

Active Publication Date: 2021-04-06
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to address the existing problems and deficiencies in the prior art. In the face of Tibetan-Chinese neural machine translation, due to the differences in the syntactic structure of the language itself, it is difficult to train, and a new Tibetan-Chinese fusion syntactic structure is proposed. Linguistic Neural Machine Translation Approaches

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Syntactic structure fused Tibetan and Chinese language neural machine translation method
  • Syntactic structure fused Tibetan and Chinese language neural machine translation method
  • Syntactic structure fused Tibetan and Chinese language neural machine translation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0152] Such as figure 1 As shown, a Tibetan-Chinese language neural machine translation method that integrates syntactic structure includes the following steps:

[0153] Step 1: Perform Tibetan phrase tree-Tibetan dependency tree conversion.

[0154]Specifically, it includes: tagging the Tibetan phrase tree, designing the Tibetan phrase table, the dependency table, and setting the priority of the dependency relationship, and automatically completing the conversion from the Tibetan phrase tree to the dependency tree based on rules. Among them, the conversion process is the same as step 1.1.

[0155] Step 2: Relative position encoding.

[0156] Specifically include: training of the dependency analysis model, and generation of the Tibetan dependency tree corresponding to the Tibetan corpus, so as to obtain the position representation of the Tibetan corpus and the dependency relationship, wherein, generating the Tibetan dependency tree corresponding to the Tibetan corpus is the ...

Embodiment 2

[0161] This embodiment will use specific examples to describe in detail the specific operation steps of a Tibetan-Chinese neural machine translation method that integrates syntactic structures according to the present invention.

[0162] The processing flow of a Tibetan-Chinese neural machine translation method that integrates syntactic structures is as follows: figure 1 shown. From figure 1 It can be seen that a Tibetan-Chinese neural machine translation method that integrates syntactic structures includes the following steps:

[0163] Step 1: Convert the Tibetan phrase tree-dependency tree. Such as figure 2 shown. The corresponding Chinese translation is "the peasants saw many Han people when they were in Beijing". The corresponding Chinese and labels of each token from left to right are farmers [acting case] Beijing [and character] when sitting, Han people see many [time body] [time body] [punctuation marks].

[0164] In the process of traversing the binary phrase tr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a syntactic structure fused Tibetan and Chinese language neural machine translation method, and belongs to the technical field of machine translation and feature fusion application. The method aims to introduce syntactic structures of more languages in a neural machine translation framework so as to help to improve the machine translation quality, and a relative position coding method is provided by optimizing the position coding technical problem of transformers so as to fuse syntactic structure information. According to the method, a transformer is optimized by extracting a structure position coding method based on a dependency relationship, and finally the purpose of improving the Tibetan-Chinese neural machine translation quality is achieved. According to the method, the association efficiency of the self-attention neural network between two languages can be effectively improved, the problems caused by different syntactic structures of the two languages are relieved, the time complexity of the algorithm is reduced, the context information loss problem caused by absolute position coding adopted by a traditional model is solved, and mistaken translation and missed translation of low-resource neural machine translation are reduced.

Description

technical field [0001] The present invention relates to a Tibetan-Chinese language neural machine translation method that integrates syntactic structures, in particular to a self-attention Tibetan-Chinese language neural machine translation method based on relative position encoding of dependent syntactic structures, belonging to the technical field of machine translation and feature fusion applications . Background technique [0002] In recent years, neural machine translation has achieved the best performance in multiple translation tasks, and the translation model trained on large-scale corpus can be comparable to human translation. Neural machine translation differs from traditional statistical machine translation in that it no longer uses a rule-based approach, but instead uses a deep learning approach. Specifically, in the bilingual corpus based on the end-to-end "encoder-decoder" framework, the correspondence between each chunk after word segmentation is calculated. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/58G06F40/211G06N3/04
CPCG06F40/58G06F40/211G06N3/044
Inventor 史树敏罗丹武星苏超黄河燕
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products