Parallelization word alignment method based on bilingual word embedding technology

A word alignment and bilingual technology, applied in natural language translation, natural language data processing, special data processing applications, etc., can solve problems such as slowing down the overall word alignment efficiency

Active Publication Date: 2017-12-19
NANJING UNIV
View PDF6 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, a large-scale word translation probability table needs to be generated in the traditional word alignment algorithm. This data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallelization word alignment method based on bilingual word embedding technology
  • Parallelization word alignment method based on bilingual word embedding technology
  • Parallelization word alignment method based on bilingual word embedding technology

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] Below in conjunction with accompanying drawing and specific embodiment, further illustrate the present invention, should be understood that these embodiments are only for illustrating the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various aspects of the present invention Modifications in equivalent forms all fall within the scope defined by the appended claims of this application.

[0053] Deep learning is a machine learning method that unsupervisedly parses data and extracts features by using a computer to simulate the neural network structure of the human brain. In recent years, due to the wide application of deep learning in the field of natural language processing, word embedding technology based on deep learning was born. Word embedding technology converts words into low-dimensional word vectors through neural network training, and uses word ve...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a parallelization word alignment method based on a bilingual word embedding technology. The method comprises the steps of obtaining a bilingual word vector table through an MPS-Neg bilingual word embedding technology on a Spark platform, obtaining a word alignment model through a bilingual word vector table, then conducting a distributed word alignment task, updating the bilingual word vector table through the word alignment result and MPS-Neg, and repeating the operations of word alignment and bilingual word vector table updating till the specific iteration frequency is reached. The parallelization word alignment method based on the bilingual word embedding technology solves the problem that an existing word alignment method cannot well adapt to a large-scale corpus word alignment task.

Description

technical field [0001] The invention belongs to the field of computer natural language processing and parallel computing, and specifically relates to a word alignment method based on bilingual word embedding technology implemented on a Spark platform. Background technique [0002] As one of the key technologies in the field of machine translation, word alignment plays an important role in many natural language processing tasks. In 1993, Brown and others proposed to divide the basic equation of machine translation into language model and translation model, and proposed five translation models from IBM 1 to 5. Since then, the IBM series of models have become the de facto standard for word alignment models, and most of the subsequent word alignment research is based on this series of models. Among them, the word alignment method based on the HMM model has improved the IBM model 2, which is commonly used in practical applications HMM word alignment model to replace IBM Model 2....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/28G06F17/27
CPCG06F40/284G06F40/58
Inventor 袁春风黄宜华黄堃
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products