English-Burmese bilingual parallel sentence pair extraction method and device based on BiLSTM-CNN

A parallel sentence pair, bilingual technology, applied in the field of English-Myanmar bilingual parallel sentence pair extraction, can solve the problems of lost, difficult to learn sentence representation, unavoidable syntactic structure information, etc., to achieve the effect of improving the accuracy and improving the accuracy.

Active Publication Date: 2019-11-05
KUNMING UNIV OF SCI & TECH
View PDF7 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional methods often only consider the semantic information of their own language. In fact, different languages ​​contain corresponding functional structures. Since the semantic expression of sentences is closely related to the s

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • English-Burmese bilingual parallel sentence pair extraction method and device based on BiLSTM-CNN
  • English-Burmese bilingual parallel sentence pair extraction method and device based on BiLSTM-CNN
  • English-Burmese bilingual parallel sentence pair extraction method and device based on BiLSTM-CNN

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] Embodiment 1: as Figure 1-7 As shown, the BiLSTM-CNN-based English-Myanmar bilingual parallel sentence pair extraction method, Figure 7 It is a flowchart of the present invention. The method includes the following steps: Step A: pre-training the English-Myanmar cross-language shared word embedding space, so that the words in different languages ​​with similar semantics are close in the word vector space, and the semantic vector of the sentence representation is in the cross-English - Burmese semantic space has relevance; Step B: Functionally mark the sentence, splicing the syntactic structure information of each word into the word vector, and can obtain the syntactic difference between English and Burmese; Step C: Use BiLSTM to analyze the sentence in the sentence Forward and reverse information transmission of each word information, to obtain the feature state generated by different time series containing context information, and then use the characteristics of CNN ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an English-Burmese bilingual parallel sentence pair extraction method and device based on BiLSTM-CNN, and belongs to the technical field of natural language processing. The method comprises the following steps: firstly, pre-training a bilingual word vector through a Muse tool; secondly, performing function marking on the sentence by utilizing the characteristics of the Burmese virtual words and the Burmese assistant words for identifying the subject-called guest of the Burmese, splicing syntactic structure information of each word into a word vector, encoding the sentence by using BiLSTM-CNN, and taking an output probability as a condition for measuring whether the sentence is a parallel sentence pair or not. According to the above steps, the BiLSTM-CNN-based British-Burmese bilingual parallel sentence pair extraction device is prepared through functional modularization. Compared with a traditional bilingual parallel sentence pair recognition system, the methodand the device are simpler. Experimental results show that the method and the device are superior to a baseline system in the aspects of accuracy, recall rate and other indexes. The accuracy is generally improved.

Description

technical field [0001] The invention relates to a BiLSTM-CNN-based English-Myanmar bilingual parallel sentence pair extraction method and device, and belongs to the technical field of natural language processing. Background technique [0002] In the field of natural language processing, the scale of parallel corpora plays an important role in improving the performance of machine translation. For Burmese, which is resource-scarce, the English-Myanmar parallel corpus resources are seriously scarce, and the quality of machine translation has not yet reached a practical level. The traditional methods of obtaining parallel corpora include manual translation and machine translation. The former is more costly and inefficient, while the latter relies on machine translation performance and poor quality. The scale of parallel corpus on the Internet is relatively small, while the comparable corpus is relatively large. How to use the large amount of English-Myanmar comparable corpus on ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06N3/04G06N3/08
CPCG06N3/084G06N3/045
Inventor 毛存礼梁昊远余正涛张少宁张亚飞朱浩东
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products