Method and device for extracting English-Myanmar bilingual parallel sentence pairs based on bilstm-cnn

A parallel sentence pair and bilingual technology, applied in neural learning methods, semantic analysis, natural language data processing, etc., can solve problems such as missing, difficult to learn sentence representation, and unavoidable syntactic structure information, so as to improve accuracy and accuracy Enhanced effect

Active Publication Date: 2021-02-05
KUNMING UNIV OF SCI & TECH
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional methods often only consider the semantic information of their own language. In fact, different languages ​​contain corresponding functional structures. Since the semantic expression of sentences is closely related to the syntactic structure, although the existing representation methods can retain the Word order information, but the loss of syntactic structure information cannot be avoided, and it is difficult to accurately learn sentence representation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting English-Myanmar bilingual parallel sentence pairs based on bilstm-cnn
  • Method and device for extracting English-Myanmar bilingual parallel sentence pairs based on bilstm-cnn
  • Method and device for extracting English-Myanmar bilingual parallel sentence pairs based on bilstm-cnn

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] Embodiment 1: as Figure 1-7 As shown, the BiLSTM-CNN-based English-Myanmar bilingual parallel sentence pair extraction method, Figure 7 It is a flowchart of the present invention. The method includes the following steps: Step A: pre-training the English-Myanmar cross-language shared word embedding space, so that the words in different languages ​​with similar semantics are close in the word vector space, and the semantic vector of the sentence representation is in the cross-English - Burmese semantic space has relevance; Step B: Functionally mark the sentence, splicing the syntactic structure information of each word into the word vector, and can obtain the syntactic difference between English and Burmese; Step C: Use BiLSTM to analyze the sentence in the sentence Forward and reverse information transmission of each word information, to obtain the feature state generated by different time series containing context information, and then use the characteristics of CNN ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an English-Myanmar bilingual parallel sentence pair extraction method and device based on BiLSTM-CNN, and belongs to the technical field of natural language processing. The present invention first pre-trains bilingual word vectors through the Muse tool, then utilizes Burmese function words and auxiliary words to identify the characteristics of the subject-predicate-object of Burmese to carry out functional marking on the sentence, splicing the syntactic structure information of each word into the word vector, and then Use BiLSTM-CNN to encode the sentence, and use the output probability as a condition to measure whether it is a parallel sentence pair. And according to the above-mentioned steps, a bilingual parallel sentence pair extraction device based on BiLSTM-CNN is made. Compared with the traditional bilingual parallel sentence pair recognition system, the present invention is simpler. Experimental results show that the method and device are superior to the baseline system in terms of accuracy rate and recall rate and other indicators, and the accuracy rate is generally improved.

Description

technical field [0001] The invention relates to a BiLSTM-CNN-based English-Myanmar bilingual parallel sentence pair extraction method and device, and belongs to the technical field of natural language processing. Background technique [0002] In the field of natural language processing, the scale of parallel corpora plays an important role in improving the performance of machine translation. For Burmese, which is resource-scarce, the English-Myanmar parallel corpus resources are seriously scarce, and the quality of machine translation has not yet reached a practical level. The traditional methods of obtaining parallel corpora include manual translation and machine translation. The former is more costly and inefficient, while the latter relies on machine translation performance and poor quality. The scale of parallel corpus on the Internet is relatively small, while the comparable corpus is relatively large. How to use the large amount of English-Myanmar comparable corpus on ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/211G06F40/242G06F40/30G06F40/284G06N3/04G06N3/08
CPCG06N3/084G06N3/045
Inventor 毛存礼梁昊远余正涛张少宁张亚飞朱浩东
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products