Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese-Myanmar Bilingual Parallel Sentence Pair Extraction Method and Device Based on Pivot Language

A technology of pivot language and parallel sentence pairs, applied in neural learning methods, natural language data processing, semantic analysis, etc., can solve data extraction and other problems

Active Publication Date: 2020-09-08
KUNMING UNIV OF SCI & TECH
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention provides a Chinese-Burmese bilingual parallel sentence pair extraction method and device based on a pivot language to solve the problem of Chinese-Burmese parallel data extraction, using English as a pivot language to study the extraction of Chinese-Burmese sentence pairs Subsequent natural language processing work provides data foundation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese-Myanmar Bilingual Parallel Sentence Pair Extraction Method and Device Based on Pivot Language
  • Chinese-Myanmar Bilingual Parallel Sentence Pair Extraction Method and Device Based on Pivot Language
  • Chinese-Myanmar Bilingual Parallel Sentence Pair Extraction Method and Device Based on Pivot Language

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] Embodiment 1: as Figure 1-4 As shown, based on the Chinese-Myanmar bilingual parallel sentence pair extraction method of the pivot language, the specific steps of the extraction method are as follows:

[0042] Step1, using the noise reduction encoder DAE to obtain the representation vectors of the three sentences of Chinese, English, and Burmese;

[0043] Step1.1. By adding a noise function to the input to reconstruct the noise-free input, you can learn to reflect the basic characteristics of the input data; the specific operation of adding the noise function is: delete part of the sentence, or disrupt the order of the words, and the noise function is N(S|P 0 ,P x ), S represents a sentence, P 0 ,P x Both are numbers representing the probability [0,1].

[0044] Step1.1.1. For each word in each sentence x, the noise function N(S|P 0 ,P x ) will start with a P 0 probability to delete word;

[0045] Step1.1.2. For each pair of non-overlapping 2-gram bigrams in th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Chinese-Burmese bilingual parallel sentence pair extraction method and device based on a pivot language, and belongs to the technical field of natural language processing. The present invention first utilizes the denoising encoder DAE to obtain the characterization vectors of the three sentences of Chinese, English, and Burmese; uses the existing Chinese-English, English-Myanmar parallel corpus as a constraint condition, and uses CorrNet ​​to convert the three sentences of Chinese, English, and Burmese The sentence representations of two languages ​​are projected to the common semantic space; using the joint training method, English is used as the pivot language to learn the public representation of the Chinese-English-Myanmar trilingual, calculate the distance between the Chinese-Myanmar bilingual sentences, and judge the Chinese-Myanmar bilingual Whether the sentence is a parallel sentence. The present invention solves the problem of Chinese-Burmese parallel data extraction, uses English as the pivotal language to extract Chinese-Burmese parallel sentence pairs, provides a data basis for subsequent natural language processing work, and has important theoretical and practical implications for the construction of Chinese-Burmese bilingual parallel corpora use value.

Description

technical field [0001] The invention relates to a Chinese-Burmese bilingual parallel sentence pair extraction method and device based on a pivot language, and belongs to the technical field of natural language processing. Background technique [0002] Distributed representations of pivot languages ​​can connect different but similar semantic space (same) objects, such as multilingual data (words, phrases, sentences, etc.), and are widely used in natural language processing. Chinese-Burmese parallel sentence pair extraction is the basis for Chinese-Burmese machine translation tasks. However, due to the lack of Burmese resources, English is the official language of Myanmar, and Chinese is relatively rich, so English is used as the pivotal language to study the extraction of Chinese-Burmese sentence pairs. A large number of Chinese-Burmese sentence pairs are extracted to provide data support for natural language processing such as machine translation. Contents of the inventio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/211G06F40/284G06F40/289G06F40/30G06N3/04G06N3/08
CPCG06N3/084G06N3/088G06N3/045
Inventor 毛存礼吴霞余正涛张少宁张亚飞朱浩东
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products