Sentence-level bilingual alignment method and device, and computer-readable storage medium

A sentence-level, bilingual technology, applied in the field of natural language processing, can solve problems such as unfavorable sentence alignment efficiency, improvement, labor and time consumption

Active Publication Date: 2019-10-08
龙马智芯(珠海横琴)科技有限公司
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Parallel corpora are important data for translation algorithms based on natural language processing. Parallel / corresponding corpora are bilingual / multilingual corpora composed of source texts and parallel corresponding target language texts. The degree of alignment can be divided into word-level , sentence-level, paragraph-level, and article-level. Among them, the sentence-level parallel corpus is the most commonly used corpus. Therefore, the paragraph-level and article-level parallel corpora are often converted into sentence-level parallel corpora. However, in the corpus , the original text and the translated text do not necessarily have a one-to-one correspondence. For example, due to the difference between the structure of the article and the author's writing habits, there may be 15 Chinese sentences corresponding to 22 English sentences, or 16 Chinese sentences corresponding to 50 English sentences. sentence, so it is necessary to consider complex and diverse sentence pairing situations. At present, the corpus of paragraphs and chapters is mainly split and grouped into one-to-one corresponding sentences manually. This method requires a lot of manpower and time, which is not conducive to sentence alignment. efficiency improvement

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sentence-level bilingual alignment method and device, and computer-readable storage medium
  • Sentence-level bilingual alignment method and device, and computer-readable storage medium
  • Sentence-level bilingual alignment method and device, and computer-readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0066] The present invention is described below based on examples, but the present invention is not limited to these examples. In the following detailed description of the present invention, some specific details are described in detail, and in order to avoid obscuring the essence of the present invention, known methods, procedures, procedures, and components are not described in detail.

[0067] Additionally, those of ordinary skill in the art will appreciate that the drawings provided herein are for illustrative purposes and are not necessarily drawn to scale.

[0068] Unless the context clearly requires, throughout the specification and claims, "comprises", "comprises" and similar words should be interpreted in an inclusive sense rather than an exclusive or exhaustive meaning; that is, "including but not limited to" meaning.

[0069] In the description of the present invention, it should be understood that the terms "first", "second" and so on are used for descriptive purp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a sentence-level bilingual alignment method and device and a computer readable storage medium, and the method comprises the steps of S1, obtaining Z trained convolution kernels, where Z is an integer greater than or equal to 1; S2, carrying out sentence segmentation processing on the two to-be-aligned texts respectively, and establishing a text similarity matrix U of the two to-be-aligned texts; S3, carrying out convolution on the text similarity matrix U by adopting each convolution in the Z trained convolution kernels to obtain Z optimized text similarity matrixes; and S4, obtaining statement alignment results of the two to-be-aligned texts by utilizing the Z optimized text similarity matrixes. The method and the device are beneficial to improving the text statement alignment efficiency.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a sentence-level bilingual alignment method and device, and a computer-readable storage medium. Background technique [0002] Parallel corpora are important data for translation algorithms based on natural language processing. Parallel / corresponding corpora are bilingual / multilingual corpora composed of source texts and parallel corresponding target language texts. The degree of alignment can be divided into word-level , sentence-level, paragraph-level, and article-level. Among them, the sentence-level parallel corpus is the most commonly used corpus. Therefore, the paragraph-level and article-level parallel corpora are often converted into sentence-level parallel corpora. However, in the corpus , the original text and the translated text do not necessarily have a one-to-one correspondence. For example, due to the difference between the structure of th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27G06F17/22
CPCG06F40/194G06F40/205
Inventor 聂镭李睿聂颖郑权张峰
Owner 龙马智芯(珠海横琴)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products