Bilingual sentence automatic alignment method and device

An automatic alignment and sentence technology, applied in the information field, can solve the sentence equivalence problem of lack of training translation models, achieve high precision, improve precision, and improve accuracy
CN112668307AActive Publication Date: 2021-04-16TSINGHUA UNIV

Patent Information

Authority / Receiving Office
CN Β· China
Patent Type
Applications(China)
Current Assignee / Owner
TSINGHUA UNIV
Publication Date
2021-04-16

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a bilingual sentence automatic alignment method and device, and the method comprises the steps of obtaining an article pair set, enabling each article pair to comprise a source language article S and a target language article T, dividing sentences of an article, and carrying out the statistics of the relative length of each sentence and the relative position of each sentence in the article; determining word similarity between sentences si in the source language article S and sentences tj in the target language article T by utilizing a word vector model; calculating the distance between the sentence in the source language article S and the sentence in the target language article T by utilizing the inter-sentence word similarity, the sentence relative length difference and the relative position difference of the sentence in the article, taking the relative length of the sentence as the information amount, minimizing the sum of the products of the distance and the information amount as an information transfer optimization model, and solving the model to establish an alignment relationship. According to the invention, alignment between sentences is converted into searching for an optimal transportation strategy, and under the condition that work is minimum, all information of a source language article is transferred into a target language article.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the field of information technology, in particular to a bilingual sentence automatic alignment method and device. Background technique

[0002] Existing bilingual word alignment techniques are mainly divided into three categories, rule-based word alignment techniques, supervised word alignment techniques and unsupervised word alignment techniques. Rule-based word alignment technology relies on artificial rules and is highly dependent on the characteristics of the language itself. Supervised word alignment technology relies on existing dictionaries or aligned sentences in the corresponding field. These dictionaries and a large number of sentence pairs do not exist in specific fields or between some languages ​​that are not particularly mainstream. The unsupervised word alignment technique obtains the word vector spaces of two languages, and obtains aligned word vectors by aligning the two spaces.

[0003] The existing sentence...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More