Corpus alignment method, related device and computer program product

A corpus and corpus technology, applied in the computer field, can solve complex problems and achieve the effect of improving alignment quality

Pending Publication Date: 2022-07-29
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In the translation system, bilingual dictionaries provide word-level mapping of two languages, which is very helpful for users to quickly learn languages ​​of other language

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus alignment method, related device and computer program product
  • Corpus alignment method, related device and computer program product
  • Corpus alignment method, related device and computer program product

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness. It should be noted that the embodiments of the present disclosure and the features of the embodiments may be combined with each other under the condition of no conflict.

[0021] In addition, in the technical solutions involved in this disclosure, the acquisition, storage, use, processing, transportation, provision, and disclosure of the user's personal information in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a corpus alignment method and device, electronic equipment, a computer readable storage medium and a computer program product, and relates to the technical field of artificial intelligence such as machine translation, natural language processing and deep learning. One specific embodiment of the method comprises the following steps: after obtaining an initial corpus pair composed of a first corpus and a second corpus which are different in language, determining the initial corpus pair of which the forward conditional probability and the reverse conditional probability are greater than a first threshold value as a target corpus pair; and determining the semantic matching probability of the target corpus pair by using a preset translation model, and finally, performing corpus alignment processing on the first corpus and the second corpus in the target corpus pair of which the semantic matching probability is greater than a second threshold. According to the embodiment, the probability threshold requirement when the matching pairs are screened based on the statistical probability can be reduced, so that more word pairs which are not significant on the statistical level are recalled on the premise of ensuring the accuracy of the semantic correspondence between the first corpus and the second corpus, and the alignment quality of the corpus is improved.

Description

technical field [0001] The present disclosure relates to the field of computer technologies, in particular to the fields of artificial intelligence technologies such as machine translation, natural language processing, and deep learning, and in particular, to corpus alignment methods, devices, electronic devices, computer-readable storage media, and computer program products. Background technique [0002] In the translation system, the bilingual dictionary provides the word-level mapping of the two languages, which is very helpful for users to quickly learn languages ​​in other languages. Usually, building a bilingual dictionary requires a large number of language experts to align the corpus in different languages. After completion, long-term and complex work is required. SUMMARY OF THE INVENTION [0003] Embodiments of the present disclosure provide a corpus alignment method, apparatus, electronic device, computer-readable storage medium, and computer program product. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/58G06F40/44G06F40/242G06F40/289G06F40/30
CPCG06F40/58G06F40/44G06F40/242G06F40/289G06F40/30
Inventor 王曦阳张睿卿何中军李芝吴华
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products