Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

109 results about "Bilingual corpus" patented technology

Method and system for determining intertranslation relationship of bilingual sentence pairs

The invention discloses a method and system for determining an intertranslation relationship of bilingual sentence pairs. The method comprises a step of determining matching feature values of the bilingual sentence pairs, performing filtering and classification on the bilingual sentence pairs according to the weights of the matching feature values in the intertranslation relationship according to a pre-established training classification model, and determining whether the bilingual sentence pairs are bilingual sentence pairs satisfying the requirements of the intertranslation relationship. Therefore, by adoption of the method for determining the intertranslation relationship of the bilingual sentence pairs provided by the embodiment of the invention, a bilingual corpus with a huge data size can be processed quickly and conveniently. The problem of determining the intertranslation relationship of the bilingual sentence pairs is converted into a binary classification problem by using the classification idea of the training classification model, so that the weights of the matching features of the bilingual corpus can be determined more scientifically and reasonably, and compared with the existing experience method, the universality is better, and the accuracy and the recall rate are improved accordingly.
Owner:BEIJING KINGSOFT OFFICE SOFTWARE INC +1

A Chinese-blind automatic conversion method and system based on depth neural network

The invention relates to a Chinese-blind automatic conversion method and system based on depth neural network, includes obtaining Chinese-blind bilingual corpus for sentence and word level comparison,training depth neural network with Chinese-blind bilingual corpus to obtain word segmentation model for Chinese character string segmentation, and ussing Chinese-blind bilingual corpus to obtain tone-marking model for Chinese character tone marking; obtaining The Chinese character text to be converted, and segmenting the Chinese character text according to Braille rules using a word segmentationmodel to obtain a plurality of characters and words, and performing tone marking on the characters and words to be converted using a tone-marking model to convert the tone-marked characters and wordsinto Braille. The invention adopts the trained model to directly segment Chinese character strings according to Braille rules. Therefore, the Chinese character information can be fully utilized to avoid the problem that the Chinese character information is lost and the homophone words are confused with each other when the Braille string is segmented, and the segmentation effect is affected. By using the depth neural network model and the calibration model, higher conversion accuracy can be obtained.
Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Method and system for cleaning parallel corpus based on language model and translation model

The present invention belongs to the technical field of computer software, and discloses a method and a system for cleaning parallel corpora based on a language model and a translation model. A corpuspreprocessing is mainly used for processing a bilingual parallel corpus of multiple directions of the same language family, screening the parallel corpus by using the language model of a source language and a target language, and screening the corpus from the bilingual parallel corpus by using the translation model. According to the method and the system for cleaning parallel corpora based on thelanguage model and the translation model, the language model and the translation model are utilized to clean the large-scale bilingual corpus, and time and labor costs of cleaning the parallel corpusby using a heuristic rule are high, only when a problem is found can processing be carried out for a certain problem, and the problem that the intonation is not smooth and the translation is inaccurate cannot be solved on a large scale. However, the language model and the translation model can solve the problem that the use rule cannot be solved in a short time, time and labor costs are saved, the corpus can be cleaned, the corpus quality is improved, and the machine translation quality can be effectively improved.
Owner:GLOBAL TONE COMM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products