Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

160 results about "Parallel corpora" patented technology

Parallel Corpora. The term parallel corpora is typically used in linguistic circles to refer to texts that are translations of each other. And the term comparable corpora refers to texts in two languages that are similar in content, but are not translations.

Word forecasting method and system based on nerve machine translation system

ActiveCN106844352AAccurately Obtain Predicted ProbabilitiesNatural language translationNeural architecturesSentence pairPrediction probability
The invention relates to a word forecasting method and system based on a nerve machine translation system. The word forecasting method includes the steps that parallel corpora are trained, extracting is carried out from the training result, and a phrase translation table is obtained; source language sentences in any parallel sentence pairs are subjected to matching searching, and all source language phrases contained in the source language sentences are determined; target phrase translation candidate sets corresponding to all the source language phrases respectively are found from the phrase translation table; part of obtained translations are translated according to the target phrase translation candidate sets and the nerve machine translation system, and target word sets needing to be encouraged are obtained; encouragement values of all target words in the target word sets are determined according to the attention probability and the target phrase translation candidate sets which are based on the nerve machine translation system; the prediction probability of all the target words is obtained according to the encouragement values of all the target words. The encouragement values of the target words are obtained in the mode that the phrase translation table is introduced and added into a nerve translation model, and therefore the prediction probability of the target words can be increased.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Cross-language multi-source vertical domain knowledge graph construction method

ActiveCN112199511AImplement automatic translationImplement pre-labeled trainingNatural language translationSpecial data processing applicationsData setTheoretical computer science
The invention discloses a cross-language multi-source vertical domain knowledge graph construction method, and relates to the technical field of knowledge engineering. According to the technical scheme, the method comprises the steps that vertical domain translation completes parallel corpus construction through content and link analysis according to input cross-language texts, domain dictionaries, domain term libraries, domain materials and data, and automatic translation of foreign language texts is achieved based on a trained translation model on the basis of preprocessing; domain knowledgepre-annotation training realizes active learning annotation based on text word segmentation and text clustering, to-be-annotated corpus screening based on an analysis topic is completed, and a confirmed service annotation data set is generated; an optimal algorithm is selected, and semantic feature extraction and entity relationship extraction are completed based on deep learning in combination with vertical domain translation data and an actual scene; and domain knowledge fusion and disambiguation are performed on knowledge from different sources through network equivalent entity combination, so that the cross-language multi-source vertical domain knowledge graph can be obtained.
Owner:10TH RES INST OF CETC

An ancient Chinese automatic translation method based on multi-feature fusion

ActiveCN109684648AIncreased accuracySolve the problem of unregistered wordsNatural language translationSpecial data processing applicationsWord listSentence pair
The invention discloses an ancient Chinese automatic translation method based on multi-feature fusion. The method comprises the following steps: 1) collecting a text, modern text translation data of the text, a text word list and modern Chinese monolingual corpus data; And 2) cleaning the data and constructing an ancient Chinese parallel corpus by using a sentence alignment method. And 3) carryingout word segmentation on the modern text and the ancient text by using a Chinese word segmentation tool; 4) performing topic modeling on the ancient text corpus to generate topics-Word distribution and word-Subject conditional probability distribution 5) using the modern Chinese monolingual corpus to train to obtain a modern Chinese language model; And obtaining an aligned dictionary by using ancient Chinese parallel corpora. 6) on the basis of the attention-based recurrent neural network translation model, fusing statistical machine translation characteristics such as a language model and analignment dictionary, and using an ancient Chinese parallel sentence pair and a word topic sequence training model, and 7) inputting a to-be-translated text by a user, and obtaining a modern text translation by using the model obtained by training in the step 6).
Owner:ZHEJIANG UNIV

Method for constructing Mongolian-Chinese parallel corpora by utilizing generative adversarial network to improve Mongolian-Chinese translation quality

The invention discloses a method for constructing Mongolian-Chinese parallel corpora by utilizing a generative adversarial network to improve Mongolian-Chinese translation quality. The generative adversarial network comprises a generator and a discriminator. The generator uses a hybrid encoder to encode a source language sentence Mongolian into vector representation and converts the representationinto target language sentence Chinese by using a decoder based on a bidirectional Transformer in combination with a sparse attention mechanism. Therefore, Mongolian sentences and more Mongolian-Chinese parallel corpora which are closer to human translation are generated. In a discriminator, the difference between the Chinese sentence generated by the generator and human translation is judged; andadversarial training is performed on the generator and the discriminator until the discriminator considers that the Chinese sentence generated by the generator is very similar to the human translation, a high-quality Mongolian-Chinese machine translation system and a large number of Mongolian-Chinese parallel data sets are acquired and Mongolian-Chinese translation is performed by using the Mongolian-Chinese machine translation system. According to the method, the problems that Mongolian-Chinese parallel data sets are severely deficient and NMT cannot guarantee naturalness, sufficiency and accuracy of translation results are solved.
Owner:INNER MONGOLIA UNIV OF TECH

Neural machine translation decoding acceleration method based on non-autoregression

ActiveCN111382582AAlleviate the multimodal problemDoes not slow down inferenceNatural language translationEnergy efficient computingPattern recognitionHidden layer
The invention discloses a neural machine translation decoding acceleration method based on non-autoregression. The neural machine translation decoding acceleration method comprises the steps of: constructing an autoregression neural machine translation model through employing a Transformer model based on an autoattention mechanism; constructing training parallel corpora, generating a machine translation word list, and training the two models from left to right and from right to left until convergence; constructing a non-autoregression machine translation model; acquiring codec attention and hidden layer states of a left-to-right autoregression translation model and a right-to-left autoregression translation model; calculating the difference between the output and the output corresponding to the autoregressive model, and taking the difference as additional loss for model training; extracting source language sentence information, and predicting a corresponding target language sentence bya decoder; and calculating the loss of predicted distribution and real data distribution, decoding translation results with different lengths, and further acquiring an optimal translation result. According to the neural machine translation decoding acceleration method, knowledge in the regression model is fully utilized, and 8.6 times of speed increase can be obtained under the condition of low performance loss.
Owner:沈阳雅译网络技术有限公司

Multi-translation parallel corpus construction system

The present invention provides a multi-translation parallel corpus construction system. The system comprises: a depth semantic similarity-degree calculator, used for separately calculating a depth semantic similarity degree between a source language text sentence and a to-be-matched sentence of each translation among multiple translations; a representative dictionary similarity-degree and other statistical information similarity-degree calculator; a fusion matching-degree calculator, used for calculating a fusion matching degree between the source language text sentence and the to-be-matched sentence of each translation among the multiple translations; a sentence matching apparatus, used for performing sentence matching on a source language text and each translation according to the fusion matching degree, wherein fusion matching degrees between the source language text and other translations among the multiple translations are referred to during matching; and a multi-translation parallel corpus construction apparatus, used for constructing a multi-translation parallel corpus according to a matching result. The technical scheme above implements construction of the multi-translation parallel corpus and improves precision of corpus alignment, and the multi-translation parallel corpus constructed by the scheme has robustness.
Owner:BEIJING LANGUAGE AND CULTURE UNIVERSITY

Neural machine translation method based on pre-training bilingual word vector

The invention discloses a neural machine translation method based on a pre-training bilingual word vector. The method comprises the following steps of: performing'source language-target language 'splicing on labeled and aligned parallel corpora as input of an XLM (X-Language Model) for pre-training; training: taking a bilingual word vector matrix obtained by pre-training to initialize a translation model; inputting a source language into an encoder, inputting vector representation of source language encoding and a corresponding target language into a decoder, outputting a prediction sequence, comparing the prediction sequence with a corresponding target sequence, calculating a loss value, and inputting the loss value into an optimizer to optimize translation model parameters; predicting: in a certain time step, inputting a source language into an optimized encoder, outputting corresponding vector representation by the encoder, inputting the vector representation and a target language word translated by the previous time step into a decoder, outputting a target word of the time step by the decoder, splicing the target words translated by different time steps according to a time sequence, and obtaining a source language translation result. Machine translation effect of low-resource languages is improved.
Owner:HARBIN INST OF TECH

Scarce resource neural machine translation training method based on pre-training

ActiveCN111178094AAvoid polysemy problems that embedding cannot resolveSimplify the training processNatural language translationNeural architecturesHidden layerAlgorithm
The invention discloses a scarce resource neural machine translation training method based on pre-training, which comprises the following steps of: constructing massive monolingual corpora, and performing word segmentation and sub-word segmentation preprocessing flows to obtain converged model parameters; constructing parallel corpora, randomly initializing parameters of a neural machine translation model, and enabling the sizes of a word embedding layer and a hidden layer of the neural machine translation model to be the same as that of a pre-trained language model; integrating the pre-training model into a neural machine translation model; training the neural machine translation model through the parallel corpora, so that a generated target statement is more similar to a real translationresult, and the training process of the neural machine translation model is completed; and sending the source statement input by the user into a neural machine translation model, and generating a translation result by the neural machine translation model through greedy search or bundle search. According to the method, knowledge in the monolingual data is fully utilized, and compared with a randomly initialized neural machine translation model, the translation performance can be obviously improved.
Owner:沈阳雅译网络技术有限公司

Mongolian-Chinese machine translation system based on byte pair coding technology

InactiveCN110674646AReduce the number of unregistered wordsSolve the problem of unregistered wordsNatural language translationNeural architecturesAlgorithmMachine translation system
The invention discloses a Mongolian-Chinese machine translation system based on a byte pair coding technology. Firstly, English-Chinese parallel corpora and Mongolian-Chinese parallel corpora are preprocessed through the BPE technology, English words, Mongolian words and Chinese words are all divided into single characters, then the occurrence frequency of character pairs is counted within the range of the words, and the character pair with the highest occurrence frequency is stored every time till the circulation frequency is finished. Secondly, the preprocessed English-Chinese parallel corpus is used for training based on a neural machine translation framework; and then, the preprocessed translation model parameter weight trained by the English-Chinese parallel corpus is migrated into aMongolian-Chinese neural machine translation framework, and a neural machine translation model is trained by utilizing the preprocessed Mongolian-Chinese parallel corpus to obtain a Mongolian-Chineseneural machine translation prototype system based on a byte pair coding technology. And finally, the BLEU value of the translation of the system and the BLEU value of the translation of the statistical machine are compared and evaluated to achieve the purpose of finally improving the translation performance of the Mongolian-Chinese machine.
Owner:INNER MONGOLIA UNIV OF TECH

Method and system for cleaning parallel corpus based on language model and translation model

The present invention belongs to the technical field of computer software, and discloses a method and a system for cleaning parallel corpora based on a language model and a translation model. A corpuspreprocessing is mainly used for processing a bilingual parallel corpus of multiple directions of the same language family, screening the parallel corpus by using the language model of a source language and a target language, and screening the corpus from the bilingual parallel corpus by using the translation model. According to the method and the system for cleaning parallel corpora based on thelanguage model and the translation model, the language model and the translation model are utilized to clean the large-scale bilingual corpus, and time and labor costs of cleaning the parallel corpusby using a heuristic rule are high, only when a problem is found can processing be carried out for a certain problem, and the problem that the intonation is not smooth and the translation is inaccurate cannot be solved on a large scale. However, the language model and the translation model can solve the problem that the use rule cannot be solved in a short time, time and labor costs are saved, the corpus can be cleaned, the corpus quality is improved, and the machine translation quality can be effectively improved.
Owner:GLOBAL TONE COMM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products