Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

388 results about "Sentence pair" patented technology

Pair in a sentence A pair of. Pair of aces. was one of a pair. A pair of hawks. Very rare pairing. It was pairing time. paired up and synced. Paired off and cocky. Two pairs of. They fly in pairs.

Semantic logic processing method and system

The invention discloses a semantic logic processing method and system. The method comprises the steps of obtaining information to be subjected to semantic analysis; recognizing the information to be subjected to the semantic analysis, and converting the information to be subjected to the semantic analysis into target text information; preprocessing the target text information, generating entity tags corresponding to entity words in the target text information, and adding the entity tags to the target text information to generate first text information; segmenting the first text information to obtain at least one sentence; processing the sentences obtained after segmentation to obtain the intention type, the intention logic relation and the semantic slot value of each sentence; analyzing the semantics of the information to be subjected to the semantic analysis based on the intention type, the intention logic relation and the semantic slot value. The semantic logic processing method and system can improve the precision of semantic understanding and user requirement understanding.
Owner:BEIJING QIYI CENTURY SCI & TECH CO LTD

Post-processing system and method for correcting machine recognized text

A method of post-processing character data from an optical character recognition (OCR) engine and apparatus to perform the method. This exemplary method includes segmenting the character data into a set of initial words. The set of initial words is word level processed to determine at least one candidate word corresponding to each initial word. The set of initial words is segmented into a set of sentences. Each sentence in the set of sentences includes a plurality of initial words and candidate words corresponding to the initial words. A sentence is selected from the set of sentences. The selected sentence is word disambiguity processed to determine a plurality of final words. A final word is selected from the at least one candidate word corresponding to a matching initial word. The plurality of final words is then assembled as post-processed OCR data.
Owner:PANASONIC CORP

Deep text matching method and device based on word migration learning

The invention provides a deep text matching method and device based on word migration learning, and the method comprises the steps: firstly, carrying out the fusion of a BERT model and carrying out the pre-training of the BERT model during the training of a deep matching model; secondly, utilizing a pre-trained BERT model to respectively represent sentences in the input sentence pairs by using initial word vectors, and then performing similarity weighting on the sentences in the sentence pairs represented by the initial word vectors to obtain weighted sentence vectors; and finally, according to the loss value corresponding to the similarity value of the statement vector, adjusting model parameters, and carrying out text matching on the input statement by utilizing a depth matching model obtained through parameter adjustment. The parameters of the pre-trained BERT model are no longer randomly initialized parameters, and part-of-speech prediction is added into the pre-trained BERT model,so that the word vector semantic information is enriched. Therefore, semantics, represented by word vectors, of sentences in the sentence pairs are more accurate through the trained BERT model, and the matching accuracy of the trained model is promoted to be improved.
Owner:ULTRAPOWER SOFTWARE

Method and apparatus for aligning bilingual corpora

A method is provided for aligning sentences in a first corpus to sentences in a second corpus. The method includes applying a length-based alignment model to align sentence boundaries of a sentence in the first corpus with sentence boundaries of a sentence in the second corpus to form an aligned sentence pair. The aligned sentence pair is then used to train a translation model. Once trained, the translation model is used to align sentences in the first corpus to sentences in the second corpus. Under aspects of the invention, pruning is used to reduce the number of sentence boundary alignments considered by the length-based alignment model and by the translation model. In further aspects of the invention, the length-based model utilizes a Poisson distribution.
Owner:MICROSOFT TECH LICENSING LLC

Predicting the quality of automatic translation of an entire document

A system and method predict the translation quality of a translated input document. The method includes receiving an input document pair composed of a plurality of sentence pairs, each sentence pair including a source sentence in a source language and a machine translation of the source language sentence to a target language sentence. For each of the sentence pairs, a representation of the sentence pair is generated, based on a set of features extracted for the sentence pair. Using a generative model, a representation of the input document pair is generated, based on the sentence pair representations. A translation quality of the translated input document is computed, based on the representation of the input document pair.
Owner:XEROX CORP

Code static analysis-based data race detecting method and system thereof

The invention discloses a code static analysis-based data race detecting method and a system of the detecting method. The method comprises the following steps of: reading software to be detected, statically analyzing a source program of the software to be detected, and generating an abstract syntax tree, a control flow graph and a global function call graph of the software to be detected; on thatbasis, computing alias information in each function, outlet alias information among functions, lock assembly information, an access link of an access escapable variable quantity in each function and a thread building relational graph; computing a plurality of initialized sentence pair sets of a plurality of access nodes in every two threads; and gradually eliminating the sets according to the alias information, the lock assembly information and a concurrency relation to obtain a sentence pair which can finally have the data race. The detecting method and the detecting system can effectively detect the data race of a multi-thread program compiled by C / C++, thereby having the characteristics of high test precision and high automaticity, and being applied to detecting the data race caused bytwo threads or multiple threads.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Multilingual Translation Database System and An Establishing Method Therefor

A method for building a multilingual translation database system includes the steps of: providing a plurality of multilingual sentence pairs or multilingual sentence-fragment pairs in a translation database, each of the multilingual sentence pairs or the multilingual sentence-fragment pairs formed with a source language and a target language; selecting repeated sentence structures or repeated sentence fragments from the multilingual sentence pairs or the multilingual sentence-fragment pairs; and defining or qualifying at least one repeated key sentence or key sentence fragment of the repeated sentence structures or the repeated key sentence-fragment structures with a predetermined degree of repeat frequency.
Owner:NAT KAOHSIUNG UNIV OF SCI & TECH

Methods for Using Manual Phrase Alignment Data to Generate Translation Models for Statistical Machine Translation

ActiveUS20090177460A1High quality word alignmentImprove automatic translation qualityNatural language translationSpecial data processing applicationsGraphicsGraphical user interface
The present invention adopts the fundamental architecture of a statistical machine translation system which utilizes statistical models learned from the training data and does not require expert knowledge for rule-based machine translation systems. Out of the training parallel data, a certain amount of sentence pairs are selected for manual alignment. These sentences are aligned at the phrase level instead of at the word level. Depending on the size of the training data, the optimal amount for manual alignment may vary. The alignment is done using an alignment tool with a graphical user interface which is convenient and intuitive to the users. Manually aligned data are then utilized to improve the automatic word alignment component. Model combination methods are also introduced to improve the accuracy and the coverage of statistical models for the task of statistical machine translation.
Owner:NANT HLDG IP LLC

Generation method of single document summaries

The invention discloses a generation method of single document summaries. The method includes the steps of clustering paragraphs of a document to be summarized, and defining each class as a semantic block; calculating similarity of each sentence pair in the semantic blocks to score one sentence with the other sentence, and defining the sentence with highest score as a sentence expressing partial core content in each semantic block; and connecting the sentences to generate a summary according to emergency sequence of core sentences. Word similarity and named entity recognition are introduced to single document summaries, so that extracting precision of summaries is higher. Clustering speed is increased by means of single pass. Single document summaries can be extracted accurately. The generation method is high in accuracy of extracting news and announcement documents.
Owner:NINGBO CHENGDIAN TAIKE ELECTRONICS INFORMATION TECH DEV

Training-corpus quality evaluation and selection method orienting to statistical-machine translation

ActiveCN102945232AEnriching Sentence Pair Quality Evaluation FeaturesRealize automatic learningSpecial data processing applicationsSentence pairMachine translation system
The invention relates to a training-corpus quality evaluation and selection method orienting to statistical-machine translation. The training-corpus quality evaluation and selection method comprises the following steps of: automatic weight acquisition: adopting small-scale corpus to train an automatic weight acquisition model so as to obtain a characteristic weight and a classification critical value; sentence-pair quality evaluation: using the weight and the classification critical value as well as the original large-scale parallel corpuses as input, carrying out classification on the large-scale parallel corpuses by using a linear model for sentence-pair quality evaluation, and generating all corpus subsets; and high-quality corpus subset selection: on the basis of all the corpus subsets, considering the influence of the cover degree, and selecting the high-quality corpuses as training data of a statistical-machine translation system. The training-corpus quality evaluation and selection method has the advantages that richer sequence-pair quality evaluation characteristic is provided, so that the automatic learning of the characteristic weight is realized, and when the scale of the subsets reaches to 30%, the performance can reach 100%, even better; and the class of any input sequence pair can be divided, and help can be provided for tasks such as selection of high-quality corpus data.
Owner:沈阳雅译网络技术有限公司

Efficient Online Domain Adaptation

Systems and methods for efficient online domain adaptation are provided herein. Methods may include receiving a post-edited machine translated sentence pair, updating a machine translation model by adjusting translation weights for a translation memory and a language model while generating test machine translations of the machine translated sentence pair until one of the test machine translations approximately matches the post-edits for the machine translated sentence pair, and retranslating the remaining machine translation sentence pairs that have yet to be post-edited using the updated machine translation model.
Owner:SDL INK

Question sentence recommendation method and system

The invention provides a question sentence recommendation method. The method includes the following steps: S1, receiving corpus data, wherein the corpus data are multi-round question and answer data;S2, transforming the corpus data to generate positive example pairs, and generating counter example pairs through random sampling and combination with the corpus data; S3, carrying out word vectorization on the positive example pairs and the counter example pairs through a word2vec model to respectively acquire sentence vector matrices; S4, inputting the sentence vector matrices into a hidden layer, and carrying out dot product operating on the sentence vector matrices and a weight matrix to obtain new sentence vector matrices; S5, inputting the sentence vector matrices into a convolutional neural network, and carrying out convolution and pooling sampling operation to obtain semantic vectors of the sentences; and S6, carrying out non-linear transformation on the semantic vectors of the sentences, calculating cosine similarity of the semantic vectors of the positive example sentence pairs and cosine similarity of the counter example sentence pairs, and finally, acquiring a prediction model. The invention also provides a question sentence recommendation system used for realizing the above-mentioned method.
Owner:GUANGZHOU DUOYI NETWORK TECH +2

Word forecasting method and system based on nerve machine translation system

ActiveCN106844352AAccurately Obtain Predicted ProbabilitiesNatural language translationNeural architecturesSentence pairPrediction probability
The invention relates to a word forecasting method and system based on a nerve machine translation system. The word forecasting method includes the steps that parallel corpora are trained, extracting is carried out from the training result, and a phrase translation table is obtained; source language sentences in any parallel sentence pairs are subjected to matching searching, and all source language phrases contained in the source language sentences are determined; target phrase translation candidate sets corresponding to all the source language phrases respectively are found from the phrase translation table; part of obtained translations are translated according to the target phrase translation candidate sets and the nerve machine translation system, and target word sets needing to be encouraged are obtained; encouragement values of all target words in the target word sets are determined according to the attention probability and the target phrase translation candidate sets which are based on the nerve machine translation system; the prediction probability of all the target words is obtained according to the encouragement values of all the target words. The encouragement values of the target words are obtained in the mode that the phrase translation table is introduced and added into a nerve translation model, and therefore the prediction probability of the target words can be increased.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Neural machine translation method by introducing source language block information to encode

The present invention relates to a neural machine translation method for introducing source language block information to encode. The method comprises: inputting bilingual sentence-level parallel data, and carrying out word segmentation on the source language and the target language respectively to obtain bilingual parallel sentence pairs after being subject to word segmentation; encoding the source sentence in the bilingual parallel sentence pairs after being subject to word segmentation according to the time sequence, obtaining the state of each time sequence on the hidden layer of the lastlayer, and segmenting the input source sentence by blocks; according to the state of each time sequence of the source sentence and the segmentation information of the source sentence, obtaining the block encoding information of the source sentence; combing the time sequence encoding information with the block encoding information to obtain final source sentence memory information; and by dynamically querying the source sentence memory information, using attention mechanism to generate a context vector at each moment through a decoder network, and extracting feature vectors for word prediction.According to the method provided by the present invention, block segmentation is automatically carried out on the source sentence without the need of any pre-divided sentence to participate in the training, and the method can capture the latest and the best block segmentation manner of the source sentence.
Owner:沈阳雅译网络技术有限公司

Method and system for filtering bilingualism corpora

The invention discloses a filtering method of a bilingual corpus and the method comprises the following steps: A. ratio flag value of sentence length of English-Chinese bilingual sentence pair is confirmed; B. the number of different parts of speech in the English-Chinese bilingual sentence pair is respectively counted, the matching number of the corresponding words in a bilingual intertranslating dictionary and words of the part of speech are calculated and the interpretation eigenvalue is confirmed according to the number of different parts of speech and the matching number; C. the filtration and classification are carried out by the ratio eigenvalue of the sentence length and the interpretation eigenvalue according to a classification model established by using a training set in advance. The invention discloses a bilingual corpus system; the invention also provides a filtering method of the bilingual corpus and a system thereof, which are used for improving universality, accuracy and recalling rate of the corpus.
Owner:BEIJING KINGSOFT SOFTWARE +2

Related word mining method, search method and search system

The invention discloses a related word mining method. The method comprises: obtaining parallel sentence pairs expressing the same meaning with different expression forms based on large-scale user search behavior data; performing word segmentation processing on each set of parallel sentence pairs; performing word alignment processing on the parallel sentence pairs subjected to the word segmentation processing to obtain first aligned word pairs; calculating co-occurrence frequency of the first aligned word pairs; and determining the first aligned word pairs with the co-occurrence frequency higher than a predetermined threshold as related words. With the related word mining method, related words with higher relevancy can be mined, the retrieval word search range also can be expanded, and the probability of finding a better search result is increased. At the same time, the invention furthermore discloses a search method and a search system.
Owner:ALIBABA (CHINA) CO LTD

An ancient Chinese automatic translation method based on multi-feature fusion

ActiveCN109684648AIncreased accuracySolve the problem of unregistered wordsNatural language translationSpecial data processing applicationsWord listSentence pair
The invention discloses an ancient Chinese automatic translation method based on multi-feature fusion. The method comprises the following steps: 1) collecting a text, modern text translation data of the text, a text word list and modern Chinese monolingual corpus data; And 2) cleaning the data and constructing an ancient Chinese parallel corpus by using a sentence alignment method. And 3) carryingout word segmentation on the modern text and the ancient text by using a Chinese word segmentation tool; 4) performing topic modeling on the ancient text corpus to generate topics-Word distribution and word-Subject conditional probability distribution 5) using the modern Chinese monolingual corpus to train to obtain a modern Chinese language model; And obtaining an aligned dictionary by using ancient Chinese parallel corpora. 6) on the basis of the attention-based recurrent neural network translation model, fusing statistical machine translation characteristics such as a language model and analignment dictionary, and using an ancient Chinese parallel sentence pair and a word topic sequence training model, and 7) inputting a to-be-translated text by a user, and obtaining a modern text translation by using the model obtained by training in the step 6).
Owner:ZHEJIANG UNIV

Method for constructing Vietnamese dependency tree bank on basis of Chinese-Vietnamese vocabulary alignment corpora

The present invention relates to a method for constructing a Vietnamese dependency tree bank on the basis of Chinese-Vietnamese vocabulary alignment corpora and belongs to the technical field of natural language processing. According to the present invention, firstly, a Chinese-Vietnamese vocabulary alignment sentence pair library is constructed; then a Chinese dependency tree corpus is constructed; and according to the constructed Chinese-Vietnamese vocabulary alignment sentence pair library and Chinese dependency tree corpus, a Vietnamese dependency tree corpus is constructed. The Vietnamese dependency tree bank constructed by the method can provide powerful support for upper layer applications of syntactic analysis, machine translation, information acquisition and the like; a bilingual parallel dependency tree corpus is constructed; according to the method for constructing a dependency tree, which is disclosed by the present invention, the process of manually collecting and labeling the Vietnamese dependency tree bank is simplified and labor and time of constructing the tree bank are saved; and compared with a method adopting a machine to carry out learning, the method for constructing a dependency tree, which is disclosed by the present invention, is obviously improved in accuracy.
Owner:KUNMING UNIV OF SCI & TECH

Seq2seq model-based structured argument generation method and system

The invention discloses a seq2seq model-based structured argument generation method and system. The method comprises the following steps of preprocessing data to establish sentence pairs for an inputof a seq2seq model; receiving an input topic and converting words through vector conversion to obtain word vectors; inputting the word vectors to an encoder of the seq2seq model to perform calculationand encoding to obtain a context vector, and adopting an attention mechanism for the context vector; and through the encoder of the seq2seq model, decoding the context vector processed through the attention mechanism through a decoder of the seq2seq model to generate main and sub arguments related to the input topic, and according to the main and sub arguments, obtaining multiple sub-sub arguments in the main and sub arguments. According to the method, randomly initialized word vectors of an original model can be replaced with pre-trained word vectors; and the attention mechanism is added tothe seq2seq model, so that the model reliability is improved and the model accuracy and consistency are effectively ensured.
Owner:CAPITAL NORMAL UNIVERSITY

Error word correction method and device, computer device and storage medium

ActiveCN110110041ASolve the problem of not being able to accurately predict proper wordsNatural language data processingSpeech recognitionData setSentence pair
The invention provides an error word correction method and device, a computer device and a storage medium. The error word correction method comprises the steps of obtaining a general natural languagedata set; converting each sentence contained in the natural language data set into a Pinyin sequence to obtain Pinyin-Pinyin of the universal natural language data set; sentence pairs; pinyin-partialof a generic natural language dataset Performing pinyin replacement on the sentence pairs to obtain a first sample set; pre-training the neural network model by using the first sample set to obtain apre-trained neural network model; pinyin-containing similar pinyin related to specific fields is acquired Taking the sentence pair as a second sample set; performing fine tuning on the pre-trained neural network model by using the second sample set to obtain a fine-tuned neural network model; and inputting the Pinyin sequence of the sentence to be corrected into the finely adjusted neural networkmodel for correcting to obtain the sentence subjected to error correction. According to the invention, error correction can be carried out on special words identified as common words in language identification.
Owner:PING AN TECH (SHENZHEN) CO LTD

Hierarchical structure-based neural-network machine translation model

The invention discloses a hierarchical structure-based neural-network machine translation model, and relates to natural language processing based on deep learning. A word alignment tool GIZA++ is used to carry out word alignment on parallel training sentence pairs, and then source language sentences are divided into clauses of monotonous translation according to punctuation and word alignment information; the above-mentioned obtained clause data are used to train a clause classifier; hierarchical structure modeling is carried out on the source language sentences of the parallel sentence pairs; and hierarchical structure decoding is carried out on target language sentences of the parallel sentence pairs. The sentences are divided into the clauses of monotonous translation, and then word-clauses-sentence hierarchical-modeling, attention mechanisms and decoding are carried out: and a bottom-layer recurrent neural network (GRU) encodes semantic representations of the clauses, an upper-layer recurrent neural network encodes information of the sentences, the bottom-layer attention mechanism is devoted to word-level alignment inside the clauses, and the upper-layer attention mechanism is devoted to clause-level alignment.
Owner:XIAMEN UNIV

Sentence similarity calculating method and device as well as system

The invention discloses a sentence similarity calculating method and device as well as system. The sentence similarity calculating method comprises the following steps: acquiring a sentence pair of to-be-calculated similarity; building a dependence syntactic tree of each sentence in the sentence pair; and calculating similarity between sentences in the sentence pair according to a pre-built sentence similarity calculation model and the dependence syntactic tree of each sentence. According to the method disclosed by the invention, the calculation accuracy of the sentence similarity can be improved.
Owner:IFLYTEK CO LTD

Carapace bone script explanation machine translation method based on example

InactiveCN102693222ARealize vernacular interpretationLowering the Barriers to ResearchSpecial data processing applicationsSentence pairDisplay device
The invention discloses a carapace bone script explanation machine translation method based on an example, which comprises the steps of: (a) building to finish a carapace bone script explanation-modern Chinese language bilingual corpus; (b) finishing the sentence alignment, the phrase alignment and the word alignment of the bilingual corpus, and building a translation example library; (c) inputting a carapace bone script explanation to be translated; (d) on the basis of the translation example library built in step (b), carrying out full-example matching or parts-of-example matching retrieval to the input carapace bone script explanation to be translated; (e) displaying a final translation result to a user via a display; and (f) evaluating the translation result, and adding a bilingual sentence pair which satisfies a paraphrasing requirement into the translation example library. With the carapace bone script explanation machine translation method based on an example, which utilizes the storage and inquiry advantages of the computer, burden of carapace bone script experts is lightened, and a carapace bone script research threshold is lowered.
Owner:熊晶 +4

Method and system for determining intertranslation relationship of bilingual sentence pairs

The invention discloses a method and system for determining an intertranslation relationship of bilingual sentence pairs. The method comprises a step of determining matching feature values of the bilingual sentence pairs, performing filtering and classification on the bilingual sentence pairs according to the weights of the matching feature values in the intertranslation relationship according to a pre-established training classification model, and determining whether the bilingual sentence pairs are bilingual sentence pairs satisfying the requirements of the intertranslation relationship. Therefore, by adoption of the method for determining the intertranslation relationship of the bilingual sentence pairs provided by the embodiment of the invention, a bilingual corpus with a huge data size can be processed quickly and conveniently. The problem of determining the intertranslation relationship of the bilingual sentence pairs is converted into a binary classification problem by using the classification idea of the training classification model, so that the weights of the matching features of the bilingual corpus can be determined more scientifically and reasonably, and compared with the existing experience method, the universality is better, and the accuracy and the recall rate are improved accordingly.
Owner:BEIJING KINGSOFT OFFICE SOFTWARE INC +1

System and method for automatically expanding input text

Provided is a method of automatically expanding input text. The method includes receiving input text composed of a plurality of documents, extracting a sentence pair that is present in different documents among the plurality of documents, setting the extracted sentence pair as an input of an encoder of a sequence-to-sequence model, setting an output of the encoder as an output of a decoder of the sequence-to-sequence model and generating a sentence corresponding to the input, and generating expanded text based on the generated sentence.
Owner:ELECTRONICS & TELECOMM RES INST

Dependency coherence constraint-based automatic alignment method for bilingual words

The invention discloses a dependency coherence constraint-based automatic alignment method for bilingual words. The method comprises the following steps of: performing dependency parsing on a training sentence pair; in a training stage, training a word alignment model based on a dependency coherence constraint between a source language end and a target language end by utilizing the training sentence pair and a dependency syntax tree; and in a test stage, generating word alignment results in line with the dependency coherence constraint between the source language end and the target language end for a test sentence pair by utilizing the word alignment model based on the dependency coherence constraint between the source language end and the target language end, and combining the two word alignment results to generate a word alignment result in line with a bilingual dependency coherence constraint, wherein the word alignment result combines accuracy and a recalling rate. Compared with the prior art, the method is low in word alignment error rate.
Owner:北京中科凡语科技有限公司

Method for improving bilingual corpus, device for improving bilingual corpus, machine translation method and machine translation device

According to one aspect, there is provided an apparatus for improving a bilingual corpus including a plurality of sentence pairs of a first language and a second language and word alignment information of each of the sentence pairs, the apparatus comprises: an extracting unit for extracting a split candidate from word alignment information of a given sentence pair; a calculating unit for calculating split confidence of said split candidate; a comparing unit for comparing said split confidence and a pre-set threshold; and a splitting unit for splitting said given sentence pair at said split candidate in a case that said split confidence is larger than said pre-set threshold.
Owner:KK TOSHIBA

A Chinese semantic matching system and method

The invention relates to a Chinese semantic matching system and a Chinese semantic matching method. The method comprises the following steps of: collecting a public Quora English data set and crawlingthe required Chinese data set from a network; processing the data; converting the data into input data that can be recognized by a network; a sentence pair semantic feature extraction model based onattention mechanism and BiLSTM is constructed, and the input data is processed by the semantic feature extraction model to obtain the semantic features of the input data. The extracted semantic features are fused and calculated, and the predicted results are outputted. Compared with the prior art, the invention can better capture more semantic information between two sentence pairs, thereby improving the accuracy of judging problems.
Owner:GUILIN UNIV OF ELECTRONIC TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products