Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

62 results about "Bilingual dictionary" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

A bilingual dictionary or translation dictionary is a specialized dictionary used to translate words or phrases from one language to another. Bilingual dictionaries can be unidirectional, meaning that they list the meanings of words of one language in another, or can be bidirectional, allowing translation to and from both languages. Bidirectional bilingual dictionaries usually consist of two sections, each listing words and phrases of one language alphabetically along with their translation. In addition to the translation, a bilingual dictionary usually indicates the part of speech, gender, verb type, declension model and other grammatical clues to help a non-native speaker use the word. Other features sometimes present in bilingual dictionaries are lists of phrases, usage and style guides, verb tables, maps and grammar references. In contrast to the bilingual dictionary, a monolingual dictionary defines words and phrases instead of translating them.

Intelligent dictionary input method

InactiveUS20020077808A1Reduce in quantityConvenient and fast inputAlphabetical characters enteringSubstation equipmentBilingual dictionaryAdvisory committee

An intelligent dictionary input method used on a standard keypad device of the International Telegraph and Telephone Consultative Committee (CCITT) for realizing the input of bilingual dictionaries and European letters and implementing intelligent dictionary character input by searching and predicting according to the currently entered characters and the dictionary data. The method comprises the following steps: first, key messages are received from a CCITT keypad. When the pressed keys are the number keys [2] through [9], the keys are then converted into the corresponding letters and numbers according to the CCITT keypad mapping rules. When a letter is entered, the system performs searching in the dictionary database according to the searching range of the converted letters. The searching result is then displayed, completing the process of letter input.

Intelligent dictionary input method

Intelligent dictionary input method

Intelligent dictionary input method

Owner:INVENTEC APPLIANCES CORP

System and method for cross-language knowledge searching

ActiveUS20070094006A1Easy constructionNatural language translationDigital data information retrievalSemantic analyzerKnowledge extraction

A system and method for cross-language knowledge searching. The system has a Semantic Analyzer, a natural language user request / document search pattern / semantic index Generator, a user request search pattern Translator and a Knowledge Base Searcher. The system also provides automatic semantic analysis and semantic indexing of natural language user requests / documents on knowledge recognition and cross-language relevant to user request knowledge extraction / searching. System functionality is ensured by Linguistic Knowledge Base as well as by a number of unique bilingual dictionaries of concepts / objects and actions.

System and method for cross-language knowledge searching

System and method for cross-language knowledge searching

System and method for cross-language knowledge searching

Owner:ALLIUM US HLDG LLC

Bilingual authoring assistant for the "tip of the tongue" problem

InactiveUS20060136223A1Natural language translationSpeech analysisBilingual dictionaryContext sensitivity

A bilingual authoring apparatus includes a user interface (20) for inputting partially translated text including a text portion in a source language and surrounding or adjacent text in a target language. A bilingual dictionary (34) associates words and phrases in the target language and words and phrases in a source language. A context sensitive translation tool (30, 32, 38) communicates with the user interface, receives the partially translated text, and provides at least one proposed translation in the target language of the text portion in the source language. The at least one proposed translation in the target language is derived from the bilingual dictionary based on contextual analysis of at least a portion of the partially translated text.

Bilingual authoring assistant for the "tip of the tongue" problem

Bilingual authoring assistant for the "tip of the tongue" problem

Bilingual authoring assistant for the "tip of the tongue" problem

Owner:XEROX CORP

Machine translation using learned word associations without referring to a multi-lingual human authored dictionary of content words

ActiveUS7356457B2Natural language data processingSpecial data processing applicationsBilingual dictionaryMachine translation

A method and computer-readable medium are provided that perform a series of steps associated with machine translation. These steps include using a first text in a first language and a second text in a second language, to produce an association list where words in the first language are associated with words in the second language. A first syntactic structure for a sentence from the first text is aligned with a second syntactic structure for a sentence in the second text based on the association list without referring to a bilingual dictionary of content words. The association list is also used during translations. Specifically, a word in the first language is translated into a word in the second language based on an entry in the association list without referring to a bilingual dictionary that contains content words. Thus, training and translation are performed without the need for a bilingual dictionary of content words.

Machine translation using learned word associations without referring to a multi-lingual human authored dictionary of content words

Machine translation using learned word associations without referring to a multi-lingual human authored dictionary of content words

Machine translation using learned word associations without referring to a multi-lingual human authored dictionary of content words

Owner:MICROSOFT TECH LICENSING LLC

System and method for cross-language knowledge searching

ActiveUS7672831B2Easy constructionNatural language translationDigital data information retrievalKnowledge extractionBilingual dictionary

A system and method for cross-language knowledge searching. The system has a Semantic Analyzer, a natural language user request / document search pattern / semantic index Generator, a user request search pattern Translator and a Knowledge Base Searcher. The system also provides automatic semantic analysis and semantic indexing of natural language user requests / documents on knowledge recognition and cross-language relevant to user request knowledge extraction / searching. System functionality is ensured by Linguistic Knowledge Base as well as by a number of unique bilingual dictionaries of concepts / objects and actions.

System and method for cross-language knowledge searching

System and method for cross-language knowledge searching

System and method for cross-language knowledge searching

Owner:ALLIUM US HLDG LLC

Word alignment apparatus, example sentence bilingual dictionary, word alignment method, and program product for word alignment

ActiveUS20070174040A1Natural language translationSpecial data processing applicationsBilingual dictionaryBipartite graph matching

A word alignment apparatus includes a word extracting portion that extracts each word from an example sentence and from a translation sentence thereof, an alignment calculator that calculates at least one of a similarity degree and an association degree between a word in a first language and that in a second language to perform an alignment between words respectively included in the example sentence in the first language and those included in the translation sentence thereof in the second language on the basis of a calculated value, and an optimization portion that optimizes the alignment by performing a bipartite graph matching.

Word alignment apparatus, example sentence bilingual dictionary, word alignment method, and program product for word alignment

Word alignment apparatus, example sentence bilingual dictionary, word alignment method, and program product for word alignment

Word alignment apparatus, example sentence bilingual dictionary, word alignment method, and program product for word alignment

Owner:FUJIFILM BUSINESS INNOVATION CORP

Unilingual translator

InactiveUS7319949B2Natural language translationDoor/window protective devicesProgramming languageText entry

A machine translator trained with textual inputs generated by other machine translators is disclosed. A textual input in a first language is provided by a user or other source. This textual input is then translated by a first machine translator to generate a translated version of the textual input in a second language. The textual input and the translated version are parsed and passed through a training architecture to develop a transfer mapping, and a bilingual dictionary. These components are then used by a second machine translator when translating other textual inputs.

Unilingual translator

Unilingual translator

Unilingual translator

Owner:MICROSOFT TECH LICENSING LLC

Translation support program and word association program

InactiveUS20050267734A1Improve translation qualityEasy to associateNatural language translationSpecial data processing applicationsBilingual dictionaryHuman language

A translation support program that makes association between words or phrases in an original sentence and a corresponding translation sentence easy. A similar translation example retrieval section extracts similar translation examples similar to an input sentence and rearranges the similar translation examples in order according to their similarity degrees. When a search result display section displays search results, first a word association section performs association between words or phrases in a first language sentence in a first language and a second language sentence in a second language included in each similar translation example by the use of a bilingual dictionary. The similar translation examples are displayed in order of similarity degree. Combinations of three sentences, that is to say, of the input sentence and a first language sentence in the first language and a second language sentence in the second language included in each similar translation example are displayed. Corresponding words or phrases in the input sentence and a first language sentence in the first language and a second language sentence in the second language included in each similar translation example are highlighted.

Translation support program and word association program

Translation support program and word association program

Translation support program and word association program

Owner:FUJITSU LTD

Word alignment apparatus, method, and program product, and example sentence bilingual dictionary

ActiveUS8069027B2Natural language translationSpecial data processing applicationsBilingual dictionaryBipartite graph matching

A word alignment apparatus includes a word extracting portion that extracts each word from an example sentence and from a translation sentence thereof, an alignment calculator that calculates at least one of a similarity degree and an association degree between a word in a first language and that in a second language to perform an alignment between words respectively included in the example sentence in the first language and those included in the translation sentence thereof in the second language on the basis of a calculated value, and an optimization portion that optimizes the alignment by performing a bipartite graph matching.

Word alignment apparatus, method, and program product, and example sentence bilingual dictionary

Word alignment apparatus, method, and program product, and example sentence bilingual dictionary

Word alignment apparatus, method, and program product, and example sentence bilingual dictionary

Owner:FUJIFILM BUSINESS INNOVATION CORP

Method and device for aligning sentences in bilingual corpus

InactiveCN102855263ASimplify the alignment processQuick fixSpecial data processing applicationsBilingual dictionaryHuman language

The embodiment of the invention discloses a method and a device for aligning sentences in a bilingual corpus. A source language corpus and a target language corpus in the bilingual corpus are in block alignment. The method comprises the following steps of: aiming at each alignment block in a source language and a target language, generating a candidate translation pair list according to a source keyword list and a target keyword list which are extracted from a source block and a target block respectively; generating a bilingual dictionary according to the translation probability of each translation pair in the candidate translation pair list; expanding the bilingual dictionary by taking a source-target keyword pair in each item in the bilingual dictionary as a seed translation pair in reference to contents of a text of the seed translation pair; translating a source sentence in the source block into a target language, and calculating the similarity between a translation result and a target sentence in the target block; and aligning the source sentence to the target sentence according to the similarity. By the embodiment of the invention, the flow of aligning the sentences can be simplified and the sentence alignment efficiency is improved.

Method and device for aligning sentences in bilingual corpus

Method and device for aligning sentences in bilingual corpus

Method and device for aligning sentences in bilingual corpus

Owner:FUJITSU LTD

Multiclass emotion analyzing method and system facing bilingual microblog text

InactiveCN104331506AEasy to classifyNatural language data processingSpecial data processing applicationsClass modelClassification methods

The invention relates to a multiclass emotion analyzing method and a system facing a bilingual microblog text and belongs to the technical field of microblog text emotion analysis. The method comprises the following steps that (1) bilingual dictionary construction: corpus with an emotion inclination of a certain size is first collected, high frequent words with the emotion inclination can be extracted from the corpus, an emotional dictionary is then expanded by using an existing knowledge database and a vocabulary similarity calculating model, and finally network language and emotional signs are added in the emotional dictionary; (2) text pretreatment: the words are divided in a to-be-identified text, stop words are removed, and standardization treatment is conducted on English word shapes; (3) text characteristic space expression: the bilingual emotional dictionary is used for conducting vectorization on the text; (4) an emotional identifying task of the corpus text is realized through a multi emotion class model. The accurate rate and the F1 valve of the method are higher than those of a traditional classification method, and particularly the classification effect of a semi-supervised Gaussian mixture model classification algorithm in a small-scale training set is obviously better than that of the other methods.

Multiclass emotion analyzing method and system facing bilingual microblog text

Multiclass emotion analyzing method and system facing bilingual microblog text

Multiclass emotion analyzing method and system facing bilingual microblog text

Owner:BEIJING INSTITUTE OF TECHNOLOGYGY

Multilingual text data sorting treatment method

InactiveCN103488623AReduce lossesSmall resource dependenciesSpecial data processing applicationsWord listClassification methods

The invention discloses a self-learning sorting method relating to multilingual data treatment, comprising the steps of extracting candidate emotion words by a first seed word Chinese or foreign language 'very', filtering stop words, and automatically obtaining a stop word list from a language database; simultaneously carrying out support or opposing clustering on the emotion words and emotion texts by a second seed word 'good' and a third seed word 'bad' or foreign languages 'good' and 'bad'; building an emotion classifier by semi-supervised learning, training the initial classifier by selecting convinced samples from a clustering result, and selecting new samples to be added into a training set by fusing emotion scores of the texts and the posterior probability of the classifier. According to the sorting method, the method facing multilingual opinion analysis is irrelevant with languages, a machine translation system and a large-scale bilingual dictionary are not needed, the emotion classifier is directly learned on a target language, the resource dependence is the smallest, and for each target language, only three seed words are needed and other priori knowledge is not needed.

Multilingual text data sorting treatment method

Multilingual text data sorting treatment method

Multilingual text data sorting treatment method

Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Statistical machine translation apparatus and method

InactiveUS20100088085A1Natural language translationSpecial data processing applicationsMorphemeBilingual dictionary

A statistical machine translation apparatus and method reflecting linguistic information are provided. In the process of generating a translation model based on statistical information on source language sentences and target language sentences during word alignment, the translation model is generated using word alignment results that are amended based on a bilingual dictionary. Further, instead of using the source language sentence and the target language sentence (i.e., their bilingual corpora) as materials to generate the translation model, it is determined whether or not the morphemes are meaningful content words in the source and target language sentences. Based on the determination, pre-processing is performed on the source language sentence and the target language sentence.

Statistical machine translation apparatus and method

Statistical machine translation apparatus and method

Statistical machine translation apparatus and method

Owner:SAMSUNG ELECTRONICS CO LTD

Method for matching of bilingual texts and increasing accuracy in translation systems

InactiveUS20080126074A1Improve accuracyAccurate similarityNatural language translationSpecial data processing applicationsBilingual dictionaryMatching methods

A method is disclosed for translation of an input sentence in a source language to an output sentence in a target language using a store comprising a plurality of example sentences in the source language each paired with its translation in the target language. The method provides for improved matching of the input text against the store of example sentences by analysing both the sentences in the store and the input sentence using a bilingual resource combining aspects of a bilingual dictionary and thesaurus in order to determine the senses and translations of the words in the input sentence and the examples.

Method for matching of bilingual texts and increasing accuracy in translation systems

Method for matching of bilingual texts and increasing accuracy in translation systems

Method for matching of bilingual texts and increasing accuracy in translation systems

Owner:SHARP KK

Methods and apparatuses for identifying bilingual lexicons in comparable corpora using geometric processing

ActiveUS7620539B2Easy to useImprove usabilityNatural language translationSemantic analysisPattern recognitionBilingual dictionary

Various methods formulated using a geometric interpretation for identifying bilingual pairs in comparable corpora using a bilingual dictionary are disclosed. The methods may be used separately or in combination to compute the similarity between bilingual pairs.

Methods and apparatuses for identifying bilingual lexicons in comparable corpora using geometric processing

Methods and apparatuses for identifying bilingual lexicons in comparable corpora using geometric processing

Methods and apparatuses for identifying bilingual lexicons in comparable corpora using geometric processing

Owner:CONDUENT BUSINESS SERVICES LLC

Bilingual authoring assistant for the “tip of the tongue” problem

InactiveUS7827026B2Natural language translationSpeech analysisSoftware engineeringBilingual dictionary

A bilingual authoring apparatus includes a user interface (20) for inputting partially translated text including a text portion in a source language and surrounding or adjacent text in a target language. A bilingual dictionary (34) associates words and phrases in the target language and words and phrases in a source language. A context sensitive translation tool (30, 32, 38) communicates with the user interface, receives the partially translated text, and provides at least one proposed translation in the target language of the text portion in the source language. The at least one proposed translation in the target language is derived from the bilingual dictionary based on contextual analysis of at least a portion of the partially translated text.

Bilingual authoring assistant for the “tip of the tongue” problem

Bilingual authoring assistant for the “tip of the tongue” problem

Bilingual authoring assistant for the “tip of the tongue” problem

Owner:XEROX CORP

Method for extracting chapter-level parallel phrase pair of comparable corpus based on parallel corpus training

ActiveCN104391885ASolve the scarcityImprove performanceNatural language translationSpecial data processing applicationsSupport vector machineBilingual dictionary

The invention discloses a method for extracting a chapter-level parallel phrase pair of a comparable corpus based on parallel corpus training and relates to a method for extracting the parallel phrase pair of the comparable corpus. The method solves the problems that acquisition of a parallel corpus needs high expenditure, and when two most similar contextual words or fragments are mutually translated and applied to the comparable corpus, serious dependency to a bilingual dictionary is caused. The method comprises the following steps of 1, providing a source language sentence set S and a target language sentence set T; 2, obtaining a phrase pair set of the parallel corpus; 3, obtaining a parallel phrase pair of the parallel corpus; 4, obtaining a non-parallel phrase pair of the parallel corpus; 5, obtaining a binary classifier of a support vector machine; 6, extracting a candidate parallel phrase pair <s, t>; 7, obtaining the parallel phrase pair containing a noise in the comparable corpus; 8, obtaining the parallel phrase pair of the comparable corpus; 9, obtaining an extension decoder. The method is applied to the field of extraction of the parallel phrase pair of the comparable corpus.

Method for extracting chapter-level parallel phrase pair of comparable corpus based on parallel corpus training

Method for extracting chapter-level parallel phrase pair of comparable corpus based on parallel corpus training

Method for extracting chapter-level parallel phrase pair of comparable corpus based on parallel corpus training

Owner:哈尔滨工业大学高新技术开发总公司

Establishing device and method for multilingual dictionary

InactiveCN102789461ASave manpower and material resourcesGuaranteed accuracySpecial data processing applicationsParaphraseMaterial resources

The invention provides a multilingual dictionary establishing device which may comprises a monolingual dictionary module, a keyword extraction module, a bilingual dictionary module and a translation confirmation module. The monolingual dictionary module selects words from a preset monolingual dictionary and obtains paraphrases of each sense corresponding to the words; the keyword extraction module extracts key words from the paraphrases; the bilingual dictionary module inquires out translation words of the words from a preset bilingual dictionary, wherein one language of the bilingual dictionary is the same with the language of the monolingual dictionary; and the translation confirmation module calculates similarities of the translation words with the words and the key words so as to select the final translation words corresponding to each sense for the words from the translation words and generate the multilingual dictionary. The invention further provides a multilingual dictionary establishing method. According to the device and the method, the multilingual dictionary can be established automatically, manpower and material resources for dictionary establishment are saved, the accuracy of the generated multilingual dictionary is guaranteed, and the compilation of the multilingual dictionary can be finished based on ordinary monolingual and bilingual dictionaries.

Establishing device and method for multilingual dictionary

Establishing device and method for multilingual dictionary

Establishing device and method for multilingual dictionary

Owner:FUJITSU LTD

Human body behavior identification method based on thematic knowledge transfer

ActiveCN103500340ALow costImprove recognition rateCharacter and pattern recognitionHuman bodyVideo monitoring

The invention discloses a human body behavior identification method based on thematic knowledge transfer. The method comprises the steps that a bilingual dictionary at a training visual angle and a test visual angle is built, wherein the bilingual dictionary is used for transforming low-layer features of the same action at the two visual angles to the same representation; three steps of low-layer feature extraction, middle-layer feature extraction and bilingual dictionary obtaining are included; all action videos at the training angle are adopted, lower-layer features of different actions at the training angle are transformed to representations respectively through the bilingual dictionary, and classified models recognizing the different actions are trained; test action videos at the test visual angle are adopted, lower-layer features of actions at the test angle are transformed to representations through the bilingual dictionary, and reorganization results of the actions are obtained through the classified models. The human body behavior identification method based on thematic knowledge transfer significantly improves the recognition rate of human body behaviors at the crossed visual angles, has high robustness for change of the visual angles, and has significant value in video monitoring.

Human body behavior identification method based on thematic knowledge transfer

Human body behavior identification method based on thematic knowledge transfer

Human body behavior identification method based on thematic knowledge transfer

Owner:NANJING UNIV OF POSTS & TELECOMM

Method and device for generating version and machine translation

InactiveCN101271452ANatural language translationSpecial data processing applicationsBilingual dictionaryOptimal combination

The present invention provides a translating method, a machine translating method, a translating device and a machine translating device. According to one aspect, the present invention provides the translating method; wherein, a first-language sentence to be translated is divided into a plurality of sections; a bilingual dictionary for alignment includes pairs of corresponding first-language and second-language example sentences and the alignment information of each pair of example sentences, as well as at least a translation section of each corresponding second-language translation of first-language sections. The method includes that the combination of second-language translation sections is optimally selected from the combination of a plurality of corresponding second-language translation sections to the first-language sentence according to the comprehensive score of the combination of the translation sections based on a plurality of feature functions, and the second-language translation is generated according to the optimal combination of the translation sections.

Method and device for generating version and machine translation

Method and device for generating version and machine translation

Method and device for generating version and machine translation

Owner:KK TOSHIBA

Bilingual dictionary construction method and device

ActiveCN107315741ANatural language translationNeural architecturesDiscriminatorNetwork model

The invention provides a bilingual dictionary construction method and device, which are used for solving the problem of how to automatically construct a bilingual dictionary without depending on a seed bilingual dictionary. The bilingual dictionary construction method comprises the steps of S101, inputting a monolingual corpus A of a language a, inputting a monolingual corpus B of a language b, and representing words in the monolingual corpus A and words in the monolingual corpus B as word vectors; S102, performing training to obtain a mapping relationship between the word vectors of the monolingual corpus A and the word vectors of the monolingual corpus B; and S103, constructing the bilingual dictionary according to the mapping relationship. According to the method and the device, a neural network model consisting of a generator and a discriminator is built from the monolingual word vectors obtained by training the monolingual corpora; and by designing a proper loss function and training technology, the mapping relationship between the word vectors of the two languages is directly obtained, and the bilingual dictionary is constructed, so that the bilingual dictionary can be finished without depending on the seed bilingual dictionary.

Bilingual dictionary construction method and device

Bilingual dictionary construction method and device

Bilingual dictionary construction method and device

Owner:TSINGHUA UNIV

Chinese- Vietnamese unsupervised neural machine translation method fusing EMD minimized bilingual dictionary

ActiveCN111753557AImprove translation qualityImprove performanceNatural language translationWeb data indexingPattern recognitionSingle sentence

The invention relates to a Chinese-Vietnamese unsupervised neural machine translation method fusing an EMD minimized bilingual dictionary, and belongs to the technical field of machine translation. The method comprises the steps of collecting corpora; crawling Chinese and Vietnamese single sentences by using a web crawler; firstly, training monolingual word embedding of Chinese and Vietnamese respectively, and obtaining a Chinese-Vietnamese bilingual dictionary through EMD training of minimized word embedding distribution; taking the dictionary as a seed dictionary for training to obtain Chinese-Vietnamese bilingual word embedding; and finally, embedding the bilingual words into an unsupervised machine translation model of a shared encoder to construct the Chinese-Vietnamese neural machinetranslation method fusing the EMD minimized bilingual dictionary. According to the method, the performance of the Hami unsupervised neural machine translation can be effectively improved.

Chinese- Vietnamese unsupervised neural machine translation method fusing EMD minimized bilingual dictionary

Chinese- Vietnamese unsupervised neural machine translation method fusing EMD minimized bilingual dictionary

Chinese- Vietnamese unsupervised neural machine translation method fusing EMD minimized bilingual dictionary

Owner:KUNMING UNIV OF SCI & TECH

Weak supervision Chinese-Vietnamese bilingual dictionary construction method based on English pivot

ActiveCN111310480AReduce language differencesAvoid dependenceNatural language translationWeb data indexingBilingual dictionaryChinese word

The invention relates to a weak supervision Chinese-Vietnamese bilingual dictionary construction method based on an English pivot, and belongs to the technical field of natural language processing. The method comprises the following steps of: respectively collecting monolingual corpora of Chinese, English and Vietnamese and preprocessing the corpora; aligning the Chinese word vectors to an Englishword vector sharing space based on a seed dictionary method; learning a mapping relationship between the Chinese word vectors through an adversarial network in the English word vector sharing space;and different extraction strategies are adopted to extract the Han-cross dictionary. According to the method, the accuracy of automatically constructing the Han-crossing dictionary is greatly improved. The problems that in an existing Chinese-Vietnamese bilingual dictionary construction method, parallel corpora, seed dictionaries and the like are very scarce and difficult to label, and an existingmethod is poor in construction effect are solved.

Weak supervision Chinese-Vietnamese bilingual dictionary construction method based on English pivot

Weak supervision Chinese-Vietnamese bilingual dictionary construction method based on English pivot

Weak supervision Chinese-Vietnamese bilingual dictionary construction method based on English pivot

Owner:KUNMING UNIV OF SCI & TECH

Method for interactively extracting comparable corpus and bilingual dictionary and device thereof

ActiveCN104572634AReduce dependencyImprove accuracySpecial data processing applicationsPart of speechBilingual dictionary

The invention relates to a method for interactively extracting a comparable corpus and a bilingual dictionary and a device thereof, and aims to overcome the defects of difficulty in identifying the comparable corpus under the condition of insufficient domain seed bilingual dictionary scale and difficulty in extracting inter-translation vocabulary under the condition of different comparable degrees. The method comprises the following steps: performing word characteristic reduction, word segmentation and stop word removing on a document to obtain a preprocessed document set and a vocabulary set; constructing relations between a source language document and a target language document, between source language vocabulary and target language vocabulary and between a bilingual vocabulary pair and a bilingual document pair respectively; iterating, enhancing and calculating the weights of the bilingual document pair and the bilingual vocabulary pair; selecting a bilingual document pair of which the weight is the largest for constructing the comparable corpus, and selecting a bilingual vocabulary pair of which the weight is the largest for constructing the bilingual dictionary. The judgment that similarity among different language vocabularies is facilitated through the similarity among different language documents is performed, the similarity among different language documents is increased through the similarity among different language vocabularies, and synchronous extraction of the comparable corpus and the bilingual dictionary is realized through interactive iteration and enhancement.

Method for interactively extracting comparable corpus and bilingual dictionary and device thereof

Method for interactively extracting comparable corpus and bilingual dictionary and device thereof

Method for interactively extracting comparable corpus and bilingual dictionary and device thereof

Owner:HEFEI INSTITUTES OF PHYSICAL SCIENCE - CHINESE ACAD OF SCI

Vietnamese event entity recognition method fusing dictionary and adversarial migration

ActiveCN112926324AImprove entity recognitionRich semantic representationNatural language translationSemantic analysisSemantic representationData set

The invention relates to a Vietnamese event entity recognition method fusing a dictionary and adversarial migration. The Vietnamese is taken as the target language, the English and the Chinese are respectively taken as the source languages, and the entity identification effect of the target language is improved by utilizing the entity labeling information of the source languages and the bilingual dictionary. According to the method, firstly, semantic space sharing of a source language and a target language is achieved through word-level adversarial migration, then multi-granularity feature embedding is conducted by fusing a bilingual dictionary to enrich semantic representation of target language words, then sequence features irrelevant to languages are extracted through sentence-level adversarial migration, and finally an entity recognition result is marked through CRF. Experimental results on a Vietnamese news data set show that under the condition that source languages are English and Chinese, compared with a monolingual entity recognition model and a current mainstream transfer learning model, the model has the advantages that the entity recognition effect of the provided model is improved, and compared with the monolingual entity recognition model, the model has the advantages that F1 values are increased by 19.61 and 18.73 respectively.

Vietnamese event entity recognition method fusing dictionary and adversarial migration

Vietnamese event entity recognition method fusing dictionary and adversarial migration

Vietnamese event entity recognition method fusing dictionary and adversarial migration

Owner:KUNMING UNIV OF SCI & TECH

Old-Chinese bilingual corpus construction method and device with Thai language as pivot

ActiveCN110717341ASolve the scarcityNatural language translationSemantic analysisPivot languageBilingual dictionary

The invention relates to an old-Chinese bilingual corpus construction method and device taking Thai language as a pivot, and belongs to the field of natural language processing. The method comprises the steps of firstly performing Thai word segmentation processing on Chinese-Thai parallel corpus data; constructing a Lao-Thai bilingual dictionary, and translating Thai sentences into Lao sentence sequences word by word by using the Lao-Thai bilingual dictionary to obtain candidate Lao-Thai parallel sentence pairs; constructing a two-way LSTM-based Lao language-Thai language parallel sentence pair classification model, and classifying the candidate Lao language-Thai language parallel sentence pairs to obtain Lao language-Thai language bilingual parallel sentence pairs; using the Thai languageas a pivot language to match the Lao language and the Chinese language, and a Lao language-Chinese bilingual parallel corpus is constructed. According to the old-Chinese bilingual parallel corpus construction device taking Thai language as pivot language, the problem of scarcity of old language-Chinese corpus is solved, and the old-Chinese bilingual parallel corpus construction device has certaintheoretical significance and practical application value for construction of the old-Chinese bilingual corpus.

Old-Chinese bilingual corpus construction method and device with Thai language as pivot

Old-Chinese bilingual corpus construction method and device with Thai language as pivot

Old-Chinese bilingual corpus construction method and device with Thai language as pivot

Owner:KUNMING UNIV OF SCI & TECH

Mongolian-Chinese machine translation method based on neural network Turing machine

ActiveCN110619127AHigh precisionHigh speedNeural architecturesSpecial data processing applicationsInternal memoryNerve network

A Mongolian-Chinese machine translation method based on a neural network Turing machine comprises the following steps: firstly, preprocessing Mongolian-Chinese bilingual corpus, vectorizing the Mongolian-Chinese bilingual corpus, and constructing a bilingual dictionary on the basis of the Mongolian-Chinese bilingual corpus; secondly, further expanding storage through a neural network totem machine(NTM), expanding from an internal memory unit of the LSTM to an external memory, introducing a memory mechanism, realizing semantic relation extraction, and giving a semantic relation between two entity words; and finally, searching an optimal solution through decoder model training. Compared with the prior art, according to the invention, semantic analysis is carried out by means of a neural totem machine; related semantic knowledge is found out and extracted, the accuracy of natural language processing is greatly improved by means of the semantic knowledge, corpora are preprocessed by meansof parallel work of a CPU and a GPU, the speed is increased by nearly one time, and the overall translation quality is further improved.

Mongolian-Chinese machine translation method based on neural network Turing machine

Mongolian-Chinese machine translation method based on neural network Turing machine

Mongolian-Chinese machine translation method based on neural network Turing machine

Owner:INNER MONGOLIA UNIV OF TECH

Chinese-Vietnamese parallel sentence pair extraction method based on cross-language bilingual pre-training and Bi-LSTM

PendingCN112287695AImprove the quality of acquisitionGood effectNatural language translationWeb data indexingNerve networkSentence pair

The invention relates to a Chinese-Vietnamese parallel sentence pair extraction method based on cross-language bilingual pre-training and Bi-LSTM and belongs to the technical field of natural languages. The method comprises the following steps: firstly, collecting Chinese-Vietnamese comparable corpora, and extracting Chinese-Vietnamese parallel sentence pairs from the comparable corpora; adding aChinese-Vietnamese bilingual dictionary and a large number of Chinese-Vietnamese monolanguages in pre-training, performing word alignment by mapping the Chinese-Vietnamese bilingual dictionaries to apublic semantic space, then generating a new dictionary iteratively in a self-learning mode by the Chinese-Vietnamese seed dictionary so that semantic similarity between Chinese-Vietnamese sentences is represented to the maximum extent; then inputting Chinese and Vietnamese sentences obtained after pre-training into a twin neural network composed of Bi-LSTM and CNN, and extracting global featuresand local features of the sentences; and finally, judging whether the input sentence pair is a Chinese-Vietnamese bilingual parallel sentence pair or not by using a full connection layer. Good effectis achieved in an experiment of extracting parallel sentence pairs from comparable corpora.

Chinese-Vietnamese parallel sentence pair extraction method based on cross-language bilingual pre-training and Bi-LSTM

Chinese-Vietnamese parallel sentence pair extraction method based on cross-language bilingual pre-training and Bi-LSTM

Chinese-Vietnamese parallel sentence pair extraction method based on cross-language bilingual pre-training and Bi-LSTM

Owner:KUNMING UNIV OF SCI & TECH

Bilingual corpus sentence alignment method and device, readable storage medium and computer equipment

ActiveCN111259652ASimplify complexityReduce couplingNatural language translationEnergy efficient computingSentence pairEngineering

The invention relates to a bilingual corpus sentence alignment method and device, a computer readable storage medium and a computer device. The method comprises the steps of obtaining language types of a to-be-aligned parallel text and an original text and a translation text; preprocessing the to-be-aligned parallel text to obtain a to-be-aligned parallel sentence pair; calling a monolingual wordsegmentation model corresponding to the language types of the original text and the translation text from a monolingual word segmentation model group trained through a SentencePiece algorithm, and performing word segmentation processing to obtain a sentence segment group of the to-be-aligned parallel text and the sentence segment group of the translation text to be aligned; and performing format processing on the sentence fragment groups of the to-be-aligned original text and the to-be-aligned translated text according to a preset format processing mode to obtain bilingual sentence pair groups, calling a sentence alignment tool, and performing sentence alignment processing on the bilingual sentence pair groups according to the bilingual dictionary to obtain sentence alignment parallel corpora. The monolingual word segmentation models of all languages are trained through the SentencePiece algorithm, so that the coupling degree and maintenance difficulty of codes are reduced, and the maintenance cost is reduced.

Bilingual corpus sentence alignment method and device, readable storage medium and computer equipment

Bilingual corpus sentence alignment method and device, readable storage medium and computer equipment

Bilingual corpus sentence alignment method and device, readable storage medium and computer equipment

Owner:TENCENT TECH (SHENZHEN) CO LTD

Method for processing out-of-set words of Chinese-over-neural machine translation integrated with classification dictionary

ActiveCN110457715AAccurate translationWill not affect the translation effectSpecial data processing applicationsText database clustering/classificationWord listAlgorithm

The invention relates to a method for processing out-of-set words of Chinese-over-neural machine translation integrated with a classification dictionary, and belongs to the technical field of naturallanguage processing. Out-of-set words are classified. Different types of out-of-set words can be processed by adopting different methods. A classification dictionary is built in a targeted manner; wherein a bilingual dictionary is used for solving a translation problem of rare words outside a word list; an entity dictionary is used for solving the problem of inaccurate entity word translation; anda rule dictionary is used for solving the translation problem of numbers, symbols, time, dates and other words. Then, in the preprocessing stage of the model, out-of-set word recognition is performedby querying a classification dictionary. Label replacement is performed on out-of-set words at the encoding end of the model. A translation result with labels is acquired after model translation andtranslation recovery is carried out on the labels by querying the classification dictionary. The classification dictionary is fused into neural machine translation. The out-of-set words can be more accurately translated and the performance and effect of a neural machine translation system are improved.

Method for processing out-of-set words of Chinese-over-neural machine translation integrated with classification dictionary

Method for processing out-of-set words of Chinese-over-neural machine translation integrated with classification dictionary

Method for processing out-of-set words of Chinese-over-neural machine translation integrated with classification dictionary

Owner:KUNMING UNIV OF SCI & TECH

Popular searches

Input method Speech recognition Document preparation Documentation Knowledge base Natural language Spoken Language Ability Semantic analytics User interface Tip of the tongue