Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

61 results about "Bilingual dictionary" patented technology

A bilingual dictionary or translation dictionary is a specialized dictionary used to translate words or phrases from one language to another. Bilingual dictionaries can be unidirectional, meaning that they list the meanings of words of one language in another, or can be bidirectional, allowing translation to and from both languages. Bidirectional bilingual dictionaries usually consist of two sections, each listing words and phrases of one language alphabetically along with their translation. In addition to the translation, a bilingual dictionary usually indicates the part of speech, gender, verb type, declension model and other grammatical clues to help a non-native speaker use the word. Other features sometimes present in bilingual dictionaries are lists of phrases, usage and style guides, verb tables, maps and grammar references. In contrast to the bilingual dictionary, a monolingual dictionary defines words and phrases instead of translating them.

Multiclass emotion analyzing method and system facing bilingual microblog text

The invention relates to a multiclass emotion analyzing method and a system facing a bilingual microblog text and belongs to the technical field of microblog text emotion analysis. The method comprises the following steps that (1) bilingual dictionary construction: corpus with an emotion inclination of a certain size is first collected, high frequent words with the emotion inclination can be extracted from the corpus, an emotional dictionary is then expanded by using an existing knowledge database and a vocabulary similarity calculating model, and finally network language and emotional signs are added in the emotional dictionary; (2) text pretreatment: the words are divided in a to-be-identified text, stop words are removed, and standardization treatment is conducted on English word shapes; (3) text characteristic space expression: the bilingual emotional dictionary is used for conducting vectorization on the text; (4) an emotional identifying task of the corpus text is realized through a multi emotion class model. The accurate rate and the F1 valve of the method are higher than those of a traditional classification method, and particularly the classification effect of a semi-supervised Gaussian mixture model classification algorithm in a small-scale training set is obviously better than that of the other methods.
Owner:BEIJING INSTITUTE OF TECHNOLOGYGY

Method for extracting chapter-level parallel phrase pair of comparable corpus based on parallel corpus training

The invention discloses a method for extracting a chapter-level parallel phrase pair of a comparable corpus based on parallel corpus training and relates to a method for extracting the parallel phrase pair of the comparable corpus. The method solves the problems that acquisition of a parallel corpus needs high expenditure, and when two most similar contextual words or fragments are mutually translated and applied to the comparable corpus, serious dependency to a bilingual dictionary is caused. The method comprises the following steps of 1, providing a source language sentence set S and a target language sentence set T; 2, obtaining a phrase pair set of the parallel corpus; 3, obtaining a parallel phrase pair of the parallel corpus; 4, obtaining a non-parallel phrase pair of the parallel corpus; 5, obtaining a binary classifier of a support vector machine; 6, extracting a candidate parallel phrase pair <s, t>; 7, obtaining the parallel phrase pair containing a noise in the comparable corpus; 8, obtaining the parallel phrase pair of the comparable corpus; 9, obtaining an extension decoder. The method is applied to the field of extraction of the parallel phrase pair of the comparable corpus.
Owner:哈尔滨工业大学高新技术开发总公司

Establishing device and method for multilingual dictionary

The invention provides a multilingual dictionary establishing device which may comprises a monolingual dictionary module, a keyword extraction module, a bilingual dictionary module and a translation confirmation module. The monolingual dictionary module selects words from a preset monolingual dictionary and obtains paraphrases of each sense corresponding to the words; the keyword extraction module extracts key words from the paraphrases; the bilingual dictionary module inquires out translation words of the words from a preset bilingual dictionary, wherein one language of the bilingual dictionary is the same with the language of the monolingual dictionary; and the translation confirmation module calculates similarities of the translation words with the words and the key words so as to select the final translation words corresponding to each sense for the words from the translation words and generate the multilingual dictionary. The invention further provides a multilingual dictionary establishing method. According to the device and the method, the multilingual dictionary can be established automatically, manpower and material resources for dictionary establishment are saved, the accuracy of the generated multilingual dictionary is guaranteed, and the compilation of the multilingual dictionary can be finished based on ordinary monolingual and bilingual dictionaries.
Owner:FUJITSU LTD

Human body behavior identification method based on thematic knowledge transfer

The invention discloses a human body behavior identification method based on thematic knowledge transfer. The method comprises the steps that a bilingual dictionary at a training visual angle and a test visual angle is built, wherein the bilingual dictionary is used for transforming low-layer features of the same action at the two visual angles to the same representation; three steps of low-layer feature extraction, middle-layer feature extraction and bilingual dictionary obtaining are included; all action videos at the training angle are adopted, lower-layer features of different actions at the training angle are transformed to representations respectively through the bilingual dictionary, and classified models recognizing the different actions are trained; test action videos at the test visual angle are adopted, lower-layer features of actions at the test angle are transformed to representations through the bilingual dictionary, and reorganization results of the actions are obtained through the classified models. The human body behavior identification method based on thematic knowledge transfer significantly improves the recognition rate of human body behaviors at the crossed visual angles, has high robustness for change of the visual angles, and has significant value in video monitoring.
Owner:NANJING UNIV OF POSTS & TELECOMM

Method for interactively extracting comparable corpus and bilingual dictionary and device thereof

The invention relates to a method for interactively extracting a comparable corpus and a bilingual dictionary and a device thereof, and aims to overcome the defects of difficulty in identifying the comparable corpus under the condition of insufficient domain seed bilingual dictionary scale and difficulty in extracting inter-translation vocabulary under the condition of different comparable degrees. The method comprises the following steps: performing word characteristic reduction, word segmentation and stop word removing on a document to obtain a preprocessed document set and a vocabulary set; constructing relations between a source language document and a target language document, between source language vocabulary and target language vocabulary and between a bilingual vocabulary pair and a bilingual document pair respectively; iterating, enhancing and calculating the weights of the bilingual document pair and the bilingual vocabulary pair; selecting a bilingual document pair of which the weight is the largest for constructing the comparable corpus, and selecting a bilingual vocabulary pair of which the weight is the largest for constructing the bilingual dictionary. The judgment that similarity among different language vocabularies is facilitated through the similarity among different language documents is performed, the similarity among different language documents is increased through the similarity among different language vocabularies, and synchronous extraction of the comparable corpus and the bilingual dictionary is realized through interactive iteration and enhancement.
Owner:HEFEI INSTITUTES OF PHYSICAL SCIENCE - CHINESE ACAD OF SCI

Vietnamese event entity recognition method fusing dictionary and adversarial migration

ActiveCN112926324AImprove entity recognitionRich semantic representationNatural language translationSemantic analysisSemantic representationData set
The invention relates to a Vietnamese event entity recognition method fusing a dictionary and adversarial migration. The Vietnamese is taken as the target language, the English and the Chinese are respectively taken as the source languages, and the entity identification effect of the target language is improved by utilizing the entity labeling information of the source languages and the bilingual dictionary. According to the method, firstly, semantic space sharing of a source language and a target language is achieved through word-level adversarial migration, then multi-granularity feature embedding is conducted by fusing a bilingual dictionary to enrich semantic representation of target language words, then sequence features irrelevant to languages are extracted through sentence-level adversarial migration, and finally an entity recognition result is marked through CRF. Experimental results on a Vietnamese news data set show that under the condition that source languages are English and Chinese, compared with a monolingual entity recognition model and a current mainstream transfer learning model, the model has the advantages that the entity recognition effect of the provided model is improved, and compared with the monolingual entity recognition model, the model has the advantages that F1 values are increased by 19.61 and 18.73 respectively.
Owner:KUNMING UNIV OF SCI & TECH

Old-Chinese bilingual corpus construction method and device with Thai language as pivot

The invention relates to an old-Chinese bilingual corpus construction method and device taking Thai language as a pivot, and belongs to the field of natural language processing. The method comprises the steps of firstly performing Thai word segmentation processing on Chinese-Thai parallel corpus data; constructing a Lao-Thai bilingual dictionary, and translating Thai sentences into Lao sentence sequences word by word by using the Lao-Thai bilingual dictionary to obtain candidate Lao-Thai parallel sentence pairs; constructing a two-way LSTM-based Lao language-Thai language parallel sentence pair classification model, and classifying the candidate Lao language-Thai language parallel sentence pairs to obtain Lao language-Thai language bilingual parallel sentence pairs; using the Thai languageas a pivot language to match the Lao language and the Chinese language, and a Lao language-Chinese bilingual parallel corpus is constructed. According to the old-Chinese bilingual parallel corpus construction device taking Thai language as pivot language, the problem of scarcity of old language-Chinese corpus is solved, and the old-Chinese bilingual parallel corpus construction device has certaintheoretical significance and practical application value for construction of the old-Chinese bilingual corpus.
Owner:KUNMING UNIV OF SCI & TECH

Chinese-Vietnamese parallel sentence pair extraction method based on cross-language bilingual pre-training and Bi-LSTM

The invention relates to a Chinese-Vietnamese parallel sentence pair extraction method based on cross-language bilingual pre-training and Bi-LSTM and belongs to the technical field of natural languages. The method comprises the following steps: firstly, collecting Chinese-Vietnamese comparable corpora, and extracting Chinese-Vietnamese parallel sentence pairs from the comparable corpora; adding aChinese-Vietnamese bilingual dictionary and a large number of Chinese-Vietnamese monolanguages in pre-training, performing word alignment by mapping the Chinese-Vietnamese bilingual dictionaries to apublic semantic space, then generating a new dictionary iteratively in a self-learning mode by the Chinese-Vietnamese seed dictionary so that semantic similarity between Chinese-Vietnamese sentences is represented to the maximum extent; then inputting Chinese and Vietnamese sentences obtained after pre-training into a twin neural network composed of Bi-LSTM and CNN, and extracting global featuresand local features of the sentences; and finally, judging whether the input sentence pair is a Chinese-Vietnamese bilingual parallel sentence pair or not by using a full connection layer. Good effectis achieved in an experiment of extracting parallel sentence pairs from comparable corpora.
Owner:KUNMING UNIV OF SCI & TECH

Bilingual corpus sentence alignment method and device, readable storage medium and computer equipment

The invention relates to a bilingual corpus sentence alignment method and device, a computer readable storage medium and a computer device. The method comprises the steps of obtaining language types of a to-be-aligned parallel text and an original text and a translation text; preprocessing the to-be-aligned parallel text to obtain a to-be-aligned parallel sentence pair; calling a monolingual wordsegmentation model corresponding to the language types of the original text and the translation text from a monolingual word segmentation model group trained through a SentencePiece algorithm, and performing word segmentation processing to obtain a sentence segment group of the to-be-aligned parallel text and the sentence segment group of the translation text to be aligned; and performing format processing on the sentence fragment groups of the to-be-aligned original text and the to-be-aligned translated text according to a preset format processing mode to obtain bilingual sentence pair groups, calling a sentence alignment tool, and performing sentence alignment processing on the bilingual sentence pair groups according to the bilingual dictionary to obtain sentence alignment parallel corpora. The monolingual word segmentation models of all languages are trained through the SentencePiece algorithm, so that the coupling degree and maintenance difficulty of codes are reduced, and the maintenance cost is reduced.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Method for processing out-of-set words of Chinese-over-neural machine translation integrated with classification dictionary

The invention relates to a method for processing out-of-set words of Chinese-over-neural machine translation integrated with a classification dictionary, and belongs to the technical field of naturallanguage processing. Out-of-set words are classified. Different types of out-of-set words can be processed by adopting different methods. A classification dictionary is built in a targeted manner; wherein a bilingual dictionary is used for solving a translation problem of rare words outside a word list; an entity dictionary is used for solving the problem of inaccurate entity word translation; anda rule dictionary is used for solving the translation problem of numbers, symbols, time, dates and other words. Then, in the preprocessing stage of the model, out-of-set word recognition is performedby querying a classification dictionary. Label replacement is performed on out-of-set words at the encoding end of the model. A translation result with labels is acquired after model translation andtranslation recovery is carried out on the labels by querying the classification dictionary. The classification dictionary is fused into neural machine translation. The out-of-set words can be more accurately translated and the performance and effect of a neural machine translation system are improved.
Owner:KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products