Chinese-English cross-lingual lexical representation learning method and system based on paraphrase primitives

A learning method and primitive word technology, applied in natural language translation, semantic analysis, special data processing applications, etc., can solve the problems of low accuracy of representation learning technology and inability to accurately express the semantic information of words and so on.

Active Publication Date: 2019-03-01
IOL WUHAN INFORMATION TECH CO LTD
View PDF7 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The technical problem to be solved by the present invention is to provide a Chinese-English cross-language vocabul

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese-English cross-lingual lexical representation learning method and system based on paraphrase primitives

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] In order to have a clearer understanding of the technical features, purposes and effects of the present invention, the specific implementation manners of the present invention will now be described in detail with reference to the accompanying drawings.

[0028] A Chinese-English cross-lingual vocabulary representation learning method based on paraphrasing primitive words, such as figure 1 shown, including the following five steps:

[0029] Step 1, extract Chinese paraphrase primitive words: decompose all paraphrases in the preset Chinese dictionary (referring to the definition sentences explaining all words in the Chinese dictionary), use Zhang Jin, Huang Changning "obtain definition primitive language from monolingual dictionary" A method" in the method, the Chinese vocabulary (referring to all the words that have appeared in the Chinese dictionary, including the words that are explained in the Chinese dictionary and the words that only appear in the explanation senten...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese-English cross-language vocabulary representation learning method and system based on paraphrasing primitive words, which represents the vocabulary of Chinese and English languages in vector form in the same vector space, and obtains more accurate word embedding combined with semantic information. Firstly, the set of paraphrasing primitives is obtained by processing the paraphrasing relations in Chinese dictionaries, so that the words in the set of paraphrasing primitives can cover all the lexical semantics in dictionaries. Secondly, all the words in Chinese dictionaries and English dictionaries are represented by the vectorized expressions of the paraphrasing primitives. The vectorized expressions of the paraphrasing primitives are used to express all thewords in the dictionaries. Finally, combining with the context and semantic relationship of Chinese and English corpus, we set certain weights on the expression of paraphrase primitives in vocabularyto obtain more accurate semantic relational word embedding. Compared with the existing word embedding, the invention has the advantages of high word embedding accuracy, strong expansion ability, convenient realization and the like, and can better serve the subsequent natural language processing tasks.

Description

technical field [0001] The invention specifically relates to a Chinese-English cross-lingual vocabulary representation learning method and system based on paraphrase primitive words. Background technique [0002] Word embedding refers to the use of distributed vectors to represent the semantic information of words. By mapping the vocabulary in natural language into a low-dimensional, dense vector, the words are in the same vector space, and the concept of "distance" is introduced to measure the semantic similarity between words, which is helpful to obtain more semantic information. is a fully expressive vector representation. At present, various deep learning-based natural language processing is mostly based on word embedding representation. [0003] There have been many achievements in the world in the study of core words in dictionary interpretation. For example, "English Teaching Dictionary" (4th edition) edited by West et al. selected 1409 words to explain 24000 items...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/28
CPCG06F40/30G06F40/40
Inventor 梁庆中姚宏李兵郑坤刘超董理君
Owner IOL WUHAN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products