Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Semantic item representation and disambiguation method based on word statistics and WordNet

A technology for disambiguation and words, applied in computing, special data processing applications, instruments, etc., can solve problems such as reducing the accuracy of semantic calculations

Pending Publication Date: 2019-12-13
芽米科技(广州)有限公司
View PDF9 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] What the present invention is to solve is that each word can only use a unique word vector to carry out semantic calculations in different language environments, thereby greatly reducing the accuracy of semantic calculations, and providing a semantic representation based on word statistics and WordNet and disambiguation method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic item representation and disambiguation method based on word statistics and WordNet
  • Semantic item representation and disambiguation method based on word statistics and WordNet
  • Semantic item representation and disambiguation method based on word statistics and WordNet

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific examples.

[0042] A semantic item representation and disambiguation method based on word statistics and WordNet, such as figure 1 As shown, it specifically includes the following steps:

[0043] Firstly, the offline page file of Wikipedia is obtained, and then the illegal characters in it are converted into spaces, the image table is deleted and only the title is retained, the link is retained in the text, and finally the plain text containing a-z (A-Z range is converted to lowercase) and numbers is left. After the cleaning is completed, the co-occurrence matrix is ​​generated by the word statistical model and the corresponding word vector is obtained therefrom, and the initial word vector is finally formed as the input of the semantic item generation model. It is to use the synset...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a semantic item representation and disambiguation method based on word statistics and WordNet. A word meaning item set and a synonym set which are sorted in WordNet and widelyaccepted internationally are used as priori knowledge. The invention provides a semantic item vector generation method based on Wikipedia word statistics. Based on Wikipedia as a corpus, a word statistical model is trained to obtain a preliminary word vector. Then, semantic information of word statistics vector dimension words is fully utilized; the word vectors of the WordNet synonyms are combined to form semantic item vectors of the words. Meanwhile, the semantic calculation precision of words in different language environments can be improved, semantic item vectors can be reasonably and accurately used in practical application, and the method can be widely applied to various semantic calculation occasions of natural language processing.

Description

technical field [0001] The invention relates to the field of natural language understanding in artificial intelligence, in particular to a method for representing and disambiguating meanings based on word statistics and WordNet. Background technique [0002] At present, the development of deep learning technology in the field of artificial intelligence is advancing by leaps and bounds. It not only performs well in the field of images, but also is widely used in natural language processing. With the combination of deep neural networks and natural language processing, word vectors have also been proposed. It aims to solve the vector representation of natural language in neural network, convert words into non-dense vectors, and for similar words, their corresponding words are also similar in the vector space. In natural language processing applications, word vectors are input as features of deep learning models. Therefore, the effect of the final model depends largely on the e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F16/33
CPCG06F16/3344
Inventor 朱新华郭青松温海旭陈宏朝
Owner 芽米科技(广州)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products