Context similarity calculation-based word sense disambiguation method

A technology of similarity calculation and word sense disambiguation, which is applied in the field of word sense disambiguation based on contextual similarity calculation, can solve problems such as poor disambiguation accuracy rate, and achieve the effect of improving disambiguation accuracy rate

Active Publication Date: 2018-03-27
SHENYANG AEROSPACE UNIVERSITY
View PDF3 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Aiming at the deficiencies in the word sense disambiguation in the prior art that regard different parts of speech of a word as one point for modeling, resulting in poor disambiguation accuracy, the problem to be solved by the present invention is to provide a method that can use different parts of speech of the same word A word sense disambiguation method based on contextual similarity calculation that distinguishes well and improves disambiguation accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Context similarity calculation-based word sense disambiguation method
  • Context similarity calculation-based word sense disambiguation method
  • Context similarity calculation-based word sense disambiguation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention will be further elaborated below in conjunction with the accompanying drawings of the description.

[0024] A kind of word meaning disambiguation method based on context similarity calculation of the present invention comprises the following steps:

[0025] 1) Process the training corpus and use the part-of-speech tagged version of ukWaC to train the model;

[0026] 2) Screen the part of speech and only keep content words, including nouns, adjectives, adverbs, and verbs;

[0027] 3) Train the two-way LSTM model with the corpus that has been screened for parts of speech;

[0028] 4) Input the example sentence of the word to be disambiguated into the bidirectional LSTM model to obtain the context vector;

[0029] 5) Input the context of the word to be disambiguated into the bidirectional LSTM model to obtain the context vector of the word to be disambiguated;

[0030] 6) Calculate the cosine similarity between the context vector of the word to be ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a context similarity calculation-based word sense disambiguation method. The method comprises the steps of processing training corpora, and training a model by using a part-of-speech tagging version of ukWaC; screening parts of speech, and only reserving notional words including nouns, adjectives, adverbs and verbs; training a bidirectional LSTM model by using the corporasubjected to part-of-speech screening; inputting example sentences of to-be-disambiguated words to the bidirectional LSTM model to obtain context vectors; inputting contexts of the to-be-disambiguatedwords to the bidirectional LSTM model to obtain context vectors of the to-be-disambiguated words; and calculating cosine similarity for the context vectors of the to-be-disambiguated words and the context vectors of the example sentences, and further selecting semanteme of the to-be-disambiguated words by utilizing a k-neighbor method according to an obtained similarity result. According to the method, the semanteme is better modeled; the words and the parts of speech are combined by using an underline behind the words directly; obtained word vectors well distinguish different parts of speechof the same word; and the disambiguation accuracy is improved by 0.5% on an experimental basis of baselines.

Description

technical field [0001] The invention relates to a natural language translation technology, in particular to a word meaning disambiguation method based on context similarity calculation. Background technique [0002] Word sense disambiguation, or WSD for short, is a problem with a long history and a wide range of applications. At present, it can be divided into three categories: supervised methods, unsupervised methods and knowledge-based methods. Although published supervised word sense disambiguation systems perform well when given large-scale training corpora with specific semantics, the lack of large-scale annotated corpora is the main problem. Using pre-trained word vectors can solve this problem to some extent. Because the word vectors pre-trained on a large-scale corpus contain more semantic and grammatical information, using it to train a supervised system will improve performance. In order to infer the meaning of words in a sentence, both the target word and the c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/284G06F40/30
Inventor 周俏丽孟禹光
Owner SHENYANG AEROSPACE UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products