Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Word sense disambiguation method and device based on word vector

A word meaning disambiguation and word vector technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as difficulty in expressing semantic information, semantic relationship between words, and data sparseness

Active Publication Date: 2018-08-24
KUNMING UNIV OF SCI & TECH
View PDF4 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Word vectors have long been used in word sense disambiguation tasks. The previous vector representation method: One-HotRepresentation, the length of a word vector represented by this method is the length of the vocabulary, and most positions of the word vector are Zero, only the dimension corresponding to the position of the word in the vocabulary is 1. Obviously, this method is difficult to express the semantic information contained in the word and the semantic relationship between words
In addition, this representation has the problem of data sparsity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word sense disambiguation method and device based on word vector

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] Embodiment 1: This embodiment uses the data in the senseval-3 data set, and the data set includes a training set, a test set and a collection of sense items of all ambiguous words; wherein the training set contains 7860 documents, and the test set contains 3944 documents, each All documents have corresponding ambiguous words, document codes and correct meanings of the ambiguous words in this document; the set of sense items of all ambiguous words contains the codes and meanings of 57 ambiguous words. Now take a document of the ambiguous word "activate" as an example to disambiguate.

[0045]Documents containing the ambiguous word "activate": Do you know what it is , and where I can get one .We suspect you had seen the TerrexAutospade , which is made by WolfTools .It is quite a hefty spade , with bicycle - type handlebars and asprung lever at the rear , which you step on to activate it .Used correctly , you should n't have to bend your back during general digging, althou...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a word sense disambiguation method and device based on a word vector. The method comprises the following steps that: data preprocessing: carrying out processing, including punctuation removal, word segmentation and the like, on a document and a semantic item; training the word vector: using a word vector training tool to train the word vector; carrying out context vector representation: obtaining the word vector, and adopting a local weighting method to calculate the context vector; carrying out semantic item vector representation: obtaining the word vector of each word of the semantic item, and carrying out calculation to obtain a semantic item vector; carrying out similarity calculation: calculating a cosine similarity between the context vector and each semanticitem vector; carrying out semantic item distribution frequency calculation: carrying out statistics on the distribution frequency of each semantic item of an ambiguous term in a dataset; and carryingout final score statistics: calculating the cosine similarity between the context and each semantic item and the comprehensive score of each piece of semantic item frequency, wherein the semantic item with a highest score is an optimal word meaning.

Description

technical field [0001] The present invention relates to a word meaning disambiguation method and device based on word vectors, belonging to the fields of natural language processing (Natural Language Processing), machine translation (Machine Translation), artificial intelligence (Artificial Intelligence) and other fields. Background technique [0002] In recent years, with the development of science and technology, word meaning disambiguation has become increasingly important in natural language processing, machine translation, artificial intelligence and other fields. Word sense disambiguation has become an urgent problem to be solved. [0003] With the popularity of the concept of word sense disambiguation, scholars have proposed solutions to word sense disambiguation. The word sense disambiguation knowledge used in the early days was artificially woven rules, but manually writing the rules was time-consuming and laborious and there was a bottleneck problem of knowledge a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/28
CPCG06F40/289G06F40/58
Inventor 吕晓伟贾连印
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products