Domain entity disambiguation method for fusing word vectors and topic model

A topic model and word vector technology, applied in the field of natural language processing and deep learning, can solve the problem of not being able to distinguish between different meanings of polysemy

Active Publication Date: 2018-03-30
KUNMING UNIV OF SCI & TECH
View PDF6 Cites 97 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention provides a domain entity disambiguation method that combines word vectors and topic models to solve the existing entity disambiguation method. Using the Skip-gram word vector calculation model can only calculate a mixed polysemous word when dealing with polysemous words. Semantic word vectors, unable to distinguish the different meanings of polysemous words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Domain entity disambiguation method for fusing word vectors and topic model
  • Domain entity disambiguation method for fusing word vectors and topic model
  • Domain entity disambiguation method for fusing word vectors and topic model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0065] Embodiment 1: as Figure 1-4 As shown, a domain entity disambiguation method that integrates word vectors and topic models, the specific steps of the method are as follows:

[0066] Step1. First, use Word2vec to train the word vector model on the encyclopedia corpus in the field of tourism;

[0067] The concrete steps of described step Step1 are:

[0068] Step1.1. From the Chinese offline database of Wikipedia, extract the page information under the tourism category, extract the summary information of the page, and save it in the text;

[0069] Step1.2. Manually write a crawler program to crawl text information in the tourism field from travel websites and encyclopedia entries, and combine it with Wikipedia texts;

[0070] The present invention considers that the positions and tags to be crawled in the crawler program are different due to different webpage structures, and there is no ready-made program, so programs need to be written for different tasks of crawling. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a domain entity disambiguation method for fusing word vectors and a topic model, and belongs to the technical field of natural language processing and deep learning. The method comprises the steps of obtaining candidate entity sets of to-be-disambiguated entities; obtaining vector forms of the to-be-disambiguated entities and candidate entities, obtaining categorical referents of the to-be-disambiguated entities in combination with a hyponymy relation domain knowledge base, and performing context similarity and categorical referent similarity calculation; performing word vector training on documents in different topic classifications by utilizing the LDA topic model and a Skip-gram word vector model, obtaining word vector representations of different meanings of apolysemous word, extracting a topic domain keyword of a text by using a K-Means algorithm, and performing domain topic keyword similarity calculation; and finally, fusing three feature similarities, and taking the candidate entity with the highest similarity as a final target entity. The method is superior to a conventional disambiguation method and can well meet the demands of actual applications.

Description

technical field [0001] The invention relates to a domain entity disambiguation method that combines word vectors and topic models, and belongs to the technical fields of natural language processing and deep learning. Background technique [0002] Entity disambiguation is one of the important tasks in the field of natural language processing. This task aims to eliminate semantic ambiguity by clarifying the meanings of polysemous words in the text, and provides help for humans and computers to better understand natural language information. The task of entity disambiguation For general texts, such as news, web pages, etc., Wanwang uses a corpus composed of texts in multiple fields. However, in practical applications, it is often necessary to disambiguate texts in a specific field. This is not only important for the mining of domain knowledge It is of great significance, and it is helpful to the construction of domain knowledge base, automatic translation of professional litera...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/951G06F40/295G06F40/30
Inventor 郭剑毅马晓军余正涛陈玮张志坤
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products