Vertical Domain Entity Disambiguation Method Fused with Topic Model and Convolutional Neural Network

A convolutional neural network and topic model technology, applied in biological neural network models, neural architecture, character and pattern recognition, etc., can solve the problems of referential error, unable to consider lexical and word order, difficult to reflect contextual semantic influence and constraints, etc. , to improve the accuracy, optimize the complexity of text processing, and improve the accuracy of disambiguation

Active Publication Date: 2021-12-07
ZHEJIANG UNIV OF TECH
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

First of all, mainstream methods use entity surface features, popularity features, etc. These features only calculate the similarity between referring items and candidate items from the global context information of the document, ignoring the topical information that is locally obvious in the text, and grasping the topic of referring items There must be a large error; secondly, the bottom layer of the mainstream disambiguation model is mostly based on the bag of words model, which cannot consider the problem of grammar and word order, and it is difficult to reflect the influence and constraints of contextual semantics on entities. As a result, these models cannot make full use of the context to effectively extract semantics feature

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Vertical Domain Entity Disambiguation Method Fused with Topic Model and Convolutional Neural Network
  • Vertical Domain Entity Disambiguation Method Fused with Topic Model and Convolutional Neural Network
  • Vertical Domain Entity Disambiguation Method Fused with Topic Model and Convolutional Neural Network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0048]In order to effectively overcome the disadvantages of traditional disambiguation methods, the present invention adopts a multi-model fusion method in order to effectively extract text features and improve the accuracy of disambiguation results. At present, the word vector model in natural language processing maps each word to a high-latitude vector through corpus training, which can contain more information than the bag of words model. The LDA topic model extracts the features of the local information of the context, and the topic features obtained have a clearer topic direction than the global one, which can effectively disambiguate the domain. In recent years, the convolutional neural network model, which has been widely used, has achieved great success in the field of natural language processing. The multi-layer convolution operation can effectively ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A vertical domain entity disambiguation method integrating topic models and convolutional neural networks, including: 1. Building a domain knowledge base; 2. Training the preprocessed data set for word vector model, and building a corresponding dictionary; 3. Extracting The name of the entity to be disambiguated, and the candidate entity set corresponding to the entity is determined from the domain knowledge base, and the entity context information is represented by word vectorization; 4. Use the manual annotated training set corpus to construct the key representing the entity subject by using the thesaurus The word dictionary is used as input to train and save the topic model; 5. Use the manually annotated data set as training set and validation set for training, optimize the model parameters and save the CNN model; 6. According to the similarity of the topic features obtained in steps 4 and 5 The degree Sim1 and the semantic feature similarity Sim2 use the weight normalization operation to fuse the two feature similarities in an optimal way; 7. The candidate entity corresponding to the fusion similarity with the largest value is the final disambiguation entity.

Description

technical field [0001] The invention belongs to the technical field of computer data processing, and in particular relates to a vertical domain entity disambiguation method. Background technique [0002] In the Internet age, information is exploding. In the face of massive amounts of information, cutting-edge AI technology can associate text with massive physical information, improve user reading fluency, and achieve precise user experience improvements. Intelligent information processing not only provides intelligent services for specific fields, but also provides more room for innovation. [0003] Entity disambiguation is the core of natural language processing. Its essence is that a word in a sentence may have multiple meanings, and the exact meaning it expresses needs to be determined through context and knowledge base related knowledge. The full name of a company in a specific field is deterministic, but in texts such as information, research reports, and Q&As, the co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/30G06F40/242G06F40/295G06K9/62G06N3/04
CPCG06F40/30G06F40/242G06F40/295G06N3/045G06F18/22G06F18/214
Inventor 王万良胡明志赵燕伟陈嘉诚尹晶王铁军
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products