Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Short text topic modeling method based on word semantic similarity

A technology of semantic similarity and topic modeling, which is applied in the computer field and can solve the problems that traditional topic models are difficult to achieve good results.

Active Publication Date: 2016-09-21
WUHAN UNIV
View PDF2 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, in 2003, Bei. published an article titled "Latent Dirichlet Allocation" in the Journal of Machine Learning Research, which is often used in the analysis of traditional texts. However, due to the sparseness of short texts, it is difficult for traditional topic models to achieve better results in short texts. Effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text topic modeling method based on word semantic similarity
  • Short text topic modeling method based on word semantic similarity
  • Short text topic modeling method based on word semantic similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] In order to facilitate those of ordinary skill in the art to understand and implement the present invention, the present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the implementation examples described here are only used to illustrate and explain the present invention, and are not intended to limit this invention.

[0056] The invention provides a short text topic modeling method based on word semantic similarity, which utilizes the semantic information of an external corpus knowledge base, and greatly enhances the sparsity of word co-occurrence in short texts. The present invention solves the difficulties that traditional topic models often encounter on short text data sets to a certain extent. The model of the present invention proposes a method for extracting topic patterns from short text collections, the model sets a topic variable for each short text, and this vari...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a short text topic modeling method based on word semantic similarity. The method comprises: according to word semantic similarity provided by external, establishing a similar word set of short text centralized words; determining the number of topics used in modeling; randomly distributing the topic of each short text; through a Gibbs sampling process, iteratively determining the topic of each short text and the distribution of the words in the topic; according to a final distribution result of the above variable, feeding back the word distribution under each topic and the topic associated to each short text. The method preferably solves problems of sparse information contents of short texts and unclear semantic expression. According to the model result, short texts can be preferably expressed as topic vectors, and the topic vectors are used as final feature vectors of a short essay. The topic vector-based expression has good semantic interpretability, and can be used as algorithm basis of various applications. The method can be widely applied in various short text data, and has wide actual meaning and commercial values.

Description

technical field [0001] The invention belongs to the field of computer technology, and relates to a method for text mining and topic modeling, in particular to a method of using external word semantic similarity information to strengthen the correlation between words in short texts, thereby strengthening topic building on short texts. method of modeling effects. Background technique [0002] With the rise of the Internet, we have gradually entered the era of big data. At the same time, short text has become an increasingly popular text data on the Internet. The common ones are web page summaries, news headlines, text advertisements, microblogs, circle of friends updates, etc. How to build an efficient topic model and mine potential semantic information from a large number of short text datasets has become the cornerstone of many applications. Through the topic model, we can obtain a computer-understandable formal expression of short texts, which can be applied to many basi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/247G06F40/30
Inventor 李晨亮王浩然张芷芊孙爱欣
Owner WUHAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products