Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Short Text Topic Recognition Method Based on Dirichlet Variational Autoencoder

A self-encoder and recognition method technology, applied in the field of short text, can solve the problems of accelerated model training, sparse short text topic model features, etc., to achieve the effect of improving efficiency, alleviating the problem of topic redundancy, and simple training

Active Publication Date: 2022-06-03
SUN YAT SEN UNIV
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to overcome the disadvantages of slow training speed, high time complexity, poor scalability, and sparse short text features in the prior art. The present invention provides a short text topic recognition method based on Dirichlet variational autoencoder , speed up model training, solve the problem of sparse features of the short text topic model, and enhance the classification and clustering effect of short text while performing short text topic recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Short Text Topic Recognition Method Based on Dirichlet Variational Autoencoder
  • A Short Text Topic Recognition Method Based on Dirichlet Variational Autoencoder
  • A Short Text Topic Recognition Method Based on Dirichlet Variational Autoencoder

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0057]

[0060]

[0061]

[0064]

[0065] where μ is a sample sampled from a uniform distribution.

[0067]

[0071]

[0074]

[0076]

[0080]

[0082]

[0083]

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a short text topic recognition method based on a Dirichlet variational autoencoder, comprising the following steps: S1. Preprocessing the short text data set, word segmentation, removing stop words, punctuation marks and numbers, to obtain data The text feature vector of the set; S2. Training for clustering to determine the category of each short text in the short text set, which is used as the supplementary feature information of the short text; S3. Constructing a conditional variational neural topic model to obtain documents in the corpus Topic distribution and topic-word distribution; S4. Short text topic recognition, obtain supplementary feature information of the short text as the feature representation of the short text, and use it for text classification and clustering. The present invention provides a short text topic recognition method based on Dirichlet variational autoencoder, which accelerates model training, solves the problem of sparse features of short text topic models, and enhances short text topic recognition while performing short text topic recognition. Classification and clustering effect of this book.

Description

A Short Text Topic Recognition Method Based on Dirichlet Variational Autoencoder technical field The present invention relates to the technical field of short text, more specifically, relate to a kind of based on Dirichlet variational autoencoder A short text topic recognition method. Background technique [0002] With the vigorous development of the Internet, the Internet has become an important source for people to obtain information. text as main letter It plays an important role in the dissemination of network information. Many data analysis applications such as Weibo, SMS, and comments involve To extract topic information from short texts, and extracting potential topics is conducive to the next step of analysis, such as sentiment analysis, text This classification, recommendation system, etc. However, due to the small number of words and the random writing characteristics of short text data, it is difficult for us to directly extract data from Extract inform...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/289G06F40/216G06N3/04G06N3/08G06K9/62
CPCG06F40/289G06F40/216G06N3/08G06N3/045G06F18/23213G06F18/2415
Inventor 饶洋辉丁诚
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products