Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Short text topic recognition method based on Dirichlet variational auto-encoder

A technology of autoencoder and recognition method, which is applied in the field of short text, can solve the problems of accelerated model training, short text topic model feature sparseness, etc., achieve the effect of simple training, alleviate the problem of topic redundancy, and improve the efficiency of topic recognition

Active Publication Date: 2021-04-02
SUN YAT SEN UNIV
View PDF8 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to overcome the disadvantages of slow training speed, high time complexity, poor scalability, and sparse short text features in the prior art. The present invention provides a short text topic recognition method based on Dirichlet variational autoencoder , speed up model training, solve the problem of sparse features of the short text topic model, and enhance the classification and clustering effect of short text while performing short text topic recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text topic recognition method based on Dirichlet variational auto-encoder
  • Short text topic recognition method based on Dirichlet variational auto-encoder
  • Short text topic recognition method based on Dirichlet variational auto-encoder

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0045] Such as Figure 1 to Figure 2 Shown is the first embodiment of a short text topic recognition method based on Dirichlet variational autoencoder of the present invention. A short text topic recognition method based on Dirichlet variational autoencoder, which includes the following specific steps:

[0046] S1. Preprocess the short text data set, segment words, remove stop words, punctuation marks and numbers, and obtain the text feature vector of the data set;

[0047] S2. Perform clustering based on the text feature vector training obtained by the preprocessing of step S1, and determine the category to which each short text in the short text collection belongs, and this category is used as supplementary feature information of the short text;

[0048]S3. Construct a conditional variational neural topic model based on the text feature vector obtained in step S1 and the supplementary feature information of the short text obtained in step S2, and obtain the document-topic d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a short text topic recognition method based on Dirichlet variational auto-encoder, which comprises the following steps of S1, performing preprocessing, word segmentation, stop word removal, punctuation mark and number removal on a short text data set to obtain a text feature vector of the data set; s2, performing training, performing clustering, determining the category of each short text in the short text set, and taking the category as supplementary feature information of the short text; s3, constructing a conditional variational neural topic model to obtain document-topic distribution and topic-word distribution in the corpus set; and S4, carrying out short text topic recognition to obtain supplementary feature information of the short text as feature representation of the short text, wherein the supplementary feature information is used for text classification and clustering. According to the short text topic recognition method based on the Dirichlet variational auto-encoder, model training is accelerated, the problem of sparse features of a short text topic model is solved, and the classification and clustering effects of short texts are enhanced whileshort text topic recognition is performed.

Description

technical field [0001] The present invention relates to the technical field of short texts, and more specifically, relates to a short text topic recognition method based on a Dirichlet variational autoencoder. Background technique [0002] With the vigorous development of the Internet, the Internet has become an important source of information for people. As the main information carrier, text plays an important role in the dissemination of network information. Many data analysis applications such as Weibo, SMS, and comments involve extracting topic information from short texts, and extracting potential topics is conducive to further analysis, such as sentiment analysis, text classification, and recommendation systems. However, it is difficult for us to directly extract information from short text data due to the small number of characters and random writing characteristics of short text data. [0003] Chinese Patent Publication No. CN107798043A, date of publication June 28...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/216G06N3/04G06N3/08G06K9/62
CPCG06F40/289G06F40/216G06N3/08G06N3/045G06F18/23213G06F18/2415
Inventor 饶洋辉丁诚
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products