Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Short text topic identification method and system

A recognition method and short text technology, applied in the field of data processing, can solve the problem of word sparseness in short texts, and achieve the effects of improving accuracy, improving subject recognition efficiency, and alleviating the problem of sparsity.

Active Publication Date: 2019-07-23
HEFEI UNIV OF TECH
View PDF14 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Aiming at the deficiencies of the prior art, the present invention provides a short text topic recognition method and system, which solves the problem of sparse co-occurrence of short text words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text topic identification method and system
  • Short text topic identification method and system
  • Short text topic identification method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described. Obviously, the described embodiments are part of the embodiments of the present invention, not all of them. example. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0063] The embodiment of the present invention solves the problem of sparse co-occurrence of short text words by providing a short text topic recognition method and system, and realizes more accurate clustering of short text data sets to be processed.

[0064]The technical solution in the embodiment of the present invention is to solve the above-mentioned technical problems, and the general idea is as follows:

[0065] The embodim...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a short text topic identification method and system, and relates to the technical field of data processing. The method comprises the following steps of S1, obtaining a first corpus set and a second corpus set, wherein the first corpus set is a short text data set to be processed, and the second corpus set is an auxiliary corpus set; S2, obtaining a hidden feature vector based on words on the second corpus set, and constructing a Dirichlet process hybrid model based on the first corpus set; S3, constructing a non-parameter theme model based on the implicit feature vectorand the Dirichlet process hybrid model; S4, performing parameter inference on topic posterior distribution of the non-parameter topic model; S5, inferring and identifying the number of topics in the first corpus set based on the parameters, and obtaining the document-topic distribution and the topic-word distribution in the first corpus set at the same time. According to the method, the Dirichletprocess hybrid model and the implicit feature vector representation of the introduced words are constructed, so that the sparsity problem can be effectively relieved, and the accuracy of short text topic identification is improved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a short text topic recognition method and system. Background technique [0002] With the rapid development of Internet technology, more and more people express their views or opinions through various network platforms. For example, users can post movie reviews or drama reviews on movies or TV dramas through websites that introduce movies and TV dramas, and can also post commodity reviews on purchased or used products through online shopping platforms, and can also provide services or applications through feedback channels. Operators put forward opinions and suggestions, etc. Since most of these comments are only fragmentary descriptions and include less text content, they can be regarded as short text data. [0003] In recent years, experts and scholars at home and abroad have carried out in-depth research on short text topic recognition algorithms, and proposed many sh...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/33G06F16/35G06F17/27
CPCG06F16/3344G06F16/35G06F40/289
Inventor 刘业政钱洋陶丹丹姜元春毕文亮孙见山孙春华陈夏雨凌海峰
Owner HEFEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products