Sampling acceleration method for biterm topic model

A topic model and topic technology, applied in the sampling acceleration field of Biterm topic model, can solve problems such as time-consuming and unable to meet user needs, achieve optimal time, optimize long text topic mining time, and optimize sampling time complexity Effect

Active Publication Date: 2019-05-31
TSINGHUA UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

From this, it can be seen that Gibbs sampling is very time-consuming, especially when K and the data set are very large, Gibb cannot meet the needs of users

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sampling acceleration method for biterm topic model
  • Sampling acceleration method for biterm topic model
  • Sampling acceleration method for biterm topic model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

[0028] In describing the present invention, it should be understood that the terms "center", "longitudinal", "transverse", "upper", "lower", "front", "rear", "left", "right", " The orientations or positional relationships indicated by "vertical", "horizontal", "top", "bottom", "inner" and "outer" are based on the orientations or positional relationships shown in the drawings, and are only for the convenience of describing the present invention and Simplified descriptions, rather than indicating or implying that the device or element refe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Biterm topic model (BTM) sampling acceleration method. The method includes: establishing an alias table for each term, and selecting one Biterm topic model; sampling one new topic for the Biterm from a corpus proposal and calculating probability of acceptance; judging whether the probability of acceptance is greater than r or not, if yes, updating the Biterm, or otherwise, performing no updating; sampling another new topic for the Biterm topic model from a word proposal and calculating probability of acceptance; judging whether the probability of acceptance is greater than r or not, if yes, updating the Biterm topic model, or otherwise, performing no updating. With the method, complexity of sampling time of BTM can be optimized, convergence rate of the BTM can be greatly increased, quality of final topic clustering is unaffected, time for essay topic mining can be optimized, and meanwhile, time for text topic mining can be optimized as well.

Description

technical field [0001] The invention relates to the technical field of software engineering based on component objects of computer programs, in particular to a sampling acceleration method of a Biterm topic model. Background technique [0002] With the popularity of social networks, such as Weibo and Twitter, topic mining of short texts is becoming more and more important. Biterm topic model (BTM) is a topic model, such as figure 1 As shown in (a), it is different from traditional topic models, such as LDA (Latent Dirichlet Allocation, document topic generation model), such as figure 1 (b) shown. BTM is suitable for both short texts and long texts, while traditional topic models are severely affected by the sparseness of short text feature items, so they are generally only suitable for long texts, but many researchers use these traditional topic models for short texts, The main methods adopted are to use external knowledge to enrich short texts, or to aggregate short text...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27
CPCG06F40/205G06F40/268
Inventor 徐华贺星伟邓俊辉孙晓民
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products