Biterm topic model (BTM) sampling acceleration method

A topic model and topic technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as unsatisfactory user needs and time-consuming, and achieve optimal long text topic mining time, optimized time, The effect of improving the convergence speed and clustering time

Active Publication Date: 2017-05-31
TSINGHUA UNIV
View PDF2 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

From this, it can be seen that Gibbs sampling is very time-consuming, especial

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Biterm topic model (BTM) sampling acceleration method
  • Biterm topic model (BTM) sampling acceleration method
  • Biterm topic model (BTM) sampling acceleration method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, only used to explain the present invention, and should not be construed as a limitation of the present invention.

[0028] In the description of the present invention, it should be understood that the terms "center", "portrait", "horizontal", "top", "bottom", "front", "rear", "left", "right", " The orientation or positional relationship indicated by vertical, horizontal, top, bottom, inner, outer, etc. is based on the orientation or positional relationship shown in the drawings, and is only for the convenience of describing the present invention and The description is simplified rather than indicating or implying that the dev...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Biterm topic model (BTM) sampling acceleration method. The method includes: establishing an alias table for each term, and selecting one Biterm topic model; sampling one new topic for the Biterm from a corpus proposal and calculating probability of acceptance; judging whether the probability of acceptance is greater than r or not, if yes, updating the Biterm, or otherwise, performing no updating; sampling another new topic for the Biterm topic model from a word proposal and calculating probability of acceptance; judging whether the probability of acceptance is greater than r or not, if yes, updating the Biterm topic model, or otherwise, performing no updating. With the method, complexity of sampling time of BTM can be optimized, convergence rate of the BTM can be greatly increased, quality of final topic clustering is unaffected, time for essay topic mining can be optimized, and meanwhile, time for text topic mining can be optimized as well.

Description

technical field [0001] The invention relates to the technical field of software engineering based on component objects of computer programs, in particular to a sampling acceleration method of a Biterm topic model. Background technique [0002] With the popularity of social networks, such as Weibo and Twitter, topic mining of short texts is becoming more and more important. Biterm topic model (BTM) is a topic model, such as figure 1 As shown in (a), it is different from traditional topic models, such as LDA (Latent Dirichlet Allocation, document topic generation model), such as figure 1 (b). BTM is suitable for both short texts and long texts, while traditional topic models are severely affected by the sparseness of short text feature items, so they are generally only suitable for long texts, but many researchers use these traditional topic models for short texts, The main methods adopted are to use external knowledge to enrich short texts, or to aggregate short texts into...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/205G06F40/268
Inventor 徐华贺星伟邓俊辉孙晓民
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products