A Topic Modeling Method Based on Selection Units

A technology for topic modeling and selection of units, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of not considering words to express other topics or noise
CN103559193BActive Publication Date: 2016-08-31ZHEJIANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG UNIV
Publication Date
2016-08-31

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a topic modeling method based on a selected cell. The method includes extracting words, segment structures and word features contained in searching results in a database according to a query request; determining topics adopted by modeling; producing each segment structure topic, word topic and binary choice through random allocation; determining the variables through the Gibbs sampling process iteratively; feeding significant documents, words of each topic and capacities for words with various features to express the topic of the located segment structure to users according to final allocating results of the variables. The method has the advantages that topic modeling can be performed on various modal data; implicit structural information of the data is utilized fully, and disadvantages due to strong structural constraints are eliminated; information of correlation between the word features and the segment structural constraints can be provided, and the users are assisted to understand the data; the method has good extensibility and can serve as algorithm basis of various applications.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to multimedia retrieval, in particular to a topic modeling method based on selection units. Background technique

[0002] At present, with the development of Internet architecture, storage technology and other related technologies, there are more and more multimedia data in various modalities, such as news, pictures, and audio and video. The rapid growth of multimedia data not only provides Internet users with a better browsing experience and provides more samples for multimedia retrieval applications, but also brings the challenge of how to automatically cluster large-scale data. In order to meet this challenge, many multimedia retrieval and integration applications use unsupervised hierarchical Bayesian models (or topic models) in their core algorithms, such as LDA (Latent Dirichlet Allocation, a broad traditional topic model) ) and its extensions, etc. Since it was proposed in 2003 until today, LDA and its derivative models h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More