Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Topic Modeling Method Based on Selection Units

A technology for topic modeling and selection of units, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of not considering words to express other topics or noise

Active Publication Date: 2016-08-31
ZHEJIANG UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these models generally impose too strong a structural constraint on (visual) words, arguing that they must obey the topic of the fragment structure they belong to, regardless of the possibility that the word expresses other topics or noise

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Topic Modeling Method Based on Selection Units
  • A Topic Modeling Method Based on Selection Units
  • A Topic Modeling Method Based on Selection Units

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0105] Taking the text type query "NYT+CNN" submitted by the user as an example, the steps of the present invention to process the query in the database are as follows:

[0106] 1. Search the multimedia database for all the news published by NYT and CNN, and extract the text in the search results;

[0107] 2. Use natural language processing tools to divide the document into sentences, and use the obtained sentences as the fragment structure of the data;

[0108] 3. Use natural language processing tools to mark the part-of-speech of each word, and use the obtained part-of-speech tagging structure as the feature of each word;

[0109] 4. Remove useless high-frequency words and uncommon words with low frequency;

[0110] 5. Collect all the words that have appeared in the text after statistical processing to form a vocabulary.

[0111] 6. According to the data set covered by the data, determine the number of topics to be 20;

[0112] 7. For each sentence contained in the data s...

Embodiment 2

[0139] Taking the image type query "LabelMe+MSRC" submitted by the user as an example, the steps of the present invention to process the query in the database are as follows:

[0140] 1. Two image data sets, LabelMe and MSRC v2, were found in the multimedia database, and the images in the search results were extracted;

[0141] 2. Use OpenSIFT to extract the SIFT features of all pictures to form a set of 128-dimensional feature points;

[0142] 3. Use K-means to cluster the feature point set to obtain a set of visual dictionaries, and replace all SIFT point clustering results with visual words in the dictionary;

[0143] 4. Use the existing annotations to extract attributes such as object boundaries and color histograms in the image, and use the object boundaries as the fragment structure in the image;

[0144] 5. The objects are clustered to obtain the category label to which each visual word belongs, and the category label is used as the feature of the visual word.

[0145...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a topic modeling method based on a selected cell. The method includes extracting words, segment structures and word features contained in searching results in a database according to a query request; determining topics adopted by modeling; producing each segment structure topic, word topic and binary choice through random allocation; determining the variables through the Gibbs sampling process iteratively; feeding significant documents, words of each topic and capacities for words with various features to express the topic of the located segment structure to users according to final allocating results of the variables. The method has the advantages that topic modeling can be performed on various modal data; implicit structural information of the data is utilized fully, and disadvantages due to strong structural constraints are eliminated; information of correlation between the word features and the segment structural constraints can be provided, and the users are assisted to understand the data; the method has good extensibility and can serve as algorithm basis of various applications.

Description

technical field [0001] The invention relates to multimedia retrieval, in particular to a topic modeling method based on selection units. Background technique [0002] At present, with the development of Internet architecture, storage technology and other related technologies, there are more and more multimedia data in various modalities, such as news, pictures, and audio and video. The rapid growth of multimedia data not only provides Internet users with a better browsing experience and provides more samples for multimedia retrieval applications, but also brings the challenge of how to automatically cluster large-scale data. In order to meet this challenge, many multimedia retrieval and integration applications use unsupervised hierarchical Bayesian models (or topic models) in their core algorithms, such as LDA (Latent Dirichlet Allocation, a broad traditional topic model) ) and its extensions, etc. Since it was proposed in 2003 until today, LDA and its derivative models h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/40G06F40/253
Inventor 汤斯亮张寅王翰琪鲁伟明吴飞庄越挺
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products