Topic modeling method based on selected cell

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for topic modeling and selection of units, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of not considering words to express other topics or noise

Active Publication Date: 2014-02-05

ZHEJIANG UNIV

View PDF2 Cites 18 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, these models generally impose too strong a structural constraint on (visual) words, arguing that they must obey the topic of the fragment structure they belong to, regardless of the possibility that the word expresses other topics or noise

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0106] Taking the text type query "NYT+CNN" submitted by the user as an example, the steps of the present invention to process the query in the database are as follows:

[0107] 1. Search the multimedia database for all the news published by NYT and CNN, and extract the text in the search results;

[0108] 2. Use natural language processing tools to divide the document into sentences, and use the obtained sentences as the fragment structure of the data;

[0109] 3. Use natural language processing tools to mark the part-of-speech of each word, and use the obtained part-of-speech tagging structure as the feature of each word;

[0110] 4. Remove useless high-frequency words and uncommon words with low frequency;

[0111] 5. Collect all the words that have appeared in the text after statistical processing to form a vocabulary.

[0112] 6. According to the data set covered by the data, determine the number of topics to be 20;

[0113] 7. For each sentence contained in the data s...

Embodiment 2

[0140] Taking the image type query "LabelMe+MSRC" submitted by the user as an example, the steps of the present invention to process the query in the database are as follows:

[0141] 1. Two image data sets, LabelMe and MSRC v2, were found in the multimedia database, and the images in the search results were extracted;

[0142] 2. Use OpenSIFT to extract the SIFT features of all pictures to form a set of 128-dimensional feature points;

[0143] 3. Use K-means to cluster the feature point set to obtain a set of visual dictionaries, and replace all SIFT point clustering results with visual words in the dictionary;

[0144] 4. Use the existing annotations to extract attributes such as object boundaries and color histograms in the image, and use the object boundaries as the fragment structure in the image;

[0145] 5. The objects are clustered to obtain the category label to which each visual word belongs, and the category label is used as the feature of the visual word.

[0146...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a topic modeling method based on a selected cell. The method includes extracting words, segment structures and word features contained in searching results in a database according to a query request; determining topics adopted by modeling; producing each segment structure topic, word topic and binary choice through random allocation; determining the variables through the Gibbs sampling process iteratively; feeding significant documents, words of each topic and capacities for words with various features to express the topic of the located segment structure to users according to final allocating results of the variables. The method has the advantages that topic modeling can be performed on various modal data; implicit structural information of the data is utilized fully, and disadvantages due to strong structural constraints are eliminated; information of correlation between the word features and the segment structural constraints can be provided, and the users are assisted to understand the data; the method has good extensibility and can serve as algorithm basis of various applications.

Description

technical field [0001] The invention relates to multimedia retrieval, in particular to a topic modeling method based on selection units. Background technique [0002] At present, with the development of Internet architecture, storage technology and other related technologies, there are more and more multimedia data in various modalities, such as news, pictures, and audio and video. The rapid growth of multimedia data not only provides Internet users with a better browsing experience and provides more samples for multimedia retrieval applications, but also brings the challenge of how to automatically cluster large-scale data. In order to meet this challenge, many multimedia retrieval and integration applications use unsupervised hierarchical Bayesian models (or topic models) in their core algorithms, such as LDA (Latent Dirichlet Allocation, a broad traditional topic model) ) and its extensions, etc. Since it was proposed in 2003 until today, LDA and its derivative models h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30G06F17/27

CPCG06F16/40G06F40/253

Inventor 汤斯亮张寅王翰琪鲁伟明吴飞庄越挺

Owner ZHEJIANG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Topic modeling method based on selected cell

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology