Web image clustering method based on image and text relevant mining

A technology of image clustering and correlation, applied in special data processing applications, instruments, electrical digital data processing, etc.

Inactive Publication Date: 2009-11-18
ZHEJIANG UNIV
View PDF0 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional graph models can only model a single type of node and isomorphic links between nodes
Bipartite graphs c

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web image clustering method based on image and text relevant mining
  • Web image clustering method based on image and text relevant mining
  • Web image clustering method based on image and text relevant mining

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0119] Five visual polysemy words are selected as queries, they are: "apple", "bass", "jaguar", "mouse" and "tower". Wrote a crawler program to automatically extract Goolge ImageSearch based on the submitted keywords as queries TM The returned result. For each image in the returned results, the image file and the web page where the image is located are downloaded. Since Google limits the number of results actually returned by a search, the dataset contains about 4000 data items in total. In order to extract the accompanying text of the image, the Web page where the image is located is analyzed, and the text of the words around the image is extracted as the accompanying text of the image. All accompanying texts are part-of-speech tagged to extract nouns. For each query, the noun vocabulary size of its accompanying text is 1000-2000 words. To obtain the benchmark category list vectors, we manually annotated the image categories in the dataset.

[0120] The work flow chart o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a web image clustering method based on image and text relevant mining, which comprises the following steps of: (1) extracting images and associated texts thereof in Google image searching results according to the query; (2) extracting nouns in the associated texts to form a vocabulary list; (3) calculating the visibility of words in the vocabulary list; the visibility and a TF-IDF method are integrated for calculating the relative association between the words and the images; (4) calculating the theme degree of association between any two words in the vocabulary list; (5) a complex map is used for modeling the relative association; (6) a complex map clustering arithmetic is applied for clustering the images. The method combines the visibility of the words and the TF-IDF method to define the relative association between the words and the images and breakthroughs the restriction that the TF-IDF as a text processing text can not directly measure the relation between the words and the images; by modeling the relative association between the words and the images and between the words by the complex map, a web image clustering frame is provided so that the image searching results are classified according to the theme, thus be convenient for searching by users.

Description

technical field [0001] The invention relates to multimedia retrieval, in particular to a Web image clustering method based on image and text correlation mining. Background technique [0002] On the Web, using keywords to search for images is still an effective and common retrieval method, such as the image search of commercial search engines Google and AltaVista. In Web image retrieval, keywords submitted by users are often visually polysemous words, which contain multiple different visual meanings. For example, the word "mouse" can represent multiple topics such as "computer mouse", "mouse animal", and "Mickey mouse". Therefore, when querying images with these visual polysemous words, the returned image retrieval results will contain multiple topics, and images of different topics will be mixed together. This requires providing a post-retrieval process to classify images expressing different themes. Recently, many researchers have proposed Web image clustering methods to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 庄越挺吴飞韩亚洪
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products