Tag clustering method and system

A clustering method and labeling technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as inaccurate calculation of label similarity

Inactive Publication Date: 2011-07-20
UNIV OF SCI & TECH OF CHINA
View PDF0 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the above technical problems, the main technical purpose of the present invention is to propose a label clustering method and system to overcome the defects of inaccurate calcul...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Tag clustering method and system
  • Tag clustering method and system
  • Tag clustering method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0072] In order to improve the accuracy of the label clustering result, an embodiment of the present invention provides a label clustering method, see figure 1 As shown in the schematic flow chart, the method may specifically include the following steps:

[0073] Step S101: Establish a feature vector of each tag to be clustered.

[0074] In this step, each label to be clustered is modeled and represented by a multi-dimensional feature vector.

[0075] This embodiment specifically provides the following three methods for establishing feature vectors for the tags to be clustered:

[0076] Method 1: Resource-based feature vector representation (item-based-vector, IBV).

[0077] A resource is usually marked by several tags, and each tag has a certain relationship with the resource. Using the above relationship, it can be seen that a tag can also be represented by multiple resources related to it.

[0078] Based on the above idea, the present invention can use the feature vector composed of t...

Embodiment 2

[0122] Corresponding to the tag clustering method provided in the first embodiment, this embodiment provides a tag clustering system to improve the accuracy of tag clustering. See Figure 7 Shown is a schematic diagram of the structure of the system, which specifically includes:

[0123] The feature vector establishment module 701 is used to establish the feature vector of each tag to be clustered;

[0124] The similarity calculation module 702 is used to calculate the cosine included angle of the two feature vectors in the Euclidean space to obtain the similarity between the labels to be clustered;

[0125] The clustering module 703 is configured to use the K-Means algorithm to cluster the tags to be clustered according to the similarity between the tags to be clustered.

[0126] Based on the three methods for establishing feature vectors for the tags to be clustered provided in the first embodiment, the feature vector establishing module 701 may include any one or more of the follo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a tag clustering method and a tag clustering system, wherein the method comprises the steps of; establishing characteristic vectors of every tag to be clustered; calculating a cosine included angle of two characteristic vectors in Euclidean space to obtain the similarity between every two tags to be clustered; and clustering the tags to be clustered by using K-Means algorithm according to the similarity between the tags to be clustered. The tag clustering system comprises: a characteristic vector establishing module which is used for establishing the characteristic vectors of every tag to be clustered, a similarity calculating module which is used for calculating the cosine included angle of two characteristic vectors in Euclidean space to obtain the similarity between every two tags to be clustered, and a clustering module which is used for clustering the tags to be clustered by using the K-Means algorithm according to the similarity between the tags to be clustered. The technical scheme can overcome the defect of inaccurate similarity calculation of tags in the current collaborative tag system, settle the problems of disordered tag organization and fuzzy tag semantics, and enhance the accuracy of tag clustering effectively.

Description

Technical field [0001] The present invention relates to the technical field of data mining, in particular to a collaborative labeling method, and particularly to a label clustering method and system under a large data set. Background technique [0002] Web 2.0, as a highly networked and liberalized Internet form based on users, content, and applications, has attracted a large number of Internet users, and has derived Web 2.0 applications such as blogs, podcasts, community networks, web digests, and Wikipedia. . The social labeling system is a typical web2.0 application, which is very popular and has a bright future. For example, websites such as Flickr, del.icio.us, and Douban.com all use collaborative labeling. One of their main characteristics is that they are open and uncontrolled systems. Users label resources with different tags according to their social and cultural background, expertise and world outlook, and use these user tags to complete the classification, organizati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 陈超周津俞能海
Owner UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products