Scalable user clustering based on set similarity

A user and cluster technology, applied in special data processing applications, instruments, calculations, etc., can solve problems that are difficult to achieve

Inactive Publication Date: 2009-09-16
GOOGLE LLC
View PDF0 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these techniques have disadvantages
For example, the running time of HAC is O(n 2 ), which is difficult to achieve for hundreds of millions of values ​​of n; and the k-means algorithm needs to represent the mean of the data points, which is not feasible when the data points are sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Scalable user clustering based on set similarity
  • Scalable user clustering based on set similarity
  • Scalable user clustering based on set similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] figure 1 A logical illustration of the following minhash method for clustering users is shown. While this approach can be implemented, it is presented here primarily for purposes of explanation. The following will refer to figure 2 A practical implementation for clustering users in a system with a large number of users is described.

[0024] like figure 1 As shown, the inputs to the minimal hashing method are: a population of items 110, denoted U; a set of k permutations 112, denoted p1, p2, . . . , pk; and a user's interest set 114, denoted for user A as X_A.

[0025] A permutation is a permutation in the range U that is uniformly selected from the set of all permutations in the range U so that each permutation has the same probability of being selected as the other permutations. Permutations are every one-to-one mapping of U to U (bijective). This permutation is only possible if U is fixed and countable. The integer k is a selection parameter. Usually the val...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Methods and apparatus, including systems and computer program products, to provide clustering of users in which users are each represented as a set of elements representing items, e.g., items selected by users using a system. In one aspect, a program operates to obtain a respective interest set for each of multiple users, each interest set representing items in which the respective user expressed interest; for each of the users, to determine k hash values of the respective interest set, wherein the i-th hash value is a minimum value under a corresponding i-th hash function; and to assign each of the multiple users to each of the respective k clusters established for the respective user, the i-th cluster being represented by the i-th hash value. The assignment of each of the users to k clusters is done without regard to the assignment of any of the other users to k clusters.

Description

technical field [0001] The present invention relates to digital data processing, and more particularly to grouping users of computer applications or systems into clusters. Background technique [0002] The operation of grouping users into clusters serves several purposes. To achieve user personalization, for example, a well-known technique, collaborative filtering, involves clustering users and recommending to users items that other users in the user cluster have expressed interest in. A user may generally be considered to express interest in an item in a variety of ways, for example, by clicking on the item, purchasing the item, or adding the item to a shopping cart. Recommendations can be presented in many ways, such as presenting to users in the form of partial search results, presenting in the form of news stories that users may want to read, identifying items that users may want to buy, and so on. [0003] One way to achieve user clustering is to first define a distan...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F7/00G06F17/00G06F17/30
CPCG06Q30/02G06F17/30867G06F16/9535
Inventor 马尤尔·达塔尔阿舒托什·加尔格
Owner GOOGLE LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products