Unlock instant, AI-driven research and patent intelligence for your innovation.

Efficient computation of top-K aggregation over graph and network data

a graph and network data technology, applied in computing, instruments, electric digital data processing, etc., can solve the problems of inability to easily answer top-k operations in structured query language (sql) query engines, inability to process h-hop queries, and inability to advance analysis of social networks

Inactive Publication Date: 2012-07-31
INT BUSINESS MASCH CORP
View PDF6 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0035]Embodiments of the present invention provide efficient techniques using forward processing with a differential index and / or backward processing with partial distribution to find nodes of a graph that have top-k aggregate values over their neighbors within h-hops.

Problems solved by technology

Furthermore, advanced analysis of social networks may address very complicated mining tasks, such as evaluating the network value of customers and link prediction.
An h-hop query that can be decomposed into an aggregation operation and a top-k operation cannot be answered easily by Structured Query Language (SQL) query engines.
Moreover, the performance of using a relational query engine to process h-hop queries is often unacceptable.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Efficient computation of top-K aggregation over graph and network data
  • Efficient computation of top-K aggregation over graph and network data
  • Efficient computation of top-K aggregation over graph and network data

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0058]An online professional networking tool helps people discover inside connections to recommended job candidates, industry experts and business partners. It is natural to submit queries for business companies to find top-k candidates who have strong expertise and are referred by professionals in the same domain. For example, a query may find top-k candidates who have experiences in database research and also are referred by many database experts.

[0059]There is no doubt that the above queries are useful for emerging applications in many online social communities and other networks, such as book recommendations on a website of an online retailer, targeted marketing on a social networking website, and gene function finding in biological networks. These applications are unified by a general aggregation query definition over a network. In general, a top-k aggregation on graphs needs to solve three problems, listed below as P1, P2 and P3. Note that P1, P2 and P3 are listed below for pr...

example 2

[0071]FIG. 3A and FIG. 3B illustrate an example of the first forward processing approach. Given a graph 300-1 shown in FIG. 3A with nodes 302-1, 302-2, 302-3, 302-4, 302-5 and 302-6 (a.k.a. nodes e1, e2, e3, e4, e5 and e6, respectively), the SUM function for 1-hop neighbors is computed to generate aggregate scores (a.k.a. aggregate values). Node e3 is selected first for a forward processing and SUM(e3,1) is computed as 1.8 (i.e., the aggregate score of e3).

[0072]In the SUM function in FIGS. 3A and 3B and in the examples that are presented below, the “1” parameter (e.g., the “1” in SUM(e3,1)) indicates 1-hop. Thus, in Example 2, SUM(e3,1) is the sum of the score assigned to node e3 plus the scores assigned to the 1-hop neighbors of node e3 (i.e., score of node e3+score of node e1+score of node e2+score of node e4+score of node e5, or 0.2+0.5+0+0.1+1=1.8).

[0073]In Example 2, node e1 is selected as the next node and SUM(e1, 1) is computed as 0.5+0+0.2=0.7. Thus, the aggregate score of ...

example 3

[0091]Consider graph 400 in FIG. 4, which depicts an example of forward processing using differentia index-based pruning. For node e3, the differential indexes of its neighbors in 1-hop are: delta(e1−e3)=0, delta(e2−e3)=0, delta(e4−e3)=0, and delta(e5−e3)=1. The differential index values for delta(e1−e3), delta(e2−e3), and delta(e4−e3) are zero because e1, e2 and e4's 1-hop nodes are a subset of e3's 1-hop nodes. The differential index value for delta(e5−e3) is 1 because e5 has one node that is in its 1-hop but not in e3's 1-hop (i.e., node e6).

[0092]In Example 3, a forward processing is done on node e3 with the SUM aggregate values evaluated on node e3's 1-hop nodes to obtain SUM(e3,1)=1.8. Then the upper bound of e3's neighbor nodes is computed, as FIG. 4 shows. For instance, SUM(e1,1)=1.8 because given delta(e1−e3)=0, the aggregate value of node e1 can at most be the same as SUM(e3,1). SUM(e4, 1)=1.1 because node e4's own score is 0.1 and node e4 has only one neighbor. Thus, N(e4...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method and system for efficiently answering a local neighborhood aggregation query over graph data. A graph which has a plurality of nodes is received and stored in memory. A local neighborhood aggregation query is received. A processing engine applies forward processing with differential index-based pruning, backward processing using partial distribution, or an enhanced backward processing that combines the backward processing and the forward processing. As a result of the forward, backward, or enhanced backward processing, nodes in the graph that have the top-k highest aggregate values over neighbors within h-hops of the nodes are determined. Identities of entities or persons associated with the determined nodes are presented and / or stored.

Description

FIELD OF THE INVENTION[0001]The present invention relates to searching and mining large-scale graphs, and more particularly to efficiently answering local aggregation queries over large-scale networks.BACKGROUND OF THE INVENTION[0002]Managing and mining large-scale networks (e.g., physical, biological, and social networks) is critical to a variety of application domains, ranging from personalized recommendation in social networks, to search for functional associations in biological pathways. Network linkage analysis can find a group of tightly connected people that form a community or discover the centrality of nodes such as hub and authority. Furthermore, advanced analysis of social networks may address very complicated mining tasks, such as evaluating the network value of customers and link prediction. Existing network analytical tools develop application-specific criteria to gauge the importance of nodes or to discover knowledge hidden in complex networks. However, there is a gro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(United States)
IPC IPC(8): G06F7/00
CPCG06F17/30489G06F17/30958G06F16/24556G06F16/9024
Inventor HE, BIN
Owner INT BUSINESS MASCH CORP