Method for calculating node similarity of chart in distributing manner

A distributed computing, similarity technology, applied in computing, special data processing applications, instruments, etc., can solve the problem of high complexity of computing methods

Active Publication Date: 2014-11-19
NORTHEASTERN UNIV
View PDF3 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this calculation method is too compl...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for calculating node similarity of chart in distributing manner
  • Method for calculating node similarity of chart in distributing manner
  • Method for calculating node similarity of chart in distributing manner

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0061] For the convenience of understanding, this embodiment provides a detailed processing process of a small graph data. This embodiment takes the character relationship network as an example, the character relationship network comes from Microsoft People Cube Relationship Search (http: / / renlifang.msra.cn / GuanxiMap.aspx?query=355 ),Such as figure 1 As shown, it describes the interpersonal relationship structure among seven celebrities, which are Dong Qing, Zhu Jun, Li Yong, Bai Yansong, Bi Fujian, Zhao Benshan and Li Sisi. In this embodiment, according to the interpersonal relationships among the seven persons, the degree of closeness of association among the seven persons can be calculated and analyzed.

[0062] In order to reduce the number of calculations, this embodiment determines the calculation accuracy to be ε=0.01; in order to speed up the convergence speed, this embodiment determines the attenuation factor to be C=0.4; in this embodiment, it is agreed that all da...

Embodiment 2

[0125] Table 7 shows 5 different datasets, among which the Wiki dataset is Wikipedia page data, where nodes represent each page, edges between nodes represent links between pages, and the similarity of node pairs can be used to represent two The degree of association of pages, for example, the pages of Liaoning Province, Fengtian Province, and Shenyang City in Wikipedia have a high degree of association, because there are links to these three pages in a large number of web pages; datasets Gnu1 to Gnu4 is the communication data of a distributed p2p file sharing system, in which the nodes represent each server, and the edges between nodes represent the access process between servers. It plays an important role in maintenance and data fusion operations.

[0126] Table 7 Dataset

[0127] data set Number of nodes number of sides wiki 7155 103689 Gnu1 6301 20777 Gnu2 8864 31839 Gnu3 10876 39994 Gnu4 22687 54705

[0128] Figure...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for calculating node similarity of a chart in a distributing manner and belongs to the field of computer data mining. The method comprises the following steps: a distributed calculating platform is built by adopting a master / slave mode, object data is read by a master computer, a graph model is built and sent to slave computers, tasks are divided by the master computer, slave tasks are assigned to the slave computers, the slave computers calculate task nodes and send the calculating results to the similarity incremental calculation values of the node pairs of the graph model respectively, the master computer calculates a deviation ratio and sends the calculating results to corresponding slave computers, the slave computers amend the similarity incremental calculation values of the nodes of local tasks, sum the similarity incremental calculation values and send the calculating results to the master computer, the master computer integrates the similarities of the nodes of the graph model to obtain the similarity of the nodes of the graph model finally. According to the invention, compared with the traditional SimRank calculating method, the transmission cost is low, the calculating time is short, and the efficiency is obviously improved.

Description

technical field [0001] The invention belongs to the field of computer data mining, in particular to a method for distributed computing graph node similarity. Background technique [0002] With the wide application of graph structure, calculating the similarity between two nodes has become a basic graph operation method. For example, in a graph model established for a social network, nodes represent personal accounts, and edges between nodes represent the relationship between personal accounts. Node similarity can be expressed as the degree of association between two accounts, which is useful in detecting similar groups and There are important applications in friend recommendation; as another example, in the graph model established for the citation network, nodes represent articles, and edges between nodes represent the citation relationship between articles, and node similarity can be applied to article classification and similar article recommendation . [0003] At presen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04L29/08G06F17/30H04L12/58
Inventor 申德荣冯朔寇月聂铁铮王振华于戈
Owner NORTHEASTERN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products