Distributed graph calculation method based on disk

A graph computing and distributed technology, applied in the computer field, can solve the problems of update operation, the inability of the graph algorithm to converge, and the difficulty of parallelism, and achieve the effect of reducing hardware cost, good scalability, and small system scale.

Active Publication Date: 2016-06-08
HUAZHONG UNIV OF SCI & TECH
View PDF3 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] One is that after the end of each superstep, the system needs to execute a high-cost synchronization process for message exchange between vertices; the other is that only the calculation results of the previous superstep can be observed, which reduces the convergence speed of calculation, and even Some graph algorithms cannot converge; third, the strong coupling between graph data vertices and the lack of parallelism of graph algorithms make it difficult for the parallel processing capabilities of large-scale clusters to play a role; fourth, the degree distribution of natural graphs is extremely uneven, namely Very few vertices have most of the degrees. These "high degree" vertices become an important performance bottleneck in the BSP calculation model. The system cannot perform synchronous message exchange until the slowest vertex calculation is completed; therefore, the model is less efficient , and as the size of the processed graph increases, this disadvantage becomes more obvious
[0005] Memory-based large-scale asynchronous parallel computing model (BulkAsynchronousParallelmodel, BAP): The BAP computing model also performs parallel computing in units of vertices, and the vertices can observe the latest value of the neighbor vertices of the current iteration during the computing process; the BAP

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed graph calculation method based on disk

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

[0044] In the embodiment, a graph system processing (XGraph) is built according to the graph computing model based on the present invention, which is composed of a master node and 3 computing nodes; wherein, the master node is responsible for task scheduling, cooperative computing node work, etc.; computing Nodes are responsible for performing assigned tasks; XGraph uses the LiveJournal graph to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed graph calculation method based on a disk. The method adopts a distributed calculation model based on the disk, and partitions an original graph into P sub-graphs by a graph partitioning algorithm, one graph algorithm operation is finished through N-frequency iteration, one execution of each sub-graph is one task, and (P*N) pieces of tasks are contained; one task comprises the following steps: (1) loading and constructing the sub-graphs; (2) calculating the sub-graphs; and (3) storing a result, and sending relevant data to other sub-graphs. The method schedules tasks in a running water way, the tasks can be subjected to overlapping execution to hide the time delay of disk read, write and communication in a system execution process, the execution process causes the operation time of the whole system to be almost shortened to calculation time, system performance is greatly improved, and the system can still keep an extremely small system scale by facing to the graphs of different scales so as to greatly save the hardware cost of the system.

Description

technical field [0001] The invention belongs to the technical field of computers, and more specifically relates to a disk-based distributed graph computing method. Background technique [0002] Graph is the most commonly used abstract data structure in computer science. It is more complex than linear tables and trees in terms of structure and semantics, and has more general representation capabilities. In the context of today's big data, there are more and more large-scale graph analysis application requirements; for various large-scale graph computing application requirements, computing models are often used for processing; computing models are related to the hardware cost of the graph processing system, Performance, efficiency and other important features, the current graph processing system mainly uses the following calculation models: [0003] Memory-based large-scale synchronous parallel processing model (BulkSynchronousParallelmodel, BSP): A graph computing job (job) ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F3/06
CPCG06F3/061G06F3/0659G06F3/0689
Inventor 王芳程永利冯丹汪修能张永选戎佳磊
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products