Unlock instant, AI-driven research and patent intelligence for your innovation.

Inter-node heterogeneous bandwidth-oriented data distribution method in Gaia cluster

A data distribution and inter-node technology, applied in the field of big data, can solve the problems of long data distribution completion time and low efficiency

Active Publication Date: 2020-03-06
BEIJING INSTITUTE OF TECHNOLOGYGY +1
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in a heterogeneous bandwidth cluster environment, due to the existence of bandwidth bottleneck nodes, evenly distributing data will lead to a long time for data distribution to complete
For data-intensive jobs, network transmission is often the bottleneck of job running, and traditional data distribution methods will lead to low efficiency of job running

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Inter-node heterogeneous bandwidth-oriented data distribution method in Gaia cluster
  • Inter-node heterogeneous bandwidth-oriented data distribution method in Gaia cluster
  • Inter-node heterogeneous bandwidth-oriented data distribution method in Gaia cluster

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The present invention will be described in detail below with reference to the accompanying drawings and examples.

[0044] figure 1 It is a schematic diagram of a Gaia cluster with heterogeneous bandwidth among nodes. The Gaia cluster is a Master / Slave architecture. Taking the figure as an example, the Master is the JobManager, and the Slave is the three computing nodes Slave1, Slave2 and Slave3. The JobManager will maintain communication with all computing nodes, and each computing node will report its own situation, including bandwidth information, to the JobManager. After a Gaia assignment is submitted,

[0045] First, the job graph JobGraph will be generated on the client side of the node where the job is submitted. The JobGraph will be submitted by the client to the JobManager of the master node. At the JobManager of the master node, first select the TaskManager node for the job to run according to the job information, and then calculate the optimal data distrib...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an inter-node heterogeneous bandwidth-oriented data distribution method in a Gaia cluster, which can reduce the time required for data distribution. The Gaia cluster receives the batch processing job submitted by the user and submits the job graph to the main node. At the master node, a slave node at which jobs are deployed and run is selected. The optimization model basedon the data transmission time is constructed with the purpose of minimizing the data transmission time required by the data distribution process, and the optimal data distribution proportion of slavenodes of which jobs are deployed and run is calculated. Sampling logic is added to the operation diagram, the sampling logic segments the data to be distributed according to the optimal data distribution proportion, and the data in the same segment is sent to the same node during data distribution; an execution graph is generated according to the modified operation graph, and each sub-task in theexecution graph are deployed and operated on the slave node of which the operation is deployed and operated; and the batch processing operation is started to be executed, and the to-be-distributed data is distributed according to the optimal data distribution proportion.

Description

technical field [0001] The invention relates to the technical field of big data, in particular to a data distribution method for inter-node heterogeneous bandwidth in a Gaia cluster. Background technique [0002] In recent years, with the further development of big data technology, people began to pursue better timeliness of data processing, and the big data platform based on streaming engine has made great progress. But batch processing is still very important at the moment. Many scenarios in reality often require the interaction between batch jobs and stream jobs, so a big data computing system that integrates batch and stream is required. [0003] Gaia is a high-efficiency, scalable next-generation big data analysis system for the hybrid coexistence of multiple computing models. The goal is to solve a series of key technologies at several core levels of big data analysis systems such as adaptive and scalable big data storage, batch-flow fusion big data computing, high-di...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04L29/08
CPCH04L67/10H04L67/1008H04L67/101
Inventor 马卿云王国仁赵宇海郑军李荣华
Owner BEIJING INSTITUTE OF TECHNOLOGYGY