Data allocation strategy in hadoop heterogeneous cluster

A data distribution and heterogeneous cluster technology, applied in resource distribution, multi-programming devices, program control devices, etc., can solve problems such as insufficient consideration of data distribution, insufficient reference value, and inability to ensure network stability.

Active Publication Date: 2013-07-24
FUZHOU UNIV
View PDF4 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these strategies do not fully consider the differences in the inherent capabilities of each node in the heterogeneous cluster, such as thread switching capabilities, node storage capabilities, etc., resulting in insufficient consideration of data distribution issues and cannot ensure network stability.
Other methods also consider network transmission, but the network distance is estimated by the sum of the distance from each node in the topology structure to the nearest common ancestor. In the different network bandwidths of practical applications, this estimation has great limitations. sex, insufficient reference value

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data allocation strategy in hadoop heterogeneous cluster
  • Data allocation strategy in hadoop heterogeneous cluster
  • Data allocation strategy in hadoop heterogeneous cluster

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below through specific embodiments and related drawings.

[0022] The present invention provides a data distribution strategy in a Hadoop heterogeneous cluster, comprising the following steps:

[0023] S01: Test and store the execution time of each node processing data of different scales, and convert it into a static performance reference index;

[0024] S02: Monitor and store the storage load of each node and the network transmission speed between each node, and convert them into dynamic performance reference indicators;

[0025] S03: According to the preset weights of each performance factor, use the calculation module to calculate the number of data blocks that should be allocated to each node, and perform data block-node mapping and allocation transmission through the data allocation server.

[0026] The followi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a data allocation strategy in a hadoop heterogeneous cluster. The data allocation strategy is characterized by comprising the following steps of: step S01, testing and storing execution time for each node to treat data with different scales, and transforming the execution time into a static performance reference index; step S02, monitoring and storing storage load of each node and network transmission speed between the nodes, and transforming the storage load and the network transmission speed into dynamic performance reference indexes; and step S03, calculating the quantity of data blocks to be allocated to each node by utilizing a calculating module according to preset weight of each performance factor, and performing data block-node mapping and carrying out allocation transmission by using a data allocation server. Through flexible configuration of each performance factor of the static and dynamic performance reference indexes, the data allocation strategy disclosed by the invention enhances the adaptability, ensures the effectiveness, effectively increases the data locality, reduces the operation response time and network transmission, improves the load stability of the system, and optimizes the cluster resources.

Description

technical field [0001] The invention relates to a data distribution strategy in the field of high-performance clusters, in particular to a data distribution strategy in a Hadoop heterogeneous cluster based on comprehensive consideration of multi-performance factors such as node computing capability, network transmission capability and node load capability. Background technique [0002] Hadoop is a software framework capable of distributed processing of large amounts of data. Its high reliability, high scalability, high efficiency and high fault tolerance make it widely concerned in the field of business and research. Hadoop includes two relatively independent subsystems: the distributed parallel computing system MapReduce consists of JobTrackers and TaskTrackers; the distributed storage system HDFS stores files on all storage nodes in the Hadoop cluster. When executing MapReduce tasks, it is necessary to obtain the corresponding data blocks on HDFS for processing. In order t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/44G06F9/50
Inventor 郭文忠陈国龙林常航
Owner FUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products