Unlock instant, AI-driven research and patent intelligence for your innovation.

Data block copy placement method based on heterogeneous hadoop cluster environment

A hadoop cluster and data block technology, applied in the field of data block copy placement, can solve the problems of unbalanced node load, low execution speed, and low system resource utilization

Active Publication Date: 2020-10-27
NORTHWEST UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

That is, for nodes in a heterogeneous cluster environment, it will lead to various conditions, such as low system source utilization, unbalanced node load, low execution speed, low fault tolerance, communication load, and even cause node crashes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data block copy placement method based on heterogeneous hadoop cluster environment
  • Data block copy placement method based on heterogeneous hadoop cluster environment
  • Data block copy placement method based on heterogeneous hadoop cluster environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0072] In order to prove the practicability of the method of the present invention, experiments are carried out to verify the dynamic placement of newly added copies of the proposed data blocks. First, the data access request records in HDFS log records are collected as our data, with a total of 1000 data blocks. The experimental environment is composed of four different types of servers, a total of four racks. The racks communicate through switches. The default size of the data block is 128M. 1 NameNode and 39 DataNodes. The virtual machine type is VMwareWorkstation 12.0, Ubuntu 14.04LTS. On the Hadoop-2.7.3 version, the evaluation of the data block access rate, the copy decision of each data block and the dynamic data copy placement strategy is carried out.

[0073] Predict the popularity of 1000 data blocks in 10 time periods based on the grayscale prediction model, such as figure 1 It is the data access rate of a data block and its original copy in 10 time periods (ab...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for placing data block copies in a heterogeneous Hadoop cluster environment. The method classifies nodes under the heterogeneous cluster according to their performance parameters, and calculates the obtained data block heat prediction results in sequence according to the number of copies. Put it on each node. The invention combines multiple factors to determine which data block should be placed on which node, which not only improves MapReduce performance but also reduces execution time.

Description

technical field [0001] The invention belongs to the field of big data distributed computing, and relates to a data block copy placement method based on a heterogeneous Hadoop cluster environment. Background technique [0002] Over the past decade, the Hadoop platform researched by the Apache Foundation has become the most prominent open source framework for processing big data analysis. The 15-year IDC report "Trends in Enterprise Hadoop Deployments" found that 32% of companies had already implemented Hadoop. Also, 31% of companies plan to deploy Hadoop within 12 months. Not only in enterprise computing, Hadoop is also gaining steady momentum in the HPC (high performance computing) group. Among many cloud computing products, Hadoop has become the preferred solution for more and more Internet companies with massive data due to its high reliability, high scalability, high efficiency, low cost, and open source, and Hadoop has been put into practical use. in industrial applic...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F3/06
CPCG06F3/061G06F3/0611G06F3/064G06F3/0643G06F3/065G06F3/067
Inventor 吴奇石刘洋张晓阳侯爱琴王永强
Owner NORTHWEST UNIV