Mapreduce data localization method based on dynamic marker priority value

A technology of priority value and tag value, which is used in electrical digital data processing, special data processing applications, instruments, etc. It can solve the problems of computing performance differences of computing nodes, unbalanced distribution of data blocks, and vicious preemption of localization tasks. , to achieve the effect of improving the overall throughput rate and improving the degree of data localization

Inactive Publication Date: 2018-12-07
CHENGDU UNIV OF INFORMATION TECH
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, this data localization method cannot achieve a high degree of data localization. The main reasons are: 1) the data blocks of a job are relatively unevenly distributed on each computing node; 2) the computing performance of each computing node is different, Not exactly isomorphic machines
In the original scheduling method, MapReduce considers that the machines in the cluster are isomorphic, and does not consider the unbalanced distribution of data blocks; 3) Each computing node does not have a priority when selecting local data blocks
Data blocks have redundant backups, which may lead to vicious preemption of localization tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] The purpose of the present invention is to further improve the throughput rate of the Hadoop platform and the execution efficiency of the MapReduce job by improving the degree of data localization. By dynamically setting the tag value to change the scheduling priority of the localization task to improve the degree of data localization. Improving the degree of data localization and reducing the number of data blocks that need to be transmitted remotely can reduce the computing waiting time of computing nodes. In addition, due to the reduction of non-localized tasks, the usage of a job's network bandwidth is also reduced, so that the concurrency of the MapReduce cluster can be improved, thereby improving the overall throughput of the Hadoop platform. The data localization scheduling method based on the dynamic tag priority value proposed by the present invention can work on homogeneous or heterogeneous clusters.

[0018] The overall idea of ​​the MapReduce data localizat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a kind of MapReduce data localization method based on dynamic label priority value, comprises the following steps: the localization data block initialization mark value on each calculation node is 2n, calculates the quantity of each calculation node localization data block; According to The number of data blocks of each computing node is sorted in ascending order; starting from the computing node with the least number of data blocks in turn, subtract a certain number DecS from the tag value of the subsequent backup data block of the data block on each computing node; The data block with the largest value is scheduled; after task scheduling, the number of data blocks of each computing node is adjusted again and sorted in ascending order. After sorting, the tag value is adjusted, and data processing is finally completed. The present invention improves the data localization degree of jobs by marking data blocks and setting different priority values, thereby improving the execution efficiency of jobs on MapReduce and reducing the bandwidth occupation rate of jobs.

Description

technical field [0001] The invention relates to computer large-scale data calculation, especially to the field of MapReduce calculation, in particular to a MapReduce data localization method based on a dynamic mark priority value. Background technique [0002] The MapReduce computing framework is the core component of the Hadoop platform. All computing tasks on the Hadoop platform are completed on MapReduce. Therefore, the computing efficiency and throughput of the Hadoop platform are closely related to the job execution efficiency and throughput of MapReduce. [0003] On the Hadoop platform, the MapReduce distributed computing framework is built on the distributed file system HDFS that stores data blocks in redundant form. HDFS stores user data in the form of data blocks and the default redundancy number of data blocks is 3, that is to say, one When the file size corresponding to the job is 100 data blocks (in HDFS, the default size of the data block is 64MB), the number of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/1752G06F16/182
Inventor 杨玉琴陈麟
Owner CHENGDU UNIV OF INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products