A data prefetching method based on mapreduce

A data prefetching and data block technology, applied in the computer field, can solve the problems of not being able to guarantee the prefetching effect, not taking into account the performance factors of computing nodes, etc., to achieve the effect of improving the overall throughput rate, flexible and convenient implementation, and shortening the execution time

Inactive Publication Date: 2018-02-09
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Seo et al. (HPMR: Prefetching and pre-shuffling in shared MapReduce computing environment [C] / / Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE International Conference on. IEEE, 2009: 1-8.) systematically analyzed MapReduce jobs Execution scenarios and the importance of network bandwidth in MapReduce computing, and propose a prefetch and advance Shuffle scheme to reduce network bandwidth consumption and improve cluster throughput and job execution efficiency, however, this prefetch cannot Guaranteed good prefetching effect, because the performance factor of the computing node is not considered

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data prefetching method based on mapreduce

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] This specific embodiment adopts following technical scheme:

[0032] A data prefetching strategy method based on MapReduce, its process is as follows figure 1 As shown, on a cluster with n physical computing nodes, for a specific job A that is scheduled, data prefetching is performed in the following way during its implementation:

[0033] Step 1: Since clusters can be divided into homogeneous and heterogeneous, it is assumed that the cluster is homogeneous when the calculation has not yet started, that is, it is assumed that the computing performance of all physical computing nodes P i are all 1, where i∈[1,n]; for job A, assuming that the number of data blocks corresponding to the job is b, and the default number of backups for each data block on HDFS is 3, set The number of data blocks is F Ti , then the total number of data blocks ∑F Ti = 3b;

[0034] The number of localized data blocks of job A on each computing node is used as a parameter to establish a small to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data prefetching method based on MapReduce, which belongs to the technical field of computers. The method of the present invention predicts the data block processing capacity of each computing node through performance evaluation, and evaluates which computing nodes will have non-localized tasks according to a series of calculations. When applying for processing the task, it will be prefetched to the local computing node in advance, so that the computing node will not generate computing waiting. The purpose of the present invention is to improve the execution efficiency of the MapReduce job and the overall throughput of the system, so that the computing nodes do not need to wait for the remote transmission of data blocks, and improve the utilization rate of the computing nodes. The prefetching method proposed by the present invention can work in homogeneous and heterogeneous MapReduce cluster environments at the same time, and the idea of ​​prefetching is not only for MapReduce, but all distributed computing frameworks can be used for reference and improved.

Description

technical field [0001] The invention belongs to the technical field of computers, and in particular relates to a data prefetching method related to MapReduce. Background technique [0002] As one of the core components of Hadoop, MapReduce is mainly used for distributed computing. On the Hadoop platform, the distributed computing framework of MapReduce is built on the distributed file system HDFS (Hadoop Distributed FileSystem), that is to say, the data input and data output required by the MapReduce framework are based on HDFS. When MapReduce performs data processing, it divides a large job into small computing tasks. These small tasks are divided into Map tasks and Reduce tasks. Map tasks obtain data from HDFS as input, and different Map tasks are independent of each other. The data input of the Reduce task comes from the output of the Map, and finally the processed data is stored on HDFS. [0003] When the HDFS distributed file system stores data, the data is divided in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/27
Inventor 高胜立薛瑞尼敖立翔
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products