A data prefetching method based on mapreduce

A data prefetching and data block technology, applied in the computer field, can solve the problems of not being able to guarantee the prefetching effect, not taking into account the performance factors of computing nodes, etc., to achieve the effect of improving the overall throughput rate, flexible and convenient implementation, and shortening the execution time
CN104933110BInactive Publication Date: 2018-02-09UNIV OF ELECTRONICS SCI & TECH OF CHINA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
UNIV OF ELECTRONICS SCI & TECH OF CHINA
Publication Date
2018-02-09
Estimated Expiration
Not applicable · inactive patent

Smart Images

  • Figure 1
    Figure 1
Patent Text Reader

Abstract

The invention provides a data prefetching method based on MapReduce, which belongs to the technical field of computers. The method of the present invention predicts the data block processing capacity of each computing node through performance evaluation, and evaluates which computing nodes will have non-localized tasks according to a series of calculations. When applying for processing the task, it will be prefetched to the local computing node in advance, so that the computing node will not generate computing waiting. The purpose of the present invention is to improve the execution efficiency of the MapReduce job and the overall throughput of the system, so that the computing nodes do not need to wait for the remote transmission of data blocks, and improve the utilization rate of the computing nodes. The prefetching method proposed by the present invention can work in homogeneous and heterogeneous MapReduce cluster environments at the same time, and the idea of ​​prefetching is not only for MapReduce, but all distributed computing frameworks can be used for reference and improved.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the technical field of computers, and in particular relates to a data prefetching method related to MapReduce. Background technique

[0002] As one of the core components of Hadoop, MapReduce is mainly used for distributed computing. On the Hadoop platform, the distributed computing framework of MapReduce is built on the distributed file system HDFS (Hadoop Distributed FileSystem), that is to say, the data input and data output required by the MapReduce framework are based on HDFS. When MapReduce performs data processing, it divides a large job into small computing tasks. These small tasks are divided into Map tasks and Reduce tasks. Map tasks obtain data from HDFS as input, and different Map tasks are independent of each other. The data input of the Reduce task comes from the output of the Map, and finally the processed data is stored on HDFS.

[0003] When the HDFS distributed file system stores data, the data is divided in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More