A data prefetching method based on mapreduce

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A data prefetching and data block technology, applied in the computer field, can solve the problems of not being able to guarantee the prefetching effect, not taking into account the performance factors of computing nodes, etc., to achieve the effect of improving the overall throughput rate, flexible and convenient implementation, and shortening the execution time

Inactive Publication Date: 2018-02-09

UNIV OF ELECTRONICS SCI & TECH OF CHINA

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] Seo et al. (HPMR: Prefetching and pre-shuffling in shared MapReduce computing environment [C] / / Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE International Conference on. IEEE, 2009: 1-8.) systematically analyzed MapReduce jobs Execution scenarios and the importance of network bandwidth in MapReduce computing, and propose a prefetch and advance Shuffle scheme to reduce network bandwidth consumption and improve cluster throughput and job execution efficiency, however, this prefetch cannot Guaranteed good prefetching effect, because the performance factor of the computing node is not considered

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0031] This specific embodiment adopts following technical scheme:

[0032] A data prefetching strategy method based on MapReduce, its process is as follows figure 1 As shown, on a cluster with n physical computing nodes, for a specific job A that is scheduled, data prefetching is performed in the following way during its implementation:

[0033] Step 1: Since clusters can be divided into homogeneous and heterogeneous, it is assumed that the cluster is homogeneous when the calculation has not yet started, that is, it is assumed that the computing performance of all physical computing nodes P i are all 1, where i∈[1,n]; for job A, assuming that the number of data blocks corresponding to the job is b, and the default number of backups for each data block on HDFS is 3, set The number of data blocks is F Ti , then the total number of data blocks ∑F Ti = 3b;

[0034] The number of localized data blocks of job A on each computing node is used as a parameter to establish a small to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a data prefetching method based on MapReduce, which belongs to the technical field of computers. The method of the present invention predicts the data block processing capacity of each computing node through performance evaluation, and evaluates which computing nodes will have non-localized tasks according to a series of calculations. When applying for processing the task, it will be prefetched to the local computing node in advance, so that the computing node will not generate computing waiting. The purpose of the present invention is to improve the execution efficiency of the MapReduce job and the overall throughput of the system, so that the computing nodes do not need to wait for the remote transmission of data blocks, and improve the utilization rate of the computing nodes. The prefetching method proposed by the present invention can work in homogeneous and heterogeneous MapReduce cluster environments at the same time, and the idea of prefetching is not only for MapReduce, but all distributed computing frameworks can be used for reference and improved.

Description

technical field [0001] The invention belongs to the technical field of computers, and in particular relates to a data prefetching method related to MapReduce. Background technique [0002] As one of the core components of Hadoop, MapReduce is mainly used for distributed computing. On the Hadoop platform, the distributed computing framework of MapReduce is built on the distributed file system HDFS (Hadoop Distributed FileSystem), that is to say, the data input and data output required by the MapReduce framework are based on HDFS. When MapReduce performs data processing, it divides a large job into small computing tasks. These small tasks are divided into Map tasks and Reduce tasks. Map tasks obtain data from HDFS as input, and different Map tasks are independent of each other. The data input of the Reduce task comes from the output of the Map, and finally the processed data is stored on HDFS. [0003] When the HDFS distributed file system stores data, the data is divided in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F17/30

CPCG06F16/27

Inventor 高胜立薛瑞尼敖立翔

Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A data prefetching method based on mapreduce

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology