Network I/O (input/output) cost evaluation based ReduceTask data locality scheduling method for Hadoop big data platform

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A big data platform and data locality technology, which is applied in the direction of electronic digital data processing, resource allocation, program control design, etc., can solve the problems of multiple modes, low value density, large volume, etc., and achieve the effect of saving network resources

Active Publication Date: 2016-03-23

重庆中邮信科集团股份有限公司

View PDF3 Cites 8 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Traditional data storage capabilities and processing technologies are gradually becoming weak

The 4V characteristics contained in big data, that is, large volume (volume), multiple modes (variety), high speed (velocity), and low value density (value), increase the difficulty and complexity of data management and information extraction. [3]

The popularity of big data does not mean an in-depth understanding of big data, but rather indicates that big data is in danger of overhyping[6]

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0029] The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0030] figure 1It is the macro flow chart of the method of the present invention, as shown in the figure, the ReduceTask data local scheduling strategy based on network I / O cost evaluation in the Hadoop big data platform of the present invention mainly includes the following four steps: Step 1: Maintain a mapping table for each user job (Job) initialized by JobTracker, namely , whenever JobTracker allocates a new MapTask for the job, update the new mapping entry to this table; Step 2 : Maintain a mapping table for each user job (Job) initialized by JobTracker, that is, ; Step 3: Calculate any physical The node executes the network I / O cost of the node for the ReduceTask, and updates the result to the mapping table in step 2; step 4: according to the mapping table in the current Hadoop cluster in step 3, the node The principle that the lo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a network I / O (input / output) cost evaluation based ReduceTask data locality scheduling method in a Hadoop big data platform and belongs to the technical field of cloud computing platform optimization. According to the method, the network I / O cost required when each recorded Host node is taken as a cloud computing platform execution node and Map output data of other nodes are copied to the Host node is evaluated, priority of ReduceTask assignment is obtained on the basis of the evaluation, Reduce tasks are assigned to the nodes with high priority, so that the network I / O cost required when the Map output data are copied to Reduce nodes is reduced, and the selected Reduce nodes have the best data locality. The data locality is added to the ReduceTask assignment, so that network load caused by data copy at the Shuffle stage is reduced, and network bandwidth resources of Hadoop clusters are saved.

Description

technical field [0001] The invention belongs to the technical field of cloud computing platform optimization, and relates to a ReduceTask data local scheduling method based on network I / O cost evaluation in a Hadoop big data platform. Background technique [0002] With the development of the information industry, the amount of data generated by enterprises and various organizations is increasing rapidly. Traditional data storage capabilities and processing technologies are gradually becoming weak. The 4V characteristics contained in big data, that is, large volume (volume), multiple modes (variety), high speed (velocity), and low value density (value), increase the difficulty and complexity of data management and information extraction. [3] . The popularity of big data does not mean a deep understanding of big data, but rather indicates that there is a danger of over-hyping big data [6]. There are many doubts and controversies about the basic concepts, key technologies an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F9/50

CPCG06F9/5038G06F9/5088G06F2209/5017G06F2209/5021G06F2209/503

Inventor 尚凤军闫辰云

Owner 重庆中邮信科集团股份有限公司

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Patsnap Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Patsnap Eureka Blog

Learn More

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Network I/O (input/output) cost evaluation based ReduceTask data locality scheduling method for Hadoop big data platform

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology