Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Network I/O (input/output) cost evaluation based ReduceTask data locality scheduling method for Hadoop big data platform

A big data platform and data locality technology, which is applied in the direction of electronic digital data processing, resource allocation, program control design, etc., can solve the problems of multiple modes, low value density, large volume, etc., and achieve the effect of saving network resources

Active Publication Date: 2016-03-23
重庆中邮信科集团股份有限公司
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional data storage capabilities and processing technologies are gradually becoming weak
The 4V characteristics contained in big data, that is, large volume (volume), multiple modes (variety), high speed (velocity), and low value density (value), increase the difficulty and complexity of data management and information extraction. [3]
The popularity of big data does not mean an in-depth understanding of big data, but rather indicates that big data is in danger of overhyping[6]

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Network I/O (input/output) cost evaluation based ReduceTask data locality scheduling method for Hadoop big data platform
  • Network I/O (input/output) cost evaluation based ReduceTask data locality scheduling method for Hadoop big data platform
  • Network I/O (input/output) cost evaluation based ReduceTask data locality scheduling method for Hadoop big data platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0030] figure 1It is the macro flow chart of the method of the present invention, as shown in the figure, the ReduceTask data local scheduling strategy based on network I / O cost evaluation in the Hadoop big data platform of the present invention mainly includes the following four steps: Step 1: Maintain a mapping table for each user job (Job) initialized by JobTracker, namely , whenever JobTracker allocates a new MapTask for the job, update the new mapping entry to this table; Step 2 : Maintain a mapping table for each user job (Job) initialized by JobTracker, that is, ; Step 3: Calculate any physical The node executes the network I / O cost of the node for the ReduceTask, and updates the result to the mapping table in step 2; step 4: according to the mapping table in the current Hadoop cluster in step 3, the node The principle that the lo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a network I / O (input / output) cost evaluation based ReduceTask data locality scheduling method in a Hadoop big data platform and belongs to the technical field of cloud computing platform optimization. According to the method, the network I / O cost required when each recorded Host node is taken as a cloud computing platform execution node and Map output data of other nodes are copied to the Host node is evaluated, priority of ReduceTask assignment is obtained on the basis of the evaluation, Reduce tasks are assigned to the nodes with high priority, so that the network I / O cost required when the Map output data are copied to Reduce nodes is reduced, and the selected Reduce nodes have the best data locality. The data locality is added to the ReduceTask assignment, so that network load caused by data copy at the Shuffle stage is reduced, and network bandwidth resources of Hadoop clusters are saved.

Description

technical field [0001] The invention belongs to the technical field of cloud computing platform optimization, and relates to a ReduceTask data local scheduling method based on network I / O cost evaluation in a Hadoop big data platform. Background technique [0002] With the development of the information industry, the amount of data generated by enterprises and various organizations is increasing rapidly. Traditional data storage capabilities and processing technologies are gradually becoming weak. The 4V characteristics contained in big data, that is, large volume (volume), multiple modes (variety), high speed (velocity), and low value density (value), increase the difficulty and complexity of data management and information extraction. [3] . The popularity of big data does not mean a deep understanding of big data, but rather indicates that there is a danger of over-hyping big data [6]. There are many doubts and controversies about the basic concepts, key technologies an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50
CPCG06F9/5038G06F9/5088G06F2209/5017G06F2209/5021G06F2209/503
Inventor 尚凤军闫辰云
Owner 重庆中邮信科集团股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products