Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Transparent efficiency for in-memory execution of map reduce job sequences

a map and job technology, applied in the field of computer applications, can solve the problems of job controllers giving up, unstructured data, job sequences that are difficult to understand, etc., and achieve the effect of transparently sharing heap state and improving metrics associated with jobs

Inactive Publication Date: 2014-02-27
IBM CORP
View PDF6 Cites 158 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

This patent describes methods and systems for executing a map reduce sequence, which is a type of processing used in big data applications. The methods involve using a collection of multiple processes with each process running various tasks like mappers, combiners, partitioners, and reducers for every job in the sequence. The processes communicate with each other to coordinate the execution of the entire sequence. The result is improved performance and metrics associated with the job. The system includes a map reduce module that can execute all jobs in the sequence and coordinate the map, shuffle, and reduce phases. The computer readable storage medium includes the program instructions for performing the methods. Overall, this patent provides technical means for efficient and effective big data processing.

Problems solved by technology

Increasingly inter-connected, global computing systems are generating an enormous amount of irregular, unstructured data.
Within limits, of course, if there are a large number of failures, the job controller may give up.
This incurs I / O cost as well as (de-) serialization cost.
Mappers and reducers for each job are started in new JVMs (JVMs typically have high startup cost).
Solid black lines represent expensive out of memory (disk or network) operations.
Serializing and then deserializing this data wastes central processing unit (CPU) cycles.
It is not possible to combine output across JVM instances.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Transparent efficiency for in-memory execution of map reduce job sequences
  • Transparent efficiency for in-memory execution of map reduce job sequences
  • Transparent efficiency for in-memory execution of map reduce job sequences

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035]The present disclosure is related to increasing the performance of MapReduce job sequences in distributed processing. A method of the present disclosure in one embodiment applies to Hadoop MapReduce or the like job sequence. In one embodiment, the method may run all jobs in the Hadoop MapReduce job sequence and potentially run multiple mappers and reducers in the same job. The method may store key value sequences in a family of long-lived Java™ virtual machines (JVMs) and share heap-state between jobs. It should be understood that while the present disclosure refers to Hadoop's MapReduce model, the methodology of the present disclosure may apply to another like model.

[0036]A methodology in one embodiment of the present disclosure optimizes the shuffling phase of the MapReduce programming model for in-memory workloads. Such workloads fit in aggregate global cluster memory. In one embodiment of the present disclosure, a combiner may be run for a given mapper after accumulating a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Executing a map reduce sequence may comprise executing all jobs in the sequence by a collection of a plurality of processes with each process running zero or more mappers, combiners, partitioners and reducers for each job, and transparently sharing heap state between the jobs to improve metrics associated with the job. Processes may communicate among themselves to coordinate completion of map, shuffle and reduce phases, and completion of said all jobs in the sequence.

Description

FIELD[0001]The present application relates generally to computers, and computer applications, and more particularly to MapReduce job sequences in distributed processing.BACKGROUND[0002]Increasingly inter-connected, global computing systems are generating an enormous amount of irregular, unstructured data. Mining such data for actionable business intelligence can give an enterprise a significant competitive advantage. High-productivity programming models that enable programmers to write small pieces of sequential code to analyze massive amounts of data are particularly valuable in mining this data.[0003]Over the last several years, MapReduce has emerged as an important programming model in this space. In this model, the programming problem is broken up into specifying mappers (map operation) and reducers (reduce operation). A mapper takes a small chunk of data (typically in the form of pairs of (key,value)), and produces zero or more additional key value pairs. Multiple mappers are e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F9/46
CPCG06F16/2471G06F16/00G09G5/00
Inventor CUNNINGHAM, DAVIDHERTA, BENJAMIN W.SARASWAT, VIJAY A.SHINNAR, AVRAHAM E.
Owner IBM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products