Cloud platform MapReduce workflow scheduling optimizing method

An optimization method and workflow technology, applied in the field of big data computing, can solve the problem of not considering cluster rental costs, etc., and achieve the effect of fast computing speed, improving optimization efficiency, and stable optimization effect.

Inactive Publication Date: 2014-12-10
ZHEJIANG UNIV
View PDF3 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Big data workflows can be composed of batch jobs or stream processing jobs. Existing big data processing optimization methods only target a single job, and do not consider the cluster rental fee when running on the cloud platform.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cloud platform MapReduce workflow scheduling optimizing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0026] A cloud platform MapReduce workflow scheduling optimization method, such as figure 1 shown, including the following specific steps:

[0027] Refactoring step 100: the core of this step is to reconstruct the workflow W submitted by the user, so as to generate a new workflow W with a new structure that can better adapt to the genetic optimization algorithm. Specifically, the workflow W submitted by the user A workflow W that includes at least one job is reconstructed into a new workflow G. As an optional solution, the workflow W can be expressed as W(Γ,Λ,s,d), where Γ is a task set, representing work The set of all jobs in the flow W. Here, a job is regarded as a task and regarded as a node of the directed acyclic graph of the workflow W. Λ is a set of directed changes, which means that any The connection between two nodes of , s represents the size of the initial input data set of workflow W, and d represents the running deadline of workflow W, that is, the end time of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to big data calculation and discloses a cloud platform MapReduce workflow scheduling optimizing method. The platform MapReduce workflow scheduling optimizing method approximately comprises the steps of conducting reconstruction, wherein existing workflow is reconstructed, so that new workflow is obtained; conducting optimization, wherein the workflow is optimized according to the genetic algorithm; obtaining historical data, wherein the historical data are reserved by recoding the historical data or by recording relevant data of a regression model after the regression model is established. In this way, different individuals can be generated through a part of the historical data during optimization. The platform MapReduce workflow scheduling optimizing method has the advantages that the operating time of the workflow is considered, cost for cluster renting required when calculation is conducted on a cloud platform is also considered, the optimization effect is good, and the problem that the efficiency is not high when workflow scheduling is conducted on a large cloud calculation platform can be solved fundamentally.

Description

technical field [0001] The invention relates to the field of big data computing, in particular to a cloud platform MapReduce workflow scheduling optimization method, which effectively improves the optimization efficiency of workflow scheduling on the cloud platform. Background technique [0002] With the emergence and development of new information publishing methods represented by the Internet of Things, social networking site SNS, and bioinformatics, the types and quantities of data in human society are growing at an explosive rate, and the era of big data has arrived. At present, there is no generally accepted definition of big data. The difference between it and traditional concepts such as "massive data" and "ultra-large-scale data" is mainly reflected in the fact that big data needs to have the following three characteristics: large-scale (volume), diverse Variety and velocity. According to statistics, the New York Stock Exchange generates about 1TB of transaction dat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F9/5066G06F16/951
Inventor 吴朝晖何延彰姜晓红陈英芝毛宇
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products