Big data cluster self-adaptive resource scheduling method based on cloud platform

A resource scheduling and big data technology, applied in the field of cloud computing, can solve problems such as difficult to meet the requirements of rapid response in complex and changeable environments, and achieve the effect of improving resource utilization, reducing total cost, and ensuring resource utilization

Active Publication Date: 2019-10-29
FUDAN UNIV
View PDF6 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It is increasingly difficult for traditional resource scheduling schemes to meet the requirements of complex and changeable environments and rapid response
With the popularity of data-driven methods in recent years, some research programs based on data-driven cloud platform resource scheduling have emerged in academia, including research based on traditional machine learning (Machine Learning, ML) and research based on reinforcement learning (Reinforcement Learning, RL). ) research, etc., but when these studies are applied to the cloud platform, there are still shortcomings, such as Christina Delimitrou et al [CC2013, Christina Delimitrou and Christos Kozyrakis. QoS-Awarescheduling indigenous datacenters with paragon. 2013.] Grasp the characteristics of the task by modeling the workload, associate it with tasks with similar characteristics, and use historical experience to guide the resource allocation of upcoming tasks. This method relies on a large amount of workload data for Accurate modeling requires sufficient historical operating data to support resource allocation decisions; Shivaram Venkataraman et al [SZM2016, Shivaram Venkataraman, Zongheng Yang, Michael Franklin, Benjamin Recht, and Ion Stoica. Ernest: Efficient performance prediction for large-scale advanced analytics. NSDI'16, 363 -378.] Aiming at the problem of big data cluster configuration selection on the cloud platform, a method based on machine learning modeling is proposed. For a specific task and a specific type of virtual machine, given the number of virtual machines and data In the case of a large amount of data, the prediction model obtained through historical data training can predict the execution time of the task in the case of the current data size and the number of virtual machines. Although this method can achieve accurate time prediction, and based on the predicted time A reasonable configuration is assigned to it, but its training cost is too high, a large amount of historical data is required to ensure the accuracy of the model, and it needs to be retrained for different tasks and virtual machine types, and the flexibility is poor; Omid Alipourfard et al. OHJ2017, Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. Cherrypick: Adaptively unearthing the best cloud configurations for b ig dataanalytics.NSDI'17, 469-482.] In the case of only a small number of data samples, a data-driven method can also be used to obtain a near-optimal cluster configuration through a small number of samples, but this solution can only be used for Daily repetitive tasks, and re-sampling training is required for different tasks; etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data cluster self-adaptive resource scheduling method based on cloud platform
  • Big data cluster self-adaptive resource scheduling method based on cloud platform
  • Big data cluster self-adaptive resource scheduling method based on cloud platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] The present invention configures big data cluster resources for different types of tasks under the cloud platform, such as Figure 9 As shown, when a user needs to apply for resources to run a task, first, classify and analyze the big data analysis task, and use a three-layer neural network classifier to classify the task into a pre-marked type; then, in a small number of samples In the preliminary stage of cluster configuration, the Bayesian optimization method is used to find the configuration that minimizes the cost of resources requested by the user, and returns it to the user; after that, an online optimization module is added based on the idea of ​​data-driven, and the real Iterative dynamic optimization of time to solve problems such as inaccurate classification that may exist in the previous stage (the first three stages are workflows with no time limit, such as figure 2 shown); Finally, for tasks with time constraints, use NNLS (non-negative least squares) to ...

Embodiment 2

[0134] 1) Big data analysis task classification analysis experiment

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of computing, and relates to a big data cluster adaptive resource scheduling method based on a cloud platform. The method comprises the steps that in the big data analysis task classification analysis stage, CPU and I / O characteristics of big data analysis tasks are preliminarily analyzed through a neural network classifier; in the initial stage of configuration of a small number of sample clusters, the optimal configuration is rapidly obtained by means of a Bayesian optimization algorithm; in the cluster configuration online optimization stage, iterative optimization of a selection strategy is configured; and in a configuration selection stage in which sufficient samples have time limitation, execution time of the big data analysis task is predicted under different configurations based on a non-negative least square method, and optimal configuration is selected under the condition of time limitation. The method can solve the problem of reasonable selection of cluster configuration for running big data analysis tasks on the cloud platform, and guarantees the resource utilization rate of the cloud platform while guaranteeing the task execution efficiency.

Description

technical field [0001] The invention belongs to the technical field of cloud computing, and in particular relates to a method for adaptive resource scheduling of large data clusters based on a cloud platform. Background technique [0002] With the improvement of computer storage capacity and the development of complex algorithms, the amount of data on the Internet has grown exponentially in recent years. These trends have led to the rapid development of science and technology. By 2020, the total amount of newly added and copied data worldwide is expected to grow to 44ZB. The industry believes that the large amount of data accumulation and growth is due to the data-related practices of various companies and individuals; taking Facebook as an example, the total number of pictures uploaded by users every day can reach up to 300 million, and the daily content delivery can reach up to 2.5 billion. The total amount of data can be added up to 500TB per day; similarly, Google's mon...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62H04L29/08
CPCH04L67/51H04L67/60G06F18/24155G06F18/24Y02D10/00
Inventor 吕智慧吴杰李俊楠
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products