Multi-target scheduling system for concurrent data integration task
A scheduling system and data integration technology, applied in the field of big data, can solve problems such as affecting the execution efficiency of ETL tasks, unreasonable utilization of cluster resources, unbalanced node load, etc., to shorten the average execution time, improve the load balance, The effect of rational use
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0044] like figure 1 As shown, the embodiment of the present invention discloses a multi-objective scheduling system oriented to concurrent data integration tasks, including:
[0045] The target selection of ETL task scheduling is used to process the ETL execution nodes, complete data extraction, transformation and loading, and select the load balancing of tasks on the ETL execution cluster;
[0046] Task execution time prediction model based on random forest, which includes factors and feature selection that affect task execution time, training data set acquisition and random forest regression model construction; said factors that affect task execution time and feature selection include: the front execution node Variable factors in the data volume and runtime environment of software and hardware parameters, ETL task processing; Described training data set acquisition comprises the following steps:
[0047] Configure a group of ETL task execution nodes with different hardware...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


