Unlock instant, AI-driven research and patent intelligence for your innovation.

Multi-target scheduling system for concurrent data integration task

A scheduling system and data integration technology, applied in the field of big data, can solve problems such as affecting the execution efficiency of ETL tasks, unreasonable utilization of cluster resources, unbalanced node load, etc., to shorten the average execution time, improve the load balance, The effect of rational use

Inactive Publication Date: 2021-10-29
贵州优联博睿科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

With the increase of users, the concurrent execution of ETL tasks is gradually increasing, and the task scheduling strategy seriously affects the execution efficiency of ETL tasks.
At present, the research on ETL task scheduling optimization mainly takes the average task execution time as the optimization goal. Although such a single-objective optimization strategy reduces the average task execution time to a certain extent, it is easy to cause load imbalance among nodes, resulting in unsatisfactory cluster resources. to reasonable use

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-target scheduling system for concurrent data integration task
  • Multi-target scheduling system for concurrent data integration task
  • Multi-target scheduling system for concurrent data integration task

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0044] like figure 1 As shown, the embodiment of the present invention discloses a multi-objective scheduling system oriented to concurrent data integration tasks, including:

[0045] The target selection of ETL task scheduling is used to process the ETL execution nodes, complete data extraction, transformation and loading, and select the load balancing of tasks on the ETL execution cluster;

[0046] Task execution time prediction model based on random forest, which includes factors and feature selection that affect task execution time, training data set acquisition and random forest regression model construction; said factors that affect task execution time and feature selection include: the front execution node Variable factors in the data volume and runtime environment of software and hardware parameters, ETL task processing; Described training data set acquisition comprises the following steps:

[0047] Configure a group of ETL task execution nodes with different hardware...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a multi-target scheduling system for concurrent data integration tasks. The system comprises: a target selection of ETL (Extract, Transform and Load, data extraction, conversion and loading, for short, ETL) task scheduling, which is used for processing an ETL execution node, completing data extraction, conversion and loading, and selecting a load balancing condition of a task on an ETL execution cluster; a task execution time prediction model based on a random forest, wherein the task execution time prediction model comprises selection of factors and features influencing task execution time, acquisition of a training data set and construction of a random forest regression model; and a load balancing evaluation model based on space projection. According to the method, the average execution time of the ETL task is effectively shortened, and meanwhile, the load balance degree between the execution nodes is improved, so that cluster resources are utilized more reasonably.

Description

technical field [0001] The invention relates to the technical field of big data, in particular to a multi-objective scheduling system oriented to concurrent data integration tasks. Background technique [0002] With the development of informatization, enterprises have accumulated a large amount of data in the long-term operation. These data are generated in different periods and stored in different departments. The phased and distributed characteristics of data lead to the phenomenon of "information islands". Before data analysis, enterprises need to integrate the data scattered in various departments in order to maximize the value contained in the data. Data extraction, transformation and loading (Extract, Transform and Load, ETL for short) are three important steps of data integration. With the increase of users, the concurrent execution of ETL tasks also gradually increases, and the task scheduling strategy seriously affects the execution efficiency of ETL tasks. At pre...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/25G06F9/48G06F9/50G06K9/62
CPCG06F16/254G06F9/4806G06F9/5027G06F9/5083G06F18/24323
Inventor 李晖韩文彪丁玺润
Owner 贵州优联博睿科技有限公司