Method and system for dispatching and executing distributed etl tasks

A task scheduling and distributed technology, applied in the field of data warehouse, can solve the problems of different load, different data volume, low data integration efficiency, etc., to achieve the effect of meeting timeliness, improving efficiency and resource utilization.

Active Publication Date: 2021-03-19
NORTH CHINA UNIVERSITY OF TECHNOLOGY
View PDF12 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the different execution time of each ETL task, the different amount of data contained in the task, and the current load of each execution node, it is easy to cause problems such as unbalanced cluster resource load and low resource utilization, resulting in low data integration efficiency.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for dispatching and executing distributed etl tasks
  • Method and system for dispatching and executing distributed etl tasks
  • Method and system for dispatching and executing distributed etl tasks

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below through specific embodiments in conjunction with the accompanying drawings. It should be understood that the described embodiments are some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0035] Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided in order to give a thorough understanding of embodiments of the invention. However, those skilled in the art will appreciate that the technical solutions of the present invention may be practiced without one or...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a method and a system for distributed ETL task scheduling execution. The method comprises the steps of extracting an association between entities involved in an ETL task and an accessory table, an association between the entities and a dimension table and a one-to-many association between the entities from an obtained target table contained in the ETL task to be scheduled and executed; determining a scheduling priority of the ETL task based on a preset weight for each association and the number of each association in the ETL task; and allocating each ETL task to each execution node according to an order of scheduling priorities from high to low. According to the technical scheme provided by the embodiment of the invention, the ETL tasks are distributed to the execution nodes according to different weights based on factors such as complexity of services corresponding to the ETL tasks, importance degree of to-be-integrated service data and the like; the timeliness of core data loading and the load balance between the nodes are met, and the efficiency of data integration and the utilization rate of resources are improved.

Description

technical field [0001] The invention relates to a data warehouse, in particular to a method and system for ETL task scheduling and execution. Background technique [0002] At present, data extraction, transformation and loading technology (Extract-Transform-Load, ETL) is one of the key steps in building a data warehouse in a big data environment. It is to integrate scattered and heterogeneous data into a unified standard library process. The steps of data extraction, transformation, and loading can be combined into a schedulable ETL script job (also called an ETL task). In a big data environment, dozens or even tens of thousands of ETL tasks often need to be executed. How to efficiently schedule these tasks is an important part of building a data warehouse. At present, the distributed cluster scheduling scheme is mainly used for ETL task scheduling, and scheduling algorithms such as polling algorithm, first-come-first-serving algorithm, and Min-Min algorithm are used to di...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/25G06F16/28G06F9/48G06F9/50
CPCG06F9/4881G06F9/5038G06F9/505G06F16/254G06F16/283
Inventor 杨冬菊徐晨阳
Owner NORTH CHINA UNIVERSITY OF TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products