ETL task scheduling method, system and device and storage medium

A task scheduling and storage medium technology, applied in program control design, program startup/switching, resource allocation, etc., can solve the problems of wasting cluster computing resources, wasting time, and low resource utilization of a single task, so as to avoid wasting resources. Effect

Inactive Publication Date: 2019-10-22
SHENZHEN LEXIN SOFTWARE TECH CO LTD
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Often, the resource utilization rate of a single task is very low, wasting cluster computing resources, and t

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • ETL task scheduling method, system and device and storage medium
  • ETL task scheduling method, system and device and storage medium
  • ETL task scheduling method, system and device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] figure 1 It is a flow chart of an ETL task scheduling method provided by Embodiment 1 of the present invention. This embodiment is applicable to situations where one or more ETL tasks are waiting to be executed, and specifically includes the following steps:

[0032] Step 110, acquiring the SQL-based ETL task.

[0033] ETL is the process of extracting, cleaning and transforming the data of the business system and loading it into the data warehouse. The purpose is to integrate the scattered, messy, and non-uniform data in the enterprise to provide an analysis basis for the decision-making of the enterprise. Currently, there are many ways to implement ETL tasks commonly used, such as using ETL tools and implementing them in SQL. The SQL-based method is more flexible and can improve the efficiency of ETL operations. Therefore, this embodiment chooses to implement ETL tasks in SQL.

[0034] Step 120, judging whether the ETL task is the first type task or the second type ta...

Embodiment 2

[0056] Figure 4 It is a flow chart of an ETL task scheduling method provided by Embodiment 2 of the present invention. On the basis of Embodiment 1, this embodiment further specifies the process of executing the second type of task through the second computing resource, specifically including:

[0057] Step 210, calculate the free computing power pool at this time by using the minimum value algorithm according to the ETL task.

[0058] The executor Spark thriftServer has multiple computing power pools SparkthriftServer1, Spark thriftServer2, Spark thriftServer3... according to the second computing resource configuration. When an ETL task classified as the second type of task is executed by Spark thriftServer, the minimum value algorithm can An example of obtaining a suitable free computing power pool is Spark thriftServer2, which can perform this ETL task.

[0059] Step 220, perform an executable judgment according to the free computing power pool.

[0060] After Spark thri...

Embodiment 3

[0067] Figure 5 Shown is a schematic structural diagram of an ETL task scheduling system 300 provided in Embodiment 3 of the present invention. The specific structure of the ETL task scheduling device is as follows:

[0068] A task acquiring module 310, configured to acquire SQL-based ETL tasks.

[0069]Currently, there are many ways to implement ETL tasks commonly used, such as using ETL tools and implementing them in SQL. The SQL-based method is more flexible and can improve the efficiency of ETL operations. Therefore, this embodiment chooses to implement ETL tasks in SQL.

[0070] The task judging module 320 is configured to judge whether the ETL task is a first type task or a second type task according to preset rules.

[0071] The first execution module 330 is configured to select a first computing resource to execute the ETL task when the ETL task is a first type task.

[0072] The second execution module 340 is configured to select a second computing resource to exec...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses an ETL task scheduling method, system and device and a storage medium. The method comprises the steps of obtaining an ETL task based on SQL; judging whether the ETL task is a first type task or a second type task according to a preset rule; if the ETL task is a first type task, selecting a first computing resource to execute the ETL task; and if the ETL task is a second type task, selecting a second computing resource to execute the ETL task. According to the embodiment of the invention, the ETL tasks are classified based on the preset rule, and different types of ETL tasks are executed by selecting different computing resources, so that the problems of overlong queuing time and computing resource waste caused by executing the ETL tasks through unified computing resources are avoided.

Description

technical field [0001] The embodiment of the present invention relates to the technical field of big data scheduling, and in particular to a SQL-based ETL task scheduling method, system, device and storage medium. Background technique [0002] Big data technology is a technical field that all walks of life are striving to promote and rely on. Especially in Internet, e-commerce, consumer finance and other industries, tens of thousands of big data ETL tasks need to be run daily as an important support for data analysis and business decision-making. Distributed computing engines are also springing up like mushrooms after rain, with diversified usage methods and different requirements for the technical level of analysts. Based on cost, the industry development trend tends to be SQL-based, which reduces the difficulty of use. In the common SQL execution process of ETL, a single task execution resource is set, and the execution efficiency and resource utilization of a single SQL ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/48G06F9/50
CPCG06F9/4881G06F9/5038
Inventor 吴志龙
Owner SHENZHEN LEXIN SOFTWARE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products