ETL-based task scheduling method and device, computer device, and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By configuring the task scheduling model and parameter passing, the problems of location and result judgment in task scheduling are solved, and efficient task scheduling management and optimization are achieved.

CN116595072BActive Publication Date: 2026-06-26CHINA PING AN PROPERTY INSURANCE CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CHINA PING AN PROPERTY INSURANCE CO LTD
Filing Date: 2023-04-03
Publication Date: 2026-06-26

Application Information

Patent Timeline

03 Apr 2023

Application

26 Jun 2026

Publication

CN116595072B

IPC: G06F16/25; G06F9/48

CPC: G06F16/254; G06F9/4881; Y02D10/00

AI Tagging

Technology Topics

Scheduling instructionsDistributed computing

Technical Efficacy Phrases

fast deliveryRapid positioning

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Hydraulic oil cooling device
CN224414033Uspeed up coolingfast deliveryThreaded pipeOil cooling
A urea nozzle cooling and heat insulation mounting sleeve
CN224396565UReduce the temperaturefast deliveryExhaust apparatus Silencing apparatus Water circulation Internal combustion engine
A method for separating lithium from a high magnesium-lithium ratio salt lake brine
CN117385192BSolution to short lifeachieve separation
Flame-retardant polyurethane two-component corner adhesive and preparation method thereof
CN117736686BImprove stabilityextended storage timeAdhesive cementPolymer science
A paper feed plate mechanism
CN224393114UFast deliverySmooth feeding

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies cannot quickly locate task scheduling problems and determine the impact of scheduling results, leading to difficulties in system maintenance and problem localization, especially in complex data ETL logic where the interdependencies between schedulers are complex.

Method used

The task scheduling model is used to configure the parameter categories of task scheduling, define the dependencies of the task scheduling layer, and pass parameter data through a subscription/publishing model to achieve orderly execution of task chains and result aggregation and analysis.

Benefits of technology

It improves the efficiency of task scheduling maintenance, management and exception handling, and can quickly locate problems and optimize the ETL task scheduling chain.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116595072B_ABST

Patent Text Reader

Abstract

The embodiment of the application belongs to the technical field of big data processing, and relates to a task scheduling method based on ETL, which comprises the following steps: when a task scheduling instruction is received, a task scheduling model is used to configure a parameter category of task scheduling, and a dependency relationship with an upper task scheduling layer and / or a lower task scheduling layer of each task scheduling layer is formulated; the task scheduling layers are sequentially executed according to the dependency relationship, and in the process of executing the task scheduling layers, parameter data corresponding to the parameter category generated by the upper task scheduling layer is transmitted to the lower task scheduling layer; after all the task scheduling layers are executed, all the parameter data is summarized and analyzed to determine a scheduling result of the task scheduling. The application also provides a task scheduling device based on ETL, a computer device and a storage medium. The application can realize positioning of a task scheduling problem and judging of an influence of a task scheduling result.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of big data processing technology, and in particular to a task scheduling method, apparatus, computer equipment and storage medium based on ETL. Background Technology

[0002] Currently, in the field of big data, data flow mainly involves the following stages:

[0003] Data production stage: Business data generated by customers or applications.

[0004] Data ETL (Extract-Transform-Load) stage: Data warehouse, data middle platform, etc. are used to uniformly process and store data.

[0005] Data application stage: Provide application data for analysis and display scenarios (such as data dashboards, data reports, and data analysis reports).

[0006] Data ETL is the prerequisite and foundation for data applications. Each data ETL process typically generates a timed scheduler to represent the logic to be processed. Different processes can reuse the same base data from the same table. This, in turn, creates dependencies between schedulers, a relationship often referred to as "lineage" in the industry. As data sources become increasingly diversified and data ETL logic becomes more complex, the dependencies between schedulers also become more intricate. For example, in insurance business statistics, the number of schedulers can range from hundreds to tens of thousands, forming a network-like structure. This makes system maintenance and problem localization increasingly difficult. The dependencies along the entire chain are extremely complex; if a scheduler malfunctions, it's impossible to quickly pinpoint the problem or assess the extent of its impact on the final RTL data processing results. Summary of the Invention

[0007] The purpose of this application is to propose an ETL-based task scheduling method, apparatus, computer device, and storage medium to solve the technical problems of existing technologies being unable to locate task scheduling problems and determine the impact of task scheduling results.

[0008] To address the aforementioned technical problems, this application provides an ETL-based task scheduling method, employing the following technical solution:

[0009] When a task scheduling instruction is received, the parameter categories for task scheduling are configured using the task scheduling model, and the dependencies with the upper and / or lower task scheduling layers of each task scheduling layer are established.

[0010] The task scheduling layer is executed sequentially according to the dependency relationship, and during the execution of the task scheduling layer, the parameter data corresponding to the parameter category generated by the upper task scheduling layer is passed to the lower task scheduling layer.

[0011] After all the task scheduling layers have been executed, all parameter data are summarized and analyzed to determine the scheduling result of the task scheduling.

[0012] Furthermore, the step of establishing dependencies between each task scheduling layer and its upper and / or lower task scheduling layers includes:

[0013] Define the task chain;

[0014] Define the dependencies between the upper and / or lower task scheduling layers of each task scheduling layer in each task chain, and define the parameter categories of the upper task scheduling layers that each task scheduling layer depends on.

[0015] Furthermore, the steps for defining the task chain include:

[0016] The functions required for task scheduling are determined based on the task scheduling instructions.

[0017] The corresponding task chain is determined based on the described function.

[0018] Furthermore, the task category includes running status, runtime, the attribute of the task program in the task scheduling layer, the consumption of the upper-level task scheduling layer in each task chain, and the level flag of the task scheduling layer; the step of transferring the parameter data corresponding to the parameter category generated by the upper-level task scheduling layer to the lower-level task scheduling layer includes:

[0019] In each task chain, the upper-layer task scheduling layer generates corresponding parameter data according to the defined parameter category;

[0020] The parameter data is then passed to the lower-level task scheduling layer.

[0021] Furthermore, the step of summarizing and analyzing all the parameter data to determine the scheduling result of the task scheduling includes:

[0022] The abnormal parameter data of all task scheduling layers is summarized to obtain the overall abnormal parameter data;

[0023] Determine whether the overall abnormal parameter data exceeds a preset threshold;

[0024] If the result of the judgment is yes, then a task optimization reminder will be output.

[0025] Furthermore, the step of optimizing the output task reminder includes:

[0026] The task scheduling layer generated by the abnormal parameter data is located based on the overall abnormal parameter data and the hierarchical relationship of the scheduling tasks in the task chain.

[0027] The corresponding reminder object is determined based on the attribution of the task scheduler in the task scheduling layer generated by the abnormal parameter data;

[0028] The abnormal parameter data generated by the abnormal parameter data is sent to the reminder object.

[0029] Furthermore, prior to the step of configuring the parameter categories for task scheduling using the task scheduling model, the following steps are included:

[0030] Store the source data for task scheduling in the source database;

[0031] After summarizing and analyzing all the parameter data, the following steps are included:

[0032] The terminal data obtained after executing all task scheduling layers is transmitted to the application database and displayed according to preset rules.

[0033] To address the aforementioned technical problems, this application also provides an ETL-based task scheduling device, which employs the following technical solution:

[0034] The task scheduling device includes:

[0035] The configuration module is used to configure the parameter categories of task scheduling using the task scheduling model when a task scheduling instruction is received, and to define the dependencies with the upper and / or lower task scheduling layers of each task scheduling layer.

[0036] An execution module is used to execute the task scheduling layer sequentially according to the dependency relationship, and during the execution of the task scheduling layer, to pass the parameter data corresponding to the parameter category generated by the upper task scheduling layer to the lower task scheduling layer.

[0037] The analysis module is used to summarize and analyze all parameter data after all the task scheduling layers have been completed, in order to determine the scheduling result of the task scheduling.

[0038] To address the aforementioned technical problems, this application also provides a computer device that employs the following technical solution:

[0039] The computer device includes a memory and a processor. The memory stores computer-readable instructions, and the processor executes the computer-readable instructions to implement the steps of the task scheduling method described above.

[0040] To address the aforementioned technical problems, this application also provides a computer-readable storage medium, employing the technical solution described below:

[0041] The computer-readable storage medium stores computer-readable instructions, which, when executed by a processor, implement the steps of the task scheduling method described above.

[0042] Compared with the prior art, the embodiments of this application have the following advantages: This application provides a task scheduling method, apparatus, computer device and storage medium based on ETL. The task scheduling method includes: upon receiving a task scheduling instruction, configuring the parameter categories of the task scheduling using a task scheduling model, and establishing dependencies with each upper-level and / or lower-level task scheduling layer. Configuring the parameter categories provides a basis for parameter passing in subsequent task scheduling. These dependencies ensure that each task scheduling layer can accurately and quickly obtain the corresponding upper-level task scheduling layer and pass the parameters to the corresponding lower-level task scheduling layer during execution. Then, the task scheduling layers are executed sequentially according to the dependencies. During the execution of each task scheduling layer, parameter data corresponding to the parameter categories generated by the upper-level task scheduling layer is passed to the lower-level task scheduling layer. This efficient transmission of parameter data is achieved. After all task scheduling layers are executed, all parameter data is summarized and analyzed to determine the scheduling result. This allows for rapid location of the specific task scheduling layer when problems arise, based on the parameter data transmitted between task scheduling layers. Furthermore, summarizing all parameter data allows for analysis of the overall task scheduling result, such as time consumption and memory usage. Therefore, this application can improve the efficiency of task scheduling maintenance, management, and exception handling, and can also perform data analysis of the entire task scheduling data based on parameter data to optimize the entire ETL task scheduling chain. Attached Figure Description

[0043] To more clearly illustrate the solutions in this application, the accompanying drawings used in the description of the embodiments of this application will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0044] Figure 1 This is an exemplary system architecture diagram to which this application can be applied;

[0045] Figure 2 A flowchart of an embodiment of the ETL-based task scheduling method according to this application;

[0046] Figure 3 yes Figure 2 A flowchart of a specific implementation of step S203;

[0047] Figure 4 This is a schematic diagram illustrating the principle of an application embodiment of the ETL-based task scheduling method;

[0048] Figure 5 This is a schematic diagram of a structure of an embodiment of the ETL-based task scheduling device according to this application;

[0049] Figure 6 This is a schematic diagram of the structure of one embodiment of the computer device according to this application. Detailed Implementation

[0050] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains; the terminology used herein in the specification of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having," and any variations thereof, in the specification, claims, and foregoing drawings of this application, are intended to cover non-exclusive inclusion. The terms "first," "second," etc., in the specification, claims, or foregoing drawings of this application are used to distinguish different objects, not to describe a particular order.

[0051] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.

[0052] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

[0053] Currently, ETL data processing in the industry typically employs batch processing. Specifically, this involves running data ETL programs on a scheduled basis, such as in the case of statistical insurance applications where batch processing is performed every morning. The data processed is often generated the previous day. Batch processing solutions offer high stability and flexibility, and can be widely supported for various data application scenarios. Especially in some statistical analysis scenarios, specific basic data needs to be extracted, which can only be accomplished through batch processing. However, existing task scheduling methods only check the status of the upper-level task scheduling layer by polling. If all upper-level task scheduling layers are complete, the current scheduling layer is executed. There is no transmission of parameter data representing the attribute information of the scheduled tasks between task scheduling layers. Therefore, if a problem occurs in task scheduling, it is impossible to quickly locate the issue or analyze the overall task scheduling result. Based on this, this application provides an ETL-based task scheduling method, which will be detailed below.

[0054] Please see Figure 1 , Figure 1 This is an exemplary system architecture diagram to which this application can be applied. For example... Figure 1 As shown, system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105. Network 104 serves as the medium for providing communication links between terminal devices 101, 102, and 103 and server 105. Network 104 may include various connection types, such as wired or wireless communication links, or fiber optic cables, etc.

[0055] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, and 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social media platform software, etc.

[0056] Terminal devices 101, 102, and 103 can be various electronic devices with displays and support web browsing, including but not limited to smartphones, tablets, e-book readers, MP3 players (MOVING PICTURE EXPERTS GROUP AUDIO LAYER III), MP4 players (MOVING PICTURE EXPERTS GROUP AUDIO LAYER IV), laptops, and desktop computers, etc.

[0057] Server 105 can be a server that provides various services, such as a backend server that supports the pages displayed on terminal devices 101, 102, and 103.

[0058] It should be noted that the ETL-based task scheduling method provided in this application embodiment is generally executed by a server / terminal device, and correspondingly, the ETL-based task scheduling device is generally set in the server / terminal device.

[0059] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.

[0060] Continue to refer to Figure 2 The diagram illustrates a flowchart of an embodiment of the ETL-based task scheduling method according to this application. The ETL-based task scheduling method includes the following steps:

[0061] Step S201: When a task scheduling instruction is received, the parameter categories of task scheduling are configured using the task scheduling model, and the dependencies with the upper and / or lower task scheduling layers of each task scheduling layer are established.

[0062] In this embodiment, the ETL-based task scheduling method can run on electronic devices (e.g., Figure 1 shown Server / Terminal Equipment In this system, task scheduling instructions can be received via wired or wireless connections. It should be noted that the aforementioned wireless connection methods may include, but are not limited to, 3G / 4G / 5G connections, Wi-Fi connections, Bluetooth connections, WiMAX connections, ZigBee connections, UWB (Ultra Wideband) connections, and other currently known or future wireless connection methods.

[0063] In another embodiment, a pre-set program instruction can be used to activate the task scheduling instruction. For example, the task scheduling instruction can be triggered at midnight every day.

[0064] Task scheduling is accomplished through multiple task scheduling layers. Each task scheduling layer executes the current task scheduler. The first task scheduling layer executes source data from the source database. In this embodiment, before configuring the task scheduling parameter categories, the source data is stored in the source database. Specifically, it can be synchronized using SQOOP (e.g., SQOOP for data transfer between Hadoop and traditional databases) or OGG (ORACLE GOLDENGATE for data synchronization from Oracle to heterogeneous databases) to the big data platform database. Task scheduling is achieved by executing the corresponding task scheduler through multiple task scheduling layers. Each task scheduling layer depends on the execution results of the task scheduler from the upper-level layer to execute its own task scheduler, and also needs to publish the execution results of its own task scheduler to the lower-level layer. Therefore, this embodiment requires a scheduling model to define the dependencies between the task scheduling layers, ensuring that each layer executes its task scheduler in an orderly manner. The scheduling model can be a publish / subscribe model, meaning that each task scheduling layer can subscribe to the upper-level layer or publish to the lower-level layer.

[0065] In this embodiment, a task scheduling model can be used to define the task chains required for task scheduling. Furthermore, the dependencies between the upper and / or lower task scheduling layers of each task scheduling layer within each task chain can be defined, as well as the parameter categories of the upper task scheduling layers that each task scheduling layer depends on. Task scheduling includes multiple task chains, each of which executes its own task scheduling program in an orderly manner through different task scheduling layers to achieve task scheduling for each chain. The entire task scheduling is completed once all task chains are finished.

[0066] It should be understood that task chains can be independent or intersecting, and their design depends on the needs of task scheduling. Task chains are designed based on the functions required by the scheduled task. Specifically, the functions to be implemented in the task scheduling can be determined first based on the task scheduling instructions. For example, the task scheduling may require calculating the number of new customers added in the previous day's insurance business and the previous day's insurance sales figures. Different task chains are then designed based on different functions, meaning each task chain can implement at least a portion of the functions. It is understood that if a task chain is independent, it implements an independent function; if task chains are intersecting, each task chain may only implement a portion of the functions.

[0067] In addition to defining the dependencies between task chains and task scheduling layers, this embodiment further configures the parameter categories for task scheduling. These parameter categories include at least one of the following: running status, runtime, the attribution of task programs within the task scheduling layer, the consumption of the upper-level task scheduling layer in each task chain, and the level flag of the task scheduling layer. Furthermore, the parameter categories of the upper-level task scheduling layers that each task scheduling layer depends on are also defined. For example, the parameter categories of upper-level task scheduling layer B that task scheduling layer A depends on are running status, runtime, and attribution; the parameter categories of upper-level task scheduling layer C that task scheduling layer C depends on include consumption, runtime, and attribution. The running status can include the status of the task scheduling program running in the task scheduling layer, specifically including information such as not running, running, running successfully, running failed, and offline.

[0068] Runtime includes the duration of the task scheduler running in the task scheduling layer, the comparison value with the average duration (such as the timeout value exceeding the average duration, the advance value of the time less than the average duration), and the time value delayed from the set start time.

[0069] Attribution includes the development unit to which the task scheduler belongs. For example, if task scheduler A was developed by department B, then the task scheduler belongs to department B. This attribution is mainly used to directly locate department B when an exception occurs during the execution of task scheduler A, so as to resolve the exception problem more efficiently.

[0070] The overhead of the upper-level task scheduling layer in each task chain mainly includes CPU consumption. This parameter category can work in conjunction with attribution parameter categories. For example, if it is found that the CPU consumption of a task scheduler in a certain task scheduling layer is too high, the department to which the task scheduler belongs can be quickly contacted to optimize the task scheduler and thus optimize the task scheduling chain.

[0071] The level label of the task scheduling layer mainly refers to the importance level of the task scheduling layer, which can be determined by each task scheduling requirement.

[0072] Step S202: Execute the task scheduling layer sequentially according to the dependency relationship, and during the execution of the task scheduling layer, pass the parameter data corresponding to the parameter category generated by the upper task scheduling layer to the lower task scheduling layer.

[0073] In each task chain, the upper-level task scheduling layer generates corresponding parameter data based on the defined parameter categories and transmits this parameter data to the lower-level task scheduling layer. For example, if the parameter category of the upper-level task scheduling layer that a certain task scheduling layer depends on is predefined as runtime, then when the upper-level task scheduling layer executes its task scheduling program, the generated parameter data is: runtime 10 minutes, and this 10-minute runtime is transmitted to that task scheduling layer. When that task scheduling layer runs its own task scheduling program, it similarly generates corresponding parameter data based on the predefined parameter categories that the lower-level task scheduling layers depend on, and transmits it further. For example, it calculates the runtime parameter category and adds it to the runtime of the previous task scheduling layer to obtain the total runtime, which is then transmitted to the next task scheduling layer, and so on, until all task scheduling layers have completed execution. The total runtime can then be obtained on the terminal. Therefore, it can be seen that each task scheduling layer generates corresponding parameter data based on predefined parameter categories and transmits it to the next task scheduling layer.

[0074] This step transmits parameter data from the task scheduling layer to the next task scheduling layer, enabling the propagation of parameter data along the task chain. This allows the acquisition of the parameter data required for task scheduling, providing a data foundation for evaluating the results of the task scheduling.

[0075] Step S203: After all the task scheduling layers have been executed, all parameter data are summarized and analyzed to determine the scheduling result of the task scheduling.

[0076] In this embodiment, the terminal data obtained after executing all task scheduling layers is transmitted to the application database and displayed according to preset rules.

[0077] When task scheduling problems occur, the specific task scheduling layer can be quickly located based on the parameter data transmitted between task scheduling layers. For example, if a task scheduling fails, since each task scheduling layer transmits the result (failure or success) of its task scheduling program to the dependent lower-level task scheduling layers, the failed task scheduling program can be located based on the dependency relationship, enabling rapid problem localization. Furthermore, the department to which the failed task scheduling program belongs can be determined based on its attribution, allowing for rapid notification of the relevant department for correction and optimization. This improves the efficiency of task scheduling maintenance, management, and exception handling, and also enables data analysis of the entire task scheduling data based on parameter data, optimizing the entire ETL task scheduling chain.

[0078] This application, upon receiving a task scheduling instruction, configures the parameter categories of the task scheduling using a task scheduling model and establishes dependencies with each upper and / or lower task scheduling layer. Then, it executes the task scheduling layers sequentially according to these dependencies. During the execution of each task scheduling layer, the parameter data corresponding to the parameter categories generated by the upper task scheduling layer is passed to the lower task scheduling layer. Finally, after all task scheduling layers have been executed, all parameter data is summarized and analyzed to determine the scheduling result. This improves the efficiency of task scheduling maintenance, management, and exception handling. Furthermore, it allows for data analysis of the entire task scheduling data based on the parameter data, optimizing the entire ETL task scheduling chain.

[0079] like Figure 3 As shown, in some optional implementations of this embodiment, step S203 further includes the following steps:

[0080] Step S301: Summarize the abnormal parameter data of all task scheduling layers to obtain the overall abnormal parameter data.

[0081] In this embodiment, during the execution of the task scheduling layer, abnormal data can be transmitted to the lower task scheduling layer through dependencies. The lower task scheduling layer also collects and updates the current abnormal parameter data by comparing the runtime abnormal parameter data with the abnormal data of the upper task scheduling layer, and then transmits it further. This process continues until all task scheduling layers have completed the execution of the task scheduling program. Then, all the abnormal parameter data is summarized to obtain the overall abnormal parameter data.

[0082] Step S302: Determine whether the overall abnormal parameter data exceeds a preset threshold. If the determination result is yes, proceed to step S303. If the determination result is no, proceed to step S304 and end.

[0083] It's worth noting that some abnormal parameter data, if small, will have little impact on task scheduling and can be left unattended. For example, regarding time delays, if the entire task scheduling process is delayed by 2 minutes, it's considered to have little impact on task scheduling and can be left unattended. However, if it exceeds 10 minutes, it can be considered to have a significant impact and requires optimization.

[0084] Therefore, this step sets a threshold for each overall abnormal parameter data. If the threshold is exceeded, the process jumps to step S303 for intervention. If the threshold is not exceeded, the process ends.

[0085] Step S303: Output task optimization reminder.

[0086] In this embodiment, the output optimization reminder can be a text reminder, an audio broadcast reminder, or a light response reminder, etc.

[0087] Furthermore, the task scheduling layer that generated the abnormal parameter data is located based on the overall abnormal parameter data and the hierarchical relationship of the scheduling tasks in the task chain. Therefore, abnormalities can be quickly located.

[0088] Once the abnormal task scheduling layer is located, the corresponding notification object can be determined based on the attribution of the task scheduling program in the task scheduling layer generated by the abnormal parameter data, such as the department to which the abnormal scheduling program belongs. Finally, the abnormal parameter data of the task scheduling layer generated by the abnormal parameter data can be sent to the notification object, such as the department to which it belongs, so that staff can quickly contact the department to which it belongs for optimization.

[0089] This application sets a threshold for abnormal parameter data, and only outputs optimization reminders when the abnormal description data exceeds the threshold. This saves unnecessary optimization procedures and can quickly locate the abnormal task scheduling layer based on dependencies, enabling rapid anomaly localization and timely contact with the relevant personnel, thus improving optimization efficiency.

[0090] The following will explain the distance based on the ETL-based task scheduling method described above. Figure 4 As shown. Task scheduling includes two task chains, 1 and 2. The smallest unit of the task scheduling hierarchy has three layers: the current scheduling layer, the upper task scheduling layer, and the lower task scheduling layer. The upper task scheduling layer can extend upstream, ending at the source system synchronization program. The lower task scheduling layer can extend downstream, ending at the application end. The source system scheduler mainly synchronizes data from the source system to the big data platform, such as SQOOP synchronization or OGG synchronization. The application-side scheduler mainly synchronizes the data from the completed task scheduling to the application-side data storage, such as ES (CLUSTER, representing a cluster), DRUID (based on a microservice architecture, a partitioned database), MySQL (relational database management system), etc.

[0091] The dependencies between task link 1 and task link 2 are as follows:

[0092] A lower-level task scheduler in task chain 1 depends on a single upper-level program, such as... Figure 4 Program B depends on program A1.

[0093] The lower-level scheduler in task chain 2 depends on multiple upper-level programs, such as... Figure 4 Program C depends on programs A2, A3, and A4.

[0094] For task chain 1, after program A1 in the upper-level task scheduling layer completes, the parameter data can be directly passed to program B. For task chain 2, program C needs to obtain the parameter data of each program A2, A3, and A4, and then synthesize and summarize it to form a final parameter data. The statistics or summarization of parameter data depends on the type of parameter data. For example, if programs A2, A3, and A4 in the upper layer of program C all take longer than usual, then program C can calculate how much longer it has taken than usual.

[0095] For example, if program A2 is delayed by 30 minutes, program A3 by 40 minutes, and program A4 by 20 minutes, then by summarizing and taking the maximum value, we can determine that the lower-level program C will be delayed by at least 40 minutes compared to usual, and the subsequent task chain will be passed in the same way.

[0096] If it's status information, then the status information that the upper layer depends on can be passed to the lower layer. If a task scheduler encounters an error, the information passed from the upper layer can be used to determine whether the error was caused by an error in the upper layer. Therefore, the error can be located quickly.

[0097] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware with computer-readable instructions. These computer-readable instructions can be stored in a computer-readable storage medium. When the program is executed, it can include the processes of the embodiments of the above methods. The aforementioned storage medium can be a non-volatile storage medium such as a magnetic disk, optical disk, or read-only memory (ROM), or random access memory (RAM).

[0098] It should be understood that although the steps in the flowcharts of the accompanying figures are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the accompanying figures may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.

[0099] Further reference Figure 5 As a response to the above Figure 2 The implementation of the method shown in this application provides an embodiment of an ETL-based task scheduling device, which is similar to... Figure 2 Corresponding to the method embodiments shown, this device can be specifically applied to various electronic devices.

[0100] like Figure 5 As shown, the ETL-based task scheduling device 500 described in this embodiment includes a configuration module 501, an execution module 502, and an analysis module 503. Wherein:

[0101] The configuration module 501 is used to configure the parameter categories of task scheduling using the task scheduling model when a task scheduling instruction is received, and to define the dependencies with the upper and / or lower task scheduling layers of each task scheduling layer.

[0102] In this embodiment, a task scheduling instruction is received as indicated by a program command. For example, a task scheduling instruction may be triggered at midnight every day. Alternatively, a task scheduling instruction triggered by the user when needed may also be received.

[0103] Task scheduling is accomplished through multiple task scheduling layers. Each task scheduling layer executes the current task scheduler, with the first layer executing source data from the source database. Each task scheduling layer depends on the execution results of the task schedulers of the upper-level layers to execute its own task scheduler, and also needs to publish the execution results of its own task scheduler to the lower-level layers. Therefore, the configuration module 501 in this embodiment needs to use a scheduling model to define the dependencies between the task scheduling layers, so that each task scheduling layer can execute its task scheduler in an orderly manner. The scheduling model can be a publish / subscribe model, meaning that each task scheduling layer can subscribe to the upper-level task scheduling layer or publish to the lower-level task scheduling layer.

[0104] In this embodiment, the configuration module 501 can use a task scheduling model to define the task chains required for task scheduling, as well as the dependencies between the upper and / or lower task scheduling layers of each task scheduling layer in each task chain. Task scheduling includes multiple task chains, each of which executes its own task scheduling program in an orderly manner through different task scheduling layers to achieve task scheduling for each chain. The entire task scheduling is completed once all task chains are finished.

[0105] It should be understood that task chains can be independent or intersecting, depending on the needs of task scheduling.

[0106] In addition to defining the dependencies between task scheduling layers, configuration module 501 further configures the parameter categories for task scheduling. These parameter categories include at least one of the following: running status, runtime, the attribution of task programs within the task scheduling layer, the consumption of the upper-level task scheduling layer in each task chain, and the level flag of the task scheduling layer.

[0107] The running status can include the status of the task scheduler running in the task scheduling layer, specifically including information such as not running, running, running successfully, running failed, and offline.

[0108] Runtime includes the duration of the task scheduler running in the task scheduling layer, the comparison value with the average duration (such as the timeout value exceeding the average duration, the advance value of the time less than the average duration), and the time value delayed from the set start time.

[0109] Attribution includes the development unit to which the task scheduler belongs. For example, if task scheduler A was developed by department B, then the task scheduler belongs to department B. This attribution is mainly used to directly locate department B when an exception occurs during the execution of task scheduler A, so as to resolve the exception problem more efficiently.

[0110] The overhead of the upper-level task scheduling layer in each task chain mainly includes CPU consumption. This parameter category can be used in conjunction with attribution parameter categories. For example, if it is found that the CPU consumption of a task scheduler in a certain task scheduling layer is too high, the department to which the task scheduler belongs can be quickly contacted to optimize the task scheduler and thus optimize the task scheduling chain.

[0111] The level label of the task scheduling layer mainly refers to the importance level of the task scheduling layer, which can be determined by each task scheduling requirement.

[0112] The execution module 502 is used to execute the task scheduling layer sequentially according to the dependency relationship, and during the execution of the task scheduling layer, it passes the parameter data corresponding to the parameter category generated by the upper task scheduling layer to the lower task scheduling layer.

[0113] In each task chain, the execution module 502 transmits the parameter data corresponding to the parameter category generated by the upper task scheduling layer to the lower task scheduling layer. For example, if the upper task scheduling layer takes 10 minutes to execute its task scheduling program, this 10-minute time is transmitted to the next task scheduling layer. The next task scheduling layer also calculates its own time and adds it to the time of the previous task scheduling layer to obtain the total time. This total time is then transmitted to the next task scheduling layer, and so on, until all task scheduling layers have completed their execution. The total time can then be obtained on the terminal.

[0114] The execution module 502 transmits the parameter data from the task scheduling layer to the next task scheduling layer, thereby enabling the propagation of parameter data along the task chain. This allows the acquisition of the parameter data required for task scheduling, providing a data foundation for evaluating the results of the task scheduling.

[0115] The analysis module 503 is used to summarize and analyze all the parameter data after all the task scheduling layers have been completed, so as to determine the scheduling result of the task scheduling.

[0116] The analysis module 503 can quickly locate the specific task scheduling layer when problems occur, based on the parameter data transmitted between task scheduling layers. For example, if a task scheduling fails, since each task scheduling layer transmits the result (failure or success) of its task scheduling program to the dependent lower-level task scheduling layers, the failed task scheduling program can be located based on the dependency relationship, enabling rapid problem localization. Furthermore, it can determine the department to which the failed task scheduling program belongs based on its attribution, allowing for rapid notification of the relevant department for correction and optimization. This improves the efficiency of task scheduling maintenance, management, and exception handling, and also enables data analysis of the entire task scheduling data based on parameter data, optimizing the entire ETL task scheduling chain.

[0117] In some optional implementations of this embodiment, the analysis module 503 is further used to summarize the abnormal parameter data of all task scheduling layers to obtain the overall abnormal parameter data, and then determine whether the overall abnormal parameter data exceeds a preset threshold. Finally, if the determination result is yes, it is determined that the task scheduling needs to be optimized, and a task optimization reminder is output.

[0118] In some optional implementations of this embodiment, the analysis module 503 is further configured to locate the task scheduling layer generated by the abnormal parameter data based on the overall abnormal parameter data and the hierarchical relationship of the scheduling tasks in the task link.

[0119] In some optional implementations of this embodiment, the ETL-based task scheduling device 500 further includes a first storage module 504 and a second storage module 505. The first storage module 504 stores the source data of task scheduling in a source database. The second storage module 505 transmits the terminal data obtained after executing all task scheduling layers to the application database and displays it according to preset rules.

[0120] To address the aforementioned technical problems, embodiments of this application also provide a computer device. Please refer to [link / reference needed]. Figure 6 , Figure 6 This is a basic structural block diagram of the computer device in this embodiment.

[0121] The computer device 6 includes a memory 61, a processor 62, and a network interface 63 that are interconnected via a system bus. It should be noted that only the computer device 6 with components 61-63 is shown in the figure; however, it should be understood that it is not required to implement all the shown components, and more or fewer components can be implemented alternatively. Those skilled in the art will understand that the computer device described here is a device capable of automatically performing numerical calculations and / or information processing according to pre-set or stored instructions. Its hardware includes, but is not limited to, microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), embedded devices, etc.

[0122] The computer device can be a desktop computer, laptop, handheld computer, or cloud server, etc. The computer device can interact with the user via a keyboard, mouse, remote control, touchpad, or voice control.

[0123] The memory 61 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6, such as the hard disk or memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, smart memory card (SMC), secure digital card (SD), flash memory card, etc. Of course, the memory 61 may include both internal storage units and external storage devices of the computer device 6. In this embodiment, the memory 61 is typically used to store the operating system and various application software installed on the computer device 6, such as computer-readable instructions based on ETL task scheduling methods. In addition, the memory 61 can also be used to temporarily store various types of data that have been output or will be output.

[0124] In some embodiments, the processor 62 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chip. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is used to execute computer-readable instructions stored in the memory 61 or to process data, for example, to execute computer-readable instructions of the ETL-based task scheduling method.

[0125] The network interface 63 may include a wireless network interface or a wired network interface, which is typically used to establish communication connections between the computer device 6 and other electronic devices.

[0126] This embodiment implements an ETL-based task scheduling method, which can improve the efficiency of task scheduling maintenance, management, and exception handling. At the same time, it can perform data analysis on the entire task scheduling data based on parameter data to optimize the entire ETL task scheduling chain.

[0127] This application also provides another embodiment, namely, providing a computer-readable storage medium storing computer-readable instructions that can be executed by at least one processor to cause the at least one processor to perform the steps of the ETL-based task scheduling method described above.

[0128] This application, upon receiving a task scheduling instruction, configures the parameter categories of the task scheduling using a task scheduling model and establishes dependencies with each upper and / or lower task scheduling layer. Then, it executes the task scheduling layers sequentially according to these dependencies. During the execution of each task scheduling layer, the parameter data corresponding to the parameter categories generated by the upper task scheduling layer is passed to the lower task scheduling layer. Finally, after all task scheduling layers have been executed, all parameter data is summarized and analyzed to determine the scheduling result. This improves the efficiency of task scheduling maintenance, management, and exception handling. Furthermore, it allows for data analysis of the entire task scheduling data based on the parameter data, optimizing the entire ETL task scheduling chain.

[0129] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk), and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of this application.

[0130] Obviously, the embodiments described above are only some embodiments of this application, not all embodiments. The accompanying drawings show preferred embodiments of this application, but do not limit the patent scope of this application. This application can be implemented in many different forms; rather, the purpose of providing these embodiments is to provide a more thorough and comprehensive understanding of the disclosure of this application. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or make equivalent substitutions for some of the technical features. Any equivalent structures made using the content of this application's specification and drawings, directly or indirectly applied to other related technical fields, are similarly within the scope of patent protection of this application.

Claims

1. A task scheduling method based on ETL, characterized in that, Includes the following steps: When a task scheduling instruction is received, the parameter categories for task scheduling are configured using the task scheduling model, and the dependencies with the upper and / or lower task scheduling layers of each task scheduling layer are established. The task scheduling layer is executed sequentially according to the dependency relationship, and during the execution of the task scheduling layer, the parameter data corresponding to the parameter category generated by the upper task scheduling layer is passed to the lower task scheduling layer. After all the task scheduling layers have been executed, all parameter data are summarized and analyzed to determine the scheduling result of the task scheduling. The step of summarizing and analyzing all parameter data to determine the scheduling result of the task scheduling includes: The abnormal parameter data of all task scheduling layers is summarized to obtain the overall abnormal parameter data; Determine whether the overall abnormal parameter data exceeds a preset threshold; If the result of the judgment is yes, then a task optimization reminder will be output.

2. The task scheduling method according to claim 1, characterized in that, The step of establishing dependencies between each task scheduling layer and its upper and / or lower task scheduling layers includes: Define the task chain; Define the dependencies between the upper and / or lower task scheduling layers of each task scheduling layer in each task chain, and define the parameter categories of the upper task scheduling layers that each task scheduling layer depends on.

3. The task scheduling method according to claim 2, characterized in that, The steps for defining the task chain include: The functions required for task scheduling are determined based on the task scheduling instructions. The corresponding task chain is determined based on the described function.

4. The task scheduling method according to claim 2, characterized in that, The parameter categories include running status, runtime, the attribution of task programs in the task scheduling layer, the consumption of the upper task scheduling layer in each task chain, and the level marker of the task scheduling layer. The step of transferring the parameter data corresponding to the parameter category generated by the upper task scheduling layer to the lower task scheduling layer includes: In each task chain, the upper-layer task scheduling layer generates corresponding parameter data according to the defined parameter category; The parameter data is then passed to the lower-level task scheduling layer.

5. The task scheduling method according to claim 2, characterized in that, The steps for optimizing the output task reminder include: The task scheduling layer generated by the abnormal parameter data is located based on the overall abnormal parameter data and the hierarchical relationship of the scheduling tasks in the task chain. The corresponding reminder object is determined based on the attribution of the task scheduler in the task scheduling layer generated by the abnormal parameter data; The abnormal parameter data generated by the abnormal parameter data is sent to the reminder object.

6. The task scheduling method according to claim 1, characterized in that, Before the step of configuring the parameter categories for task scheduling using the task scheduling model, the following steps are included: Store the source data for task scheduling in the source database; After summarizing and analyzing all the parameter data, the following steps are included: The terminal data obtained after executing all task scheduling layers is transmitted to the application database and displayed according to preset rules.

7. A task scheduling device based on ETL, characterized in that, The task scheduling device includes: The configuration module is used to configure the parameter categories of task scheduling using the task scheduling model when a task scheduling instruction is received, and to define the dependencies with the upper and / or lower task scheduling layers of each task scheduling layer. An execution module is used to execute the task scheduling layer sequentially according to the dependency relationship, and during the execution of the task scheduling layer, to pass the parameter data corresponding to the parameter category generated by the upper task scheduling layer to the lower task scheduling layer. The analysis module is used to summarize and analyze all parameter data after all the task scheduling layers have been completed, so as to determine the scheduling result of the task scheduling. The analysis module is further configured to summarize the abnormal parameter data of all task scheduling layers to obtain overall abnormal parameter data; determine whether the overall abnormal parameter data exceeds a preset threshold; and output a task optimization reminder if the determination result is yes.

8. A computer device, characterized in that, The computer device includes a memory and a processor, the memory storing computer-readable instructions, and the processor executing the computer-readable instructions to implement the steps of the task scheduling method as described in any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-readable instructions, which, when executed by a processor, implement the steps of the task scheduling method as described in any one of claims 1 to 6.