ETL process execution method based on asynchronous mode

An execution method and asynchronous mode technology, applied in the field of medical information system data processing, can solve the problem of high cpu occupation of traditional ETL tasks, and achieve the effect of reducing the number of threads

Pending Publication Date: 2022-01-11
中电四川数据服务有限公司
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is: the present invention provides an ETL process execution method based on asynchronous mode, which is applied in the field of medical information system data processing, and solves the problem of high cpu occupation of traditional ETL tasks in the data ETL scenario

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • ETL process execution method based on asynchronous mode
  • ETL process execution method based on asynchronous mode
  • ETL process execution method based on asynchronous mode

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] Such as Figures 1 to 4 As shown, this embodiment provides a method for executing an ETL process based on an asynchronous mode, including the following steps:

[0032] Step 1: Configuration analysis;

[0033] Step 2: task initialization;

[0034] Step 3: Run the task, set the task status to the start status, and send the start status to the Actor container of the associated node through the Actor Manager. At this time, it is marked that the associated node is ready for data processing;

[0035] Then send the start status to the start node, and the start node receives the message to start event processing and read data; at this time, the task status is marked as running;

[0036] The start node obtains the data and assembles it into a message. The address obtained in step 2 is sent to the message bus. The message bus processor finds the message queue of the associated node through the Actor Manager and pushes it to the queue. After receiving the message, the associated...

Embodiment 2

[0049] Step 1 specifically includes, passing in the pre-set json format, and parsing the configuration according to the json parser; the configuration includes task running parameters, all nodes of the task, connection points between nodes, and flow conditions.

[0050] Step 2 specifically includes, starting from the start node to query the associated nodes, and creating the data receiving channel id of the associated node, and setting the data receiving channel id of the associated node to the data sending channel set of the starting node, so that the starting node can Have the channel address of the associated node; create an Actor execution container for each node in a loop, and register it in the Actor Manager.

[0051] In specific use, according to the method of this solution, the resource consumption is small: a large number of process nodes can be run with a small number of threads.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an ETL process execution method based on an asynchronous mode, relates to the field of medical information system data processing, solves the technical problems that pause, recovery and stop of a thread in an existing Java language consume resources, and similar spark needs a lot of resources for operation. The method comprises the following steps: step 1, carrying out configuration analysis; step 2, initializing a task; step 3, running the task; and step 4, after the task is finished, summarizing task results and sending the task results to a task result processing module. Asynchronous communication is carried out between the Actors, and even if the Actors send messages, other things can be handled without blocking or waiting. Each Actor has a mailbox for message storage, an Actor scheduler carries out Actor event calling after retrieving that the Actor has a message, the Actor carries out event processing after receiving the message, and a result message is sent to other Actors after processing is completed.

Description

technical field [0001] The invention relates to the field of data processing of medical information systems, and more specifically relates to an ETL process execution method based on an asynchronous mode. Background technique [0002] At present, the main methods for ETL task execution in related patents are as follows: like kettle, which uses one thread for each task and starts a sub-thread for each node, and a task may start dozens or hundreds of threads. This method mainly has the problems of high cpu usage, long task startup time, and too many single-machine tasks cannot be executed; using a spark-like framework requires a lot of resources for calculation. [0003] The existing disadvantages are as follows: [0004] 1. The execution of an ETL task process in the Java language is that each node of the task process in the main thread starts a sub-thread for data processing. In this case, a task may start dozens or hundreds of sub-threads, which will lead to a very high cp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/25G06F9/54
CPCG06F16/254G06F9/546G06F9/544G06F2209/548
Inventor 肖渊
Owner 中电四川数据服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products