Multithreading data processing method based on ETL (Extract Transform Loading)

A data processing and multi-threading technology, applied in the field of data processing, can solve the problems of low utilization of hardware resources, low data throughput, low speed, etc., to reduce the probability of ETL paralysis, improve fault tolerance, improve throughput and extraction effect of speed

Active Publication Date: 2010-11-10
山东中创软件商用中间件股份有限公司
View PDF3 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of this, the present invention provides an ETL-based multi-threaded data processing method to solve the problems of

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multithreading data processing method based on ETL (Extract Transform Loading)
  • Multithreading data processing method based on ETL (Extract Transform Loading)

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The embodiment of the present invention discloses a multithreaded data processing method based on ETL, which includes: dividing the data extraction process of ETL into three obvious stages, namely extraction, sending and synchronization, and using independent threads to execute the following four steps in parallel: steps:

[0026] Step 10: Start an extraction thread by the data extraction unit, extract the source table data in real time through the rules, and store the data in the message queue to be sent after packaging. If an error occurs during the data extraction process, the erroneous data will be sent to the error data message queue ;

[0027] Step 11: Start a sending thread by the data sending unit, circularly detect the message queue to be sent, when there is data to be sent in the queue, send the data to the message queue to be synchronized, if an error occurs in the process of sending data, send Error data is sent to error data message queue;

[0028] Step 1...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multithreading data processing method based on ETL (Extract Transform Loading), comprising the following steps of: dividing the data extracting process of ETL into three obvious stages: extraction, sending and synchronization, collaterally executing the extraction, the sending and the synchronization of data by using respective independent threading, and persisting error data. The invention parallelizes the extraction process of the ETL data, greatly improves the throughput and extraction rate and the use ration of hardware resources through using a multithreading processing frame, also improves the error tolerance of the data, and reduces the probability of causing the whole ETL paralysis because errors are generated in the data extraction process through processing the error data generated in the extraction, sending and the synchronization processes of the data.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to an ETL-based multi-thread data processing method. Background technique [0002] Data warehouse is a kind of comprehensive data platform widely used by enterprises at present. It is an independent data environment. Compared with relational database, data warehouse technology has no strict mathematical theoretical basis. It is more oriented to practical engineering applications and requires data extraction mechanism. Import data into the data warehouse from online transaction processing environments, external data sources, and offline data storage media. Therefore, data extraction, transformation, and loading (ETL, Extraction-Transformation-Loading) is a very important part of the data warehouse, and it is a necessary step to connect the past and the future. [0003] ETL is responsible for extracting data from distributed and heterogeneous data sources, such as relational ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 周钢陈俊
Owner 山东中创软件商用中间件股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products