Supercharge Your Innovation With Domain-Expert AI Agents!

Method for classifying and evaluating ETL job scheduling resources in distributed environment

A distributed environment and job scheduling technology, applied in the network field, can solve problems such as resource waste, waste, ETL server cluster resource waste, etc., and achieve the effect of improving efficiency

Active Publication Date: 2020-05-12
NO 30 INST OF CHINA ELECTRONIC TECH GRP CORP
View PDF10 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In a medium-scale data warehouse in the field of network operation and maintenance, in order to meet the needs of various data analysis, daily, weekly, monthly and other data statistics, the number of ETL connection tasks that are ready for data connection is generally maintained at more than 100 , The ETL server cluster is generally composed of about twelve ordinary computers or virtual machines. If the strategy of "who is idle, who handles it" is simply adopted, then the ETL server cluster will be a waste of resources, such as figure 1 shown
[0003] Under normal circumstances, the tasks of data connection mainly include three types: data extraction, data conversion, and data loading. The consumption of CPU, memory, and I / O operation resources is somewhat heavy. If an ETL server is engaged in ETL operations for a long time It takes up a lot of I / O operation resources, and the CPU computing resources are still idle, which is a great waste for ETL jobs that require CPU computing resources. Similarly, for a computer that is doing a lot of memory-occupied operations , I / O operation idleness is also a waste of resources for ETL jobs that require simple data replication between tables. If the idle resources of ETL jobs and ETL servers can be accurately matched, ETL tasks will be greatly improved. Execution efficiency, which requires us to first determine the type of ETL job, then accurately evaluate the service capacity of the idle resources of the ETL server, and finally accurately match the ETL job to be executed and the ETL server, such as figure 2 shown

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for classifying and evaluating ETL job scheduling resources in distributed environment
  • Method for classifying and evaluating ETL job scheduling resources in distributed environment
  • Method for classifying and evaluating ETL job scheduling resources in distributed environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] Such as image 3 As shown, a method for classifying and evaluating ETL job scheduling resources in a distributed environment of the present invention is characterized in that it includes the following steps:

[0020] Step 1, determining the index system for evaluating the performance of the ETL server;

[0021] The indicators for evaluating the performance of the ETL server are divided into CPU, memory and I / O categories; among them, the indicators of the CPU category include the utilization rate of the CPU and the occupancy rate of the CPU process queue; Waiting rate and redo buffer non-waiting rate; I / O indicators include I / O busy rate. Specifically, the index system for evaluating the performance of the ETL server is shown in Table 1.

[0022] Table 1:

[0023]

[0024]

[0025] Step 2, determine the ETL job type by calculating the comprehensive evaluation value of the ETL job;

[0026] According to the index system for evaluating ETL server performance, ET...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for classifying and evaluating ETL job scheduling resources in a distributed environment. The method comprises the following steps: 1, determining an index system forevaluating the performance of an ETL server; 2, determining an ETL operation type; 3, clustering and analyzing the ETL server index data based on an index system to obtain an ETL server candidate setclassified corresponding to the ETL operation type; 4, establishing an index evaluation matrix, and calculating the information entropy of the index and the weight of the information entropy; 5, sorting the ETL server candidate sets of various types; and 6, calculating ETL jobs according to the step 2 to determine ETL job types, and then selecting an ETL server with a top sequence from the ETL server candidate set of the corresponding classification. According to the method, the clustering analysis method and the evaluation method based on the information entropy are adopted to evaluate the performance of the ETL server, the ETL jobs which do not enter the execution state are automatically matched, the idle resources of the ETL server can be fully utilized, and the computing resources aredynamically distributed in a quasi-real-time manner.

Description

technical field [0001] The invention belongs to the field of network technology, in particular to a method for classifying and evaluating ETL job scheduling resources in a distributed environment. Background technique [0002] The rapid development of information technology has also brought about the vigorous rise of data construction. The scale of data warehouses has gradually increased, and the system structure has become more complex. The most important link in the construction of data warehouses is ETL data introduction, including data extraction and conversion. , The process of loading accounts for 80% of the workload of the data warehouse construction process. The distributed data connection method is the mainstream technology of data connection at present, and the data connection operation can be distributed to relatively cheap computer clusters. Reasonable allocation of ETL jobs and full use of ETL server computing resources. In a medium-scale data warehouse in the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06Q10/06G06F16/28G06F16/27
CPCG06Q10/06393G06F16/285G06F16/27
Inventor 杜海唐伟力苗青鹏吴迪
Owner NO 30 INST OF CHINA ELECTRONIC TECH GRP CORP
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More