Method and device for ETL (Extract Transform and Load) task off-lining and data cleaning in data warehouse

A data warehouse and data cleaning technology, applied in the field of data warehouse analysis, can solve problems such as unpredictable errors, low accuracy, and low efficiency, and achieve the effects of saving system computing and storage resources, improving work efficiency, and improving system performance

Active Publication Date: 2013-03-27
ALIBABA GRP HLDG LTD
View PDF2 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The problem with the above existing technology is: due to the confirmation with the business side, when the business side has a large number of personnel, it is extremely inefficient to review each application, and at the same time it may not be able to cover all personnel. Inevitably, there will be omissions during cleaning, and there may also be errors in the application that is being used offline. Artificial perceptual judgments are not supported by rational data, and the accuracy of offline is not high. Human misjudgments may lead to unpredictable errors

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for ETL (Extract Transform and Load) task off-lining and data cleaning in data warehouse
  • Method and device for ETL (Extract Transform and Load) task off-lining and data cleaning in data warehouse

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] In order to make the above objects, features and advantages of the present application more obvious and comprehensible, the present application will be further described in detail below in conjunction with the accompanying drawings and specific implementation methods.

[0046] ETL is the process of extracting, cleaning, converting, and loading database objects in the data warehouse. The rapid growth of data volume in the data warehouse makes the cost of data query and storage continue to increase. Many ETL tasks and data corresponding to applications that have declined and no longer used are due to Without a reasonable offline strategy, the limited computing and storage resources in the data warehouse are greatly wasted. In the prior art, the ETL task offline and data cleaning are performed manually, which has the problem of low efficiency and accuracy.

[0047] One of the core concepts of the embodiment of the present application is to obtain the call information of ea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and device for ETL (Extract Transform and Load) task off-lining and data cleaning in a data warehouse. The method comprises the steps of: obtaining transfer information of each database object in the data warehouse; and carrying out corresponding ETL task off-lining to and/or corresponding data cleaning according to the transfer information of the database object. According to the invention, the efficiency and accuracy for ETL task off-lining and data cleaning in the data warehouse can be improved.

Description

technical field [0001] The present application relates to the technical field of data warehouse analysis, in particular to a method for offline ETL tasks and data cleaning in a data warehouse, and a device for offline ETL tasks and data cleaning in a data warehouse. Background technique [0002] The data warehouse is an independent data environment, and the data is imported into the data warehouse from the online transaction processing environment, external data sources and offline data storage media through the extraction process. Its purpose is to establish a structured data storage space, separate data from different data sources, form a unified and effective data set, and finally process and integrate it into the required data. [0003] ETL (Extraction-Transformation-Loading) is the process of data extraction, cleaning, transformation and loading. It is an important part of building a data warehouse. Users extract the required data from the database, after data cleaning...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 黄晓婧曾春秋孙伟光吴伟方建江
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products