Incremental data capturing and extraction method based on timestamps and logs

A technology of incremental data and incremental extraction, which is applied in the computer field, can solve problems such as affecting the efficiency of incremental extraction, inability to process modified data, and poor performance of the full table comparison method, so as to reduce pressure, improve extraction efficiency, and reduce The Effect of Technical Complexity

Active Publication Date: 2013-02-06
BEIJING JINHER SOFTWARE
View PDF1 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If these changing data cannot be effectively captured, it will seriously affect the efficiency of incremental extraction
At present, the existing data capture technologies mainly include trigger method, time stamp method, full table comparison method, CDC (Changed Data Capture, change data capture) method, but these methods can only solve a certain aspect of the problem, or The business system has high requirements. For example, the trigger method requires the data source to have the trigger function, and the connection between the source database and the target database must be guaranteed to be unblocked at all times; the timestamp method can only handle new and modified data, and cannot handle deletion. data; the full table comparison method has poor performance and cannot handle modified data; the CDC method requires the source database and the target database to be isomorphic databases

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Incremental data capturing and extraction method based on timestamps and logs
  • Incremental data capturing and extraction method based on timestamps and logs
  • Incremental data capturing and extraction method based on timestamps and logs

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The present invention will be further described below in conjunction with the accompanying drawings, so that those of ordinary skill in the art can implement it after referring to this specification.

[0028] A method for capturing and extracting incremental data based on time stamps and logs according to the present invention comprises the following steps:

[0029] Step 1. Add a self-increasing surrogate key field and a timestamp field for each table in the business system that requires incremental data extraction. After the addition, each row of data in each data table has a timestamp value and a surrogate key value. Such as figure 1 As shown, group management is a common function in instant messaging (IM) software, and the group user table stores the corresponding relationship between group IDs and user IDs. Add a self-increasing surrogate key field "relationship ID" and a timestamp field "modification time" to it, and each row of data in this table has a relations...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an incremental data capturing and extraction method based on timestamps and logs. The incremental data capturing and extraction method comprises the following steps: step 1, adding an auto increment surrogate key field and a timestamp field for each sheet which is required to incrementally extract data in a service system; step 2, using a timestamp way when data in data sheets are added or revised; and step 3, using a log recording way when the data in the data sheets are deleted. By the incremental data capturing and extraction method, a service database and a data warehouse can be heterogeneous; the service database and the data warehouse are allowed to be in communication status only if being extracted, and are not necessary to be in the communication status all the time; only the added or the revised data which are not extracted last time are extracted each time, only the undeleted data which are not processed last time are processed, so that service data volume is small; by the incremental data capturing and extraction method, extraction efficiency is remarkably increased, pressure to the service database when extracted is relieved, and technical complexity of the extraction is reduced.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to the ETL technology in the field of data warehouse technology, and is mainly applied to all products related to the ETL operation of the data warehouse. Background technique [0002] ETL is the abbreviation of Extract (extract), Transform (transform), Load (load), and it is the core step of data warehouse implementation. There are two methods of extraction: full extraction and incremental extraction. Full extraction is similar to data migration or data replication, which extracts all data from the data source to the target database at one time. The general implementation method of full extraction is to delete the data in the target database each time and reload all the data in the data source. This method is simple to implement, but when the amount of data is large, the extraction time is long and the performance is very poor. Incremental extraction is based on the last extrac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 施霖杨爱民
Owner BEIJING JINHER SOFTWARE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products