Unlock instant, AI-driven research and patent intelligence for your innovation.

Data update method and device in distributed data warehouse

A distributed data and data update technology, applied in the field of data warehouses, can solve problems such as large data volume, large consumption of computing and storage resources, and impact on normal operation of the source system, so as to reduce impact, reduce data volume, avoid calculation and The effect of storage effects

Active Publication Date: 2018-04-13
BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The data update technology in the above-mentioned distributed data warehouse has the following defects: first, the amount of captured data is huge, which has a great impact on the normal operation of the source system; second, the data before the upgrade of the source system cannot be retained, Moreover, the updated data in the distributed data warehouse cannot determine the specific data changes caused by the source system upgrade; third, with the continuous upgrade of the source system, the frequent update of the data in the distributed data warehouse will consume a lot of computing and storage resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data update method and device in distributed data warehouse
  • Data update method and device in distributed data warehouse
  • Data update method and device in distributed data warehouse

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0032] see Figure 1a , is a schematic flowchart of a method for updating data in a distributed data warehouse provided by Embodiment 1 of the present invention. The method of the embodiment of the present invention can be executed by a data update device configured in a distributed data warehouse implemented by hardware and / or software. The update device is typically configured in a distributed data processing platform. Usually, a distributed data processing The platform includes multiple processors in a distributed configuration, and the implementation configuration can be specifically configured in a certain processor in the distributed data processing platform.

[0033] The method includes: Step 110 - Step 170 .

[0034] Step 110, setting the initial version number of the data model of the distributed data warehouse in the distributed data processing platform.

[0035] Step 120: Set the initial version number of the original data table in the distributed data warehouse ac...

example

[0059] In the above solution, the data model may be a hierarchical data model;

[0060] Setting the initial version number of the data model may specifically include:

[0061] Set the initialization version number of the hierarchical data model according to the hierarchical name of the hierarchical data model;

[0062] Correspondingly, according to the initial version number of the data model and the upgrade sequence number, setting the current version number of the data model may specifically include:

[0063] Set the current version number of the hierarchical data model according to the initialization version number of the hierarchical data model, the upgrade sequence number and the hierarchical name.

[0064] Combine below Figure 1b The hierarchical data model shown is illustrated.

[0065] The hierarchical data model includes: dimensional data layer (DIM), buffer data layer (Buffering DataModel, BDM), basic data layer (Fundamental Data Model, FDM), general data layer (...

Embodiment 2

[0074] see figure 2 , is a schematic structural diagram of a data updating device in a distributed data warehouse provided by Embodiment 2 of the present invention. The device includes: data model version initialization module 210, data table version initialization module 220, update element acquisition module 230, data table update module 240, data table name configuration module 250, data model version configuration module 260 and data table version configuration module 270 .

[0075] Wherein, the data model version initialization module 210 is used to set the initial version number of the data model of the distributed data warehouse in the distributed data processing platform; the data table version initialization module 220 is used to set the initial version number according to the initialization version number of the data model. The initial version number of the original data table in the distributed data warehouse, wherein the data in the original data table in the dis...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data updating method and device in a distributed data warehouse. The method comprises the steps of setting an initial version number of a data model of the distributed data warehouse; setting an initial version number of an original data table in the distributed data warehouse; in an updating process of a source system, capturing corresponding updated data and updating the name of a source data table to which the data belong, and determining the updating type and an updating serial number of the source system; inputting the updated data into the data model to obtain a currently updated data table; according to the name, determining the name of the corresponding original data table in the distributed data warehouse; according to the initial version umber and the updating serial number of the data model, setting the current version number of the data model; setting the current version number of the currently updated data table according to the current version number and updating type of the data model, thus determining each dynamic updating process corresponding to the data table in the distributed data warehouse and the updating type of the source system.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of data warehouses, and in particular, to a method and device for updating data in a distributed data warehouse. Background technique [0002] Taking e-commerce as an example, with the development of e-commerce, a large amount of user, product and production-related data generated and accumulated by e-commerce companies such as JD.com, Taobao, and Amazon in their daily operations has shown explosive growth, and the data structure has also begun to diversify. , The amount of information contained in data is increasing, and these companies are paying more and more attention to digital operations. Databases are used to sub-process data and play a huge role. With the advent of the era of big data, the database has transformed into a distributed architecture to meet the explosive growth of computing and storage needs. Distributed data warehouses generally use columnar storage and store...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/2329G06F16/27
Inventor 孙冬董月红
Owner BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD