An incremental data convergence updating method and system based on CarbonData

A technology of incremental data and update method, which is applied in database update, electronic digital data processing, structured data retrieval, etc. It can solve the problems that affect the aggregation query ability, data unavailability, poor compatibility, etc., and achieve convenient query and use at any time, Meet business needs and facilitate real-time effects

Active Publication Date: 2019-05-10
中电福富信息科技有限公司 +1
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It cannot be used directly for scenarios where data needs to be deleted and updated, and a full update workaround is required, which is complex and inefficient
There is also poor compatibility with old external data source systems (non-message flow file formats, Kafka of lower versions 0.8.x, RabbitMQ, RocketMQ)
CarbonData lacks effective management of incremental data, and cannot trace the sequence and accuracy of the incremental data convergence process
In addition, data deletion and update operations are prone to fragmentation. Although the defragmentation function of merging and clearing fragments through SQL is provided, the time of merging and cleaning is too close to make the data unavailable for a long time. Too frequent merging or cleaning will also affect It provides aggregation query capabilities at any time
For scenarios that need to delete and update data, CarbonData stream processing cannot be directly used to correspond
When the external data source system does not meet the compatibility requirements of CarbonData, the data source will be unavailable
When an exception occurs in incremental data processing, the sequence and accuracy of the incremental data convergence process cannot be traced
Defragmentation affects its ability to serve aggregate queries at any time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An incremental data convergence updating method and system based on CarbonData
  • An incremental data convergence updating method and system based on CarbonData
  • An incremental data convergence updating method and system based on CarbonData

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0050] Example 1, take the product instance table "PROD_INST" as an example:

[0051] hdfs: / / ns1 / user / e_carbon / public / temp / PROD_INST_ADD_H

[0052] hdfs: / / ns1 / user / e_carbon / public / todo / PROD_INST_ADD_H

[0053] hdfs: / / ns1 / user / e_carbon / public / add / PROD_INST_ADD_H

[0054] hdfs: / / ns1 / user / e_carbon / public / trash / PROD_INST_ADD_H

[0055] hdfs: / / ns1 / user / e_carbon / public / PROD_INST_ADD_C

[0056] For the increment of the message flow format, the detailed process of step 1 is that the incremental data landing module consumes a batch of messages and converts them into CSV file format and writes them to the temp temporary directory. The interval or file size reaches the configured threshold) Merge or split the files in the temporary directory and move them to the todo directory in turn.

[0057] Step 1.2, when the incremental data is an increment in the file format, the incremental data landing module directly stores the incremental data in the file format to the to-be-made directory;...

example 2

[0080] $SCHEDULE[BEGIN=14:11:11,END=16:22:22] ALTER TABLE carbon_tableCOMPACT 'MINOR';

[0081] $SCHEDULE[BEGIN=17:11:11,END=17:22:22] CLEAN FILES FOR TABLE carbon_table;

example 3

[0083] $SCHEDULE[BEGIN=14:11:11,END=16:22:22] ALTER TABLE carbon_tableCOMPACT 'MINOR';

[0084] $NOSCHEDULE[BEGIN=14:11:11,END=16:22:22]CLEAN FILES FOR TABLEcarbon_table;

[0085] $NOSCHEDULE[BEGIN=17:11:11,END=19:22:22] ALTER TABLE carbon_tableCOMPACT 'MINOR';

[0086]$SCHEDULE[BEGIN=17:11:11,END=19:22:22] CLEAN FILES FOR TABLE carbon_table;

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an incremental data convergence updating method and system based on CarbonData, and the method comprises the following steps: 1, carrying out the recognition processing of incremental data, enabling the incremental data to land to a to-be-made directory, and enabling a record to be newly added in a metadata table; Step 2, triggering a trigger preset in the metadata table after newly adding the record, and newly adding a scheduling task on a data increment updating convergence module by the trigger; Step 3, the data increment updating convergence module obtains a scheduling task and determines the current task type according to a trigger type task priority principle; Step 4, performing task logic processing according to the current task type, and moving the incremental file to an ad directory; 5, executing the updated and converged sql statement to converge the increment file increment to a full scale; and 6, after convergence updating is completed, the corresponding record is recovered to be in an unoccupied state, and the field content information corresponding to the record is modified. According to the method and the device, the problem that a CarbonDatastreaming processing scheme does not support incremental updating is solved, and the workload of service transformation is reduced.

Description

technical field [0001] The present invention relates, in particular, to a CarbonData-based incremental data convergence update method and system thereof. Background technique [0002] In order to solve the problem of incremental data update for various external data sources (message stream or file format), quickly converge the changes of external data sources to the full scale table on the Hadoop platform, and be able to provide aggregation query capabilities at any time. In recent years, many companies have gradually begun to use CarbonData, an open-source big data high-performance data storage format, which has obvious advantages over other technologies. It can not only provide rich indexing and coding support, but also be based on the Hadoop ecology and the height of Spark. integration, but there are still some problems that limit its application. [0003] CarbonData's current stream processing only supports data insertion, not data deletion and update. Scenarios that n...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/2453G06F16/23G06F16/2457
Inventor 连城孙而焓杨逸栩郭海涛
Owner 中电福富信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products