Unlock instant, AI-driven research and patent intelligence for your innovation.

Spark-based data synchronization method in e-commerce platform

An e-commerce platform and data synchronization technology, applied in the field of data processing, can solve problems such as data synchronization failure remedial errors, single synchronization strategy, unfriendly deletion operation, etc., to achieve the effect of ensuring accuracy, performance and high efficiency

Pending Publication Date: 2020-03-13
INSPUR SOFTWARE CO LTD
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For data analysis, it is usually extracted to the target library through ETL tools, but for big data platform data lakes such as hive, it is not friendly to delete operations
[0003] Incremental synchronization of DB2 database, the synchronization strategy is too simple and there is no elastic scaling design, and the remedial error of data synchronization failure is insufficient
[0004] Data synchronization from the e-commerce platform to the big data platform, because it involves updating (or deleting) historical data, if you choose HIVE as the only storage component, you need to start the HIVE transaction table mechanism, but the disadvantage is that the HIVE transaction table is updating (or deleting) operations The performance is poor, and SPARK cannot be used to read transaction table data for calculation; if HBASE is selected as the only storage component, the advantage is that HBASE can be used to update (or delete), but the disadvantage is that SPARK reads HBASE table data for calculation, and its performance is far from Lower than HIVE, unable to meet the time requirements for aggregate calculations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spark-based data synchronization method in e-commerce platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work belong to the protection of the present invention. scope.

[0034] The method involves components such as MYSQL, NIFI, HBASE, HIVE and SPARK.

[0035] MYSQL: E-commerce platform business library.

[0036] NIFI: Data extraction and write back.

[0037] HBASE: Create an incremental table to store intermediate incremental data.

[0038] HIVE: Save the full amount of data before yesterday (including yesterday). ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a spark-based data synchronization method in an e-commerce platform. The method belongs to the technical field of data processing, Mysql is used as source data of a relational database in an e-commerce platform to generate incremental data in real time, a binlog log of mysql is read through an ETL tool to obtain the incremental data, and the incremental data is marked and stored in an hbase. In setting a synchronization rule, the labeled data are regularly synchronized from the hbase to hive through spark. The problem of service data synchronization among multiple systems in an e-commerce platform is solved.

Description

technical field [0001] The invention relates to data processing technology, in particular to a spark-based data synchronization method in an e-commerce platform. Background technique [0002] Multiple systems are bound to be involved in the current Internet e-commerce platform, and data synchronization between systems is particularly important. When various departments have different requirements for business data, such as data analysis or report display, the data needs to be migrated from the current business database to the corresponding target library. For data analysis, it is usually extracted to the target library through ETL tools, but for big data platform data lakes such as hive, it is not friendly to delete operations. [0003] Incremental synchronization of DB2 database is carried out, the synchronization strategy is too simple and there is no elastic scaling design, and the remedial error of data synchronization failure is insufficient. [0004] Data synchroniza...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/23G06F16/25G06F16/27
CPCG06F16/2365G06F16/254G06F16/27Y02D10/00
Inventor 张秀超
Owner INSPUR SOFTWARE CO LTD