Unlock instant, AI-driven research and patent intelligence for your innovation.

A method for updating data synchronously based on spark

A data update and synchronous update technology, applied in database update, electronic digital data processing, structured data retrieval, etc., can solve problems such as error-prone and complicated processes, achieve performance improvement, fast speed, and increase the speed of reading and updating Effect

Active Publication Date: 2021-12-17
ZHEJIANG BAISHI TECH
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] Aiming at the shortcomings of complicated process and error-prone in the prior art, the present invention provides a method for synchronously updating data based on spark

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for updating data synchronously based on spark

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

[0023] like figure 1 As shown, a method for synchronously updating data based on spark includes the following steps:

[0024] (1) Import data: use the spark platform to obtain the target list from the database and store it in the hdfs file;

[0025] (2) Partition naming: Partition the data in the hdfs file through the spark partition method, pull the target data of a required creation time interval according to the creation date, and separate the interval data according to the creation date of the target data Create and name a folder and store it in the original data directory on the hdfs file;

[0026] (3) Acquire updated data: pull the updated data list from the database, read the updated data and pull it according to the update date field, partition the updated data with the creation time, and save it to the temporary updated data file of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of data updating, and discloses a method for synchronously updating data based on spark, including: (1) importing data; (2) naming partitions; (3) acquiring updated data; (4) updating data with the original Data replacement: compare the primary key ID of the target data according to the primary key ID of the table data in the updated data, replace the data with the same primary key ID to form new target data, and cache the primary key ID of the updated data to In the update collection, the set split value judges the data size for classification processing. The present invention divides a large file into multiple small files, and can know that the updated data is in a certain small file when sorting, and then sorting only needs to read and traverse the small files, which improves the speed of reading and updating. The entire sorting process combines two methods and uses spark software to achieve a fast and distributed computing effect.

Description

technical field [0001] The invention relates to the technical field of data updating, in particular to a method for synchronously updating data based on spark. Background technique [0002] At present, with the popularization of the Internet, people are using more and more electronic products, and the amount of data stored by Internet companies and related companies is also increasing. For large and medium-sized companies with tens of millions of business data every day, if so much data can be processed Statistical analysis, analysis of business growth trends, and user behavior are of great help to the company's business growth. [0003] However, at present, databases such as oracle and mysql have multiple tables, and the table data is very large. There is a performance bottleneck in oracle sql analysis, which is very slow and directly affects the stability of the system. Therefore, it is necessary to use big data related technologies for data analysis. [0004] Big data an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/23G06F16/2455
Inventor 周韶宁金建华
Owner ZHEJIANG BAISHI TECH