Template data processing method and device, server and storage medium
A template data processing and data processing technology, applied in the field of big data processing, can solve the problems of reducing data processing efficiency, incompatibility between real-time processing and batch processing, etc., and achieve the effect of stable and efficient processing
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0043] figure 1 It is a flow chart of a template data processing method provided by Embodiment 1 of the present invention. This embodiment is applicable to the case of obtaining data in a big data stream. The method can be executed by a server in the big data, and specifically includes the following steps:
[0044] S101. Acquire first data from a data stream.
[0045] In this step, the event flow is reported through the client SDK, and the server receives the event data and stores it in the Kafka cluster. The first data referred to in this step is the original data without any processing. Optionally, after this step, it also includes: storing the first data into the Kafka cluster.
[0046] S102. Based on data processing requirements, generate multiple Flink tasks in the Flink framework to process the first data to generate second data.
[0047] In this step, use the Flink framework to write multiple Flink tasks, because Flink can build real-time data channels to move and con...
Embodiment 2
[0055] Such as figure 2 As shown, this embodiment provides a template data processing method. On the basis of the above embodiments, the generation of multiple Flink tasks in the Flink framework to process the first data to generate the second data is specifically carried out. Description, by adding a custom processing template that can be added or deleted in each Flink task, the scalable effect of the data processing program rules is realized, including the following steps:
[0056] S201. Acquire first data from a data stream.
[0057] S2021. Use a preset first ETL template to perform invalid data cleaning on the first data.
[0058] The first ETL template and the following second ETL template refer to the execution program in the ETL processing process, and the execution program adopts a dynamic template for customizing and extending the program. Wherein, the first ETL template is a common template (Common Template), which is used to perform a common cleaning step on the ...
Embodiment 3
[0135] Such as image 3 As shown, this embodiment provides a template data processing device 3, including the following modules:
[0136] An acquisition module 301, configured to acquire first data from a data stream;
[0137] The data processing module 302 is configured to generate a plurality of Flink tasks in the Flink framework based on data processing requirements to process the first data to generate second data;
[0138] The storage module 303 is configured to store data with high timeliness among the second data into the first cluster for real-time calculation, and store data with low timeliness among the second data into the second cluster for offline calculation.
[0139] In an alternative embodiment such as Figure 4 , the data processing module 302 includes:
[0140] The first cleaning unit is configured to use a preset first ETL template to perform invalid data cleaning on the first data. include:
[0141] The first judging subunit 30211 is used to judge whet...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


