Unlock instant, AI-driven research and patent intelligence for your innovation.

Template data processing method and device, server and storage medium

A template data processing and data processing technology, applied in the field of big data processing, can solve the problems of reducing data processing efficiency, incompatibility between real-time processing and batch processing, etc., and achieve the effect of stable and efficient processing

Pending Publication Date: 2020-06-30
UBTECH ROBOTICS CORP LTD
View PDF4 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

And in actual scenarios, there must be various complex business processes, which must be completed together with data cleaning
[0004] The solution of the existing technology leads to the incompatibility of real-time processing and batch processing, which reduces the efficiency of data processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Template data processing method and device, server and storage medium
  • Template data processing method and device, server and storage medium
  • Template data processing method and device, server and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] figure 1 It is a flow chart of a template data processing method provided by Embodiment 1 of the present invention. This embodiment is applicable to the case of obtaining data in a big data stream. The method can be executed by a server in the big data, and specifically includes the following steps:

[0044] S101. Acquire first data from a data stream.

[0045] In this step, the event flow is reported through the client SDK, and the server receives the event data and stores it in the Kafka cluster. The first data referred to in this step is the original data without any processing. Optionally, after this step, it also includes: storing the first data into the Kafka cluster.

[0046] S102. Based on data processing requirements, generate multiple Flink tasks in the Flink framework to process the first data to generate second data.

[0047] In this step, use the Flink framework to write multiple Flink tasks, because Flink can build real-time data channels to move and con...

Embodiment 2

[0055] Such as figure 2 As shown, this embodiment provides a template data processing method. On the basis of the above embodiments, the generation of multiple Flink tasks in the Flink framework to process the first data to generate the second data is specifically carried out. Description, by adding a custom processing template that can be added or deleted in each Flink task, the scalable effect of the data processing program rules is realized, including the following steps:

[0056] S201. Acquire first data from a data stream.

[0057] S2021. Use a preset first ETL template to perform invalid data cleaning on the first data.

[0058] The first ETL template and the following second ETL template refer to the execution program in the ETL processing process, and the execution program adopts a dynamic template for customizing and extending the program. Wherein, the first ETL template is a common template (Common Template), which is used to perform a common cleaning step on the ...

Embodiment 3

[0135] Such as image 3 As shown, this embodiment provides a template data processing device 3, including the following modules:

[0136] An acquisition module 301, configured to acquire first data from a data stream;

[0137] The data processing module 302 is configured to generate a plurality of Flink tasks in the Flink framework based on data processing requirements to process the first data to generate second data;

[0138] The storage module 303 is configured to store data with high timeliness among the second data into the first cluster for real-time calculation, and store data with low timeliness among the second data into the second cluster for offline calculation.

[0139] In an alternative embodiment such as Figure 4 , the data processing module 302 includes:

[0140] The first cleaning unit is configured to use a preset first ETL template to perform invalid data cleaning on the first data. include:

[0141] The first judging subunit 30211 is used to judge whet...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a template data processing method. The method comprises the steps: obtaining first data from a data stream; based on the data processing requirement, generating a plurality ofFlink tasks in the Flink framework to process the first data so as to generate second data; storing the data with high timeliness in the second data into a first cluster to perform real-time calculation, and storing the data with low timeliness in the second data into a second cluster to perform offline calculation. The invention further discloses a template data processing method and device, a server and a storage medium. According to the invention, the processed data are distinguished according to timeliness and are respectively stored in different clusters, so the real-time processing and batch processing of the data can be compatible.

Description

technical field [0001] Embodiments of the present invention relate to big data processing technologies, and in particular to a template data processing method, device, server and storage medium. Background technique [0002] Big data has covered all aspects of life. Smart traffic scheduling, smart medical forecasting, financial stock analysis, e-commerce shopping recommendations, etc. all require the use of big data technology without exception. Among them, data cleaning is a crucial part of the entire big data link, and it is also the basis of data analysis. ETL data includes: Extract (extraction), Transform (conversion), Load (landing). [0003] In the process of data cleaning, it is necessary to extract the data required by the business from massive data, so as to support real-time and offline computing. And when the received data becomes more complex, the program can dynamically expand the analysis method without changing the code and restarting the program. And in ac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F16/25
CPCG06F16/215G06F16/254
Inventor 曾逸清熊友军
Owner UBTECH ROBOTICS CORP LTD