Data quality monitoring method and apparatus, and big data computing platform

A data quality and computing platform technology, applied in the field of data processing, can solve problems such as insufficient evaluation of service quality, false alarms, and inaccurate alarms, etc., to achieve the optimization of big data computing services, reduce false alarm rates, and reduce workload Effect

Active Publication Date: 2018-04-20
ALIBABA GRP HLDG LTD
View PDF9 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Second, the business data produced every day is constantly changing, but they are all checked with the same threshold, which may cause false alarms
[0007] Third, after the threshold is set, if the upstream relationship or business changes that affect the data, but the threshold setter is not aware of it, and the original threshold is still used fo...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data quality monitoring method and apparatus, and big data computing platform
  • Data quality monitoring method and apparatus, and big data computing platform
  • Data quality monitoring method and apparatus, and big data computing platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] This embodiment adopts the method of statistical prediction, predicts the current data of the parameter according to the historical data of the parameter, collects the current data of the parameter and compares it with the predicted result, and performs alarm processing according to the comparison result. This embodiment relates to data quality monitoring of data in a data warehouse, and the data warehouse is set in a computing cluster that provides big data computing services. But the present invention is not limited thereto, and can also be used for data quality monitoring of data in other systems and other nodes.

[0047] Such as figure 1 As shown, the data quality monitoring method of this embodiment includes:

[0048] Step 110, collect historical data related to the parameters to be monitored, and use the historical data as a sample to establish a prediction model for the parameters;

[0049] Statistical forecasting belongs to the research category of forecasting...

Embodiment 2

[0094]In order to solve the problem that existing big data computing services cannot provide users with data quality monitoring, this embodiment provides a user data quality monitoring method for big data computing services, such as image 3 As shown, the method includes:

[0095] Step 210, the big data computing platform collects historical data related to the parameters to be monitored from the saved user data, predicts the current data of the parameters according to the historical data, and obtains a prediction result;

[0096] The big data computing platform may be a cloud computing platform, etc., and the user data may be personal user data or enterprise user data. The content of the data can be various types of data such as data generated by the user based on the big data computing platform for business processing, log data, or data generated by accessing the big data computing platform.

[0097] The user data alarm of the big data computing service can be used as a val...

example 1

[0130] The data monitored in this example is the data in the offline data warehouse in the cloud computing system, and the data in the offline warehouse needs to be regularly imported from the front-end database such as mysql database or oracle business database. This part of imported data exists in the form of data tables, which can be called source data or source tables. Often, some summary tables can be generated based on these source data. The monitored data can be the data related to the source table, or the data related to the summary table generated based on the source table.

[0131] In this example, when monitoring data quality, an appropriate prediction model can be selected according to the characteristics of the data table. For example, when monitoring the number of records in the list of registered users, because after the user registers, even if the logout does not delete the record, but only marks its status as logout, the number of records in the list of regis...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data quality monitoring method and apparatus, and a big data computing platform. According to the data quality monitoring apparatus, historical data related to a parameter needed to be monitored is collected, and a prediction model of the parameter is built by taking the historical data as a sample; according to the prediction model, current data of the parameter is predicted, and according to a prediction result, a monitoring threshold of the parameter is determined; and the current data of the parameter is collected and compared with a normal value range defined bythe monitoring threshold, and according to a comparison result, alarm processing is performed. Intelligent setting of the monitoring threshold is realized. Compared with manual threshold setting, theaccuracy is higher and the false alarm rate is reduced. Based on a prediction analysis method, the invention furthermore provides a big data computing service-based user data quality monitoring methodand a corresponding big data computing platform.

Description

technical field [0001] The present invention relates to data processing, and more specifically, to a data quality monitoring method, device and big data computing platform. Background technique [0002] Big data has not only become the strategic direction of major Internet companies, but other industries have also begun to explore big data. But the data quality problems that accompany big data are much more severe than those in traditional databases. Big data services have PB-level data calculations every day. In order to ensure data quality, data monitoring during the data output process is particularly important. If the quality of the data content does not meet the standards, the data monitoring can give an alarm to inform the user, so as to avoid causing larger-scale data pollution of downstream data. [0003] The Data Quality Center (DQC: Data Quality Center) system can monitor the data of big data computing services such as MaxCompute (formerly known as ODPS). Once t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F11/32
CPCG06F11/327G06F16/215G06F16/219
Inventor 解敏陈欢范茸
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products