Method and system for data quality monitoring

A data quality and data volume technology, applied in the field of big data processing, can solve problems such as increasing development workload, occupying cluster computing resources, and affecting production data output, so as to improve effective utilization, reduce usage, and reduce development workload Effect

Active Publication Date: 2020-01-07
GEO POLYMERIZATION (BEIJING) ARTIFICIAL INTELLIGENCE TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The existing data quality monitoring technology can only target the already produced data, and then run the MapReduce job alone to scan the entire data, and finally get the data quality of this data, that is to say, it is necessary to scan the data repeatedly, and cannot When the data is output, the quality of the data can be obtained directly. If there is too much data to monitor the data quality and the cluster computing pressure is relatively high, this solution will occupy a large amount of computing resources of the cluster and seriously affect other production data. output
At the same time, different MapReduce monitoring jobs need to be developed for different data. If there are a lot of data to be monitored, many MapReduce jobs need to be developed for related monitoring, which increases the development workload.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for data quality monitoring

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] Such as figure 1 As shown, the method for data quality monitoring includes the following steps:

[0018] (1) Configure the monitoring rules, and the data quality monitoring rules are realized by borrowing the scheduling calculation framework;

[0019] (2) Transmission of monitoring rules: the monitoring rules are transmitted to the computing framework by the scheduling system in the form of configuration information, and the computing framework further transmits these rule information through the context;

[0020] (3) Identify monitoring rules: The computing framework obtains these monitoring rules through context information. The computing framework defaults to the configuration information beginning with the ruler string as the configuration of data quality monitoring rules. The computing framework uses these for each row of data to be output. Monitor the rules, and judge whether each row of data satisfies these monitoring rules, and count the satisfied ones respecti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data quality monitoring method comprising the steps of (1) configuring monitoring rules: data quality monitoring rules are realized by means of a dispatching front end; (2) transmitting the monitoring rules: the monitoring rules are transmitted to a computing framework in a manner of configuration information by a dispatching system and the computing framework further transmits the rule information in a context manner; (3) identifying the monitoring rules; the computing framework acquires the monitoring rules through the context information and tacitly approves that the configuration information starting with a ruler character string is used as the configuration of the data quality monitoring rules, and the computing framework applies the monitoring rules to each line of to-be-produced data, judges whether each line of data meets the monitoring rules and performs statistical counting on lines of data meeting the monitoring rules; (4) outputting monitoring data: when data production is over, the data quality monitoring data is output in a counter manner. The invention also provides a data quality monitoring system.

Description

technical field [0001] The invention relates to the technical field of big data processing, in particular to a data quality monitoring method and a data quality monitoring system. Background technique [0002] Existing data quality monitoring solutions can only target existing data, develop MapReduce jobs independently according to business needs, implement business monitoring rules in the MapReduce job code, and run the MapReduce job to scan the entire data, and finally output data quality Situation information. For example, Chinese patent application: system and method for data quality monitoring, application number: CN201210225743.X. [0003] The existing data quality monitoring technology can only target the already produced data, and then run the MapReduce job alone to scan the entire data, and finally get the data quality of this data, that is to say, it is necessary to scan the data repeatedly, and cannot When the data is output, the quality of the data can be obtai...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/215G06Q10/10
CPCG06F16/215G06Q10/10
Inventor 何良均张翼温宗臣冯森林范卫卫李冰张书凡
Owner GEO POLYMERIZATION (BEIJING) ARTIFICIAL INTELLIGENCE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products