Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data quality analysis

A data set and specific data technology, applied in the direction of electrical digital data processing, special data processing applications, natural language data processing, etc., can solve problems such as poor data quality of data sets

Active Publication Date: 2018-03-16
INITIO TECH
View PDF10 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Generally, when an error occurs during the processing of a dataset, the data quality of the dataset is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data quality analysis
  • Data quality analysis
  • Data quality analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] A method for identifying the root cause of data quality issues based on data lineage analysis is described here. If a data quality issue is identified in a downstream dataset, the upstream dataset and upstream transformation elements (sometimes referred to as upstream data lineage elements) from which the downstream dataset is derived are identified. The quality of each upstream data lineage element is evaluated to identify one or more upstream data lineage elements that may themselves have data quality issues that result in data quality issues in the downstream dataset. In some examples, the profile characterizing each upstream data set is compared to a base profile, such as a historical average profile, for that data set to determine whether the data set has data quality issues. In some examples, a value in a field of an upstream dataset is compared to one or more allowed or forbidden values ​​for the field to determine whether the dataset has a data quality issue.

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method includes receiving information indicative of an output dataset generated by a data processing system; identifying, based on data lineage information relating to the output dataset, one or more upstream datasets on which the output dataset depends; analyzing one or more of the identified one or more upstream datasets on which the output dataset depends. The analyzing includes, for each particular upstream dataset of the one or more upstream datasets, applying one or more of: (i) a first rule indicative of an allowable deviation between a profile of the particular upstream dataset and areference profile for the particular upstream dataset, and (ii) a second rule indicative of one or more allowable values or prohibited values for each of one or more data elements in the particular upstream dataset, and based on the results of applying the one or more rules, selecting one or more of the upstream datasets. The method includes outputting information associated with the selected oneor more upstream datasets.

Description

Background technique [0001] This specification deals with data quality analysis. The data quality of a dataset indicates whether the data records in the dataset have errors. Typically, where an error occurs during processing of a dataset, the dataset has poor data quality. Contents of the invention [0002] In a general aspect, a method comprising: receiving information indicative of an output dataset generated by a data processing system; identifying one or more databases on which the output dataset depends based on data lineage information related to the output dataset upstream data sets; analyzing one or more upstream data sets in the one or more upstream data sets on which the output data set depends, the analysis includes: for the one or more upstream data sets in the one or more upstream data sets For each specific upstream data set, one or more of the following rules are applied: (i) a first rule indicating the allowable relationship between the profile of the speci...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/215G06F16/2365G06F8/65G06F40/197G06F16/24568G06F16/248
Inventor C·斯皮茨乔尔·古尔德
Owner INITIO TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products