Check patentability & draft patents in minutes with Patsnap Eureka AI!

Data cleaning framework evaluation method, device and equipment and storage medium

A data cleaning and framework technology, applied in the field of data cleaning, can solve the problems of prolonged quality control time, long quality control construction period, and extended data cleaning cycle.

Pending Publication Date: 2021-07-16
BEIJING MEDICAL CROSS THE CLOUD TECH CO LTD
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

(When there are more than 2 data sources, the quality control time will be lengthened) Moreover, it is impossible to quickly locate the problem
[0003] The quality control period of the existing data cleaning framework evaluation method is long, and it takes a lot of time to find the production logic relationship between data and data (when there are more than 2 data sources, the quality control time will be lengthened), which seriously reduces Improve the speed of the upgrade iteration of the data cleaning framework
Moreover, even if a problem is found during the upgrade of the data cleaning framework, the problem cannot be quickly located, which increases the workload of subsequent code inspections and prolongs the cycle of data cleaning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data cleaning framework evaluation method, device and equipment and storage medium
  • Data cleaning framework evaluation method, device and equipment and storage medium
  • Data cleaning framework evaluation method, device and equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals denote the same or similar structures in the drawings, and thus their repeated descriptions will be omitted.

[0041] figure 1 is a flowchart of the evaluation method of the data cleaning framework of the present invention. Such as figure 1 As shown, an embodiment of the present invention provides a method for evaluating a data cleaning framework, comprising the following steps:

[0042] S101. Wash the original database according to the second data cleaning framework to obtain a second data set, and the second data cleaning fram...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data cleaning framework evaluation method, device and equipment and a storage medium, and the method comprises the following steps: washing an original database according to a second data cleaning framework to obtain a second data set, the second data cleaning framework being an upgraded version of a first data cleaning framework; comparing fields in the first data set and fields in the second data set, and establishing a difference data set according to a comparison result, wherein the first data set is obtained by the original database according to a first data cleaning framework; selecting a cleaning field in the difference data set, carrying out data tracing, and obtaining corresponding original field information of the cleaning field in the original database; and comparing the cleaning field information with the original field information in the difference data set, and evaluating the second data cleaning framework. According to the method and device, verification can be carried out by utilizing data traceability through the production logic relationship between the data source and the data, and the cleaning framework upgrading effect of the original data can be quickly verified.

Description

technical field [0001] The present invention relates to the field of data cleaning, in particular to an evaluation method of a data cleaning framework and its device, equipment and storage medium. Background technique [0002] Data cleaning is the process of re-examining and verifying data, with the purpose of removing duplicate information, correcting existing errors, and providing data consistency. Data cleaning can also be seen from the name to "wash out" the "dirty", which refers to the last procedure to find and correct identifiable errors in data files, including checking data consistency, dealing with invalid and missing values, etc. Because the data in the data warehouse is a collection of data oriented to a certain topic, these data are extracted from multiple business systems and contain historical data, so it is unavoidable that some data are wrong data, and some data are inconsistent with each other. Conflicts, these wrong or conflicting data are obviously unwan...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/215G06F16/21G06F16/242
CPCG06F16/215G06F16/21G06F16/2433
Inventor 付麟钧
Owner BEIJING MEDICAL CROSS THE CLOUD TECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More