Unlock instant, AI-driven research and patent intelligence for your innovation.

Data cleaning and data quality visualization method

A data quality and data cleaning technology, applied in the computer field, can solve problems such as unfavorable user understanding and mastery, lack of data detection quality assessment, etc., and achieve the effect of simplifying the preparation work

Inactive Publication Date: 2022-07-01
南通展飞智能装备科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] There are currently some data cleaning technologies and data quality assessment technologies, but there is a lack of a technology that integrates data detection, cleaning and quality assessment, and the existing data cleaning technologies can only perform built-in methods mechanically and cannot meet the specific needs of users. Existing data quality assessment technologies draw conclusions in the form of data or text, which is not conducive to users' understanding and mastery

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data cleaning and data quality visualization method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] A data cleaning and data quality visualization method such as figure 1 shown, including the following steps:

[0025] 1) Import the data set to be cleaned and quality assessed, denoted as P1;

[0026] 2) Detect the overall missing rate p of the data set P1, if p≤50%, go to the next step), otherwise go to step 8);

[0027] 3) Select the missing value processing rule through the graphical interface, and execute the rule to obtain the processed data set P2. The specific method is as follows:

[0028] 3.1) The missing rate p of each field in the detection data set is displayed on the graphical interface, and the ratio of the number of missing values ​​to the total number of column data is the missing rate;

[0029] 3.2) The user subjectively judges the importance of each field in the data set based on the actual needs;

[0030] 3.3) The user selects the missing value processing rule in the graphical interface according to the two reference factors of importance and missi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a data cleaning and data quality visualization method, which comprises the following steps of: firstly, importing an initial data set, calculating the total missing rate of the data set, if the total missing rate is greater than 50%, replacing the data set, otherwise, further detecting the missing rate of each field in the data set; a user judges field importance in combination with actual requirements, selects a missing value processing rule on a graphical interface according to two reference factors, namely importance and a missing rate, performs abnormal value detection on a processed data set, and selects a proper abnormal value processing rule on the graphical interface according to the use purpose of the data set. If yes, the data set is exported, and otherwise, the data cleaning method can be continuously executed. According to the method, data detection, cleaning and quality visualization are integrated, a user can subjectively select cleaning rules through an introduced graphical interface, and the user can more intuitively and clearly understand and master the data quality condition through a visual graph.

Description

technical field [0001] The invention relates to the technical field of computers, in particular to a data cleaning and data quality visualization method. Background technique [0002] Data cleaning and data quality assessment are an indispensable part of the entire data analysis process. They are the process of reviewing and verifying data quality. The purpose is to deal with missing values, outliers, and duplicate data. The quality of the results is directly related to Model performance and final conclusions, therefore, data cleaning and quality assessment are very important. [0003] At present, there are some data cleaning technologies and data quality assessment technologies, but there is a lack of a technology that integrates data detection, cleaning and quality assessment, and the existing data cleaning technologies can only perform built-in methods mechanically and cannot meet the specific needs of users. Existing data quality assessment technologies all draw conclus...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F16/26
CPCG06F16/215G06F16/26
Inventor 张鹏孟另伟李孟委
Owner 南通展飞智能装备科技有限公司