Unlock instant, AI-driven research and patent intelligence for your innovation.

Data cleaning method and device based on probability density clustering

A technology of probability density and data cleaning, which is applied in the direction of electrical digital data processing, special data processing applications, digital data information retrieval, etc., can solve the problems of inaccurate detection of numerical error data, difficulty in operation, and influence on promotion, etc., to achieve work Effects with low complexity and high precision

Pending Publication Date: 2022-01-14
HANGZHOU DIANZI UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, the detection method for batch-type numerical error data is not accurate enough or the manual work is more and more complicated, resulting in a high proportion of correct data being detected as wrong data or wrong data but not detected. In addition, It is difficult to operate in actual use, which affects the promotion

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data cleaning method and device based on probability density clustering
  • Data cleaning method and device based on probability density clustering
  • Data cleaning method and device based on probability density clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] Specific embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. in:

[0036] A data cleaning method based on probability density clustering, such as figure 1 Use the following steps to achieve:

[0037] Step (1), extracting numerical metadata by column from the specified database;

[0038] Step (2), calculate the probability density of the extracted numerical metadata by column; specifically:

[0039] 2-1 For the jth column of numerical metadata, all its elements form a set U j , and form all its different element values ​​into a set D j .

[0040] 2-2 Assuming that the numerical metadata satisfies the normal distribution, then calculate the probability density of all numerical metadata according to formula (1);

[0041]

[0042] where x∈D j , μ is D j The average value of all elements in , σ is D j Variance.

[0043] In step (3), the numerical metadata is preprocessed according to the p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data cleaning method and device based on probability density clustering. According to the method, starting from predicting numerical error data based on the hierarchical clustering model, the feature vector of the data is solved by utilizing the probability density of the data, and then model training and prediction are performed according to the feature vector, so that the prediction accuracy of the error data is improved, the workload of manual participation is relatively small, and the working complexity is relatively low; in the process of solving the feature vectors by utilizing the probability density, the set threshold values are high in precision, the number of the threshold values is large, and the method has a certain generalization type.

Description

technical field [0001] The invention belongs to the technical field of data cleaning, and relates to a data cleaning method and device based on probability density clustering. Background technique [0002] It is of great significance to clean the numerical data output by sensors such as thermometers, CO2 concentration detectors, PM value detectors, atmospheric pressure detectors, gyroscopes, and infrared distance detectors. Through data cleaning, the dirty data caused by the sensor itself or other reasons is detected and repaired, so that the subsequent use of these cleaned data for data mining and decision-making will not be due to the original data. errors resulting in erroneous results. In the two major steps of data cleaning, that is, error detection and repair, error data detection is the first step, and error data must be accurately detected before the next step of error data repair can be performed. [0003] At present, the detection method for batch-type numerical ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/215G06V10/762G06V10/764G06V10/774G06K9/62
CPCG06F16/215G06F18/2321G06F18/214G06F18/24
Inventor 王玮周仁杰张纪林任永坚邓飞王星邵衢进沈佳冰袁俊峰欧东阳
Owner HANGZHOU DIANZI UNIV