Data cleaning method and device based on probability density clustering
A technology of probability density and data cleaning, which is applied in the direction of electrical digital data processing, special data processing applications, digital data information retrieval, etc., can solve the problems of inaccurate detection of numerical error data, difficulty in operation, and influence on promotion, etc., to achieve work Effects with low complexity and high precision
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0035] Specific embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. in:
[0036] A data cleaning method based on probability density clustering, such as figure 1 Use the following steps to achieve:
[0037] Step (1), extracting numerical metadata by column from the specified database;
[0038] Step (2), calculate the probability density of the extracted numerical metadata by column; specifically:
[0039] 2-1 For the jth column of numerical metadata, all its elements form a set U j , and form all its different element values into a set D j .
[0040] 2-2 Assuming that the numerical metadata satisfies the normal distribution, then calculate the probability density of all numerical metadata according to formula (1);
[0041]
[0042] where x∈D j , μ is D j The average value of all elements in , σ is D j Variance.
[0043] In step (3), the numerical metadata is preprocessed according to the p...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


