Hadoop data cleaning method and system based on outlier mining
A data cleaning and isolated point technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as few solutions and different error data cleaning solutions, achieve accurate data cleaning, ensure mass data cleaning, The effect of improving the efficiency of data cleaning
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0024] The technical solution of the present invention will be further described below in conjunction with the accompanying drawings.
[0025] Such as figure 1 As shown, a Hadoop data cleaning method based on outlier mining includes the following steps:
[0026] S1. Load data from various heterogeneous data sources into the Hadoop distributed file system;
[0027] S2. Preprocessing the data of the Hadoop distributed file system: pull the data to be cleaned in the Hadoop distributed file system, and dig out the isolated points with abnormal attributes in the data to be cleaned, and record the number of isolated points as N;
[0028] S3. Judging whether the isolated points obtained in S2 meet the cleaning rules, and cleaning the isolated points satisfying the cleaning rules, specifically including the following three situations:
[0029] S31. If all the N isolated points meet the cleaning rules, perform data cleaning on all the N isolated points according to the cleaning rules...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 