Fast-running big data cleaning method
A data cleaning and big data technology, applied in the field of big data processing, can solve the problems of inability to target cleaning, low efficiency of big data cleaning, and time-consuming, so as to achieve targeted data cleaning of big data, improve data cleaning efficiency, The effect of reducing time
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0056] The present invention provides a fast-running big data cleaning method, such as figure 1 shown, including the following cleaning steps:
[0057] S1. Data collection: collect the data that needs to be cleaned and the items that need to be cleaned;
[0058] S2. Establishing a database: input the collected cleaning data types and data cleaning items into the database, and establish a data cleaning database;
[0059] S3. Data analysis: Analyze the relationship between data types and data cleaning items through the data cleaning database, and obtain the most frequently used cleaning items in each data type;
[0060] S4. Establish a database of cleaning items: enter various cleaning items in the data cleaning module, and establish a database of cleaning items;
[0061] S5. Data pre-cleaning: Import a single data type into the data cleaning database, and match the cleaning items with the highest frequency of use for separate cleaning;
[0062] S6. Deep cleaning of data: re-...
Embodiment 2
[0069] As the second embodiment of the present invention, in order to facilitate the analysis of the collected data and data cleaning items, the present invention also improves the data cleaning database. As a preferred embodiment, the details are as follows figure 2 and image 3 As shown, the data cleaning database includes a data collection module, a data storage module and a data analysis module. The data collection module is used to collect data that needs to be cleaned and data cleaning items. The items are saved in the data cleaning database, and the data analysis module is used to analyze the relationship between the cleaning data type and the data cleaning items.
[0070] In this embodiment, the data analysis module includes a data type classification module, a cleaning item classification module, a frequency calculation module and a main cleaning item calculation module. Relationship between data types and cleaning items.
[0071] Further, the data type classificat...
Embodiment 3
[0084] As the third embodiment of the present invention, in order to facilitate data cleaning, the present invention also improves the data cleaning module, as a preferred embodiment, such as Figure 4 and Figure 5 As shown, the data cleaning module includes a cleaning project database, a pre-cleaning module and a deep cleaning module. The cleaning project database is used to enter multiple cleaning projects, the pre-cleaning module is Cleaning the project database includes correcting errors, deleting duplicates, unifying specifications, correcting logic, transforming structures, compressing data, filling vacancies and discarding data.
[0085] In this embodiment, the error correction module is used to correct the form of data errors, and the error correction module is used to correct data value errors, data type errors, data encoding errors, data format errors, data abnormal errors, Correction of dependency conflicts and correction of multivalued errors.
[0086] Further, ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


