Unlock instant, AI-driven research and patent intelligence for your innovation.

Fast-running big data cleaning method

A data cleaning and big data technology, applied in the field of big data processing, can solve the problems of inability to target cleaning, low efficiency of big data cleaning, and time-consuming, so as to achieve targeted data cleaning of big data, improve data cleaning efficiency, The effect of reducing time

Active Publication Date: 2019-10-29
南京安夏电子科技有限公司
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide a fast-running big data cleaning method to solve the problem that most of the data cleaning technical solutions proposed in the above-mentioned background technology are for cleaning big data information as a whole, and cannot perform targeted cleaning according to the type of data, resulting in large Low data cleaning efficiency and time-consuming problems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fast-running big data cleaning method
  • Fast-running big data cleaning method
  • Fast-running big data cleaning method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0056] The present invention provides a fast-running big data cleaning method, such as figure 1 shown, including the following cleaning steps:

[0057] S1. Data collection: collect the data that needs to be cleaned and the items that need to be cleaned;

[0058] S2. Establishing a database: input the collected cleaning data types and data cleaning items into the database, and establish a data cleaning database;

[0059] S3. Data analysis: Analyze the relationship between data types and data cleaning items through the data cleaning database, and obtain the most frequently used cleaning items in each data type;

[0060] S4. Establish a database of cleaning items: enter various cleaning items in the data cleaning module, and establish a database of cleaning items;

[0061] S5. Data pre-cleaning: Import a single data type into the data cleaning database, and match the cleaning items with the highest frequency of use for separate cleaning;

[0062] S6. Deep cleaning of data: re-...

Embodiment 2

[0069] As the second embodiment of the present invention, in order to facilitate the analysis of the collected data and data cleaning items, the present invention also improves the data cleaning database. As a preferred embodiment, the details are as follows figure 2 and image 3 As shown, the data cleaning database includes a data collection module, a data storage module and a data analysis module. The data collection module is used to collect data that needs to be cleaned and data cleaning items. The items are saved in the data cleaning database, and the data analysis module is used to analyze the relationship between the cleaning data type and the data cleaning items.

[0070] In this embodiment, the data analysis module includes a data type classification module, a cleaning item classification module, a frequency calculation module and a main cleaning item calculation module. Relationship between data types and cleaning items.

[0071] Further, the data type classificat...

Embodiment 3

[0084] As the third embodiment of the present invention, in order to facilitate data cleaning, the present invention also improves the data cleaning module, as a preferred embodiment, such as Figure 4 and Figure 5 As shown, the data cleaning module includes a cleaning project database, a pre-cleaning module and a deep cleaning module. The cleaning project database is used to enter multiple cleaning projects, the pre-cleaning module is Cleaning the project database includes correcting errors, deleting duplicates, unifying specifications, correcting logic, transforming structures, compressing data, filling vacancies and discarding data.

[0085] In this embodiment, the error correction module is used to correct the form of data errors, and the error correction module is used to correct data value errors, data type errors, data encoding errors, data format errors, data abnormal errors, Correction of dependency conflicts and correction of multivalued errors.

[0086] Further, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of big data processing, in particular to a fast-running big data cleaning method which comprises a data cleaning database, a pre-cleaning module and a deepcleaning module. The method comprises the following steps of by establishing a data cleaning database, establishing a cleaning data sub-database according to the type of the cleaning data; meanwhile,establishing a cleaning project sub-database according to the type of a cleaning project; providing comprehensive cleaning data and cleaning project information; through a pre-cleaning module, conducting cleaning according to a cleaning item corresponding to the maximum cleaning item frequency value in the cleaning data so that targeted data cleaning of the big data is achieved, the time needed for data cleaning is shortened, the data cleaning efficiency is improved, comprehensive data cleaning is conducted on the pre-cleaned data again through the deep cleaning module, and data cleaning integrity is improved.

Description

technical field [0001] The invention relates to the technical field of big data processing, in particular to a fast-running big data cleaning method. Background technique [0002] With the advent of the era of big data and the continuous surge of massive data, various industries can use the support of big data technology to realize the integration and readjustment of existing resources, improve the efficiency of industry operations, and tap the huge potential of the industry. However, the existing Most of the data cleaning technical solutions of the company are to clean the big data information as a whole, and cannot perform targeted cleaning according to the type of data, resulting in low efficiency and time-consuming big data cleaning. In view of this, we propose a fast-running big data cleaning method. Contents of the invention [0003] The purpose of the present invention is to provide a fast-running big data cleaning method to solve the problem that most of the data ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F16/25G06F16/28
CPCG06F16/215G06F16/252G06F16/284Y02D10/00
Inventor 谷敏骏吴庆东李普阳
Owner 南京安夏电子科技有限公司