Data cleaning method and device based on Spark framework

A data cleaning and data technology, applied in the field of data processing, can solve the problems of long time-consuming and low efficiency of MapReduce program execution, and achieve the effect of ensuring accuracy and improving data cleaning efficiency

Inactive Publication Date: 2018-09-21
CHENGDU ZHIYUN SCI & TECH
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Most of the existing mainstream data cleaning methods are based on the MapReduce program for data cleaning. However, when large data cleaning is performed through the MapReduce program, a large number of intermediate results need to be written to the local disk, resulting in the existence of MapReduce program execution during data cleaning. Disadvantages such as time-consuming and low efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data cleaning method and device based on Spark framework
  • Data cleaning method and device based on Spark framework
  • Data cleaning method and device based on Spark framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. The components of the embodiments of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations.

[0042] Accordingly, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art wi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a data cleaning method and device based on a Spark framework. The data cleaning method based on the Spark framework includes the steps of obtaining data to becleaned; judging whether or not the data to be cleaned meets a preset requirement, if not, conducting data cleaning on the data to be cleaned, and using the cleaned data as data to be saved; calculating data attributes of the data to be saved, and writing the calculated data attributes into an attribute file; saving the data to be saved and the attribute file. By means of the data cleaning methodand device based on the Spark framework, the data cleaning efficiency can be effectively improved, and the authenticity, integrity and rationality of the data are ensured.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a data cleaning method and device based on a Spark framework. Background technique [0002] Most of the existing mainstream data cleaning methods are based on the MapReduce program for data cleaning. However, when large data cleaning is performed through the MapReduce program, a large number of intermediate results need to be written to the local disk, resulting in the existence of MapReduce program execution during data cleaning. Time-consuming, low efficiency and other shortcomings. Contents of the invention [0003] In view of this, embodiments of the present invention provide a data cleaning method and device based on a Spark framework, which can effectively solve the above problems and improve data cleaning efficiency. [0004] A preferred embodiment of the present invention provides a data cleaning method based on the Spark framework, the method comprisin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 姜光植严雪枫谢川黄瀚林
Owner CHENGDU ZHIYUN SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products