Method and device for cleaning mass data

A technology of data cleaning and massive data, applied in the field of data communication, can solve the problems of lack of unified specifications, low operation efficiency, low output efficiency, etc., and achieve the effect of high operation efficiency, consistency and high efficiency

Active Publication Date: 2014-02-19
ADVANCED NEW TECH CO LTD
View PDF4 Cites 53 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of this application is to solve the problems in the prior art that the data cleaning code is manually generated, lacks unified specifications, low o

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for cleaning mass data
  • Method and device for cleaning mass data
  • Method and device for cleaning mass data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The technical solutions of the present application will be described in further detail below in conjunction with the accompanying drawings and embodiments, and the following embodiments do not limit the present application.

[0039] In this application, a massive data cleaning method such as figure 1 shown, including steps:

[0040] Step 101, configuring a data cleaning rule file.

[0041] Specifically, Table 1 provides a specific embodiment of a data cleaning rule file:

[0042]

[0043] Table 1

[0044] Taking Table 1 as an example, the data cleaning rule file includes:

[0045] rule_id: rule serial number;

[0046] table_name: the name of the data table, that is, the name of the data table to which the rule belongs;

[0047] bit_offset: rule serial number, which is the binary offset, the function of bit_offset is used to label the data;

[0048] rule_code: Pseudocode of data cleaning rules;

[0049] description: Chinese description of the data cleaning rule...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and device for cleaning mass data. The method comprises the steps of first configuring data cleaning rule files, obtaining data cleaning rules corresponding to a to-be-cleaned data table according to a table name of the data cleaning rules, automatically generating cleaning codes to perform cleaning, tagging every to-be-cleaned datum in the cleaning process, analyzing which data cleaning rule the data trigger by tag analysis, and accordingly performing corresponding cleaning processing. The device for cleaning the mass data comprises a data rule configuration module, a data cleaning code generation module, an execution module and an analysis module, and the mass data are cleaned through the mass data cleaning method. The mass data can be effectively cleaned, the efficiency is high, dirty data which are cleaned out are reserved in a classified mode, and sources and whereabouts of every dirty datum can be located precisely.

Description

technical field [0001] The present application belongs to the technical field of data communication, and in particular relates to a method and device for cleaning massive data. Background technique [0002] With the rapid development of computer technology and communication technology, people can obtain more and more digital information, but at the same time, they need to invest more time in organizing and organizing the digital information. For example, in business systems, some dirty data is often generated due to factors such as code defects, business definition changes, and network delays. For example, the payment time of an order is earlier than the creation time of the order. This is a piece of data that does not conform to business logic. Before performing statistical analysis on the data, these dirty data need to be filtered out to ensure the accuracy of the statistics. Data cleaning is a process of reducing data errors and inconsistencies. The main task is to detec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 刘欣
Owner ADVANCED NEW TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products