Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A data cleaning method and device

A data cleaning and data technology, applied in the field of data processing, can solve the problems of inability to dynamically replace, inability to coordinate cleaning tasks of multiple cleaning engines, version deployment affecting the quality and speed of data cleaning, etc.

Pending Publication Date: 2018-12-18
BANK OF CHINA
View PDF4 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, when performing data cleaning, a data system usually uses a single data cleaning engine to clean the data to be cleaned, or when multiple data cleaning engines are used to clean the data to be cleaned, each data cleaning engine is run separately to clean the data to be cleaned. In the process of data cleaning, each data cleaning engine runs a fixed cleaning program, which cannot be changed dynamically. Therefore, the version deployment and update speed of the cleaning program will affect the quality and speed of data cleaning, such as the efficiency of low-version cleaning programs. Low or code logic errors will cause the data cleaning process to slow down or report errors
In addition, since each data cleaning engine runs independently, when multiple cleaning engines run at the same time, the cleaning tasks among multiple cleaning engines cannot be coordinated

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data cleaning method and device
  • A data cleaning method and device
  • A data cleaning method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0016] Embodiments of the present invention apply to the following technical terms:

[0017] Data cleaning: due to data generation, data collection methods, network transmission and other processes, when data reaches data systems (such as big data platforms, various MIS (Management Information System, management information systems)), there will be missing data items and data meanings. Problem data such as errors, excessive length, etc. In order to facilitate...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a data cleaning method and device, which relate to the data processing field and can realize dynamic data cleaning on a data cleaning engine. The method comprises the following steps of: determining data to be cleaned for each data cleaning engine in a data source; Selecting a target cleaning rule for the to-be-cleaned data in at least one pre-stored cleaning rule according to the type of the to-be-cleaned data; Invoking a target cleaning plug-in in at least one pre-stored cleaning plug-in according to the target cleaning rule; Wherein each of the cleaning rules corresponds to at least one cleaning plug-in; The target cleaning plug-in is operated by the data cleaning engine to clean the data to be cleaned according to the target cleaning rule to obtain the cleaning result data.

Description

technical field [0001] Embodiments of the present invention relate to the field of data processing, and in particular to a data cleaning method and device. Background technique [0002] Data cleaning refers to a process of finding and correcting identifiable errors in data files, including checking data consistency, dealing with invalid and missing values, etc. Is the process of re-examining and validating data to remove duplicate information, correct existing errors, and provide data consistency. [0003] When using pro*c to develop a cleaning program and run it on the AIX (Advanced Interactive eXecutive, Advanced Executive Interactive Body) operating system, it is necessary to manually divide the data to be cleaned according to the number of data cleaning engines (such as computers or processors that perform data cleaning) , divided into multiple parts, and each data cleaning engine uses a cleaning program to run the data to be cleaned allocated to the data cleaning engin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 陈世强王鹏晴李晓东钟华剑徐雅光刘利刚
Owner BANK OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products