A distributed heterogeneous data cleaning system based on visual management

A data cleaning and distributed technology, applied in the field of distributed heterogeneous data cleaning system, can solve the problems of data standardization and structuring, data writing into the target database, missing data, etc., to achieve convenient and effective management and high code reuse rate , low operation and maintenance costs

Active Publication Date: 2022-05-24
众创网(武汉)科技有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The heterogeneous sources of data lead to difficulties in data standardization and structuring, and it is difficult for a set of codes to synchronously process data from different sources. For example, differences in the structure of Oracle and Mysql databases lead to different types of data such as dates that require different logical processing and conversion into target data. During the conversion, dirty data may be written into the target database due to bugs in programs such as ETL, such as spelling mistakes, repeated information, missing data, etc., which directly lead to data quality that cannot meet the requirements. In addition, traditional cleaning methods cannot be parallelized on a large scale Handle and manage the data processing process, the data needs to be processed serially, and the cleaning and conversion processing speed is slow, which are unavoidable in the traditional processing methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A distributed heterogeneous data cleaning system based on visual management

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

[0021] as attached figure 1 Shown is a distributed heterogeneous data cleaning system based on visual management. The heterogeneous data cleaning system includes a plurality of heterogeneous data cleaning modules, a data cleaning visualization management terminal and a formal library. The plurality of heterogeneous data cleaning The modules are set in parallel, each heterogeneous data cleaning module can run in parallel on an indepen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of data processing, and discloses a distributed heterogeneous data cleaning system based on visual management. The source data cleaning modules are set in parallel, and each heterogeneous data cleaning module can run in parallel on an independent server or in the heterogeneous data cleaning process, and the data cleaning visualization management terminal performs unified management and scheduling on the heterogeneous data cleaning modules; The cleaning module includes an original database, a test target library, a database reading preprocessing middleware, an ETL conversion module, a data checking middleware and a log module. The invention can realize distributed heterogeneous data cleaning and improve cleaning and conversion processing speed.

Description

technical field [0001] The present invention relates to the technical field of data processing, and more particularly, to a distributed heterogeneous data cleaning system based on visual management. Background technique [0002] Data cleaning is the core of government intensification, data warehouse and data mining. It is the basis of government data migration. The complexity of heterogeneous data leads to slow data cleaning and error-prone. Due to the wide range of data sources in ETL technology, These data sources may be stored on different hardware or different operating systems, so there will inevitably be some "dirty data" in these data sources. The purpose of data cleaning is to find and eliminate those data that do not meet the specifications, which is important for ensuring data. The high quality of the data warehouse has a very important impact on the correctness of the data warehouse and subsequent data mining and decision analysis. [0003] The heterogeneity of d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/215G06F16/25
CPCG06F16/215G06F16/254G06F16/258
Inventor 于天宝罗丞
Owner 众创网(武汉)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products