Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Distributed heterogeneous data cleaning system based on visual management

A data cleaning and distributed technology, applied in the field of distributed heterogeneous data cleaning system, can solve the problems of data standardization and structuring, data writing into the target database, missing data, etc., to achieve convenient and effective management and high code reuse rate , low operation and maintenance costs

Active Publication Date: 2020-08-28
众创网(武汉)科技有限公司
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The heterogeneous sources of data lead to difficulties in data standardization and structuring, and it is difficult for a set of codes to synchronously process data from different sources. For example, differences in the structure of Oracle and Mysql databases lead to different types of data such as dates that require different logical processing and conversion into target data. During the conversion, dirty data may be written into the target database due to bugs in programs such as ETL, such as spelling mistakes, repeated information, missing data, etc., which directly lead to data quality that cannot meet the requirements. In addition, traditional cleaning methods cannot be parallelized on a large scale Handle and manage the data processing process, the data needs to be processed serially, and the cleaning and conversion processing speed is slow, which are unavoidable in the traditional processing methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed heterogeneous data cleaning system based on visual management

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0021] as attached figure 1 Shown is a distributed heterogeneous data cleaning system based on visual management. The heterogeneous data cleaning system includes multiple heterogeneous data cleaning modules, a data cleaning visualization management terminal and a formal library. The multiple heterogeneous data cleaning The modules are set in parallel, and each heterogeneous data cleaning module can run in parallel on an independent server or in a heterogeneous...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of data processing. The invention discloses a distributed heterogeneous data cleaning system based on visual management. The heterogeneous data cleaning system comprises a plurality of heterogeneous data cleaning modules, a data cleaning visual management end and a formal library, wherein the plurality of heterogeneous data cleaning modules are arrangedin parallel; each heterogeneous data cleaning module can run in parallel in an independent server or a heterogeneous data cleaning process; the data cleaning visual management end performs unified management scheduling on the heterogeneous data cleaning modules; and the heterogeneous data cleaning module comprises an original database, a test target library, database reading preprocessing middleware, an ETL conversion module, data checking middleware and a log module. The distributed heterogeneous data cleaning system can achieve distributed heterogeneous data cleaning and improve the cleaning conversion processing speed.

Description

technical field [0001] The present invention relates to the technical field of data processing, and more specifically, the present invention relates to a distributed heterogeneous data cleaning system based on visual management. Background technique [0002] Data cleaning is the core of government intensification, data warehouse and data mining. It is the basis of government data migration, and the complexity of heterogeneous data makes data cleaning slow and error-prone, and because the data sources in ETL technology are very extensive, These data sources may be stored on different hardware or different operating systems, so it is inevitable that there will be some "dirty data" in these data sources. The high quality has a very important impact on the correctness of the data warehouse and subsequent data mining and decision analysis. [0003] The heterogeneous sources of data lead to difficulties in data standardization and structuring, and it is difficult for a set of cod...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F16/25
CPCG06F16/215G06F16/254G06F16/258
Inventor 于天宝罗丞
Owner 众创网(武汉)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products