A distributed heterogeneous data cleaning system based on visual management

A data cleaning and distributed technology, applied in the field of distributed heterogeneous data cleaning system, can solve the problems of data standardization and structuring, data writing into the target database, missing data, etc., to achieve convenient and effective management and high code reuse rate , low operation and maintenance costs
CN111597181BActive Publication Date: 2022-05-24众创网(武汉)科技有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
众创网(武汉)科技有限公司
Publication Date
2022-05-24

Smart Images

  • Figure 1
    Figure 1
Patent Text Reader

Abstract

The invention belongs to the technical field of data processing, and discloses a distributed heterogeneous data cleaning system based on visual management. The source data cleaning modules are set in parallel, and each heterogeneous data cleaning module can run in parallel on an independent server or in the heterogeneous data cleaning process, and the data cleaning visualization management terminal performs unified management and scheduling on the heterogeneous data cleaning modules; The cleaning module includes an original database, a test target library, a database reading preprocessing middleware, an ETL conversion module, a data checking middleware and a log module. The invention can realize distributed heterogeneous data cleaning and improve cleaning and conversion processing speed.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present invention relates to the technical field of data processing, and more particularly, to a distributed heterogeneous data cleaning system based on visual management. Background technique

[0002] Data cleaning is the core of government intensification, data warehouse and data mining. It is the basis of government data migration. The complexity of heterogeneous data leads to slow data cleaning and error-prone. Due to the wide range of data sources in ETL technology, These data sources may be stored on different hardware or different operating systems, so there will inevitably be some "dirty data" in these data sources. The purpose of data cleaning is to find and eliminate those data that do not meet the specifications, which is important for ensuring data. The high quality of the data warehouse has a very important impact on the correctness of the data warehouse and subsequent data mining and decision analysis.

[0003] The heterogeneity of d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More