Distributed data cleaning system and method based on data analysis

A distributed data and data analysis technology, applied in the fields of electronic digital data processing, special data processing applications, digital data information retrieval, etc. slow and so on

Active Publication Date: 2020-10-30
山东省科院易达科技咨询有限公司
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] On the basis of heterogeneous databases, the diversification of data sources is added to form a multivariate heterogeneous database. Compared with ordinary heterogeneous databases, multivariate heterogeneous databases have more data characteristics of diversified sources, so the degree of multivariate complexity of data is more complex. On the upper level, if such a multivariate heterogeneous database is cleaned directly by cleaning tools, the predetermined cleaning rules cannot be generally applied to the data in the multivariate heterogeneous database. The cleaning of complex and multivariate data will lead to loss of data attributes and damage The integrity of the data, the cleaning speed is slow, and the cleaning effect is also unpredictable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed data cleaning system and method based on data analysis
  • Distributed data cleaning system and method based on data analysis
  • Distributed data cleaning system and method based on data analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0085] Such as Figure 1-2 As shown, the processing unit 2 includes a collection module 201, a processing module 202, a metadata classification module 203, a cleaning module 204 and an output module 205, and the collection module 201 is used to collect user models and metadata of the multivariate heterogeneous database 1 element as well as the source data element;

[0086] The processing module 202 is configured to screen initial metadata elements for the correlation between the metadata elements collected by the collection module 201 and the user model;

[0087] The metadata classification module 203 screens metadata elements that have a common relationship with the initial metadata elements from the metadata elements collected by the collection module 201, and extracts from the source data elements collected by the collection module 201 The source data elements corresponding to the metadata elements having a common relationship with the initial metadata elements, the source...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a distributed data cleaning system based on data analysis, which comprises a multivariate heterogeneous database, a storage user model, metadata elements and source data elements corresponding to the metadata elements, and at least one processing unit operable to: extract a user model, metadata elements, and source data elements of the multivariate heterogeneous database;select initial metadata elements; select at least more than one data attribute item of the user model as a relationship parameter and a preset weighted value corresponding to the relationship parameter, and extract a metadata set Q; clean the metadata elements in the metadata set Q. According to the invention, multiple categories of metadata sets can be screened on the basis of selection of the user model matching relationship parameters, the targeted cleaning rules can be selected for each category of metadata sets, the cleaning speed is increased, high flexibility and practicability are achieved, and the controllability of data cleaning is improved.

Description

technical field [0001] The invention belongs to the technical field of artificial intelligence data processing, and in particular relates to a distributed data cleaning system based on data analysis. Background technique [0002] A heterogeneous database system is a collection of related multiple database systems, which can realize data sharing and transparent access. Several database systems already existed before joining the heterogeneous database system. They have their own database management systems and external database systems. Each component has its own autonomy. While realizing data sharing, each database system still has its own application characteristics, integrity control and security control; [0003] ----The goal of the heterogeneous database system is to realize the merging and sharing of data information resources, hardware equipment resources and human resources between different databases. The key point is to establish a global data model or a global exte...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/215G06K9/62
CPCG06F16/215G06F18/22
Inventor 张伟徐志峰
Owner 山东省科院易达科技咨询有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products