Unlock instant, AI-driven research and patent intelligence for your innovation.

A distributed data cleaning system and method based on data analysis

A distributed data and data analysis technology, applied in the direction of electronic digital data processing, digital data information retrieval, special data processing applications, etc., can solve the problems of unpredictable cleaning effect, damage to data integrity, loss of data attributes, etc. Achieve the effects of improving controllability and precision, high flexibility and practicality, and improving speed and accuracy

Active Publication Date: 2021-06-15
山东省科院易达信息科技有限公司
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] On the basis of heterogeneous databases, the diversification of data sources is added to form a multivariate heterogeneous database. Compared with ordinary heterogeneous databases, multivariate heterogeneous databases have more data characteristics of diversified sources, so the degree of multivariate complexity of data is more complex. On the upper level, if such a multivariate heterogeneous database is cleaned directly by cleaning tools, the predetermined cleaning rules cannot be generally applied to the data in the multivariate heterogeneous database. The cleaning of complex and multivariate data will lead to loss of data attributes and damage The integrity of the data, the cleaning speed is slow, and the cleaning effect is also unpredictable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A distributed data cleaning system and method based on data analysis
  • A distributed data cleaning system and method based on data analysis
  • A distributed data cleaning system and method based on data analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0085] like Figure 1-2 As shown, the processing unit 2 includes a collection module 201, a processing module 202, a metadata classification module 203, a cleaning module 204 and an output module 205, and the collection module 201 is used to collect user models and metadata of the multivariate heterogeneous database 1 element as well as the source data element;

[0086] The processing module 202 is configured to screen initial metadata elements for the correlation between the metadata elements collected by the collection module 201 and the user model;

[0087] The metadata classification module 203 screens metadata elements that have a common relationship with the initial metadata elements from the metadata elements collected by the collection module 201, and extracts from the source data elements collected by the collection module 201 The source data elements corresponding to the metadata elements having a common relationship with the initial metadata elements, the source da...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to a distributed data cleaning system based on data analysis. The distributed data cleaning system based on data analysis includes a multivariate heterogeneous database for storing user models, metadata elements and source data elements corresponding to them; and At least one processing unit operable to: extract user models, metadata elements, and source data elements of the multivariate heterogeneous database; select initial metadata elements; select at least one data attribute item of the user model as a relationship Parameters, preset weighted values ​​corresponding to the relationship parameters, extract the metadata set Q; clean the metadata elements in the metadata set Q; the present invention can screen multiple categories based on the selection of the relationship parameters based on the user model For metadata sets of various categories, cleaning rules can be selected for each category of metadata sets, which improves the cleaning speed, has high flexibility and practicability, and improves the controllability of data cleaning.

Description

technical field [0001] The invention belongs to the technical field of artificial intelligence data processing, and in particular relates to a distributed data cleaning system based on data analysis. Background technique [0002] A heterogeneous database system is a collection of related multiple database systems, which can realize data sharing and transparent access. Several database systems already existed before joining the heterogeneous database system. They have their own database management systems and external database systems. Each component has its own autonomy. While realizing data sharing, each database system still has its own application characteristics, integrity control and security control; [0003] ----The goal of the heterogeneous database system is to realize the merging and sharing of data information resources, hardware equipment resources and human resources between different databases. The key point is to establish a global data model or a global exte...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/215G06K9/62
CPCG06F16/215G06F18/22
Inventor 张伟徐志峰
Owner 山东省科院易达信息科技有限公司