Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data cleaning method and device and computer readable storage medium

A technology of data cleaning and structured data, applied in the field of data cleaning, it can solve problems such as difficulty in identifying complex quality, and achieve the effect of improving timeliness

Inactive Publication Date: 2020-06-05
HARBIN INST OF TECH
View PDF1 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a comprehensive data cleaning method based on timeliness, completeness, and consistency to solve at least some of the above-mentioned deficiencies, so as to solve the problems in the prior art that are difficult to identify and repair complex quality problems in structured data. defect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data cleaning method and device and computer readable storage medium
  • Data cleaning method and device and computer readable storage medium
  • Data cleaning method and device and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0051] Such as figure 1 with figure 2 As shown, a data cleaning method provided by an embodiment of the present invention includes the following steps:

[0052] S1. Data preprocessing: Obtain the structured data to be cleaned and time constraints, establish a timing diagram for all tuples in the structured data according to the time constr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a data cleaning method and device and a computer readable storage medium. The method comprises the steps of obtaining structured data and aging constraints, establishing a time sequence diagram for all tuples, and obtaining an aging sub-diagram through a transfer protocol; calculating an aging value of each tuple based on the time sequence sub-graph; taking the timeliness-consistency joint repair distance as an index, calculating an editing distance between the error tuple and the high-quality tuple, and selecting a repair mode which meets consistency rule constraintsand is closest to the timeliness value of the error tuple to perform consistency repair on the error tuple; utilizing a Bayesian filling method, taking the aging value of the tuple as a newly added attribute of the tuple to participate in a Bayesian training process, and realizing filling of a missing value; and obtaining a cleaned data set. According to the method, three data quality problems ofunavailable timestamps, incomplete attribute values and inconsistent attribute values existing in a data set at the same time can be effectively identified and repaired.

Description

technical field [0001] The present invention relates to the technical field of data cleaning, in particular to a data cleaning method, device and computer-readable storage medium. Background technique [0002] As an important step in data preprocessing, data cleaning is widely used in data warehouse, data quality management, data mining and other fields. By performing data cleaning, errors in data can be effectively repaired and data quality can be improved. [0003] In data quality management technology, timeliness, completeness, and consistency are three important factors for evaluating data quality. At present, the existing technology usually only performs data cleaning for a single factor, and the existing data cleaning methods often do not consider the timeliness factor of the data quality problem, which will easily lead to a decrease in the reliability and accuracy of the data cleaning method, resulting in More misjudgments and missed judgments. In structured data, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215
CPCG06F16/215
Inventor 王宏志丁小欧苏佳轩
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products