Data cleaning method based on functional dependency

A data cleaning and function-dependent technology, applied in the field of big data processing, can solve problems such as the influence of expert experience, the temporal and spatial characteristics and importance of data, and achieve good universality, good universality, and good scalability. Effect

Active Publication Date: 2016-03-30
HUAZHONG UNIV OF SCI & TECH
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the above-mentioned studies and methods have certain limitations in application scenarios, either const

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data cleaning method based on functional dependency
  • Data cleaning method based on functional dependency
  • Data cleaning method based on functional dependency

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

[0049] The invention includes five parts: attribute conversion, attribute self-dependent function extraction, attribute interdependent function extraction, attribute decision-making to be cleaned and function-dependent cleaning. Such as figure 1 As shown, the data cleaning method based on functional dependence of the present invention comprises the following steps:

[0050] Step 1: Perform dat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a data cleaning method based on functional dependency. The method is characterized by comprising: performing data conversion on raw data so as to totally convert different types of attributes into numerical attributes; extracting self-dependency function characteristics of the attributes of the raw data after data conversion; and extracting an inter-dependency function among the attributes of the raw data after data conversion; and determining the attributes which need to be cleaned and are to be cleaned as well as an sample according to the self-dependency function characteristics and the inter-dependency function, forming a relevant cleaning policy basis according to the attributes and the sample, determining whether an to-be-cleaned attribute object is cleaned by adopting the self-dependency function or the inter-dependency function, and if the self-dependency function is adopted to clean the to-be-cleaned attribute object, performing calibration and recovery on the sample which does not accord with the condition according to a polynomial determined by the self-dependency function, and using white noise as random perturbation. According to the data cleaning method provided by the present invention, the dirty data problem in mass data can be solved and high quality data is provided for subsequent analysis and mining of the mass data.

Description

technical field [0001] The invention belongs to the field of big data processing, and more specifically, relates to a data cleaning method based on functional dependence Background technique [0002] With the rapid development of mobile Internet and information technology, the data of the government, enterprises and various industries is growing at a rate of TB / s. People's life, business decision-making and precision services are increasingly dependent on data, and the requirements for data quality It is also getting higher and higher, and any "dirty data" will have an impact on data analysis and target decision-making. As the awareness of "data-driven operations" has been widely recognized and popularized in all walks of life, people are spending more and more energy on data analysis and mining. More than 80% of the time is spent on "dirty data" processing. [0003] The causes of "dirty data" include system failures, cross-system, multi-source data, changes in data standa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/215
Inventor 莫益军曾志华谭辉
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products