Unsupervised data automatic cleaning method

An automatic cleaning and unsupervised technology, applied in the field of data management, can solve the problems of increasing the delivery cost of enterprises, wrong delivery of goods, economic losses, etc., and achieve the effect of saving labor costs, improving accuracy, and improving effects

Active Publication Date: 2019-03-19
SICHUAN CHANGHONG ELECTRIC CO LTD
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In business, erroneous data can cause large financial losses
For example, wrong customer information may lead to wrong delivery of the goods purchased by the company, which not only increases the delivery cost of the company, but also has a relatively large negative impact on the image of the company for a long time
[0003] Among the existing data cleaning methods, some metho

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unsupervised data automatic cleaning method
  • Unsupervised data automatic cleaning method
  • Unsupervised data automatic cleaning method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0049] Such as figure 1 As shown, an unsupervised automatic data cleaning method can realize data cleaning in the absence of data quality patterns / rules and without manual intervention, while ensuring the effect and efficiency of data cleaning.

[0050] Specifically include the following steps:

[0051] S10. Data model learning:

[0052] To find out the hidden patterns / rules, the dependencies between attributes need to be learned from raw data which may contain invalid data. Since there may be invalid data, the absolute or strong dependencies between the attributes of the data table do not necessarily exist. By finding out the implicit non-absolute or relatively weak dependencies and expressing them in the form of a Bayesian network, the data model is obtained .

[0053] The key processes extracted in this step are as follows:

[0054] S101. Evaluate and sample the data to be repaired;

[0055] S102. Learning the original data set or the sampled data set to obtain the struc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an unsupervised data automatic cleaning method. The method comprises the following steps of: A, learning a data model: learning a dependency relationship among attributes fromoriginal data which may contain invalid data, and finding out a hidden non-absolute or relatively weak dependency relationship to obtain a data model represented in a Bayesian network form; B, generating a data cleaning rule; generating a data cleaning rule after obtaining the original data or a complete data model sampled by the original data, and specifically generating a predicate and a first-order predicate rule; c, generating a Markov logic network based on the predicates generated in the step B and the first-order predicate rule; and D, generating an inference rule based on the Markov logic network generated in the step C and cleaning data based on an inference result. According to the method, the data quality of each business system of a company can be effectively improved under the condition that a large amount of manpower and material resources are not consumed, and a management layer is helped to make correct decisions.

Description

technical field [0001] The invention relates to the technical field of data management, in particular to an unsupervised automatic data cleaning method. Background technique [0002] Data in the real world usually needs to be cleaned (the data that needs to be cleaned is defined as dirty data below), such as may contain inconsistent, noisy, incomplete or repeated values. In business, erroneous data can cause great financial losses. For example, wrong customer information may lead to wrong delivery of the goods purchased by the company, which not only increases the delivery cost of the company, but also has a relatively large negative impact on the image of the company for a long time. [0003] Among the existing data cleaning methods, some methods require heavy manual participation in the data cleaning process, such as providing suggestions for cleaning or confirming repairs, etc.; although some methods do not require manual participation in the cleaning process, they need ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/215G06K9/62
CPCG06F18/295
Inventor 李玲唐军吴纯彬于跃陈秋宇
Owner SICHUAN CHANGHONG ELECTRIC CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products