Data cleaning method for improving accuracy of target data and cleaning system thereof

A target data and data cleaning technology, applied in the database field, can solve the problems of analysis and model practice errors, poor target data quality, etc.

Inactive Publication Date: 2010-03-24
ALIBABA GRP HLDG LTD
View PDF0 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] The first purpose of the present invention is to provide a data cleaning method that improves the accuracy of target data, so as to solve the problem of poor quality of target data obtained from data sources in the prior art, thus bringing certain errors to subsequent analysis and model practice technical issues

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data cleaning method for improving accuracy of target data and cleaning system thereof
  • Data cleaning method for improving accuracy of target data and cleaning system thereof
  • Data cleaning method for improving accuracy of target data and cleaning system thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The present invention will be described in detail below in conjunction with the accompanying drawings.

[0035] see figure 2 , the present invention provides a schematic structural diagram of a data cleaning system. It includes a database 21 and a server 22, wherein the database 21 includes a data source 211 and a data warehouse 212, and the data source 211 is used for real-time storage of data for business processing by users. Data warehouse 212 is also used to store logical processing models for each target data value:

[0036] Target data volume = f(M1(q1, G1(A1)), M2(q2, G2(A2)), ... Mn(qn, Gn(An)))

[0037] Among them, A1, A2...An are original field items, G1(A1), G2(A2)...Gn(An) are attribute functions reflecting the data attributes in each original field item, q1, q2. ..qn is the weight value of each original field item; M1(q1, G1(A1), M2(q2, G2(A2))..Mn(qn, Gn(An)) is each The influence function of the original field score, f() is a determination function f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a data cleaning method for improving the accuracy of target data, which comprises the following steps: (1) finding out a plurality of original field items of A1, A2, ..., An,which are related to the target data from a data source; (2) establishing a logic processing model: the quantity of the target data is equal to f(M1(q1, G1(A1)), M2(q2, G2(A2)),...Mn(qn, Gn(An))), wherein, A1, A2...An are the original field items respectively, G1(A1), G2(A2)...Gn(An) are attribute functions which respectively reflect the data attribute of each original field item, and q1, q2...qnare weight value of each original field item respectively; M1(q1, G1(A1), M2(q2, G2(A2))...Mn(qn, Gn(An)) are influence functions which influence each original field component of the target data value, f() is the determination function for determining the target data value according to each influence function; and (3) finding out all the original field items during each time of data cleaning, anddetermining the value of the target data according to the logic processing model. The data cleaning method can improve the accuracy of the target data which is cleaned out from the data source.

Description

technical field [0001] The invention relates to the field of databases, in particular to a data cleaning method and a data cleaning system for cleaning target data in a data warehouse. Background technique [0002] A data warehouse is a subject-oriented, integrated, time-related, and unmodifiable collection of data in enterprise management and decision-making. That is to say, all application systems, such as customer relationship management (CRM, Customer Relationship Management) system, financial system, etc., are integrated by theme, and the entire historical changes are recorded. With the continuous improvement of enterprise informatization, a large amount of business data has been accumulated within the enterprise, and the data warehouse is used to uniformly process these mutually independent and scattered data to meet the needs of high-level decision-making and analysis of the enterprise. [0003] refer to figure 1 , which is a block diagram of the architecture of the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 徐建军向继新
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products