Data cleaning method based on data warehouse
A data cleaning and data warehouse technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as the complexity of the cleaning process, achieve the effect of improving cleaning efficiency, ensuring correctness, and reducing complexity
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0021] The data cleaning method is implemented through five steps: preprocessing, assigning weights to attributes, duplicate record detection, database-level duplicate record clustering, and conflict handling;
[0022] Preprocessing: select the attribute for record matching, which can represent the characteristics of the record due to the large amount of data;
[0023] Assign weights to attributes: assign different weights to each attribute according to the importance of the attribute in determining the similarity of two records; When recording similarity, different attributes are given different weights, and those with greater importance are assigned higher weights. For example, the weight of the name attribute is obviously higher than that of the gender attribute, because the name can better reflect the characteristics of a record. During the cleaning process of duplicate records, the weight can be adjusted to find more duplicate records.
[0024] Database-level duplicate r...
Embodiment 2
[0027] Take the cleaning of the association between business tables as an example:
[0028] 1) Select the main table and establish the association between the main table and the auxiliary table.
[0029] 2) Use the sql statement to join the main table left outer to the auxiliary table, and find out the records in the main table that cannot be associated with the auxiliary table.
[0030] 3) Make a specific analysis of the records that cannot be associated. For real dirty data, you can add a "default record" in the auxiliary table, and associate all the records that cannot be associated in the main table with the "default record" in the auxiliary table.
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 