Data cleaning method and device
A data cleaning and target data technology, applied in the information field, can solve problems such as inability to clean dirty data, poor cleaning effect, etc., achieve the effect of improving the effect and reducing the probability of misidentifying as dirty data
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0023] figure 1 A schematic flow diagram of a data cleaning method provided in Embodiment 1 of the present invention, as shown in figure 1 shown, including:
[0024] Step 101. Match cleaning rules according to the data characteristics of the target data.
[0025] Among them, data features are used to describe the target data.
[0026] Specifically, the data-related information may be obtained from the requesting end that requests to clean the target data. For example: data-related information such as the original business that generates the target data, the target business that the target data needs to use, the original computing task that generates the target data in the original business, and / or the target computing task that the target data needs to use in the target business.
[0027] The original business that generates the target data, the target business that the target data needs to be used for, the original computing task that generates the target data in the origi...
Embodiment 2
[0036] figure 2 A schematic flow diagram of a data cleaning method provided in Embodiment 2 of the present invention, such as figure 2 shown, including:
[0037] Step 201, configure cleaning rules.
[0038] Specifically, the cleaning rules can be configured in advance, and the configuration process can be manually completed by the user, or can be automatically generated by the data cleaning platform according to the existing cleaning rules.
[0039] As a possible implementation form, the cleaning rule includes three levels: first-level cleaning sub-rules, second-level cleaning sub-rules, and third-level cleaning sub-rules. The three levels are described below:
[0040] A. The first-level cleaning sub-rules are composed of common rules for each business, and are mainly used to identify incomplete, repetitive, and obviously wrong dirty data.
[0041] For example, the first-level cleaning sub-rules may include: a field in the data cannot be empty, the data has been complete...
Embodiment 3
[0063] image 3 A schematic structural diagram of a data cleaning device provided in Embodiment 3 of the present invention, as shown in image 3 As shown, it includes: a matching module 31 and a cleaning module 32 .
[0064] The matching module 31 is configured to match the cleaning rules according to the data characteristics of the target data.
[0065] The cleaning module 32 is configured to clean the target data by using the cleaning rules in the matching.
[0066] In this embodiment, after matching the cleaning rules according to the data characteristics of the target data, the target data is cleaned using the matching cleaning rules, thereby ensuring that the cleaning rules match the data characteristics, and the target data can be more targeted Perform cleaning to effectively clean out more dirty data and improve the cleaning effect.
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com