Python script-based distributed big data cleaning method
A data cleaning and distributed technology, applied in the field of data cleaning, can solve problems such as few cleaning rules, insufficient cleaning computing power, general cleaning effect, etc., to achieve the effect of improving accuracy and solving insufficient cleaning capabilities
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0028] The embodiments of the present invention are described in detail below. This embodiment is implemented on the premise of the technical solution of the present invention, and detailed implementation methods and specific operating procedures are provided, but the protection scope of the present invention is not limited to the following implementation example.
[0029] Such as figure 1 As shown, the present embodiment provides a technical solution: a python script-based distributed big data cleaning method, the method comprising the following steps:
[0030] Step 1: First load the data to be cleaned, and then perform sharding operation on the loaded data to be cleaned;
[0031] Step 2: Distributed scheduling and execution of the data to be cleaned;
[0032] Step 3: Request the data to be cleaned and backfill the cleaning results;
[0033] Wherein, Step 1 is specifically divided into the following steps:
[0034] S1: Data loading, first load the data that needs to be cl...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com