Data cleaning method and data cleaning system
A data cleaning and cleaning technology, applied in the communication field, can solve the problem of high threshold of use, achieve the effect of lowering the technical threshold and improving the experience
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0042] This embodiment provides a data cleaning method, such as figure 1 shown, including:
[0043] Step S10: Select a data source to be cleaned from heterogeneous data sources through a graphical interface. Among them, heterogeneous data sources include text files and database data.
[0044] In this step, the heterogeneous data source is the HDFS file system. The HDFS file system is an open source big data distributed file system. The data sources to be cleaned must first be poured into the HDFS file system, and then cleaned and processed. In addition, the text file is a common data file of the user, and the common data file can be directly uploaded to the HDFS file system through a graphical interface. The database data is the user's relational database. The data in the relational database needs to be extracted by the Sqoop component and then saved in the HDFS file system. The Sqoop component is a big data open source tool used for data extraction and data conversion betw...
Embodiment 2
[0051] This embodiment provides a data cleaning method, such as figure 2 shown, including:
[0052] Step S10: Select a data source to be cleaned from heterogeneous data sources through a graphical interface. Among them, heterogeneous data sources include text files and database data.
[0053] Step S11: Edit data cleaning rules through a graphical interface.
[0054] This step specifically includes: Step S110: Select the file to be cleaned from the selected data sources to be cleaned through a graphical interface.
[0055] Step S111: Designate the fields to be cleaned in the file to be cleaned through the graphical interface.
[0056] Step S112: Configure cleaning rules for the field to be cleaned through the graphical interface.
[0057]The specific execution process of steps S110 to S112 is, for example: choose to clean the file user, such as:
[0058]
[0059] Choose to clean the 2nd, 3rd and 4th fields in the file user, and specify cleaning rules for each field. I...
Embodiment 3
[0077] This embodiment provides a data cleaning system, such as image 3 As shown, it includes: selection module 1, which is used to select a data source to be cleaned from heterogeneous data sources through a graphical interface. Among them, heterogeneous data sources include text files and database data. Editing module 2 is used to edit data cleaning rules through a graphical interface. The cleaning module 3 is used to perform data cleaning through a graphical interface.
[0078] Among them, the heterogeneous data source is the HDFS file system. The HDFS file system is an open source big data distributed file system. The data source to be cleaned must first be poured into the HDFS file system, and then cleaned and processed. In addition, the data cleaning system in this embodiment further includes a data source management module 8 for managing data input by users into heterogeneous data sources. For example, a text file is a user's common data file, which can be directly ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More - R&D
- Intellectual Property
- Life Sciences
- Materials
- Tech Scout
- Unparalleled Data Quality
- Higher Quality Content
- 60% Fewer Hallucinations
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com



