Unlock instant, AI-driven research and patent intelligence for your innovation.

Data cleaning method and data cleaning system

A data cleaning and cleaning technology, applied in the communication field, can solve the problem of high threshold of use, achieve the effect of lowering the technical threshold and improving the experience

Inactive Publication Date: 2018-08-03
CHINA UNITED NETWORK COMM GRP CO LTD
View PDF8 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These cleaning methods require users to master the use of multiple cleaning tools, have high development capabilities for cleaning tools, and have a relatively high threshold for use.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data cleaning method and data cleaning system
  • Data cleaning method and data cleaning system
  • Data cleaning method and data cleaning system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0042] This embodiment provides a data cleaning method, such as figure 1 shown, including:

[0043] Step S10: Select a data source to be cleaned from heterogeneous data sources through a graphical interface. Among them, heterogeneous data sources include text files and database data.

[0044] In this step, the heterogeneous data source is the HDFS file system. The HDFS file system is an open source big data distributed file system. The data sources to be cleaned must first be poured into the HDFS file system, and then cleaned and processed. In addition, the text file is a common data file of the user, and the common data file can be directly uploaded to the HDFS file system through a graphical interface. The database data is the user's relational database. The data in the relational database needs to be extracted by the Sqoop component and then saved in the HDFS file system. The Sqoop component is a big data open source tool used for data extraction and data conversion betw...

Embodiment 2

[0051] This embodiment provides a data cleaning method, such as figure 2 shown, including:

[0052] Step S10: Select a data source to be cleaned from heterogeneous data sources through a graphical interface. Among them, heterogeneous data sources include text files and database data.

[0053] Step S11: Edit data cleaning rules through a graphical interface.

[0054] This step specifically includes: Step S110: Select the file to be cleaned from the selected data sources to be cleaned through a graphical interface.

[0055] Step S111: Designate the fields to be cleaned in the file to be cleaned through the graphical interface.

[0056] Step S112: Configure cleaning rules for the field to be cleaned through the graphical interface.

[0057]The specific execution process of steps S110 to S112 is, for example: choose to clean the file user, such as:

[0058]

[0059] Choose to clean the 2nd, 3rd and 4th fields in the file user, and specify cleaning rules for each field. I...

Embodiment 3

[0077] This embodiment provides a data cleaning system, such as image 3 As shown, it includes: selection module 1, which is used to select a data source to be cleaned from heterogeneous data sources through a graphical interface. Among them, heterogeneous data sources include text files and database data. Editing module 2 is used to edit data cleaning rules through a graphical interface. The cleaning module 3 is used to perform data cleaning through a graphical interface.

[0078] Among them, the heterogeneous data source is the HDFS file system. The HDFS file system is an open source big data distributed file system. The data source to be cleaned must first be poured into the HDFS file system, and then cleaned and processed. In addition, the data cleaning system in this embodiment further includes a data source management module 8 for managing data input by users into heterogeneous data sources. For example, a text file is a user's common data file, which can be directly ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data cleaning method and a data cleaning system. The data cleaning method comprises the following steps that: S10: through a graphic interface, selecting a data source to be cleaned from a heterogeneous data source, wherein the heterogeneous data source comprises a text file and database data; S11: through the graphic interface, editing a data cleaning rule; and S12: through the graphic interface, executing data cleaning. By use of the data cleaning method, through the graphic interface, the data source to be cleaned is selected from the heterogeneous data source, different data sources can be subjected to fusion cleaning, meanwhile, a user can clean data through a simple operation on the graphic interface, the development and use method of the data cleaning tool does not need to be mastered, the technical threshold of big data application service is lowered, and the experience of the user for the big data service is improved.

Description

technical field [0001] The present invention relates to the technical field of communications, and in particular, to a data cleaning method and a data cleaning system. Background technique [0002] In recent years, with the development of big data technology, new analysis techniques have been provided for the original massive logs, online records, historical data, etc. Through the analysis of these massive data, many valuable information that cannot be found usually can be found. Data analysis consists of Quantitative change brings qualitative change. Big data technology can support the business analysis of enterprises internally, and realize new application innovations externally, bringing more and better services to users. [0003] To provide big data services, the first step is to collect scattered data, clean it, and put the cleaned data into storage. This process is also called ETL, and involves three steps: extract data extraction, Transformation data conversion, and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F9/451
CPCG06F9/451G06F16/215
Inventor 博格利贾子翔龙岳蒋成郭佳睿
Owner CHINA UNITED NETWORK COMM GRP CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More