Excel data source cleaning method, system, electronic equipment and storage medium based on big data

A data source and big data technology, applied in the field of data cleaning, can solve problems such as waste of labor costs, poor data quality and reliability, and achieve the effect of improving accuracy and alleviating workload

Active Publication Date: 2021-08-03
航天神舟智慧系统技术有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The data storage methods of various industries are mainly EXCEL and various databases, and the storage structures are also various. If data cleaning is to be performed, it is necessary to manually sort out various structures and types of data, resulting in a waste of labor costs.
[0005] The quality and reliability of most of the data in EXCEL is very poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Excel data source cleaning method, system, electronic equipment and storage medium based on big data
  • Excel data source cleaning method, system, electronic equipment and storage medium based on big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0050] According to an embodiment of the present invention, parsing and structuring the EXCEL data source includes:

[0051] Upload the EXCEL data source and specify the number of header rows in the list;

[0052] Distinguish the header row and data area according to the number of header rows;

[0053] According to the last line of the title, the data model is automatically built, and the corresponding field names are defined;

[0054] Establish the corresponding relationship between fields and titles;

[0055] Store the data from the EXCEL data source into the database.

[0056] Further, standardizing the key attribute names of the parsed and structured data in the EXCEL data source is to match the key field data in the EXCEL data source with the standard data.

[0057] Further, clean the standardized EXCEL data source, including:

[0058] Preprocess the data in the EXCEL data source;

[0059] Build a knowledge base model, compare the data in the preprocessed EXCEL data ...

Embodiment 1

[0103] Input: an EXCEL list with non-standard data, specify the number of header rows;

[0104] Output: an EXCEL list of data standards;

[0105] Processing flow:

[0106] According to the titleNum of the EXCEL title row, the title and data are distinguished. The first line to the titleNum line is the title area, and the (titleNum+1) to the last line is the data area;

[0107] Use JAVA POI technology to parse the data in the header area and data area of ​​the EXCEL list:

[0108] Parse the suffix of the EXCEL file to determine whether it is "XLSX" or "XLS";

[0109] Create corresponding workbooks according to different suffixes;

[0110] Parse the first sheet in the workbook;

[0111] Loop to parse each row of data in the sheet;

[0112] loop through each cell in each row;

[0113] Read the data in the cell and store the data in memory.

[0114] Use the jdbc method to store the read title in the T_DATA_SOURCE_COLUMN table. Create the corresponding table structure acco...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method, system, electronic device and storage medium for cleaning an EXCEL data source based on big data, wherein the method includes: parsing and structuring the EXCEL data source; keying the data in the parsed and structured EXCEL data source Standardize the attribute name; clean the standardized EXCEL data source; perform standard matching on the cleaned EXCEL data source according to the standard database and improve the data information. According to the technical solution of the present invention, the accuracy rate of data processing can be effectively improved, the workload of users can be alleviated, and data guarantee can be provided for subsequent analysis and use of big data.

Description

technical field [0001] The present invention relates to the technical field of data cleaning, in particular to a method, system, electronic equipment and storage medium for cleaning an EXCEL data source based on big data. Background technique [0002] The construction of smart cities requires the support of big data technology. The current big data field is mainly aimed at data mining, analysis and use, and the processing of data standardization and accuracy is handed over to users, which brings huge benefits to users. workload. Moreover, the user spends a lot of time and energy, and the accuracy of the data sorted out by hand is not necessarily high. [0003] All walks of life have a large amount of different types of data, and these data have various problems, which have caused great obstacles to the accurate use of data. In order to remove obstacles, data needs to be cleaned to obtain accurate and high-quality data. [0004] The data storage methods of various industri...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/215G06F16/28
CPCG06F16/215G06F16/284
Inventor 孙东祥常卫涛张坤郑媛媛王茹
Owner 航天神舟智慧系统技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products