Excel data source cleaning method, system, electronic equipment and storage medium based on big data
A data source and big data technology, applied in the field of data cleaning, can solve problems such as waste of labor costs, poor data quality and reliability, and achieve the effect of improving accuracy and alleviating workload
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment approach
[0050] According to an embodiment of the present invention, parsing and structuring the EXCEL data source includes:
[0051] Upload the EXCEL data source and specify the number of header rows in the list;
[0052] Distinguish the header row and data area according to the number of header rows;
[0053] According to the last line of the title, the data model is automatically built, and the corresponding field names are defined;
[0054] Establish the corresponding relationship between fields and titles;
[0055] Store the data from the EXCEL data source into the database.
[0056] Further, standardizing the key attribute names of the parsed and structured data in the EXCEL data source is to match the key field data in the EXCEL data source with the standard data.
[0057] Further, clean the standardized EXCEL data source, including:
[0058] Preprocess the data in the EXCEL data source;
[0059] Build a knowledge base model, compare the data in the preprocessed EXCEL data ...
Embodiment 1
[0103] Input: an EXCEL list with non-standard data, specify the number of header rows;
[0104] Output: an EXCEL list of data standards;
[0105] Processing flow:
[0106] According to the titleNum of the EXCEL title row, the title and data are distinguished. The first line to the titleNum line is the title area, and the (titleNum+1) to the last line is the data area;
[0107] Use JAVA POI technology to parse the data in the header area and data area of the EXCEL list:
[0108] Parse the suffix of the EXCEL file to determine whether it is "XLSX" or "XLS";
[0109] Create corresponding workbooks according to different suffixes;
[0110] Parse the first sheet in the workbook;
[0111] Loop to parse each row of data in the sheet;
[0112] loop through each cell in each row;
[0113] Read the data in the cell and store the data in memory.
[0114] Use the jdbc method to store the read title in the T_DATA_SOURCE_COLUMN table. Create the corresponding table structure acco...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com