Supercharge Your Innovation With Domain-Expert AI Agents!

Data preprocessing system supporting multiple file formats

A data preprocessing and file format technology, applied in the field of big data processing, can solve the problems of inability to process multiple source files, dynamic sorting, support difficulties, etc., to achieve rich output media, reduce time and difficulty, and intelligentize high degree of effect

Inactive Publication Date: 2021-10-26
KUNMING UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Existing data preprocessing solutions can only process source files in certain specific formats. When encountering source files in new formats, it is difficult to support them. They cannot flexibly process multiple source files through configuration files, and cannot do dynamic sorting. After processing, it can only be output to a text file

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data preprocessing system supporting multiple file formats

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The specific content of the present invention will be described in detail below through specific examples.

[0027] Such as figure 1 As shown, the present invention supports the data preprocessing system of multiple file formats, including central processing module, configuration file management module, parsing mode processing module, delimiter mode processing module, EXCEL mode processing module, dynamic link mode processing module, finishing Processing module, sorting processing module, text file output module, mysql output module, kafka output module and log management module;

[0028] The central processing module calls the configuration file management module through the parameter channel number, finds the configuration information of the channel number, obtains the content of the data source file directory, processing method, output method, etc. to be processed, and scans the data source file directory, For files that meet the conditions, the corresponding proces...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data preprocessing system supporting multiple file formats. The data preprocessing system includes a central processing module, a configuration file management module, an analysis mode processing module, a separator mode processing module, an EXCEL mode processing module, a dynamic link mode processing module, an arrangement processing module, a sorting processing module, a text file output module, a mysql output module, a kafka output module and a log management module. Through a scientific and reasonable system design, incomplete, inconsistent and irregular data in data analysis and processing work can be sorted and processed into rule data meeting system requirements, and through sorting processing, specified content can be output to a required storage medium, so that the data analysis and processing quality is improved, and the time and difficulty required by actual processing are reduced; standard data required by a follow-up system is met, the supported source data formats are multiple, output media are rich, the intelligent degree is high, the system can be achieved through configuration, and the system is worthy of application and popularization.

Description

technical field [0001] The invention belongs to the technical field of big data processing, and relates to a data preprocessing system supporting multiple file formats. Background technique [0002] The big data processing process can be summarized into four steps, which are collection, import and preprocessing, statistics and analysis, and mining. This step of import and preprocessing needs to solve the data processing problems caused by a large amount of data and data in different formats. , the source data needs to be processed into relatively regular data. There may be different processing methods for data in different formats, but no matter what kind of data, there are always some steps and methods that are common in the entire data processing process. Yes, the data file preprocessing system can complete these general processes of data processing, reduce the time of data processing and simplify the cumbersome processing process. [0003] Existing data preprocessing sch...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/11G06F16/14G06F40/151
CPCG06F16/116G06F16/148G06F40/151
Inventor 李冬萍杨迎春
Owner KUNMING UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More