Mass data system with data cleaning function

A mass data and data cleaning technology, applied in the field of data systems, can solve problems such as missing data fields, inaccurate original data, and difficulty in solving big data, and achieve the effect of solving missing values

Inactive Publication Date: 2017-01-25
CHENGDU CALABAR INFORMATION TECH
View PDF7 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Using the traditional centralized processing method for structured data, it is difficult to solve the problems caused by big data. In view of these three characteristics, the integration and cleaning of big data becomes particularly...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mass data system with data cleaning function

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0020] Such as figure 1 As shown, the massive data system with data cleaning includes:

[0021] Data collection module: collect data from various data sources to the data processing center, and perform preliminary processing on the collected data, that is, format inspection and standardization;

[0022] Data processing module: perform decoding and format conversion on the preliminarily processed data collected in the data acquisition module, generate standard format data products, set quality control codes for each data, and generate standard formats with quality control codes Data products; select, integrate and statistically process some real-time and non-real-time massive data to generate processed data;

[0023] Data cleaning module: first complete data analysis, define error types, then complete search, identify error records, and finally correct errors; error types include structure-level errors and record-level errors; the method of identifying errors is based on data ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a mass data system with the data cleaning function. The mass data system comprises a data collecting module, a data processing module, a data cleaning module, a data memory management module, a data service module and a data monitoring module, wherein the data collecting module collects data from all kinds of data sources to a data processing center; data collected by the data collecting module and primarily processed is decoded and subjected to format conversion through the data processing module; the data cleaning module firstly completes data analysis and error type definition, secondly completes error logging searching and indentifying and finally completes error correcting; the data memory management module carries out memory management on data processed by the data processing module; the data service module is used for achieving the accessing requirements of clients for data; the data monitoring module monitors, records and processes data in the data collecting module, the data processing module and the data service module. By means of the mass data system with the data cleaning function, structural level errors and recording level errors can be found and corrected, and value losses, value errors, duplicate records and disagreement errors between the insides of data sources and data sources are avoided.

Description

technical field [0001] The invention relates to a data system, in particular to a massive data system with data cleaning. Background technique [0002] In recent years, with the rapid development of information technology, the amount of data collected, stored, processed and analyzed is increasing. The processing of massive data is becoming more and more popular. Different from the traditional data structure characteristics, big data has three characteristics, including massiveness, distribution, and heterogeneity. Its massiveness mainly refers to the huge scale of data and its growth rate continues to increase; its distribution is mainly reflected in the fact that the huge amount of data cannot be stored, calculated and analyzed on a single machine; its heterogeneity is mainly reflected in the differences in data types and data sources diversification. Using the traditional centralized processing method for structured data, it is difficult to solve the problems caused by b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/215
Inventor 朱焰冰
Owner CHENGDU CALABAR INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products