Unlock instant, AI-driven research and patent intelligence for your innovation.

Data quality improvement method and system based on distributed system

A distributed system and data quality technology, applied in the field of data processing, can solve problems such as inability to adapt to industrial data, and achieve the effect of improving data quality

Pending Publication Date: 2020-01-31
QILU UNIV OF TECH
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, in view of the data quality problems of industrial big data, domestic and foreign are researching and designing general data cleaning frameworks, providing corresponding description languages ​​for user programming and reducing the difficulty of data cleaning, but traditional data quality improvement methods can no longer Adapt to industrial data with a large amount of data, so this paper proposes a data quality improvement method based on a distributed system, which can effectively deal with data quality problems and improve data quality

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data quality improvement method and system based on distributed system
  • Data quality improvement method and system based on distributed system
  • Data quality improvement method and system based on distributed system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] It should be noted that the following detailed description is exemplary and intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

[0024] It should be noted that the terminology used herein is only for describing specific embodiments, and is not intended to limit the exemplary embodiments according to the present disclosure. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and / or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and / or combinations thereof.

[0025] Implementation example one

[0026] This embodiment discloses a method for improving data quality based on a distribute...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data quality improvement method and a system based on a distributed system. The method comprises the steps of acquiring data of all data sources and loading to a distributed file system; preprocessing the loaded data by the distributed file system, wherein the preprocessing process mainly comprises the steps of filling incomplete fields and merging repeated records; performing data cleaning on the to-be-cleaned data; and after the data cleaning is finished, performing data quality evaluation through the constructed data quality evaluation model. According to the data cleaning method, the problems of data cleaning, inconsistent data formats, repeated data records and lack of fields in the data are effectively solved, most of wrong data is cleaned, and the data quality is greatly improved.

Description

technical field [0001] The present disclosure relates to the technical field of data processing, in particular to a method and system for improving data quality based on a distributed system. Background technique [0002] In recent years, with the continuous development of Industry 4.0, Internet of Things technology, and enterprise internal management systems, massive amounts of data have been gathered in the process. These data are of inestimable value to the development of enterprises. However, the quality of these data is not high, which will affect the accuracy of the data analysis results, and the decision-making based on problematic data is likely to bring losses to the enterprise. [0003] Industrial big data is divided into three categories, namely: (1) various business data related to manufacturing enterprises, including production data, sales data, etc.; (2) data generated by various machinery and equipment, such as various machinery and equipment operating inform...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/182G06F16/215
CPCG06F16/182G06F16/215
Inventor 孙涛刘秀源郭爱章
Owner QILU UNIV OF TECH