Unlock instant, AI-driven research and patent intelligence for your innovation.

Big data quality management task scheduling method based on pipeline model and task combination

A technology of data quality and task scheduling, which is applied in the fields of electronic digital data processing, special data processing applications, structured data retrieval, etc., can solve the problems of low performance, achieve the goal of improving parallelism, improving data quality management efficiency, and good performance Effect

Active Publication Date: 2020-07-28
NANJING UNIV
View PDF3 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Purpose of the invention: For the problems and deficiencies in the above-mentioned prior art, the purpose of the invention is to provide a large data quality management task scheduling based on pipeline model and task merging method to solve the problem of low performance of the existing system when processing multiple data management tasks in a big data scenario, while taking into account the detection and repair of various data quality problems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data quality management task scheduling method based on pipeline model and task combination
  • Big data quality management task scheduling method based on pipeline model and task combination
  • Big data quality management task scheduling method based on pipeline model and task combination

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] Below in conjunction with accompanying drawing and specific embodiment, further illustrate the present invention, should be understood that these embodiments are only for illustrating the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various aspects of the present invention Modifications in equivalent forms all fall within the scope defined by the appended claims of this application.

[0023] The present invention proposes a big data quality management task scheduling method based on the combination of pipeline model and task, which solves the problem of execution efficiency of data quality detection and repair tasks in big data scenarios, and designs a method based on task computing characteristics and task The classification and merging method of tasks related to each other can improve the parallelism of tasks and avoid repeated calculations.

[0024...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a big data quality management task scheduling method based on pipeline model and task combination. The big data quality management task scheduling method comprises the following steps: 1, reading dirty data from various underlying heterogeneous big data sources; 2, defining a series of data quality detection and repair tasks, and sending the tasks to a task scheduler; 3, enabling the task scheduler to classify the received data quality management tasks; 4, performing merging processing on the combinable classified tasks; 5, sequentially executing various tasks through aparallel data processing function; and 6, uniformly outputting and feeding back an execution result of the data quality detection and repair tasks. The problem that an existing data quality management system is insufficient in performance in a big data scene can be solved, the data quality management task execution efficiency is improved, and meanwhile data quality detection and restoration are considered.

Description

technical field [0001] The present invention relates to the field of big data quality management, in particular to a big data quality management task scheduling method based on pipeline model and task merging, especially relates to multiple heterogeneous big data sources at the bottom layer, including various data quality problems, and provides a Unified data quality management task scheduling method. Background technique [0002] In today's big data era, data quality issues are getting more and more attention. It is not only a basic data processing job, but also helps to clean data with quality problems, integrate clean data, and provide high-quality data services. It is a necessary prerequisite for users to develop upper-level applications, mine data value, and make correct decisions, and directly affects the social value and economic benefits that big data can bring. In addition, in practical applications, data quality management runs through the entire life cycle of dat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/48G06F16/215G06F16/27
CPCG06F9/4881G06F16/215G06F16/27
Inventor 顾荣齐扬黄宜华
Owner NANJING UNIV