Distributed data integration job scheduling method and device

A distributed data and job scheduling technology, applied in the basic field of big data, can solve problems such as scheduling too long jobs, not suitable for complex data integration scenarios, and not considering job metadata access and low-latency job metadata consistency. Achieve the effect of reducing the risk of delay and interruption

Active Publication Date: 2019-10-22
ENJOYOR COMPANY LIMITED
View PDF9 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Chinese invention patent CN201610197298 discloses a task scheduling method, device and system, and proposes a multi-channel multi-task distributed scheduling method, which solves the problem of starvation of other jobs caused by too long a single task scheduling time, but does not consider the job Low latency of metadata access, how to ensure the consistency of job metadata, etc.
Chinese invention patent CN201410748604 discloses a distributed task scheduling system and method, which proposes a distributed task scheduling system and method that ensures th

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed data integration job scheduling method and device
  • Distributed data integration job scheduling method and device
  • Distributed data integration job scheduling method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0045] Example: such as figure 1 As shown, a distributed data integration job scheduling device is composed of a job scheduling device and a job running device. The job scheduling device and the job running device perform information interaction with each other; the job scheduling device includes a job management module, a job pre-recording module, and a resource scheduling module; the job management module is used to receive, cache, and store job-related meta information, Perform concurrency control; the job preloading module is used to obtain pending jobs from the job management module, and determine the scheduling priority order; the resource scheduling module is used to obtain job preloading information and computing resources of the job running device information to complete resource allocation and scheduling distribution; the job running device includes a master control node and a work node, the master control node is responsible for management and coordination, and the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a distributed data integration job scheduling method and device. The method aims at a special scene possibly faced by data integration. A job scheduling device is responsiblefor issuing the data integration job to the job operation device, and the job operation device receives the scheduling task, starts job execution, feeds back job operation state information to the jobmanagement module, feeds back working node computing resources to the resource scheduling module, and feeds back lost or fault information to the job preloading module. The method has the following comprehensive characteristics: (1) high availability, fault tolerance and weak consistency; (2) a low delay characteristic for quasi-real-time job scheduling; (3) multi-tenant concurrency control oriented to the cloud service application; (4) computing resource isolation and multi-job parallel scheduling; and (5) a priority scheduling mechanism.

Description

technical field [0001] The invention relates to the field of big data basic technology, in particular to a distributed data integration job scheduling method and device. Background technique [0002] With the evolution of the digital economy, business digitization in many industries has been fully developed, and digital business has gradually become a new focus. However, due to the large number of data islands derived from business digitization, which has become a common pain point in the realization of digital business, various industries urgently need data integration, open up and avoid data islands, integrate and manage data resources, so as to effectively develop the associated value between data. [0003] Data integration often faces tens of thousands of job scheduling, including data exchange, data preprocessing and other job types. The design of the scheduling system needs to consider various complex scenarios. For example, some scenarios not only have a large number...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/48G06F16/27
CPCG06F9/485G06F9/4881G06F16/27
Inventor 李建元刘飞黄王超群刘兴田贾建涛温晓岳
Owner ENJOYOR COMPANY LIMITED
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products