Method for realizing task data decoupling in spark operation scheduling system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A job scheduling and data decoupling technology, applied in the direction of multi-programming devices, etc., can solve the problem of lack of task scheduling implementation, and achieve the effects of improving coordination and maintainability, enhancing collaborative development capabilities, and improving maintainability

Active Publication Date: 2015-02-18

北京赛特斯信息科技股份有限公司

View PDF5 Cites 30 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Spark provides implementations in both job scheduling and Action scheduling, but lacks the implementation of task scheduling

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment

[0073] In practical applications, such as figure 2 As shown, it is a specific embodiment of the present invention, and the specific process is as follows:

[0074] 1: First create a global context object, which will save the context information and global attribute information of the Spark runtime state. These attribute information can be specified by the developer.

[0075] 2: Read the configuration information of each task

[0076] 3: According to the configuration information of these tasks, a directed acyclic graph will be constructed, and the dependencies of tasks can be analyzed through the directed acyclic graph.

[0077] 4: Create a global state object instance, which saves the RDD information of the global scope and the iteration state object of each iteration cycle. In this way, all state objects can be traversed through the object instance to obtain necessary RDD information.

[0078] 5: Start an iterative cycle, and execute the tasks in this cycle in sequence according to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a method for realizing task data decoupling in a spark operation scheduling system, wherein the method comprises the following steps that in one iteration cycle, a system reads the iteration RDD (resilient distributed datasets) information of an iteration state object through a task context object example, and in addition, the iteration RDD information is stored into a task context object; the system finds the corresponding RDD information from the task context object through a Spark task object example, and stores the corresponding RDD information into a task result object; the system analyzes the RDD information in the task result object through the task state object example, and respectively stores the corresponding RDD information into the corresponding state object. When the method for realizing task data decoupling in the spark operation scheduling system is adopted, the RDD can be transmitted among all tasks, or the RDD transmission can be carried out between a former period and a later period of the task, so that each task can be complied in a modularized mode, and a wider application range can be realized.

Description

Technical field [0001] The invention relates to the field of distributed big data processing, in particular to the field of Spark job scheduling design, and specifically refers to a method for realizing task data decoupling in a Spark job scheduling system. Background technique [0002] Spark is an open source cluster computing system based on memory computing. The purpose is to make data analysis faster. Spark is a general parallel computing framework similar to MapReduce (a programming model), but unlike MapReduce, intermediate results can be stored in memory, bringing higher efficiency and better interactivity (low latency). In addition, Spark also provides a wider range of data set operation types, supporting multiple paradigms such as memory computing, multi-iteration batch processing, ad hoc query, stream processing, and graph computing. [0003] Spark also introduced an abstraction called Resilient Distributed Datasets (RDD). RDD is a collection of read-only objects distri...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F9/46

Inventor 逯利军钱培专汪金忠余聪林强李克民李拯

Owner 北京赛特斯信息科技股份有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for realizing task data decoupling in spark operation scheduling system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology