A method for decoupling task data in the spark job scheduling system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for job scheduling and data decoupling, applied in multi-program devices and other directions, can solve problems such as lack of task scheduling implementation, and achieve the effects of improving synergy and maintainability, improving maintainability, and simplifying dependency configuration.

Active Publication Date: 2017-10-31

北京赛特斯信息科技股份有限公司

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Spark provides implementations in both job scheduling and Action scheduling, but lacks the implementation of task scheduling

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment

[0073] In practical application, such as figure 2 Shown, be a specific embodiment of the present invention, concrete flow process is as follows:

[0074] 1: First create a global context object, which will save the context information of the Spark runtime state and global attribute information. These attribute information can be specified by the developer.

[0075] 2: Read the configuration information of each task

[0076] 3: A directed acyclic graph will be constructed based on these task configuration information, and the dependencies of tasks can be analyzed through the directed acyclic graph.

[0077] 4: Create a global state object instance, which saves the RDD information of the global scope and the iteration state object of each iteration cycle. Therefore, through this object instance, all state objects can be traversed to obtain necessary RDD information.

[0078] 5: Start an iterative cycle, and execute the tasks in this cycle sequentially according to the infor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a method for realizing task data decoupling in a spark operation scheduling system, wherein the method comprises the following steps that in one iteration cycle, a system reads the iteration RDD (resilient distributed datasets) information of an iteration state object through a task context object example, and in addition, the iteration RDD information is stored into a task context object; the system finds the corresponding RDD information from the task context object through a Spark task object example, and stores the corresponding RDD information into a task result object; the system analyzes the RDD information in the task result object through the task state object example, and respectively stores the corresponding RDD information into the corresponding state object. When the method for realizing task data decoupling in the spark operation scheduling system is adopted, the RDD can be transmitted among all tasks, or the RDD transmission can be carried out between a former period and a later period of the task, so that each task can be complied in a modularized mode, and a wider application range can be realized.

Description

technical field [0001] The invention relates to the field of distributed big data processing, in particular to the field of Spark job scheduling design, and specifically refers to a method for decoupling task data in a Spark job scheduling system. Background technique [0002] Spark is an open source cluster computing system based on memory computing, which aims to make data analysis faster. Spark is a general-purpose parallel computing framework like MapReduce (a programming model), but unlike MapReduce, the intermediate results can be stored in memory, which brings higher efficiency and better interactivity (low latency). In addition, Spark also provides a wider range of data set operations, supporting multiple paradigms such as memory computing, multi-iterative batch processing, ad hoc query, stream processing, and graph computing. [0003] Spark also introduces an abstraction called Resilient Distributed Datasets (RDD). RDD is a collection of read-only objects distribu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F9/46

Inventor 逯利军钱培专汪金忠余聪林强李克民李拯

Owner 北京赛特斯信息科技股份有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A method for decoupling task data in the spark job scheduling system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology