Workflow scheduling method for data analysis tasks

A scheduling method and data analysis technology, applied in the direction of electrical digital data processing, program startup/switching, program control design, etc., can solve problems such as data analysis task execution failure, achieve efficient workflow scheduling methods, and reduce system redundant input Output and reduce the effect of strong dependencies

Active Publication Date: 2018-08-17
成都优易数据有限公司
View PDF8 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is: in order to solve the problem that in the current scheduling method, a node manager is often responsible for the task execution of each node, once a node program runs incorrectly or the node manager goes down, the problem that the execution of the entire data analysis task will fail, the present invention Provide a workflow scheduling method applied to data analysis tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Workflow scheduling method for data analysis tasks
  • Workflow scheduling method for data analysis tasks
  • Workflow scheduling method for data analysis tasks

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0071] S1: Receive the task sequence, analyze and encapsulate the smallest task unit in the task sequence. The smallest task unit includes the various identifiers mentioned above.

[0072] S2: Construct a data flow graph according to the minimum task unit, and divide the data flow graph into several production-consumption relationships. The data flow graph contains nodes; when constructing the data flow graph, if it is detected that the received sequence task contains an end flag In the task of searching for the complete workflow submitted by the current user this time according to the unique identifier of the current operating environment, this complete workflow is used as a data flow graph, which contains nodes; figure 2 There are 7 minimum task units in total, node 1 represents streaming data, node 2 represents offline data, node 3 represents merging by column, node 4 represents feature conversion / feature importance selection, node 5 represents model training, and node 6 repres...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a workflow scheduling method for data analysis tasks, and relates to the field of computer software. The method includes the steps that S1, a task sequence is received, and minimum task units in the task sequence are parsed and packaged; S2, a data flow diagram is constructed according to the minimum task units, wherein the data flow diagram includes nodes; S3, the nodes inthe data flow diagram are initialized; S4, the initialized data flow diagram is searched for parallel nodes; S5, tasks are executed through multiple worker threads according to the data flow diagram,wherein the parallel nodes execute tasks simultaneously; S6, steps S4-S5 are circularly conducted until execution of the task with an ending tag is finished, and then the whole workflow scheduling iscompleted. The method solves the problem that in most current scheduling methods, one node manager takes charge of task execution of all nodes, and once one node program runs incorrectly or the node manager breaks down, execution of the whole data analysis task fails.

Description

Technical field [0001] The invention relates to the field of computer software, in particular to a workflow scheduling method applied to data analysis tasks. Background technique [0002] As people’s lives become more digitized and intelligent, data analysis also plays an increasingly important role. Due to the diversity and complexity of businesses, it is often necessary to combine multiple data analysis tasks into a larger analysis task. , Which is executed in the form of workflow. [0003] The scheduling and execution of data analysis workflow is affected by multiple factors such as data size, storage location, business process, computer calculation, data transmission, etc., and directly relates to the stability, reliability, and efficiency of the entire task. To effectively optimize the process, It is necessary to integrate multiple factors to design a scientific and reasonable scheduling optimization method. [0004] The current popular scheduling methods generally have the ad...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/445G06F9/48
CPCG06F9/44563G06F9/4843
Inventor 王永波傅玉生
Owner 成都优易数据有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products