Distributed task processing method, device, equipment, storage medium and system

A task processing and distributed technology, applied in the field of artificial intelligence and deep learning, can solve the problem that the model training task can only terminate the training, and achieve the effect of improving the efficiency of task processing

Pending Publication Date: 2021-02-26
GUANGZHOU HUYA TECH CO LTD
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If a node fails during the processing of the model training task, the model training task can only terminate the training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed task processing method, device, equipment, storage medium and system
  • Distributed task processing method, device, equipment, storage medium and system
  • Distributed task processing method, device, equipment, storage medium and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] figure 1 It is a flowchart of a distributed task processing method in Embodiment 1 of the present invention. This embodiment is applicable to the situation where local nodes process training tasks in a distributed training scenario. This method can be implemented by distributed task processing device, which may be implemented by software and / or hardware, and integrated in the node.

[0044] Such as figure 1 As shown, the method includes:

[0045] S110. According to the node role of the local node in the target node set, obtain matching subtasks from the target task for processing, and each subtask in the target task is jointly processed by each node in the target node set.

[0046] Wherein, the local node is a worker node in the distributed training, the target node set includes multiple nodes, and each node in the same target node set processes the same target task. The node role is used to indicate the position of the local node in the target node set, and the posi...

Embodiment 2

[0055] Figure 2a It is a flowchart of a distributed task processing method in Embodiment 2 of the present invention. On the basis of the above embodiments, this embodiment determines the node role of the local node in the target node set, according to the local node The process of processing the target task by the node role and the process of detecting whether the target node set has changed have been further refined, and the processes of node registration, generation of target node set, node shutdown, and node failure handling have been added.

[0056] Correspondingly, the method of this embodiment may include:

[0057] S210. According to the node start instruction of the node controller, acquire the task ID of the target task, wherein all the nodes in the target node set of the target task share the same task ID.

[0058] Wherein, the node controller is used to realize the opening and closing of the node, and the node start instruction is used to instruct the node to start...

Embodiment 3

[0106] image 3 It is a flow chart of a distributed task processing method in Embodiment 3 of the present invention. This embodiment is applicable to the situation where a node set is assigned to a training task in a distributed training scenario. This method can be implemented by a distributed task The processing device may be implemented by means of software and / or hardware, and integrated into the node controller.

[0107] Such as image 3 As shown, the method includes:

[0108] S310. Acquire a target task, where the target task includes multiple subtasks.

[0109] In the embodiment of the present invention, after acquiring the target task, the node controller allocates the target node set for the target task.

[0110] S320. According to the target task and currently available computing resources, create a target node set matching the target task, and assign the same task identifier to each node in the target node set.

[0111] In the embodiment of the present invention...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a distributed task processing method, device and equipment, a storage medium and a system. The method comprises: according to the node role of a local node ina target node set, obtaining matched sub-tasks from a target task to be processed, and jointly processing all the sub-tasks in the target task by all nodes in the target node set; when it is detectedthat the target node set is changed, suspending the current processing process; and re-determining a new node role of the local node in the changed target node set, and according to the new node role, obtaining a matched sub-task from the target task to continue processing. By using the technical scheme of the invention, the elastic expansion and contraction of the nodes during task processing can be realized, and the task processing efficiency is improved.

Description

technical field [0001] Embodiments of the present invention relate to the fields of artificial intelligence and deep learning, and in particular to a distributed task processing method, device, device, storage medium and system. Background technique [0002] Distributed training can be used for deep learning to train deep neural networks, where the workload used to train the model is split and shared among multiple microprocessors called worker nodes, Each node works in parallel to speed up model training. [0003] In the existing distributed training, once the model training task starts, the number of nodes is fixed and cannot be changed. If the computing resources are idle during the processing of the model training task, the number of nodes cannot be expanded for the model training task, resulting in a waste of computing resources. If during the processing of the model training task, when computing resources are insufficient, a high-priority model training task needs to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50
CPCG06F9/5061
Inventor 刘柏芳
Owner GUANGZHOU HUYA TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products