Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Cluster training node allocation method and electronic equipment

An allocation method and node technology, applied in the direction of resource allocation, neural learning method, program control design, etc., can solve the problems of improper resource allocation, many errors, and difficulty in realizing dynamic optimization of computing resource allocation.

Pending Publication Date: 2021-06-08
杭州幻方人工智能基础研究有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

And unless the user actively disables a node with many error reports, the node allocation will generally not be adjusted according to the running status. This one-way static allocation method is difficult to achieve dynamic optimization of computing resource allocation, so that resources in good status can be fully utilized. return and use
This can easily lead to improper resource allocation, resource overload, more error reports, and poor user experience

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cluster training node allocation method and electronic equipment
  • Cluster training node allocation method and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0024] Example: such as figure 1 As shown, a cluster training allocation method includes the following steps:

[0025] (101) Set training allocation parameters, and submit a cluster training task; the training allocation parameters include group names, the number n of requested nodes, and may also include specified node numbers.

[0026] The training task can be a machine learning algorithm model training program, such as a neural network model training program, or other deep learning model training programs. The user develops and debugs the training task through the client development module to ensure that the training task can initially run through. After the development is complete, the training task can be uploaded to the cluster training task queue to wait for task assignment.

[0027] The cluster in this embodiment includes multiple nodes, and the nodes refer to computer servers, including one or more GPUs. Most of the operations involved in deep learning are vectoriz...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of node distribution, in particular to a cluster training node distribution method and electronic equipment. The method comprises the steps: setting training distribution parameters, and submitting a cluster training task; obtaining an available node set M, obtaining all grouping information lists and node lists corresponding to selected request group names, and obtaining a request node set Y; obtaining a pre-allocated available node set Z = M intersection Y; checking whether the number of nodes in the pre-allocated available node set Z meets the number of request nodes or not; if yes, establishing an allocation execution node set with the capacity as the number of request nodes; putting the specified nodes into a distribution execution node set; sorting nodes in the pre-allocated available node set Z according to priorities, and sequentially putting the nodes into an allocation execution node set until the allocation execution node set is full; locking and distributing execution nodes, distributing tasks, and starting task training. The method has the advantages that the node distribution mode is optimized, equipment resources are fully utilized, and the overall error report rate and the fault rate of training tasks are reduced.

Description

technical field [0001] The invention relates to the field of node allocation, in particular to a cluster training node allocation method and electronic equipment. Background technique [0002] With the development of AI technology, many complex AI program models need to be trained on computing clusters. Due to the simultaneous use of cluster node resources by multiple users, during the training process, for a single-machine or multi-machine training task, the resources of the node computers need to be allocated. [0003] The allocation of node computers in the prior art is unidirectional and static, either the user directly selects a training group to train the node computers, or the system directly allocates according to the remaining node resources. And unless the user actively disables a node with many error reports, the node allocation will generally not be adjusted according to the running status. This one-way static allocation method is difficult to achieve dynamic op...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/50G06F9/48G06F9/455G06F11/30G06N3/08G06N20/00
CPCG06F9/5038G06F9/4881G06F9/45558G06F11/3006G06F11/3051G06N20/00G06N3/08G06F2009/45595G06F2009/45562G06F2009/45575
Inventor 郑达韡徐进
Owner 杭州幻方人工智能基础研究有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products