Unlock instant, AI-driven research and patent intelligence for your innovation.

Parameter updating method, device and apparatus of AI distributed training system

A parameter update and training system technology, applied in the computer field, can solve the problems of no effective solution, limited overall performance of AI distributed training system, slow transmission, etc.

Active Publication Date: 2020-11-27
INSPUR SUZHOU INTELLIGENT TECH CO LTD
View PDF4 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in practical applications, various new computing devices will be continuously added to the worker nodes of the AI ​​distributed training system. In this case, if the above three algorithms are used to adjust the parameters in the AI ​​distributed training system When updating, the overall performance of the AI ​​distributed training system will be limited by the worker node with the slowest computing performance or the communication link with the slowest transmission in the distributed computing environment
At present, there is no effective solution to this technical problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parameter updating method, device and apparatus of AI distributed training system
  • Parameter updating method, device and apparatus of AI distributed training system
  • Parameter updating method, device and apparatus of AI distributed training system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0047] See figure 1 , figure 1 It is a flowchart of a parameter update method of an AI distributed training system provided by an embodiment of the present invention, the parameter update method includes:

[0048] Step S11: When the distributed heterogeneous system needs to complete the AI ​​acceleration task, start the training task of the AI ​​algorithm model on the target worker node of the distributed heterogeneous system, and initialize the model paramet...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a parameter updating method of an AI distributed training system. The method comprises: starting a training task of an AI algorithm model on a target worker node of the distributed heterogeneous system, controlling the node to load model parameters, randomly selecting sample data of kth iterative training for the node, carrying out gradient updating on the model parameters,randomly creating a target node set, carrying out non-zero value updating on an adjacent matrix by utilizing the set, and updating the model parameters on each node in the set by utilizing the updated adjacent matrix; and when the kth iterative training is completed, if the AI algorithm model converges, repeatedly carrying out iterative training on the node until the node completes M times of iterative training, and judging that the distributed heterogeneous system completes the AI acceleration task. By utilizing the method, the requirement of each worker node in the distributed computing cluster on the communication bandwidth during parameter synchronization can be reduced while the hybrid heterogeneous distributed computing environment is supported.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a parameter updating method, device, equipment and medium of an AI distributed training system. Background technique [0002] In practical applications, distributed clusters are often used to accelerate the training tasks of AI (Artificial Intelligence, artificial intelligence) algorithm models. When using multiple worker nodes in distributed clusters to perform data parallel training on AI algorithm models , firstly, the same AI algorithm model will be deployed on each worker node, and the marked training data will be iteratively processed in batches. In each iteration, a batch of training data needs to be processed according to The number of worker nodes is divided into N micro-batches, and then the training data of these N micro-batches are distributed to different worker nodes for model training. Finally, when all worker nodes complete the training data of each micr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/00G06F16/23G06F16/27H04L29/08
CPCG06N20/00G06F16/23G06F16/27H04L67/10
Inventor 郭振华范宝余曹芳赵雅倩李仁刚
Owner INSPUR SUZHOU INTELLIGENT TECH CO LTD