Model training method, server and computer readable storage medium

A model training and model technology, applied in computers, digital computer parts, computing, etc., can solve problems such as inability to exert parallel computing capabilities, waste of hardware resources, and low system acceleration ratios, to eliminate computing power bottlenecks and bandwidth bottlenecks. , Improve the effect of model training acceleration ratio

Active Publication Date: 2019-08-16
ZTE CORP
View PDF4 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the large amount of interactive data, the bandwidth and CPU processing power between PS-Worker often become the bottleneck, and the powerful parallel computing capability of the GPU on the Worker cannot be used, resulting in low system acceleration ratio, poor scalability, and waste of hardware resources.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Model training method, server and computer readable storage medium
  • Model training method, server and computer readable storage medium
  • Model training method, server and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] Such as image 3 As shown, the embodiment of the present invention provides a model training method, the method includes:

[0030] S301. After receiving a training job, acquire job information; wherein, the job information includes a model, sample data, and iteration times.

[0031] Specifically, after receiving the training job submitted by the user, the task management system extracts job information from it. Job information can include information such as deep learning models, sample data, resource requirements, and number of training iterations. Among them, the general form of the model is the program code written in the computer programming language, and the training system refers to the task management system that manages GPU clusters and general training platforms (such as matrix array Tensorflow, caffe2, etc.).

[0032] S302. Copy the model to each GPU, and synchronize the initial values ​​of the model parameters of each GPU.

[0033] Specifically, the traini...

Embodiment 2

[0041] After the first embodiment of the above invention is implemented, the global specification of parameter gradients between GPUs needs to be completed between all GPUs. Typical deep learning models have parameters on the order of millions, tens of millions, or hundreds of millions. These parameter gradients are usually composed of a large number of multidimensional matrix arrays. Organized, the global reduction operation of parameter gradient multi-dimensional matrix array needs to be carried out among all GPUs one by one, and the additional overhead is also very large. In order to solve this problem, Embodiment 2 of the present invention utilizes the characteristics of low overhead for processing long messages by various communication protocols, and adds aggregation and splitting operations before and after the parameter gradient global reduction operation, so that the initial N small parameter gradient multidimensional matrices The arrays are merged into M (1≤M Figure 4 ...

Embodiment 3

[0052] When performing parameter gradient global protocol operations on single-node or multi-node GPU clusters, the communication between nodes and within nodes may pass through various transmission media such as NVLink / PCIe / IB / ETH. In general, inter-GPU media within nodes (such as NVLink / PCIe) has high bandwidth, while inter-node bandwidth is low, directly synchronizing all inter-node and intra-node parameter gradients can make lower-bandwidth media (such as IB / ETH) a bottleneck. In order to solve this problem, in order to solve this problem, the third embodiment of the present invention splits the parameter gradient global statute process into multiple steps, and divides the high-bandwidth interconnected GPUs in the node into logical global statute groups. First, in the global statute The GPU in the group performs a global protocol operation, and then performs inter-group synchronization through the "representatives" selected in the group, so that the global protocol reduces ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a model training method, a server and a computer readable storage medium, and belongs to the field of artificial intelligence calculation. The model training method comprises the steps: after a training job is received, acquiring job information; copying a model to each GPU, and synchronizing the initial values of model parameters of each GPU; extracting a part of sample data each time of iteration, splitting the extracted sample data, distributing the split sample data to different GPUs for training, performing global protocol operation on parameter gradients obtainedby training of all the GPUs, and updating the model parameters on the GPUs according to the parameter gradients after protocol; and after iteration of the iteration times is completed, selecting the model parameters of any GPU and storing the model parameters as a model training result. According to the model training method, the bandwidth bottleneck and the computing power bottleneck between thecomputing nodes are eliminated by fully utilizing the high-speed data transmission bandwidth of the GPU-GPU, so that the synchronous training efficiency and the speed-up ratio of the model on the GPUcluster are improved.

Description

technical field [0001] The invention relates to the field of artificial intelligence computing, in particular to a model training method, a server and a computer-readable storage medium. Background technique [0002] The training of deep learning models requires huge computing resources, and it can take days or even months to complete a training session. In order to speed up model training, it is often necessary to use large-scale GPU (Graphic Processing Unit, Graphics Processing Unit) clusters for parallel training. At the same time, deep learning training tasks often use some parallel algorithms to distribute training tasks to multiple computing nodes to run simultaneously. Including data parallel and model parallel two types of parallel algorithms, data parallel is more commonly used. [0003] In data parallel algorithms, such as figure 1 Shown, usually using PS-Worker (Parameter [0004] Server-Worker (parameter server-computing node) architecture is deployed on GPU c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F15/163G06N3/08
CPCG06F15/163G06N3/08Y02D10/00
Inventor 戎海栋
Owner ZTE CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products