Unlock instant, AI-driven research and patent intelligence for your innovation.

Tensor storage management method for large model training

A storage management, large model technology, applied in the computer field, can solve problems such as system crash, and achieve the effect of strong functionality

Pending Publication Date: 2022-08-05
北京潞晨科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to provide a tensor storage management method for large-scale model training, to solve the problem that DeepSpeed, the current best heterogeneous training solution proposed in the above-mentioned background technology, still has a lot of room for optimization. In case, DeepSpeed ​​statically divides model data between CPU and GPU memory, and their memory layout is constant for different training configurations, when GPU memory or CPU memory is insufficient to meet its corresponding model data requirements, even if other There is still free memory on the device, and the system will crash

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Tensor storage management method for large model training
  • Tensor storage management method for large model training
  • Tensor storage management method for large model training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] In the description of the present invention, unless otherwise stated, "plurality" means two or more; the terms "upper", "lower", "left", "right", "inner", "outer" The orientation or positional relationship indicated by , "front end", "rear end", "head", "tail", etc. are based on the orientation or positional relationship shown in the accompanying drawings, and are only for the convenience of describing the present invention and simplifying the description, not An indication or implication that the referred device or element must have a particular orientation, be constructed and operate in a particular orientation, is not to be construed as a limitation of the invention. Furthermore, the terms "first," "second," "third," etc. are used for descriptive purposes only and should not be construed to indicate or imply relative importance.

[0023] In the description of the present invention, it should be noted that, unless otherwise expressly specified and limited, the terms "...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a tensor storage management method for large model training, and relates to the technical field of computers, in particular to a tensor storage management method for large model training, which comprises memory management software for large model heterogeneous training, and the memory management software comprises a memory manager and a memory information counter. The memory management software enables tensors to be dynamically distributed in a storage space of a CPU-GPU in a training process, so that model training breaks through a memory wall of the GPU, the memory manager is responsible for model data tensors and is used for marking state information of the tensors, and large model training comprises a preheating stage and a formal stage. According to the method, the memory use condition of a CPU or a GPU of a system is collected regularly, non-model data numerical values are accurately counted according to the statistical moment of each group, large model training breaks through a memory wall by managing the storage mode of tensors in the CPU and the GPU, and training of a larger model can be completed under the same storage hardware configuration.

Description

technical field [0001] The invention relates to the technical field of computers, in particular to a tensor storage management method oriented to large model training. Background technique [0002] The emergence of pre-trained models (PTMs) represented by BERT and GPT today is a milestone in the field of natural language processing (NLP). NLP is entering the era of pre-training. PTM uses a neural network of stacked multi-Transformer structures to pre-train common language feature representations on a large amount of text, and then transfer the learned knowledge to different downstream tasks through fine-tuning. Using massive text data from the Internet, PTM can capture subtle features of natural language and make downstream tasks more impressive. Therefore, the consensus in the AI ​​community is to adopt PTMs as the backbone for specific NLP tasks, rather than training models from scratch on task-related datasets. [0003] The source of power of PTM is its parameter scale...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50G06F17/16G06N3/04
CPCG06F9/5016G06F9/5027G06F17/16G06N3/04Y02D10/00
Inventor 方佳瑞尤鹏
Owner 北京潞晨科技有限公司