Model training method and apparatus, storage medium, and program product

CN120654764BActive Publication Date: 2026-06-30MOORE THREADS TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
MOORE THREADS TECH CO LTD
Filing Date
2025-06-13
Publication Date
2026-06-30

Smart Images

  • Figure CN120654764B_ABST
    Figure CN120654764B_ABST
Patent Text Reader

Abstract

This disclosure relates to a model training method, apparatus, storage medium, and program product. The method includes: in the forward propagation phase, for any checkpoint module, storing the input and output of the checkpoint module in GPU memory and releasing the intermediate activation values ​​of the checkpoint module from GPU memory; the input of the checkpoint module is used to recalculate the intermediate activation values, and the output of the checkpoint module is used for the forward computation of subsequent modules of the checkpoint module; in the backpropagation phase, for any checkpoint module, in response to the last layer of the checkpoint module being a linear layer, skipping the forward computation of the last layer, calculating the gradient according to the gradient formula corresponding to the last layer, and completing the backpropagation of the checkpoint module based on the gradients of each layer in the checkpoint module. This disclosure can significantly reduce computational overhead while achieving the same computational accuracy and GPU memory savings as standard recalculation schemes.
Need to check novelty before this filing date? Find Prior Art