Model training method and device, equipment, storage medium

By constructing an optimizer construction interface and a learnable parameter update interface, and customizing optimizer parameters and update methods, the problems of training efficiency and accuracy of deep learning models are solved, and more efficient model training is achieved.

CN115688912BActive Publication Date: 2026-06-12SHANGHAI SENSETIME INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANGHAI SENSETIME INTELLIGENT TECH CO LTD
Filing Date
2022-06-06
Publication Date
2026-06-12

Smart Images

  • Figure CN115688912B_ABST
    Figure CN115688912B_ABST
Patent Text Reader

Abstract

The method comprises the following steps: an optimizer construction interface constructs an optimizer based on optimizer construction parameters; in response to completion of reverse propagation gradient calculation based on a deep learning model, a learnable parameter update interface determines updated learnable parameters based on the optimizer, learnable parameters in the deep learning model, and gradient information corresponding to the learnable parameters; and the learnable parameter update interface obtains the deep learning model with updated parameters based on the updated learnable parameters.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to, but is not limited to, the field of data processing technology, and in particular to a model training method, apparatus, device, and storage medium. Background Technology

[0002] In the process of training a deep learning model, in order to make the output of the deep learning model closer to the real label and improve the accuracy of the deep learning model, it is necessary to build an optimizer to optimize and update the values ​​of the learnable parameters in the deep learning model. Summary of the Invention

[0003] In view of the above, the present disclosure provides at least one model training method, apparatus, device, storage medium, and program product.

[0004] The technical solution of this disclosure embodiment is implemented as follows:

[0005] On one hand, this disclosure provides a model training method, characterized in that the method includes:

[0006] The optimizer construction interface builds the optimizer based on the optimizer construction parameters;

[0007] In response to the completion of backpropagation gradient calculation based on the deep learning model, the learnable parameter update interface determines the updated learnable parameters based on the learnable parameters in the optimizer and the deep learning model and the gradient information corresponding to the learnable parameters.

[0008] The learnable parameter update interface obtains the updated deep learning model based on the updated learnable parameters.

[0009] On the other hand, embodiments of this disclosure provide a model training apparatus, the apparatus comprising:

[0010] The optimizer construction interface is used to construct optimizer parameters and build the optimizer.

[0011] The learnable parameter update interface is used to respond to the completion of backpropagation gradient calculation based on the deep learning model. The learnable parameter update interface determines the updated learnable parameters based on the learnable parameters in the optimizer and the deep learning model and the gradient information corresponding to the learnable parameters.

[0012] The learnable parameter update interface is also used to obtain a deep learning model with updated parameters based on the updated learnable parameters.

[0013] In another aspect, embodiments of this disclosure provide a computer device including a memory and a processor, wherein the memory stores a computer program that can run on the processor, and the processor executes the program to implement some or all of the steps in the above-described method.

[0014] In another aspect, embodiments of this disclosure provide a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements some or all of the steps in the above-described method.

[0015] In another aspect, embodiments of this disclosure provide a computer program including computer-readable code, which, when executed in a computer device, causes a processor in the computer device to perform some or all of the steps in the above-described method.

[0016] In another aspect, embodiments of this disclosure provide a computer program product, the computer program product including a non-transitory computer-readable storage medium storing a computer program, wherein when the computer program is read and executed by a computer, it implements some or all of the steps in the above method.

[0017] In this embodiment, by setting an optimizer construction interface, custom optimizer construction parameters can be received to complete the construction of the optimizer. This allows different optimizers to be set for different training tasks, improving the overall training efficiency of the deep learning model. Simultaneously, by setting a learnable parameter update interface, the learnable parameters of the input deep learning model can be updated based on the constructed optimizer. This allows the deep learning model to be trained based on the specifically configured optimizer after backpropagation gradient calculation. Attached Figure Description

[0018] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this disclosure and, together with the specification, serve to illustrate the technical solutions of this disclosure.

[0019] Figure 1 A schematic diagram illustrating the implementation process of a model training method provided in this embodiment of the disclosure;

[0020] Figure 2 A schematic diagram illustrating the implementation process of a model training method provided in this embodiment of the disclosure;

[0021] Figure 3 A schematic diagram illustrating the implementation process of a model training method provided in this embodiment of the disclosure;

[0022] Figure 4 A schematic diagram illustrating the implementation process of a model training method provided in this embodiment of the disclosure;

[0023] Figure 5 This is a schematic diagram of the composition structure of a model training device provided in an embodiment of the present disclosure;

[0024] Figure 6This is a schematic diagram of the hardware entity of a model training device provided in an embodiment of this disclosure. Detailed Implementation

[0025] To make the objectives, technical solutions, and advantages of this disclosure clearer, the technical solutions of this disclosure are further described in detail below with reference to the accompanying drawings and embodiments. The described embodiments should not be regarded as limitations on this disclosure. All other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this disclosure.

[0026] In the following description, references to "some embodiments" describe a subset of all possible embodiments; however, it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments and may be combined with each other without conflict. The terms "first / second / third" are used merely to distinguish similar objects and do not represent a specific ordering of objects. It is understood that "first / second / third" may be interchanged in a specific order or sequence where permitted, so that the embodiments of this disclosure described herein can be implemented in orders other than those illustrated or described herein.

[0027] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. The terminology used herein is for descriptive purposes only and is not intended to limit the scope of this disclosure.

[0028] This disclosure provides a model training method that can be executed by a processor of a computer device. The computer device can refer to a server, laptop computer, tablet computer, desktop computer, mobile device (e.g., mobile phone, portable video player, portable gaming device), or any other device with data processing capabilities. Figure 1 This is a schematic diagram illustrating the implementation process of a model training method provided in an embodiment of this disclosure, as shown below. Figure 1 As shown, the method includes the following steps S101 to S103:

[0029] Step S101: The optimizer construction interface constructs the optimizer based on the optimizer construction parameters.

[0030] In some embodiments, the optimizer is used to correlate and update the values ​​of learnable parameters in the deep learning model, so that the output of the deep learning model after updating the learnable parameters is closer to the true label.

[0031] In some embodiments, the optimizer construction parameters can be used to determine the optimization algorithm corresponding to the optimizer. This optimization algorithm includes, but is not limited to, stochastic gradient descent (SGD), momentum, ADAGRAD, and ADAM.

[0032] In some embodiments, the optimizer construction parameters can be used to determine the learning parameters corresponding to the optimizer. These learning parameters include, but are not limited to, the base learning rate, momentum coefficient, and weight decay coefficient.

[0033] In step S102, in response to the completion of the backpropagation gradient calculation based on the deep learning model, the learnable parameter update interface determines the updated learnable parameters based on the learnable parameters in the optimizer and the deep learning model and the gradient information corresponding to the learnable parameters.

[0034] In some embodiments, during any iteration of training, the loss value corresponding to the current iteration of training can be obtained. After the backpropagation gradient calculation process is completed based on the loss value, the gradient information corresponding to the learnable parameters in the deep learning model can be obtained.

[0035] In some embodiments, the learnable parameters in the deep learning model and the gradient information corresponding to the learnable parameters can be received based on a preset learnable parameter update interface, and the updated learnable parameters can be determined based on the optimizer constructed above.

[0036] Step S103: The learnable parameter update interface obtains the parameter-updated deep learning model based on the updated learnable parameters.

[0037] In this embodiment, by setting an optimizer construction interface, custom optimizer construction parameters can be received to complete the construction of the optimizer. This allows different optimizers to be set for different training tasks, improving the overall training efficiency of the deep learning model. Simultaneously, by setting a learnable parameter update interface, the learnable parameters of the input deep learning model can be updated based on the constructed optimizer. This allows the deep learning model to be trained based on the specifically configured optimizer after backpropagation gradient calculation.

[0038] Figure 2 This is an optional flowchart illustrating a model training method provided in this disclosure, which can be executed by a computer device's processor. The optimizer construction parameters include an optimization algorithm and optimization parameters; based on... Figure 1 , Figure 1S101 can be updated to S201, and S102 can be updated to S202, which will be combined Figure 2 The steps shown are explained.

[0039] Step S201: The optimizer construction interface constructs an optimizer based on the optimization algorithm and optimization parameters; the optimizer is used to update the learnable parameters in the deep learning model based on the optimization strategy determined by the optimization algorithm and the optimization parameters.

[0040] In some embodiments, the optimization parameters include at least one of the following: base learning rate, momentum coefficient, and parameter decay coefficient.

[0041] In some embodiments, the optimizer construction interface constructs an optimizer based on an optimization algorithm and optimization parameters, including: the optimizer construction interface verifies the optimization algorithm and the optimization parameters to obtain a construction parameter verification result; if the construction parameter verification result indicates successful verification, the optimizer construction interface constructs the optimizer based on the optimization method identifier and optimization parameters carried in the optimizer construction request; if the construction parameter verification result indicates failed verification, the optimizer construction interface generates construction failure exception information.

[0042] In step S202, in response to the completion of backpropagation gradient calculation based on the deep learning model, the learnable parameter update interface determines the updated learnable parameters based on the optimization algorithm, the optimization parameters, the learnable parameters in the deep learning model, and the gradient information corresponding to the learnable parameters.

[0043] Based on the above embodiments, by setting the optimizer construction interface, targeted optimization algorithms and optimization parameters can be received, and an optimizer suitable for the current training process can be generated based on the optimization algorithms and optimization parameters. During the process of updating learnable parameters based on the optimizer, the probability of overfitting or underfitting can be reduced, thereby improving the overall training efficiency of the deep learning model.

[0044] Figure 3 This is an optional flowchart illustrating the model training method provided in this disclosure, which can be executed by the processor of a computer device. Based on any of the above embodiments, using... Figure 1 For example, Figure 1 S102 can be updated to S301 to step S302, and S103 can be updated to S303 to step S304, which will combine Figure 3 The steps shown are explained.

[0045] Step S301: The learnable parameter update interface receives the deep learning model and a callback function; the deep learning model includes the learnable parameters and the gradient information corresponding to the learnable parameters;

[0046] In step S302, the learnable parameter update interface determines the updated learnable parameters based on the optimization strategy corresponding to the optimizer, the learnable parameters, and the gradient information corresponding to the learnable parameters.

[0047] Step S303: The learnable parameter update interface obtains an intermediate deep learning model based on the updated learnable parameters.

[0048] Step S304: The learnable parameter update interface performs at least one parameter fine-tuning process on the intermediate deep learning model through the callback function, and determines the intermediate deep learning model after the at least one parameter fine-tuning process as the parameter-updated deep learning model.

[0049] In some embodiments, the parameter fine-tuning process includes determining a new loss function value based on the intermediate deep learning model, and adjusting the model parameters of the intermediate deep learning model based on the new loss function value.

[0050] In some embodiments, the learnable parameter update interface generates a first model definition error exception message in response to a failure to parse the deep learning model.

[0051] In the process of obtaining the deep learning model through the learnable parameter update interface, if the parsing fails, a first model definition error exception message is generated; if the parsing succeeds, the learnable parameters and the gradient information corresponding to the learnable parameters in the deep learning model are obtained.

[0052] In some embodiments, the learnable parameter update interface generates model gradient anomaly information in response to the failure to obtain gradient information corresponding to the learnable parameters in the deep learning model.

[0053] Specifically, during the process of obtaining the gradient information corresponding to the learnable parameters of the deep learning model through the learnable parameter update interface, if the acquisition fails, model gradient anomaly information is generated; if the acquisition succeeds, the updated learnable parameters are determined based on the learned parameters and their corresponding gradient information in the deep learning model. Acquisition failure can include the following situations: the gradient information calculation of the deep learning model is incomplete, or the gradient information of the learnable parameters exceeds a preset normal range.

[0054] In some embodiments, the learnable parameter update interface generates parameter update exception information in response to failure to update learnable parameters in the deep learning model.

[0055] Specifically, in the process of determining the updated learnable parameters based on the learnable parameters and the gradient information corresponding to the learnable parameters in the deep learning model, if the updated learnable parameters cannot be calculated, parameter update anomaly information is generated; if the updated learnable parameters can be calculated, the next step is executed.

[0056] Based on the above embodiments, by setting a learnable parameter update interface, the learnable parameters of the input deep learning model can be updated based on the constructed optimizer. After the backpropagation gradient calculation is completed, the deep learning model can be trained based on the specifically set optimizer. At the same time, the callback function set by the learnable parameter update interface can also perform at least one parameter fine-tuning process on the deep learning model (intermediate deep learning model) with updated learnable parameters. In this way, the convergence process in the deep learning model training process can be accelerated and the model training efficiency can be improved.

[0057] Figure 4 This is an optional flowchart illustrating the model training method provided in this disclosure, which can be executed by the processor of a computer device. Based on any of the above embodiments, using... Figure 1 For example, the method further includes step S401, combining... Figure 4 The steps shown are explained.

[0058] Step S401: The learnable parameter gradient zeroing interface responds to the deep learning model after obtaining the updated network parameters by clearing the gradient information of the deep learning model after updating the network parameters based on the optimizer, thereby obtaining the target deep learning model.

[0059] In some embodiments, step S401 can be implemented by S4011 to S4012.

[0060] Step S4011: The learnable parameter gradient zeroing interface receives the deep learning model after updating the network parameters; the deep learning model after updating the network parameters includes the updated learnable parameters and the gradient information corresponding to the updated learnable parameters.

[0061] In step S4012, the learnable parameter gradient zeroing interface zeroes the gradient information corresponding to the updated learnable parameters, and determines the target deep learning model based on the updated learnable parameters and the zeroed gradient information corresponding to the updated learnable parameters.

[0062] In some embodiments, the learnable parameter update interface generates a second model definition error exception message in response to a failure to parse the deep learning model after updating the network parameters.

[0063] Based on the above embodiments, and using the constructed learnable parameter gradient zeroing interface, after completing one backpropagation, the optimizer iterates through all learnable parameters of the deep learning model, sets the gradients of the learnable parameters to zero, and then returns the model with zeroed gradients for subsequent operations. This avoids the result of the previous backpropagation affecting the result of the next backpropagation.

[0064] The following describes the application of the model training method provided in this embodiment in a real-world scenario.

[0065] The optimizer constructed by the optimizer construction interface provided in this embodiment can provide users with functions such as parameter updates, optimization algorithms, learning rate updates, and gradient zeroing.

[0066] In some embodiments, the model training apparatus provided in this disclosure can provide an optimizer construction interface to the user. The user constructs an optimizer through this interface, determining basic information such as the optimization algorithm, learning rate, momentum coefficient, and parameter decay coefficient. Based on this optimizer, a learnable parameter update interface can be provided during each training round to update relevant learnable parameters in the deep learning model. The optimizer can also provide a gradient zeroing function to prevent the gradient calculated in the current round from affecting subsequent training of the deep learning model through the learnable parameter gradient zeroing interface.

[0067] The optimizer construction interface, learnable parameter update interface, and learnable parameter gradient zeroing interface are explained below.

[0068] 1) Optimizer construction interface

[0069] A. Interface Name: construct_optimizer

[0070] The constructor optimizer interface is structured as: construct_optimizer(optimizer_config,).

[0071] B. Interface Function Description:

[0072] The optimizer is constructed based on optimizer_config.

[0073] C. Interface parameter list:

[0074] The parameters of the optimizer's interface are shown in Table 1 below:

[0075]

[0076]

[0077] Table 1

[0078] D. Interface exception handling:

[0079] No errors: Operation successful.

[0080] Config parsing error: config could not be parsed or the configuration parameters are out of range.

[0081] Model Error: Deep learning model definition error.

[0082] E. Other additional notes: None

[0083] F. Implementation Principle: The optimizer is a class implemented based on object-oriented principles. During construction, the parameters related to the deep learning model and the optimizer are stored in the object for later use. The most basic optimizer should be a base class, upon which deep learning frameworks or algorithms can define derived classes to implement different optimization algorithms based on the optimizer's basic interface, such as SGD and Adam.

[0084] 2) Learnable parameter update interface

[0085] A. Interface Name: update_learnable_parameter (optional)

[0086] The optimizer is constructed using the following interface: update_learnable_parameter(model, optimizer, closure)

[0087] B. Interface Function Description: Based on a deep learning model, when the forward propagation result and backpropagation gradient of the model have been calculated based on a certain loss function, the learnable parameters in the model are updated using the strategy in the optimizer.

[0088] C. Interface parameter list:

[0089] The parameters of the optimizer's interface are shown in Table 2 below:

[0090]

[0091]

[0092] Table 2

[0093] D. Interface exception handling:

[0094] No errors: Operation successful.

[0095] Model definition error: Deep learning model definition error.

[0096] Model gradient error: The gradient calculation of the deep learning model was not completed or an error occurred.

[0097] Model update error: The optimizer policy definition cannot correctly update the model's learnable parameters, for example, it enters a NaN state.

[0098] E. Additional notes: Here, "model" refers to both the input and the output.

[0099] F. Implementation principle: After the optimizer is constructed, it already has the corresponding optimization algorithm and optimization parameters built in. At this time, what is needed is the model parameters and their gradients. Some optimizer algorithms may also need a corresponding callback function. With these inputs, the optimizer can use the built-in optimization algorithm and optimization parameters to calculate the new model parameters. After the optimizer replaces the original model parameters with the new model parameters, a round of parameter update is completed.

[0100] 3) Learnable parameter gradient zeroing interface

[0101] A. Interface Name: zero_gradient (optional)

[0102] zero_gradient(model,optimizer)

[0103] B. Interface Function Description:

[0104] Based on the deep learning model, the gradients of the learnable parameters within the model are cleared to 0.

[0105] C. Interface Parameter List: The parameters of this learnable parameter gradient zeroing interface are shown in Table 3 below:

[0106] Parameter type Keywords Parameter Description Is it optional? Input / Output model Deep learning model, type Module Required enter optimizer Optimizer, type Optimizer Required

[0107] Table 3

[0108] D. Interface exception handling:

[0109] No errors: Operation successful.

[0110] Model definition error: Deep learning model definition error

[0111] E. Other additional notes:

[0112] Here, "model" refers to both the input and the output.

[0113] F. Implementation principle: The optimizer will traverse all parameters of the model, set their gradients to zero, and then return the model with the gradients set to zero for subsequent operations.

[0114] Based on the above embodiments, the standardization of optimizer-related interfaces helps training frameworks better develop optimizers, enables algorithms and algorithm frameworks to better call the optimizer interface and use the optimizer functions of deep learning frameworks, and allows for the customization of optimizers required by algorithms based on existing optimizers. The optimizer interface scheme designed in this solution can be used by deep learning frameworks and related tools and applications derived from them to build related models or model frameworks. For example, the company's internal training framework and algorithm framework use this set of optimizer interfaces to align the design and use of optimizers.

[0115] To better understand the embodiments of this disclosure, the computer vision interface model is briefly described below:

[0116] 1) Algorithm Interface Goals

[0117] By defining the interface adaptation layer, the following goals can be achieved: a) the algorithm implementation can use different deep learning frameworks as backends and can be switched; b) it can run on different hardware, such as servers and distributed clusters; c) the algorithm can read different data through a unified data interface.

[0118] 2) Algorithm Interface Model

[0119] This standard describes the relationship between data, algorithms, and models from three levels: system resources, interface adaptation, and algorithm application. The standard primarily specifies the technical requirements for the interface adaptation layer: a) System resource layer: This includes both hardware and software, providing the necessary storage, computation, and inference functions for computer vision systems. The specific implementation of the system resource layer varies among vendors, and this standard does not further describe or specify it; b) Interface adaptation layer: This provides service interfaces such as data interfaces, optimization interfaces, distributed interfaces, and model interfaces to the upper-layer system processes, ensuring efficient and flexible model training and model migration between different frameworks; c) Algorithm application layer: This completes the model training and inference processes, requiring the use of various interfaces defined in this standard during model training.

[0120] The interface requirements for computer vision systems (mandatory and optional requirements, and appropriate usage) are as follows:

[0121] 1. Data and Model Structure

[0122] 1) Data Structure

[0123] Datasets can be in formats such as images, videos, and binary data. The dataset annotation file is in JSON format and contains annotation information for all samples in the dataset. If the annotation results include other auxiliary files, such as mask layer information, the relative paths of these auxiliary files are stored in the JSON file. Common data types are represented as follows:

[0124] Supports category labels, such as integers, where 0 represents background and positive integers represent foreground; supports bounding boxes, such as using the coordinates of the top-left and bottom-right vertices in the order (top-left x-coordinate, top-left y-coordinate, bottom-right x-coordinate, bottom-right y-coordinate); supports annotation files being parsed into lists or arrays after passing through the data reading interface, where each element is a dictionary or key-value pair container containing all relevant information for a sample, accessible by the dataset using an index; supports loading specific dataset formats and custom datasets; supports annotation files in mainstream open-source dataset formats, such as COCO and PASCAL VOC.

[0125] 2) Model Structure

[0126] a) Basic operators should be supported, including but not limited to "+", "-", "*", " / ", convolution operations, etc.; b) The meaning and values ​​of the parameters of the operator should be defined, and the computational logic for obtaining the output from the input during forward operation should be defined; c) Basic operators should support the construction of computational graphs through operator chaining and function nesting; d) The computational graph should support differentiation through backpropagation using the chain rule; e) The computational graph should support construction through conditional judgments or loops; f) User-defined operators should be compiled and allowed to be added to the computational graph; g) Serialization and deserialization of model parameters should be supported.

[0127] 2. Training Interface

[0128] 1) Optimizer Interface

[0129] The optimizer interface should support network model optimization, updating model parameters according to different optimization algorithms such as SGD, Adam, and Momomtum. Specifically, it should implement functions such as gradient calculation and backpropagation, parameter updates, optimization algorithms, and learning rate updates.

[0130] The optimizer interface should implement: a) support implementation as a class whose constructor parameters are the network model or a list of model parameters, as well as other required parameters, such as the learning rate; b) support implementation of the `step()` function for performing a single parameter optimization update. After this function is called, the model parameters should be updated based on the accumulated gradients.

[0131] 2) Mixed Precision Training Interface

[0132] The mixed-precision training interface provides unified support for mixed-precision training of algorithms, which can reduce memory consumption and improve training speed when the graphics card supports it. This interface should implement the following functions:

[0133] a) Supports precision conversion, converting model parameters to fp16, except for special layers (such as BN layers), while retaining a copy of the fp32 parameters; b) Supports input forward propagation, converting input data to fp16 for forward propagation and loss calculation; c) Supports loss amplification, amplifying the calculated loss with both fixed and dynamic amplification modes; d) Supports gradient calculation, calculating and backpropagating gradients in fp16 mode, then converting them to fp32 and scaling them back to the actual scale proportionally according to the amplification factor in c); e) Supports parameter update, updating parameters in the fp32 parameter copy based on the gradient calculated in d), and then assigning the updated parameters to the fp16 model.

[0134] 3) Distributed Interface

[0135] Through a distributed interface, the framework can complete the data transfer between multiple processes in a distributed training scenario with multiple machines and multiple GPUs. This interface should have the following core functionalities:

[0136] The distributed interface should cover functions such as `bcast()`, `reduce()`, `scatter()`, `gather()`, `allreduce()`, `allgateher()`, and `sync()`. This set of interfaces should implement the following: support broadcasting data from the main process to each process; support reducing data from each process to the main process; support scattering a set of data from the main process to each process; support collecting data scattered across each process into the main process as a set; support reducing data from each process and then broadcasting it to each process; support collecting scattered data from each process into a set and then broadcasting it to each process; and support ensuring that all previously issued communication commands have been completed.

[0137] 4) Quantization training interface

[0138] The quantization training interface enables algorithms to perceive the information loss caused by model quantization during neural network training. During training, quantized weights are approximated using floating-point weights, allowing the quantized model to be simulated during forward propagation. The floating-point error is then calculated and backpropagated to update the weights. Quantization helps accelerate model inference and reduce storage requirements. This interface should implement the following functionalities:

[0139] a) Supports input quantization, converting input from 32-bit floating-point type to 8-bit or custom-bit fixed-point type; b) Supports quantization and dequantization of convolution and addition operators; c) Supports pseudo-quantization nodes, which should include functions for quantizing and dequantizing floating-point weights; d) Supports error backpropagation, calculating the error through b) during forward propagation, updating the floating-point weights, and then quantizing; e) Supports quantized model output, allowing the quantized trained model to be converted into a fixed-point model for storage.

[0140] 5) Data processing interface

[0141] The data interface should support the conversion of data into formats required by the module, such as tensors. The data interface should have both outer and inner interfaces. The framework should be able to iteratively prepare continuous training data for algorithm training. This interface should have the following core functionalities:

[0142] Supports the implementation of an iterable data loader type, with each iteration returning a batch of data;

[0143] It supports sampling data from the dataset according to training requirements; it supports constructing dataset objects based on the dataset path and related parameters; it supports reading part or all of the dataset data from storage devices or services, such as annotation files and data samples; and it supports preprocessing operations on the data, such as image scaling, flipping, and color perturbation.

[0144] 6) Visual Interface

[0145] The training visualization interface provides visualizations of model structure, parameters, gradients, and features during algorithm training. This interface should support: visualization of model structure diagrams; visualization of feature maps; visualization of weight histograms; visualization of scalar changes; and visualization of convolutional kernels.

[0146] 7) Distillation interface

[0147] The distillation interface supports the use of a teacher network to guide the student network during training, thereby improving the training accuracy of the student model: it should support target distillation method; it should support feature distillation method.

[0148] 8) Graphical computation fusion interface

[0149] Graph-computation fusion optimizes the overall network execution time by analyzing and optimizing the existing network computation graph logic, reducing overhead during operator execution intervals, and improving the utilization of device computing resources. It supports operations such as splitting, reorganizing, and merging existing computation logic; and allows enabling graph-computation fusion by modifying the context parameter in the training script.

[0150] 3. Inference Interface

[0151] 1) Process orchestration interface

[0152] The computer vision system supports workflow orchestration, and its interface meets the following requirements:

[0153] It should be able to combine key processes such as image acquisition, image decoding, image scaling, object detection, image cropping, image classification, and serialization;

[0154] It is advisable to support the plug-in approach for key processes, with configurable properties for each plug-in.

[0155] Users should be able to mount metadata.

[0156] It should support configuration file-based process orchestration management and have management components;

[0157] It should support specifying accelerators for particular processes;

[0158] It should support orchestration of multiple request and multiple output processes;

[0159] The following models should be supported for orchestration: YOLOv3, YOLOv3-tiny, ResNet50, Faster R-CNN, YOLOv4, SSD-VGG16, SSD MobileNet v1 FPN, CRNN, YOLOv5, Faster R-CNN-FPN / Cascade R-CNN-FPN, ResNet-18, DeepLabv3+, CTPN, DeepLabv3, BERT-Base (Uncased), DeepLabv3+, U-Net, Mask R-CNN, FaceNet, SSD MobileNet v1 FPN, OpenPose, Unet++, RetinaNet;

[0160] It should support single-input, single-output, multiple-input, and multiple-output orchestration.

[0161] 2) Data processing interface

[0162] The computer vision system has a data processing interface that meets the following requirements:

[0163] It should support reading data from image files and moving it into a pre-configured cache;

[0164] It should support JPG / JPEG / BMP format image decoding, with a resolution range of (32*32, 8192*8192);

[0165] It should at least support JPG image encoding, with a resolution range of (32*32, 8192*8192);

[0166] Image scaling with specified target width and height should be supported, and image width and height should be scaled and aligned to the step size.

[0167] It should support specifying the expansion ratio in the four directions (top, bottom, left, and right) to expand the area of ​​the target bounding box for cropping.

[0168] It should support H264 / H265 video decoding, with a resolution range of (128*128, 4096*4096).

[0169] Width and height scaling alignment should be supported for step-based scaling.

[0170] It should support resolutions ranging from 128*128 to 1920*1920, and both H264 MP and H265 MP.

[0171] Image normalization, center cropping, affine transformation, and rotation should be supported;

[0172] It should support data transfer between key processes and preferably support the multiple distribution of a single input;

[0173] It should support data transfer between processor memory and main memory;

[0174] It should support frame skipping processing of video data;

[0175] Serialization should be supported.

[0176] 3) Plug-in interface

[0177] The computer vision system should support the development and use of visual processing plugins and meet the following requirements: it should support user-developed plugins, registration, and compilation; it should support interfaces for plugin initialization, deinitialization, execution, attribute registration, and retrieval; it should support interfaces for defining variable and immutable ports for plugin input and output; and it should support interfaces for defining and throwing business logic exceptions.

[0178] It should support a streaming plugin interface to achieve the following functions: a) sending data of a specified type or channel to different ports; b) outputting data from multiple ports in sequence through a single port.

[0179] It should support multiple instantiation interfaces for plugins of the same type; it should support plugin caching mechanisms and interfaces to enable the transfer of business data (such as decoded video and image data) between plugins; it should support description interfaces for plugin metadata (such as classification information and target information), and implement the transfer by relying on the plugin cache; it should support single-input, single-output, multi-input, and multi-output plugin interfaces; it should support inference plugin interfaces, supporting target classification, detection, and tensor-based (input) inference; it should support model post-processing plugin interfaces, enabling it to interface with models for target detection, classification, semantic segmentation, text generation, text box detection, pose detection, etc.

[0180] The video analytics plugin interface should be supported to implement the following functions: a) multi-target (including machine, non-human, and face) path recording; b) face alignment (correcting detected face images); c) video quality diagnosis.

[0181] It should support a debugging plugin interface to enable data export (e.g., JSON format) and data loading and restoration; it should also support a screen display plugin interface to enable drawing basic units on images, such as drawing frames, lines, circles, and writing text.

[0182] 4) Block detection interface

[0183] The computer vision system supports a block detection interface and meets the following requirements:

[0184] It supports filtering duplicate targets in overlapping areas after segmentation; it supports user-defined parameters such as the number / size of segments and overlap, and automatically generates target boxes for image segments; it supports merging images of segmented inference results; and during multi-level inference, it supports filtering post-processing results based on the selection of maximum and minimum area, upper and lower area limits, and confidence thresholds.

[0185] 4. Module

[0186] A Module is a fundamental module in a neural network. Neural network modules are built upon this base class to construct graphs. A Module provides the following functionalities:

[0187] 1) Forward computation of the module: a) Interface name: forward; b) Interface function description: The module performs a forward computation and returns the computation result of the module. If it is in training state, a computation graph is constructed during the forward computation process to calculate the gradient of the module parameters.

[0188] 2) Get trainable parameters of the module: a) Interface name: get_parameters; b) Interface function description: Returns the trainable parameters of the module.

[0189] 3) Retrieve Modules and Submodules: a) Interface Name: get_modules; b) Interface Function Description: Optional. This interface returns an iterator that iterates through the module itself and its submodules. Duplicate modules are returned only once.

[0190] 4) Get module state: a) Interface name: get_state_dict; b) Interface function description: Returns the module state in key-value pairs, including the module parameters and buffer.

[0191] 5) Loading module status: a) Interface name: load_state_dict; b) Interface function description: Loading module status, including module parameters and buffers.

[0192] 6) Module Backward Computation: a) Interface Name: backward(grad_input, grad_output); b) Interface Function Description: The module performs a backward computation and returns the computation result. If in training mode, the gradient of the module parameters is calculated during the backward computation process. This function is automatically generated by the computation graph and can also be registered later using register_backward_function.

[0193] 7) Forward Computation of Module: a) Interface Name: register_backward_function; b) Interface Function Description: The module performs a backward computation and returns the computation result of the module. If it is in training state, the gradient of the module parameters is calculated during the backward computation process.

[0194] Based on the foregoing embodiments, this disclosure provides a model training device, which includes the included units and the modules included in each unit, which can be implemented by a processor in a computer device; of course, it can also be implemented by specific logic circuits; in the implementation process, the processor can be a central processing unit (CPU), a microprocessor unit (MPU), a digital signal processor (DSP), or a field programmable gate array (FPGA), etc.

[0195] Figure 5 This is a schematic diagram of the composition structure of a model training device provided in an embodiment of the present disclosure, as shown below. Figure 5 As shown, the model training device 500 includes:

[0196] The optimizer construction interface 501 is used to construct optimizer parameters and build the optimizer.

[0197] The learnable parameter update interface 502 is used to respond to the completion of backpropagation gradient calculation based on the deep learning model. The learnable parameter update interface determines the updated learnable parameters based on the learnable parameters in the optimizer and the deep learning model and the gradient information corresponding to the learnable parameters.

[0198] The learnable parameter update interface 502 is also used to obtain a deep learning model with updated parameters based on the updated learnable parameters.

[0199] In some embodiments, the optimizer construction parameters include an optimization algorithm and optimization parameters; the optimizer construction interface is used to construct an optimizer based on the optimization algorithm and optimization parameters; the optimizer is used to update the learnable parameters in the deep learning model based on the optimization strategy determined by the optimization algorithm and the optimization parameters; wherein, the optimization parameters include at least one of the following: a base learning rate, a momentum coefficient, and a parameter decay coefficient.

[0200] In some embodiments, the learnable parameter update interface is used to determine the updated learnable parameters based on the optimization algorithm, the optimization parameters, the learnable parameters in the deep learning model, and the gradient information corresponding to the learnable parameters.

[0201] In some embodiments, the optimizer construction interface is used to verify the optimization algorithm and the optimization parameters to obtain the construction parameter verification result;

[0202] The optimizer construction interface is used to construct the optimizer based on the optimization method identifier and optimization parameters carried in the optimizer construction request, provided that the construction parameter verification result indicates successful verification.

[0203] The optimizer construction interface is used to generate construction failure exception information when the construction parameter verification result indicates that the verification has failed.

[0204] In some embodiments, characterized in that,

[0205] The learnable parameter update interface is used to receive the deep learning model and a callback function; the deep learning model includes the learnable parameters and the gradient information corresponding to the learnable parameters;

[0206] The learnable parameter update interface is used to determine the updated learnable parameters based on the optimization strategy corresponding to the optimizer, the learnable parameters, and the gradient information corresponding to the learnable parameters.

[0207] The learnable parameter update interface is used to obtain an intermediate deep learning model based on the updated learnable parameters.

[0208] The learnable parameter update interface is used to perform at least one parameter fine-tuning process on the intermediate deep learning model through the callback function, and to determine the intermediate deep learning model after the at least one parameter fine-tuning process as the parameter-updated deep learning model.

[0209] The parameter fine-tuning process includes determining a new loss function value based on the intermediate deep learning model, and adjusting the model parameters of the intermediate deep learning model based on the new loss function value.

[0210] In some embodiments, the learnable parameter update interface is used to generate a first model definition error exception message in response to a failure to parse the deep learning model.

[0211] The learnable parameter update interface is used to generate model gradient anomaly information in response to the failure to obtain gradient information corresponding to the learnable parameters in the deep learning model.

[0212] The learnable parameter update interface is used to generate parameter update exception information in response to the failure to update the learnable parameters in the deep learning model.

[0213] In some embodiments, the apparatus further includes: a learnable parameter gradient zeroing interface 503, used to, in response to obtaining a deep learning model with updated network parameters, zero out the gradient information of the deep learning model with updated network parameters based on the optimizer, to obtain a target deep learning model.

[0214] In some embodiments, the learnable parameter gradient zeroing interface is used to receive the deep learning model after updating the network parameters; the deep learning model after updating the network parameters includes the updated learnable parameters and the gradient information corresponding to the updated learnable parameters.

[0215] The learnable parameter gradient zeroing interface is used to zero out the gradient information corresponding to the updated learnable parameters, and to determine the target deep learning model based on the updated learnable parameters and the zeroed gradient information corresponding to the updated learnable parameters.

[0216] In some embodiments, the apparatus further includes:

[0217] The learnable parameter update interface is used to generate a second model definition error exception message in response to the failure to parse the deep learning model after updating the network parameters.

[0218] The descriptions of the apparatus embodiments above are similar to those of the method embodiments above, and have similar beneficial effects. In some embodiments, the functions or modules included in the apparatus provided by this disclosure can be used to perform the methods described in the method embodiments above. For technical details not disclosed in the apparatus embodiments of this disclosure, please refer to the descriptions of the method embodiments of this disclosure for understanding.

[0219] It should be noted that, in the embodiments of this disclosure, if the above-described model training method is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the embodiments of this disclosure, or the part that contributes to related technologies, can be embodied in the form of a software product. This software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the methods described in the various embodiments of this disclosure. The aforementioned storage medium includes various media capable of storing program code, such as a USB flash drive, external hard drive, read-only memory (ROM), magnetic disk, or optical disk. Thus, the embodiments of this disclosure are not limited to any specific hardware, software, or firmware, or any combination of hardware, software, and firmware.

[0220] This disclosure provides a computer device including a memory and a processor. The memory stores a computer program that can run on the processor. When the processor executes the program, it implements some or all of the steps in the above-described method.

[0221] This disclosure provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements some or all of the steps in the above-described method. The computer-readable storage medium may be transient or non-transient.

[0222] This disclosure provides a computer program including computer-readable code, wherein when the computer-readable code is executed in a computer device, a processor in the computer device performs some or all of the steps in the above-described method.

[0223] This disclosure provides a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program. When the computer program is read and executed by a computer, it implements some or all of the steps in the above-described method. This computer program product can be implemented specifically through hardware, software, or a combination thereof. In some embodiments, the computer program product is specifically embodied as a computer storage medium; in other embodiments, the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc.

[0224] It should be noted that the descriptions of the various embodiments above tend to emphasize the differences between them, while their similarities or commonalities can be referenced interchangeably. The descriptions of the above embodiments of the device, storage medium, computer program, and computer program product are similar to the descriptions of the above method embodiments and have similar beneficial effects. For technical details not disclosed in the embodiments of the device, storage medium, computer program, and computer program product of this disclosure, please refer to the descriptions of the method embodiments of this disclosure for understanding.

[0225] Figure 6 This is a schematic diagram of the hardware entity of a computer device provided in an embodiment of the present disclosure, such as... Figure 6 As shown, the hardware entity of the device 600 includes a processor 601 and a memory 602, wherein the memory 602 stores a computer program that can run on the processor 601, and the processor 601 executes the program to implement the steps in the method of any of the above embodiments.

[0226] The memory 602 stores computer programs that can run on the processor. The memory 602 is configured to store instructions and applications that can be executed by the processor 601. It can also cache data to be processed or already processed by the processor 601 and the various modules in the device 600 (e.g., image data, audio data, voice communication data and video communication data). It can be implemented by flash memory or random access memory (RAM).

[0227] When processor 601 executes a program, it implements the steps of any of the above-mentioned model training methods. Processor 601 typically controls the overall operation of device 600.

[0228] This disclosure provides a computer storage medium storing one or more programs that can be executed by one or more processors to implement the steps of the model training method as described in any of the above embodiments.

[0229] It should be noted that the descriptions of the storage medium and device embodiments above are similar to those of the method embodiments above, and have similar beneficial effects. For technical details not disclosed in the storage medium and device embodiments of this disclosure, please refer to the descriptions of the method embodiments of this disclosure for understanding.

[0230] The aforementioned processor can be at least one of the following: Application Specific Integrated Circuit (ASIC), Digital Signal Processor (DSP), Digital Signal Processing Device (DSPD), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), Central Processing Unit (CPU), Controller, Microcontroller, and Microprocessor. It is understood that other electronic devices can also implement the functions of the aforementioned processor, and this disclosure does not specifically limit the specific implementation.

[0231] The aforementioned computer storage media / memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), magnetic random access memory (FRAM), flash memory, magnetic surface memory, optical disc, or compact disc read-only memory (CD-ROM), etc.; or it can be various terminals that include one or any combination of the above-mentioned memories, such as mobile phones, computers, tablet devices, personal digital assistants, etc.

[0232] It should be understood that the phrase "an embodiment" or "one embodiment" throughout the specification means that a specific feature, structure, or characteristic related to the embodiment is included in at least one embodiment of this disclosure. Therefore, "in one embodiment" or "one embodiment" appearing throughout the specification does not necessarily refer to the same embodiment. Furthermore, these specific features, structures, or characteristics can be combined in any suitable manner in one or more embodiments. It should be understood that in the various embodiments of this disclosure, the sequence numbers of the above steps / processes do not imply a sequential order of execution; the execution order of each step / process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this disclosure. The sequence numbers of the above embodiments of this disclosure are merely descriptive and do not represent the superiority or inferiority of the embodiments.

[0233] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.

[0234] In the several embodiments provided in this disclosure, it should be understood that the disclosed devices and methods can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of units is only a logical functional division, and in actual implementation, there may be other division methods, such as: multiple units or components may be combined, or integrated into another system, or some features may be ignored or not executed. In addition, the coupling, direct coupling, or communication connection between the various components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.

[0235] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units. They may be located in one place or distributed across multiple network units. Some or all of the units may be selected to achieve the purpose of this embodiment according to actual needs.

[0236] In addition, each functional unit in the various embodiments of this disclosure can be integrated into one processing unit, or each unit can be a separate unit, or two or more units can be integrated into one unit; the integrated unit can be implemented in hardware or in the form of hardware plus software functional units.

[0237] Those skilled in the art will understand that all or part of the steps of the above method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it performs the steps of the above method embodiments. The aforementioned storage medium includes various media that can store program code, such as mobile storage devices, read-only memory (ROM), magnetic disks, or optical disks.

[0238] Alternatively, if the integrated units described above are implemented as software functional modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this disclosure, or the part that contributes to related technologies, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the methods described in the various embodiments of this disclosure. The aforementioned storage medium includes various media capable of storing program code, such as mobile storage devices, ROM, magnetic disks, or optical disks.

[0239] The above description is merely an embodiment of this disclosure, but the scope of protection of this disclosure is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this disclosure should be included within the scope of protection of this disclosure.

Claims

1. A model training method applied to a computer vision system, wherein the computer vision system includes at least an optimizer construction interface and a learnable parameter update interface; characterized in that, The method includes: The optimizer construction interface is based on optimization algorithms and optimization parameters corresponding to different training tasks to construct optimizers; the optimizer is used to update the learnable parameters in the deep learning model based on the optimization strategy determined by the optimization algorithm and the optimization parameters; wherein, the optimization parameters include at least one of the following: base learning rate, momentum coefficient and parameter decay coefficient; the training tasks include at least: object detection, object classification, pose detection, multi-object path recording, face alignment and video quality diagnosis; In response to the completion of backpropagation gradient calculation based on the deep learning model, the learnable parameter update interface determines the updated learnable parameters based on the learnable parameters in the optimizer and the deep learning model and the gradient information corresponding to the learnable parameters; The learnable parameter update interface obtains the updated deep learning model based on the updated learnable parameters.

2. The method according to claim 1, characterized in that, The learnable parameter update interface determines the updated learnable parameters based on the learnable parameters in the optimizer and the deep learning model, and the gradient information corresponding to the learnable parameters, including: The learnable parameter update interface determines the updated learnable parameters based on the optimization algorithm, the optimization parameters, the learnable parameters in the deep learning model, and the gradient information corresponding to the learnable parameters.

3. The method according to claim 1, characterized in that, The optimizer construction interface constructs an optimizer based on optimization algorithms and parameters corresponding to different training tasks, including: The optimizer construction interface verifies the optimization algorithm and the optimization parameters to obtain the construction parameter verification result; If the verification result of the construction parameters indicates that the verification is successful, the optimizer construction interface constructs the optimizer based on the optimization method identifier and optimization parameters carried in the optimizer construction request; If the verification result of the construction parameters indicates a failure, the optimizer construction interface generates a construction failure exception message.

4. The method according to any one of claims 1 to 3, characterized in that, The learnable parameter update interface determines the updated learnable parameters based on the learnable parameters in the optimizer and the deep learning model, and the gradient information corresponding to the learnable parameters, including: The learnable parameter update interface receives the deep learning model and a callback function; the deep learning model includes the learnable parameters and the gradient information corresponding to the learnable parameters; The learnable parameter update interface determines the updated learnable parameters based on the optimization strategy corresponding to the optimizer, the learnable parameters, and the gradient information corresponding to the learnable parameters. The learnable parameter update interface, based on the updated learnable parameters, obtains a deep learning model with updated parameters, including: The learnable parameter update interface obtains an intermediate deep learning model based on the updated learnable parameters; The learnable parameter update interface performs at least one parameter fine-tuning process on the intermediate deep learning model through the callback function, and determines the intermediate deep learning model after the at least one parameter fine-tuning process as the parameter-updated deep learning model. The parameter fine-tuning process includes determining a new loss function value based on the intermediate deep learning model, and adjusting the model parameters of the intermediate deep learning model based on the new loss function value.

5. The method according to claim 4, characterized in that, The method further includes: The learnable parameter update interface generates a first model definition error exception message in response to the failure to parse the deep learning model. The learnable parameter update interface generates model gradient anomaly information in response to the failure to obtain gradient information corresponding to the learnable parameters in the deep learning model. The learnable parameter update interface generates parameter update exception information in response to failure to update learnable parameters in the deep learning model.

6. The method according to any one of claims 1 to 3, characterized in that, The method further includes: The learnable parameter gradient zeroing interface responds to the deep learning model after obtaining updated network parameters by clearing the gradient information of the deep learning model after updating network parameters based on the optimizer, thereby obtaining the target deep learning model.

7. The method according to claim 6, characterized in that, The learnable parameter gradient zeroing interface clears the gradient information of the deep learning model after updating the network parameters based on the optimizer, including: The learnable parameter gradient zeroing interface receives the deep learning model after updating the network parameters; the deep learning model after updating the network parameters includes the updated learnable parameters and the gradient information corresponding to the updated learnable parameters. The learnable parameter gradient zeroing interface zeroes the gradient information corresponding to the updated learnable parameters, and determines the target deep learning model based on the updated learnable parameters and the zeroed gradient information corresponding to the updated learnable parameters.

8. The method according to claim 7, characterized in that, The method further includes: The learnable parameter update interface generates a second model definition error exception message in response to the failure to parse the deep learning model after updating the network parameters.

9. A model training apparatus, applied to a computer vision system, wherein the computer vision system includes at least an optimizer construction interface and a learnable parameter update interface; characterized in that, include: The optimizer construction interface is used to construct optimizers based on optimization algorithms and optimization parameters corresponding to different training tasks; The optimizer is used to update the learnable parameters in the deep learning model based on the optimization algorithm and the optimization parameters and the optimization strategy determined by the optimization strategy; wherein, the optimization parameters include at least one of the following: base learning rate, momentum coefficient and parameter decay coefficient; the training task includes at least: object detection, object classification, pose detection, multi-object path recording, face alignment and video quality diagnosis; The learnable parameter update interface is used to respond to the completion of backpropagation gradient calculation based on the deep learning model. The learnable parameter update interface determines the updated learnable parameters based on the learnable parameters in the optimizer and the deep learning model and the gradient information corresponding to the learnable parameters. The learnable parameter update interface is also used to obtain a deep learning model with updated parameters based on the updated learnable parameters.

10. The apparatus according to claim 9, characterized in that, The learnable parameter update interface is used to determine the updated learnable parameters based on the optimization algorithm, the optimization parameters, the learnable parameters in the deep learning model, and the gradient information corresponding to the learnable parameters.

11. The apparatus according to claim 9, characterized in that, The optimizer construction interface is used to verify the optimization algorithm and the optimization parameters, and obtain the construction parameter verification result; The optimizer construction interface is used to construct the optimizer based on the optimization method identifier and optimization parameters carried in the optimizer construction request, provided that the construction parameter verification result indicates successful verification. The optimizer construction interface is used to generate construction failure exception information when the construction parameter verification result indicates that the verification has failed.

12. The apparatus according to any one of claims 9 to 11, characterized in that, The learnable parameter update interface is used to receive the deep learning model and a callback function; the deep learning model includes the learnable parameters and the gradient information corresponding to the learnable parameters; The learnable parameter update interface is used to determine the updated learnable parameters based on the optimization strategy corresponding to the optimizer, the learnable parameters, and the gradient information corresponding to the learnable parameters. The learnable parameter update interface is used to obtain an intermediate deep learning model based on the updated learnable parameters. The learnable parameter update interface is used to perform at least one parameter fine-tuning process on the intermediate deep learning model through the callback function, and to determine the intermediate deep learning model after the at least one parameter fine-tuning process as the parameter-updated deep learning model. The parameter fine-tuning process includes determining a new loss function value based on the intermediate deep learning model, and adjusting the model parameters of the intermediate deep learning model based on the new loss function value.

13. The apparatus according to claim 12, characterized in that, The learnable parameter update interface is used to generate a first model definition error exception message in response to the failure to parse the deep learning model. The learnable parameter update interface is used to generate model gradient anomaly information in response to the failure to obtain gradient information corresponding to the learnable parameters in the deep learning model. The learnable parameter update interface is used to generate parameter update exception information in response to the failure to update the learnable parameters in the deep learning model.

14. The apparatus according to any one of claims 9 to 11, characterized in that, The device further includes: The learnable parameter gradient zeroing interface is used to respond to the deep learning model after obtaining updated network parameters, and to zero out the gradient information of the deep learning model after updating network parameters based on the optimizer to obtain the target deep learning model.

15. The apparatus according to claim 14, characterized in that, The learnable parameter gradient zeroing interface is used to receive the deep learning model after updating the network parameters; the deep learning model after updating the network parameters includes the updated learnable parameters and the gradient information corresponding to the updated learnable parameters. The learnable parameter gradient zeroing interface is used to zero out the gradient information corresponding to the updated learnable parameters, and to determine the target deep learning model based on the updated learnable parameters and the zeroed gradient information corresponding to the updated learnable parameters.

16. The apparatus according to claim 15, characterized in that, The device further includes: The learnable parameter update interface is used to generate a second model definition error exception message in response to the failure to parse the deep learning model after updating the network parameters.

17. A computer device comprising a memory and a processor, the memory storing a computer program executable on the processor, characterized in that, When the processor executes the program, it implements the steps of the method according to any one of claims 1 to 8.

18. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the computer program implements the steps of the method according to any one of claims 1 to 8.