Neural network model training method and device based on memory-computing integrated chip

By designing a configurable in-memory computing simulator and a quantitative perception training method, combined with teacher network fine-tuning, the problem of decreased accuracy of neural networks on in-memory computing chips was solved, achieving higher inference accuracy and adaptability.

CN122242636APending Publication Date: 2026-06-19CHINA NANHU ACAD OF ELECTRONICS & INFORMATION TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA NANHU ACAD OF ELECTRONICS & INFORMATION TECH
Filing Date
2024-12-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing neural network training methods are mainly based on the traditional von Neumann architecture, which leads to a decrease in accuracy when applied to in-memory computing chips, making it difficult to fully utilize the advantages of hardware performance and to flexibly configure parameters, thus failing to meet the needs of different types of in-memory computing chips.

Method used

Design a configurable in-memory computing simulator that combines quantized perception training and teacher network fine-tuning. Fine-tune the student network through a quantized perception strategy, and use the quantized perception network and the final teacher network to generate high-quality guidance signals to optimize the student network model. Construct a joint cross-entropy loss and KL divergence as the final loss function for iterative training.

Benefits of technology

It improves the inference accuracy of neural networks on in-memory computing chips, adapts to the characteristics of different types of in-memory computing chips, fully leverages the potential of neural network models, and enhances the accuracy and adaptability of the models.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242636A_ABST
    Figure CN122242636A_ABST
Patent Text Reader

Abstract

This invention provides a neural network model training method based on an in-memory computing chip. The method includes: inputting training data into a student network and a teacher network for gradient updates to obtain initial student and teacher network models; quantizing the teacher network model, loading the quantized model through an in-memory computing simulator to generate a hardware-optimized final teacher model; inputting a query set into the final teacher network model to generate a high-quality guidance signal with hardware-aware characteristics, used for distillation training to update the student network; inputting the query set into the student network and the final teacher network, calculating the KL divergence and combining it with classification loss to obtain the final loss function, and updating the student network. This method improves the inference accuracy of the student model through quantization and hardware optimization, narrows the accuracy gap between the student and teacher models, and enhances the inference performance of the model on the in-memory computing chip.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, specifically to a method and apparatus for training neural network models based on in-memory computing chips. Background Technology

[0002] With the rapid development of artificial intelligence and deep learning, especially in fields such as image recognition and natural language processing, the demand for computing power is constantly increasing. The traditional von Neumann architecture, due to its separate computing and storage units, suffers from increasingly significant data transmission latency and power consumption issues, failing to meet the requirements for efficiently processing large-scale data. To address this problem, Compute-in-Memory (CIM) architecture, as an emerging computing paradigm, integrates computing and storage within a single chip, effectively reducing data transmission latency and improving computing efficiency.

[0003] However, existing neural network training methods are mainly based on the traditional von Neumann architecture, which leads to a decrease in accuracy when applied to in-memory computing chips, making it difficult to fully utilize the hardware's performance advantages. Current research attempts to introduce factors such as quantization error and noise to better simulate the working environment of in-memory computing chips. However, these methods often fail to fully reflect the complexity of in-memory computing chip architectures. Numerous factors affect the inference accuracy of neural networks, including the non-ideal characteristics of the device, architecture design, analog-to-digital converter (ADC) design, data storage methods, quantization strategies, and the accuracy of the input data.

[0004] Currently, most research focuses on one or a few factors, lacking in-depth exploration of the overall characteristics of in-memory computing architectures. Furthermore, traditional training methods often face problems of insufficient accuracy and low efficiency when ported to in-memory computing architectures, and they cannot flexibly configure parameters to meet the needs of different types of in-memory computing chips.

[0005] Therefore, developing efficient neural network training methods for in-memory computing architectures has become a key focus in this field. By using configurable in-memory computing architecture simulators, combined with quantized perceptual training and fine-tuning of the teacher network, it is possible to better adapt to the characteristics of in-memory computing chips, thereby effectively improving the inference accuracy of neural networks in practical applications. Summary of the Invention

[0006] This invention aims to at least partially solve one of the technical problems in the aforementioned technologies. Therefore, one objective of this invention is to provide a neural network model training method based on an in-memory computing chip, designing a configurable in-memory computing simulator, and combining quantitative perception training with teacher network fine-tuning to better adapt to the characteristics of the in-memory computing chip and improve the inference accuracy of the neural network.

[0007] A neural network training method based on a memory computing chip includes:

[0008] Obtain the training set, wherein the training set includes the query set;

[0009] The training set is input into the student network and the teacher network, and the traditional neural network training method is used for preliminary training to obtain an initial student model and a teacher network with a larger capacity; the model capacity and accuracy of the teacher network are higher than those of the student network.

[0010] The teacher network was loaded using an in-memory computing simulator to obtain the final teacher model after quantization and hardware optimization.

[0011] The query set is input into the final teacher network to generate a high-quality guidance signal with hardware characteristics, which is used to fine-tune the initial student network.

[0012] The student network is fine-tuned using a quantization-aware strategy. The query set is input into the student network and the simulator to obtain the first and second output results. The KL divergence is calculated, and the final loss function is constructed by combining the classification loss and the KL divergence to update the initial student network. The above steps are repeated to iteratively train the student network model to achieve a balance between accuracy and hardware adaptation loss.

[0013] Furthermore, the fine-tuning of the student network using a quantization-based perception strategy further includes:

[0014] The final teacher network model output is used as a teacher guidance signal, such as a probability distribution, to update the parameters of the student model to reduce quantization error and increase hardware awareness.

[0015] During the fine-tuning process, the high-quality signals generated by the quantized sensing network and the final teacher network are used to jointly optimize and complete the training of the student model. The training process uses cross-entropy loss and KL divergence as the final loss function.

[0016] Furthermore, the in-memory computing simulator includes:

[0017] The hardware configuration loading module can load user-defined chip architecture data, and set weight storage precision and storage format, word line bit line number and ADC precision, etc.

[0018] The data input module performs input data conversion, selects the data storage format, and inputs the data into the storage array;

[0019] The weight conversion module reshapes the weights of the four-dimensional neural network into a two-dimensional array suitable for the in-memory computing chip format, and calculates the weight sum. The converted weights correspond to the in-memory computing integrated cross array, completing the mapping from weights to the in-memory computing array. Each cross point of the in-memory computing array is a storage unit.

[0020] The calculation module simulates the in-memory cross array calculation mode, implements matrix multiplication calculation, performs convolution operations in neural networks, and processes the output size and bias terms. Since actual in-memory computing devices such as MRAM and RRAM have signal attenuation problems during the calculation process, a signal attenuation module has been added to the simulator. The calculation results are output through the bit line and enter the ADC processing module.

[0021] The ADC module truncates or scales the output of the calculation module to simulate quantization loss. The module supports at least two operating modes, including direct truncation mode and scaling mode.

[0022] Furthermore, the ADC module includes two operating modes: a direct truncation mode and a scaling processing mode.

[0023] In the direct truncation mode, the computation results obtained from the computation array are processed according to the following formula:

[0024] Y res =clip(Y, -2) a-1 ,2 a-1 )

[0025] Where clip is the truncation function, Y is the array calculation result, and a is the ADC precision, the calculation result of Y is directly truncated within the corresponding interval;

[0026] The scaling processing mode processes the computation results obtained from the computation array according to the following formula:

[0027]

[0028] Where b is the maximum data bit width of the calculated result Y, and a is the accuracy of the ADC.

[0029] Furthermore, the hardware configuration loading module supports various hardware features of in-memory computing architectures, including word line and bit line configuration, ADC accuracy configuration, and signal attenuation parameter configuration.

[0030] Furthermore, the final loss function is obtained according to the following formula:

[0031]

[0032] Where L represents the joint optimization function of distillation training and quantization-aware training; C is the number of categories, and y i p represents the probability distribution corresponding to the true labels. i P(x) represents the predicted probability of the model output for category i; P(x) and Q(x) represent the probabilities of the teacher network and student network outputs, respectively; α and β are loss adjustment factors.

[0033] Another aspect of the present invention provides a neural network training device based on a memory computing chip, comprising:

[0034] The acquisition module is used to acquire the training dataset, wherein the training dataset includes a query set;

[0035] A building module is used to build a student network module and a teacher network model, wherein the teacher network model includes an initial teacher network model and a teacher network model loaded by the simulator;

[0036] The first update module is used to input the training set into the student model and the teacher network model to perform gradient updates and obtain the initial network model.

[0037] The simulator module is used to load the teacher network into the hardware simulator and generate high-quality guidance signals to guide the student network in gradient updates.

[0038] The knowledge distillation module is used to input the training dataset into the student network module and the teacher network model loaded by the simulator to obtain corresponding first and second output results, and to obtain KL divergence based on the first and second output results, and to obtain the final loss function based on the KL divergence, so as to update the student network module according to the final loss function; and to iteratively train the student network model to obtain a trained student network model.

[0039] Furthermore, the knowledge distillation module fine-tunes the student model according to the following formula:

[0040]

[0041] Where L represents the joint optimization function of distillation training and quantization-aware training; C is the number of categories, and y i p represents the probability distribution corresponding to the true labels. i The model outputs the predicted probability for category i; P(x) and Q(x) represent the probabilities output by the teacher network and the student network, respectively.

[0042] Another aspect of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, when the processor executes the program, it implements the steps of the neural network training method based on a memory-computing chip described above.

[0043] Compared with the prior art, the present invention has the following beneficial effects:

[0044] The neural network training method based on in-memory computing chips of this invention, by designing a configurable in-memory computing architecture simulator and combining quantitative perceptual training and fine-tuning methods of the teacher network, can more comprehensively perceive the error characteristics of in-memory computing hardware during neural network inference. This allows for better adaptation to the characteristics of in-memory computing chips, thereby effectively improving the inference accuracy of neural networks in practical applications. The parameters in the in-memory computing chip simulator are adjustable, allowing users to set relevant parameters according to their existing in-memory computing chips, thereby achieving targeted neural network model training, fully utilizing the potential of the neural network model, and improving accuracy. Attached Figure Description

[0045] Figure 1 This is a flowchart of a neural network training method based on a memory computing chip according to an embodiment of the present invention;

[0046] Figure 2 This is an example diagram illustrating the workflow of the in-memory computing simulator according to an embodiment of the present invention;

[0047] Figure 3 This is a framework diagram of a neural network training device based on a memory computing chip according to an embodiment of the present invention. Detailed Implementation

[0048] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0049] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to limit the invention.

[0050] This invention primarily addresses the poor compatibility between existing neural network training methods and in-memory computing (IMC) architecture chips. Specifically, most existing neural network training methods are based on traditional von Neumann architectures (such as CPUs or GPUs), which are not fully compatible with newer IMC architectures. Although IMC chips have great potential in terms of computing speed, efficiency, and energy efficiency, they are currently still in the experimental stage, supporting relatively low precision (typically below 8 bits). Furthermore, IMC arrays are based on analog computing, which suffers from signal attenuation and ADC accuracy limitations. Therefore, traditional neural network models and quantization models cannot fully adapt to and be compatible with IMC architectures. To solve this problem, this invention proposes combining quantization-aware training and distillation training to train the model, and optimizing the teacher model in the distillation process. By modeling the IMC array architecture, computation method, and ADC calculation through an IMC simulator, the loaded teacher model can fully perceive the hardware characteristics. Based on this, high-quality guidance signals are generated, enabling the student model to better simulate the behavior of the teacher model and fully perceive quantization errors and other hardware-related errors. This method improves the inference accuracy of the model on IMC chips.

[0051] This invention comprises two core modules: a quantization-aware distillation module and an in-memory computing simulator module. To achieve a close integration of quantization and knowledge distillation, an intuitive approach is to directly utilize knowledge distillation during quantization-aware training. However, while distillation can compress the model, the directly distilled quantized model fails to fully perceive hardware errors, and the difference in representational capabilities between the teacher model and the low-precision student model makes the simple combination less than satisfactory. To address this issue, this invention proposes using an in-memory computing hardware simulator to model inference errors. The loaded teacher model can fully perceive hardware characteristics and generate high-quality guidance signals. Through distillation training, the student network can better perceive hardware characteristics. This stage aims to transfer the knowledge and hardware characteristics of the teacher model to the low-precision student model, enabling it to achieve performance comparable to the teacher model while maintaining lower computational and storage requirements. Compared to traditional knowledge distillation methods, this invention allows the student network to better learn the behavior of the teacher model, not only improving model accuracy but also enabling the student network to learn hardware characteristics, thus enhancing the inference accuracy of the neural network on an in-memory computing chip. Furthermore, the method of this invention has good generalization ability and can adapt to different datasets and model types.

[0052] To better understand the above technical solutions, the following will provide a detailed explanation of the technical solutions in conjunction with the accompanying drawings and specific implementation methods.

[0053] Example 1

[0054] Figure 1This is a flowchart of a neural network training method based on a memory computing chip according to an embodiment of the present invention. The neural network training method based on a memory computing chip in this embodiment includes the following steps:

[0055] Step 1: Obtain the training dataset, which includes the query set.

[0056] It's important to note that the training dataset can be obtained from two widely recognized benchmark datasets: CIFAR-10 or ImageNet. CIFAR-10 is the most classic classification dataset, containing a much smaller number of images for each class compared to ImageNet, making it suitable for validating the effectiveness of the method. ImageNet, on the other hand, is a large-scale dataset, suitable for validating the model's adaptability to complex tasks. Alternatively, you can collect your own task-specific target dataset to make the training more targeted.

[0057] Step 2: Construct student network models and teacher network models.

[0058] As one example, the student and teacher network models can be any of VGG-8, VGG-16, ResNet20, or ResNet32; this example does not impose a specific limitation. Inputting the dataset into the student and teacher networks yields a pre-trained, full-precision network model.

[0059] Step 3: The hardware simulator generates the teacher signal.

[0060] The quantization method is determined, with options including symmetric and asymmetric quantization. A hardware simulator is used to automatically perform post-training quantization on the teacher's network model, fixing quantization parameters such as the scaling factor S and the zero point. The simulator's parameters are configured to correspond with the in-memory computing hardware used. After configuration, inputting query set data generates hardware-aware output signals such as probability distributions. These hardware-aware signals will be used to guide the training of the student model.

[0061] Configure emulator-related parameters, specifically including:

[0062] a. Word line bits and bit line bits, 64, 128 or 256, etc., represent the size of the memory array core.

[0063] b. Analog-to-digital converter (ADC) and digital-to-analog converter (DAC) precision, selectable as 4, 6, or 8 bits.

[0064] c. Bit line precision configuration, selectable range is 1-8.

[0065] d. Data storage format, which can be either two's complement or original code.

[0066] e. The signal attenuation parameter `decay` represents the degree of current attenuation for each bit line, ranging from 0 to 1, and is used to simulate current attenuation.

[0067] Step 4: The student model performs forward propagation to generate output, and quantizes it in each forward propagation.

[0068] Step 5: Calculate the difference between the student model output and the teacher network model output (quantized by a hardware simulator). The loss function can combine cross-entropy loss and distillation loss (such as KL divergence). Calculate the gradient based on the distillation loss and quantization error, and update the student model parameters.

[0069] Step 6: After training is complete, the optimized student model is deployed on the in-memory computing chip for actual inference and performance verification.

[0070] Figure 2 This is an example diagram illustrating the workflow of the in-memory computing simulator proposed in this embodiment, as follows: Figure 2 As shown, the workflow of this in-memory computing simulator includes the following steps:

[0071] Step 1: Initialize the teacher network

[0072] When training the teacher network using traditional methods, a model with a large capacity, such as ResNet32, can be selected.

[0073] Step 2: Complete the quantization of the initial teacher network model.

[0074] Since in-memory computing chips only support fixed-point computation with a precision of 8 bits or less, the initial teacher network model needs to be quantized. Symmetric or asymmetric quantization can be used, and the quantization objects include weights and activations. These data are then uniformly mapped to the data range supported by the in-memory computing chip. Secondly, quantization parameters, such as the scaling factor S and the scaling zero point (ZeroPoint), are fixed.

[0075] Step 3: Load the quantified teacher network

[0076] The quantized weight data format is converted into a format supported by the in-memory computing chip, and then the mapping is completed.

[0077] Step 4: Configure simulator parameters to generate the final teacher network.

[0078] Based on the chip parameters, configure the simulator parameters, including ADC accuracy, DAC accuracy, number of bits per input, whether the signal is signed or unsigned, and signal attenuation rate. After configuration, generate the final teacher network model.

[0079] An ADC has two operating modes: truncated mode and scaled mode. Assuming the array calculation result is Y with a precision of a bit, then in truncated mode, the ADC output result Y... res as follows:

[0080] Y res =clip(Y, -2) a-1 ,2 a-1 )

[0081] Here, clip is a truncation function that directly truncates the calculated result of Y within the corresponding interval.

[0082] Assuming the array calculation result is Y with a precision of a bit, then in scaling mode, the ADC output result Y... res as follows:

[0083]

[0084] Where b is the data bit width of the calculation result Y.

[0085] Step 5: Input data into the final teacher network model to generate high-quality guidance signals.

[0086] The query set data is input into the final teacher network model, and through inference, a final high-quality guidance signal is obtained to guide the fine-tuning of the student model.

[0087] Example 2

[0088] This embodiment provides a neural network training device based on a memory computing chip, including:

[0089] The acquisition module is used to acquire the training dataset, wherein the training dataset includes a query set;

[0090] A building module is used to build a student network module and a teacher network model, wherein the teacher network model includes an initial teacher network model and a teacher network model loaded by the simulator;

[0091] The first update module is used to input the training set into the student model and the teacher network model to perform gradient updates and obtain the initial network model.

[0092] The simulator module is used to load the teacher network into the hardware simulator and generate high-quality guidance signals to guide the student network in gradient updates.

[0093] The knowledge distillation module is used to input the training dataset into the student network module and the teacher network model loaded by the simulator to obtain corresponding first and second output results, and to obtain KL divergence based on the first and second output results, and to obtain the final loss function based on the KL divergence, so as to update the student network module according to the final loss function; and to iteratively train the student network model to obtain a trained student network model.

[0094] The knowledge distillation module fine-tunes the student model according to the following formula:

[0095]

[0096] Where L represents the joint optimization function of distillation training and quantization-aware training; C is the number of categories, and y i p represents the probability distribution corresponding to the true labels. i The model outputs the predicted probability for category i; P(x) and Q(x) represent the probabilities output by the teacher network and the student network, respectively.

[0097] Another aspect of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, when the processor executes the program, it implements the steps of the neural network training method based on a memory-computing chip described above.

[0098] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Furthermore, any references to memory, storage, databases, or other media used in the embodiments provided in this application can include non-volatile and / or volatile memory.

[0099] Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and RAMbus dynamic RAM (RDRAM), etc.

[0100] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0101] The embodiments described above are merely illustrative of several implementations of the present invention, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of the invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these modifications and improvements all fall within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the appended claims.

Claims

1. A neural network training method based on a memory computing chip, comprising: Obtain the training set, wherein the training set includes the query set; The training set is input into the student network and the teacher network, and the traditional neural network training method is used for preliminary training to obtain an initial student model and a teacher network with a larger capacity; the model capacity and accuracy of the teacher network are higher than those of the student network. The teacher network was loaded using an in-memory computing simulator to obtain the final teacher model after quantization and hardware optimization. The query set is input into the final teacher network to generate a high-quality guidance signal with hardware characteristics, which is used to fine-tune the initial student network. The student network is fine-tuned using a quantization-aware strategy. The query set is input into the student network and the simulator to obtain the first and second output results. The KL divergence is calculated, and the final loss function is constructed by combining the classification loss and the KL divergence to update the initial student network. Repeat the above steps to iteratively train the student network model until it achieves a balance between accuracy and adaptability to hardware loss.

2. The method according to claim 1, characterized in that, The fine-tuning of the student network using a quantized perception strategy further includes: The final teacher network model output is used as a teacher guidance signal, such as a probability distribution, to update the parameters of the student model to reduce quantization error and increase hardware awareness. During the fine-tuning process, the high-quality signals generated by the quantized perception network and the final teacher network are used to jointly optimize and complete the training of the student model. The training process uses the classification loss and KL divergence as the final loss function.

3. The method according to claim 1, characterized in that, The in-memory computing simulator includes: The hardware configuration loading module can load user-defined chip architecture data, and set weight storage precision and storage format, word line bit line number and ADC precision, etc. The data input module performs input data conversion, selects the data storage format, and inputs the data into the storage array for multiplication and addition calculations; The weight conversion module reshapes the weights of the four-dimensional neural network into a two-dimensional array suitable for the in-memory computing chip format, and calculates the weight sum. The converted weights correspond to the in-memory computing integrated cross array, completing the mapping from weights to the in-memory computing array. Each cross point in the in-memory computing array is regarded as a storage unit. The calculation module simulates the in-memory cross array calculation mode, implements matrix multiplication calculation, performs convolution operations in neural networks, and processes the output size and bias terms. A signal attenuation module has been added. The calculation results are output through bit lines and enter the ADC processing module. The ADC module truncates or scales the output of the calculation module to simulate quantization loss. The module supports at least two operating modes, including direct truncation mode and scaling mode.

4. The method according to claim 3, characterized in that, The ADC module includes two operating modes: direct truncation mode and scaling processing mode. In the direct truncation mode, the computation results obtained from the computation array are processed according to the following formula: AND res =clip(Y,-2 a-1 2 a-1 ) Where clip is the truncation function, Y is the array calculation result, and a is the ADC precision, the calculation result of Y is directly truncated within the corresponding interval; The scaling processing mode processes the computation results obtained from the computation array according to the following formula: Where b is the maximum data bit width of the calculated result Y, and a is the accuracy of the ADC.

5. The method according to claim 3, characterized in that, The hardware configuration loading module supports various hardware features of in-memory computing architectures, including word line and bit line configuration, ADC precision configuration, and signal attenuation parameter configuration.

6. The method according to claim 2, characterized in that, The final loss function is obtained using the following formula: Where L represents the joint optimization function of distillation training and quantization-aware training; C is the number of categories, and y i p represents the probability distribution corresponding to the true labels. i P(x) represents the predicted probability of the model output for category i; P(x) and Q(x) represent the probabilities of the teacher network and student network outputs, respectively; α and β are loss adjustment factors.

7. A neural network training device based on a memory computing chip, characterized in that, include: The acquisition module is used to acquire the training dataset, wherein the training dataset includes a query set; A building module is used to build a student network module and a teacher network model, wherein the teacher network model includes an initial teacher network model and a teacher network model loaded by the simulator; The first update module is used to input the training set into the student model and the teacher network model to perform gradient updates and obtain the initial network model. The simulator module is used to load the teacher network into the hardware simulator and generate high-quality guidance signals to guide the student network in gradient updates. The knowledge distillation module is used to input the training dataset into the student network module and the teacher network model loaded by the simulator to obtain corresponding first and second output results, and to obtain KL divergence based on the first and second output results, and to obtain the final loss function based on the KL divergence, so as to update the student network module according to the final loss function; and to iteratively train the student network model to obtain a trained student network model.

8. A neural network training device based on a memory computing chip according to claim 8, characterized in that, The knowledge distillation module fine-tunes the student model according to the following formula: Where L represents the joint optimization function of distillation training and quantization-aware training; C is the number of categories, and y i p represents the probability distribution corresponding to the true labels. i The model outputs the predicted probability for category i; P(x) and Q(x) represent the probabilities output by the teacher network and the student network, respectively.

9. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps of the neural network training method based on a memory computing chip as described in any one of claims 1-6.