Lithium battery lightweight capacity estimation method driven by hardware constraint based knowledge distillation

By converting one-dimensional time-domain data of lithium batteries into two-dimensional aging feature maps and using a pre-trained model for knowledge distillation, a lightweight student model is designed, which solves the problems of real-time performance and resource constraints in lithium battery capacity estimation and achieves efficient capacity estimation.

CN122283451APending Publication Date: 2026-06-26NORTHWESTERN POLYTECHNICAL UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NORTHWESTERN POLYTECHNICAL UNIV
Filing Date
2026-03-20
Publication Date
2026-06-26

Smart Images

  • Figure CN122283451A_ABST
    Figure CN122283451A_ABST
Patent Text Reader

Abstract

This application belongs to the field of lithium battery capacity estimation technology. This application provides a lightweight capacity estimation method for lithium batteries based on hardware-constrained knowledge distillation. The embodiments of this disclosure effectively solve the problem that complex deep learning models are difficult to deploy on resource-constrained computing platforms. By converting irregular partial charging data into a two-dimensional aging feature map, the proposed method successfully captures the aging mechanism of lithium batteries without relying on complete cycle data, thereby ensuring robust capacity estimation even under conditions of incomplete charging data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of lithium battery capacity estimation technology, and in particular to a lightweight lithium battery capacity estimation method based on hardware constraint-driven knowledge distillation. Background Technology

[0002] In recent years, lithium-ion batteries have played a crucial role in many fields, and their technical reliability and lifespan have become key factors. However, as lithium battery performance degrades, increased charge-discharge cycles lead to battery capacity decay, causing system failures and safety hazards. Therefore, accurate monitoring of battery capacity decay is of great significance.

[0003] State of Health (SOH) reflects the aging condition of a battery and is defined as the ratio of the battery's capacity during the most recent charge-discharge cycle to its initial capacity. Since it is difficult to directly measure battery capacity, the estimation process requires the use of monitorable signals. Several capacity estimation methods have been proposed, categorized into model-based and data-driven approaches.

[0004] Model-based methods construct physical and mathematical models based on the internal mechanisms and structure of batteries, and combine these with parameter estimation and filtering algorithms to predict remaining lifespan. Common models include electrochemical, equivalent circuit, and Brownian motion models. Although their accuracy has been proven, they suffer from problems such as inaccurate modeling, unstable parameter estimation, and poor model generalization ability. Furthermore, modeling requires a large amount of prior conditions and experimental data, which varies greatly between different power systems, making data acquisition difficult and limiting their wide applicability.

[0005] With the improvement of hardware parallel computing capabilities, model-based methods have become increasingly difficult to advance, making data-driven methods a potential solution. Compared to model-based methods, data-driven methods estimate capacity solely based on measurement data, establishing a mapping relationship between capacity decay and features, without requiring complex analysis and modeling. Examples include the methods of Liu et al. and Jiang et al., but these require complete charge-discharge cycles or additional impedance measurements, making online estimation impossible. Other researchers have input battery data into deep learning models to automatically extract features, such as the methods of You et al., Shen et al., and Cui et al. While data-driven methods can establish reliable mappings, they require high-quality datasets and carefully designed models.

[0006] The key to data-driven methods is extracting effective features from battery operating data; inappropriate features can negatively impact estimation. For example, Wang et al. and Yang et al. selected different features for training. Some studies have attempted to use images as input to deep networks for battery health estimation; for instance, Li et al. organized charging data into 15×15 segments. Using two-dimensional data sequences and computation with convolutional neural networks improved battery capacity estimation performance. Zhao et al. converted multi-dimensional aging features into two-dimensional grayscale images and estimated capacity using specific charging data segments; Zhang et al. expanded the data feature space using Gram angle fields, encoded the area of ​​the discharge curve, and then extracted features using a Bayesian CNN. These methods improve estimation accuracy, but deploying high-precision models in real-world BMS systems is challenging because most deep learning models have high computational and memory requirements, far exceeding the capacity of microprocessors, making local deployment for online estimation difficult. Cloud-based solutions can utilize remote servers for computation, but suffer from data transmission latency and rely on stable networks. Techniques such as knowledge distillation, pruning, and quantization provide pathways for model compression, but involve trade-offs in estimation accuracy.

[0007] Based on the above analysis, most current studies still have the following limitations in the extraction of lithium battery capacity degradation features and the design of estimation models: 1) The training data used requires complete charge-discharge cycles or additional impedance measurements, and depends on specific tests in a laboratory environment. This data is difficult to collect and process in real time to achieve online capacity estimation.

[0008] 2) Traditional manual feature extraction methods rely heavily on the prior knowledge of domain experts and are at risk of information loss. In addition, deep learning models designed based on specific features generally need to be trained from scratch on limited domain data, resulting in extremely high data dependence, computing power, and time costs.

[0009] 3) While cloud-based deployment of deep learning models can improve estimation accuracy, it cannot meet the requirements of real-time, stable, and low-latency network communication. Model compression techniques can create sufficiently efficient models that can run on resource-constrained microcontrollers, but most existing compression techniques can only compress the model to a certain ratio and do not simultaneously consider the available resources of edge devices, the desired accuracy, and the average processing time of the task.

[0010] Therefore, it is necessary to improve one or more of the problems existing in the above-mentioned related technical solutions.

[0011] It should be noted that this section is intended to provide background or context for the technical solutions of this disclosure as set forth in the claims. The description herein does not constitute an admission that it is prior art simply because it is included in this section. Summary of the Invention

[0012] The purpose of this disclosure is to provide a lightweight capacity estimation method for lithium batteries based on hardware constraint-driven knowledge distillation, thereby overcoming at least to some extent one or more problems caused by the limitations and defects of related technologies.

[0013] According to embodiments of this disclosure, a lightweight capacity estimation method for lithium batteries based on hardware constraint-driven knowledge distillation is provided, comprising: Acquire one-dimensional time-domain data generated during battery charging; Convert one-dimensional time-domain data into size-normalized two-dimensional aging feature maps; The two-dimensional aging feature map is input into a pre-trained teacher model; the teacher model includes a conv1 input layer, a convolutional backbone, and a regression output layer. Based on the hardware constraints of the target edge device, design and generate a lightweight student model that meets the hardware constraints; wherein the hardware constraints include at least the maximum available memory and the average processing time limit. Guided by the teacher model, with the goal of minimizing the combined loss function, the lightweight student model is trained through knowledge distillation to obtain a well-trained lightweight student model. Using a student model that has been trained and meets the deployment conditions of the target edge device, the battery capacity of the battery to be estimated is estimated to obtain the capacity estimation result.

[0014] Furthermore, the step of converting one-dimensional time-domain data into a size-normalized two-dimensional aging feature map includes: From one-dimensional time-domain data, select feature data segments whose correlation with battery capacity aging is higher than a preset threshold. By using voltage as the independent variable and time as the dependent variable, a polynomial fitting was performed to obtain the functional relationship between charging voltage and time. N points are uniformly sampled within a preset voltage range to generate a standard voltage sequence; Based on the functional relationship between charging voltage and time and the standard voltage sequence, the corresponding one-dimensional time series is obtained; Based on a one-dimensional time series, a two-dimensional voltage-time feature matrix and a voltage-space feature matrix are constructed. Each element in the voltage-time feature matrix is ​​determined by the logarithmic average of the i-th and j-th elements in the one-dimensional time series, and each element in the voltage-space feature matrix is ​​calculated by the ratio of the time difference to the index difference between the i-th and j-th sampling points in the time series. The voltage time features and voltage spatial features are fused to generate a hybrid feature map as a two-dimensional aging feature map.

[0015] Furthermore, the processing procedure for the trained teacher model is as follows: Perform channel adaptation and weight reconstruction on the conv1 layer: Let the weights of the conv1 layer be... Where 64 is the number of output channels, 3 is the number of input channels, and 7×7 is the kernel size; Will Calculate the mean along the input channel dimension to generate new weights adapted to single-channel input. ; The two-dimensional aging feature map is normalized and standardized preprocessed to obtain the preprocessed input feature map; The input feature map is processed through a channel-adapted conv1 layer and a convolutional backbone consisting of four residual stages to extract high-level feature representations step by step. The high-level feature representation is mapped using the regression output layer to output the initial battery capacity estimation result.

[0016] Furthermore, the step of designing and generating a lightweight student model that satisfies the hardware constraints of the target edge device includes: With the target device's maximum available memory and the maximum allowed processing time for the task As a constraint, memory usage is defined. and execution time :

[0017]

[0018] in, This represents the set of all parameters within the teacher model; Tensor The number of elements in; Indicates the size in bytes of a single element; This represents the total number of layers in the teacher model. For the first The number of floating-point operations per layer For time coefficient; Based on memory usage and execution time, a multi-constraint optimization problem is constructed, with the goal of minimizing overall resource consumption:

[0019] in, A lightweight student model. The loss of the teacher model on the training set, The preset performance threshold; By iteratively connecting and sparsifying using structured dynamic Dropout, a sparse model that meets the preset performance loss threshold constraint is obtained. The computationally intensive layers in the sparse model are weighted to further compress the model parameters and computational load until the memory usage and execution time of the student model simultaneously meet the hardware constraints, resulting in a lightweight student model that satisfies the hardware constraints.

[0020] Furthermore, the step of iteratively sparsifying the structured dynamic Dropout to obtain a sparse model that satisfies a preset performance loss threshold constraint includes: Definition of the first The weight vector corresponding to each output channel Importance score The L2 norm of the weights is used as a measure of importance.

[0021] in, Indicates the relationship with the first Vectorized weights associated with each output channel, for element index in ( ) is used as a channel importance score.

[0022] In the In this iteration, based on the dynamic Dropout rate Generate structured binary masks All weights are scored according to their importance. Sort from low to high, and select a threshold. This threshold corresponds to the sorted first... Each channel has an importance score; the mask position corresponding to the weight with an importance score lower than or equal to this threshold is set to 0, indicating that the connection will be removed; the remaining mask positions are set to 1, indicating that the connection will be retained. For weight tensors Each element in , Indicates the first The first neuron and the first The connection weights between each input, and their corresponding mask values. Generate as:

[0023] in, It is based on the Dropout rate Given a defined importance threshold, the network weights are updated as follows:

[0024] After applying the mask, the sparsed model is trained to evaluate its performance loss. If the loss is lower than a preset threshold If the iteration fails, the iteration terminates; otherwise, it continues based on the current number of connections in the network. and the number of connections after sparsification Regarding Dropout rate Update:

[0025] in, As a regulating factor; Until the loss threshold is met or the maximum number of iterations is reached. The final output is a sparse model that meets the initial performance requirements. .

[0026] Furthermore, the weight decomposition of computationally intensive layers in the sparse model to further compress model parameters and computational cost, until the student model's memory usage and execution time simultaneously meet hardware constraints, includes the following steps: For a weight tensor of The convolutional layer is decomposed into two consecutive convolutional layers with a kernel size of [missing value]. The point convolutional layer, its weights Used for dimensionality reduction; and a kernel size of The convolutional layer, its weights It is used for feature extraction and dimensionality increase; the number of parameters is determined by... Reduce to ; much smaller and The middle rank, These represent the number of input channels and the number of output channels of the convolutional layer, respectively. Indicates the size of the spatial kernel; For a weight matrix as The fully connected layer is approximated as a product of two low-rank matrices using singular value decomposition:

[0027] in, The number of singular values ​​to be retained; It is a left singular vector matrix. It is a singular value diagonal matrix. It is the transpose of the right singular vector matrix; The fully connected layer was replaced with two consecutive fully connected layers, with weights respectively. and Its parameter quantity is determined by Reduce to ; In each iteration, the sparse model Several layers in the process are weighted and their resource consumption is evaluated. and ; Until the model's resource consumption simultaneously satisfies the memory constraint. and time constraints A lightweight student model that meets the requirements is obtained. .

[0028] Furthermore, guided by the teacher model, and with the goal of minimizing the combined loss function, the lightweight student model is trained through knowledge distillation to obtain the trained lightweight student model. This process includes: Freeze the parameters of the teacher model; The first few layers of the teacher model and the student model are shared; A lightweight student model is trained on a two-dimensional aging feature map using a combined loss function that includes task loss, distillation loss, and attention loss to obtain a well-trained lightweight student model.

[0029] Furthermore, the combined loss function is:

[0030] in, For mission losses, For knowledge distillation loss, For attention transfer loss, The first weighted hyperparameter, This is the second weight hyperparameter. This is the third weight hyperparameter.

[0031] The technical solutions provided by the embodiments of this disclosure may include the following beneficial effects: In the embodiments of this disclosure, the lightweight capacity estimation method for lithium batteries based on hardware constraint-driven knowledge distillation, as described above, proposes a two-dimensional aging feature map construction method, enhancing the model's application potential. This method converts easily collectable one-dimensional time-domain data from the charging process of the lithium battery into two-dimensional image features, eliminating reliance on complete charge-discharge cycle data and effectively solving the problem of incomplete data under operating conditions, thus providing a reliable data foundation for estimating lithium battery capacity. Furthermore, it introduces a deep learning method based on transfer learning, significantly reducing the cost and data dependence of model training. By transforming the battery aging state estimation problem into a computer vision task, it directly utilizes a teacher model that has been fully pre-trained on a large image dataset. This method effectively leverages the powerful general feature extraction capabilities of the pre-trained model, avoiding the need to design and train a network from scratch for specific battery data, greatly reducing reliance on large-scale labeled data, and significantly reducing the computational and time costs required for model training. Finally, it constructs a resource-constrained lightweight model design and training framework. First, based on the maximum available memory and average processing time limitations of edge devices, a lightweight student model that meets hardware constraints is designed. Subsequently, a pre-trained teacher model is used for guided training to generate a lightweight model that meets hardware constraints while ensuring accuracy. Attached Figure Description

[0032] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure. It is obvious that the drawings described below are merely some embodiments of this disclosure, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort.

[0033] Figure 1 A flowchart illustrates the steps of a lightweight capacity estimation method for lithium batteries based on hardware constraint-driven knowledge distillation in an exemplary embodiment of this disclosure. Figure 2 A framework diagram of a lightweight capacity estimation method for lithium batteries based on hardware constraint-driven knowledge distillation in an exemplary embodiment of this disclosure is shown. Figure 3 This illustrates the CCCV charging process in an exemplary embodiment of the present disclosure; Figure 4 This illustrates the features under different cycles in exemplary embodiments of this disclosure; Figure 5 This diagram illustrates the structure of the teacher model in an exemplary embodiment of this disclosure. Figure 6 In the exemplary embodiments shown in this disclosure exist Training is conducted under the guidance of [unspecified]; Figure 7 The following are examples of the teacher model's estimation results on different batteries, as shown in the exemplary embodiments of this disclosure. Figure 8 A schematic diagram of an experimental platform in an exemplary embodiment of this disclosure is shown. Detailed Implementation

[0034] Exemplary embodiments will now be described more fully with reference to the accompanying drawings. However, these exemplary embodiments can be implemented in many forms and should not be construed as limited to the examples set forth herein; rather, they are provided so that this disclosure will be more comprehensive and complete, and will fully convey the concept of the exemplary embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

[0035] Furthermore, the accompanying drawings are merely illustrative diagrams of embodiments of this disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and therefore repeated descriptions of them will be omitted. Some block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically independent entities.

[0036] This example implementation provides a lightweight capacity estimation method for lithium batteries based on hardware constraint-driven knowledge distillation. (Reference) Figure 1 As shown, the lightweight capacity estimation method for lithium batteries based on hardware constraint-driven knowledge distillation may include: Step S1: Obtain one-dimensional time-domain data generated by the battery during charging; Step S2: Convert the one-dimensional time-domain data into a size-normalized two-dimensional aging feature map; Step S3: Input the two-dimensional aging feature map into the pre-trained teacher model; wherein the teacher model includes a conv1 input layer, a convolutional backbone, and a regression output layer; Step S4: Based on the hardware constraints of the target edge device, design and generate a lightweight student model that meets the hardware constraints; wherein the hardware constraints include at least the maximum available memory and the average processing time limit. Step S5: Under the guidance of the teacher model, with the goal of minimizing the combined loss function, the lightweight student model is trained through knowledge distillation to obtain a trained lightweight student model. Step S6: Using the trained student model that meets the deployment conditions of the target edge device, estimate the battery capacity of the battery to be estimated to obtain the capacity estimation result.

[0037] The aforementioned lightweight capacity estimation method for lithium batteries based on hardware-constrained knowledge distillation proposes two main approaches. Firstly, a method for constructing two-dimensional aging feature maps is introduced, enhancing the model's application potential. This method converts easily collected one-dimensional time-domain data from the charging process into two-dimensional image features, eliminating reliance on complete charge-discharge cycle data and effectively addressing the problem of incomplete data under complex operating conditions. This provides a reliable data foundation for estimating lithium battery capacity. Secondly, a deep learning method based on transfer learning is introduced, significantly reducing the cost and data dependence of model training. By transforming the battery aging state estimation problem into a computer vision task, a teacher model fully pre-trained on a large image dataset can be directly utilized. This method effectively leverages the powerful general feature extraction capabilities of the pre-trained model, avoiding the need to design and train networks from scratch for specific battery data. This greatly reduces reliance on large-scale labeled data and significantly lowers the computational and time costs required for model training. Thirdly, a resource-constrained lightweight model design and training framework is constructed. Firstly, based on the maximum available memory and average processing time limitations of edge devices, a lightweight student model that meets hardware constraints is designed. Subsequently, a pre-trained teacher model is used for guided training to generate a lightweight model that meets hardware constraints while ensuring accuracy.

[0038] Below, we will refer to Figures 1 to 8 The steps of the above-described hardware constraint-driven knowledge distillation-based lithium battery lightweight capacity estimation method in this example embodiment will be described in more detail.

[0039] In one embodiment, the architecture of the battery capacity estimation model corresponding to the proposed hardware-constrained knowledge distillation-based lightweight lithium battery capacity estimation method is as follows: Figure 2 As shown.

[0040] In steps S1 and S2, one-dimensional time-domain data generated by the battery during charging is acquired; the one-dimensional time-domain data is converted into a size-standardized two-dimensional aging feature map.

[0041] Specifically, aging feature map construction To effectively utilize battery charging data and transform it into input suitable for computer vision models, an aging feature map construction method is designed. This method aims to convert one-dimensional, variable-length battery charging data sequences into two-dimensional, size-normalized aging feature images. These images, within their finite size, contain rich aging feature information, which can be processed more efficiently by subsequent computer vision models. This application proposes an improved hybrid feature construction strategy to enhance the representational power of the original features.

[0042] 1) Data Selection: To verify the effectiveness and generalization ability of the proposed method, this application selected battery aging datasets publicly available from NASA's Ames Research Center, including aging data from four lithium-ion batteries (B0005, B0006, B0007, and B0018). These batteries use LiCoO2 / LiNiCoAlO2 as the positive electrode material and graphite as the anode. Figure 3 As shown, it is charged at 1.5A in constant current (CC) mode until it reaches 4.2V, and then charged in constant voltage (CV) mode until the current drops below 20mA.

[0043] Compared to the discharge process, which is closely coupled with complex and variable task loads, the battery charging process follows a more standardized and predictable protocol. To ensure the feasibility of the model in practical applications, this application uses only charging data for lithium battery capacity estimation. To select the feature segments most relevant to battery capacity aging from the charging data, this application conducts a systematic correlation analysis of voltage and current data during the charging process. Spearman's rank correlation coefficient is used to evaluate the correlation between different data segments and battery capacity decay. (Spearman's rank correlation coefficient is then described.) The calculation formula is as follows:

[0044] in, Indicates the first In each loop, the difference between the rank of the data segment feature and the rank of the corresponding capacity is... This represents the total number of effective loops.

[0045] Table 1. Rank correlation coefficients of each charging data segment

[0046] As shown in Table 1, for all selected batteries, the charging voltage data showed a higher and more stable correlation with capacity aging than the charging current data. In particular, a high correlation was observed within the voltage range of 3.8V to 4.1V. Therefore, this application selected the 3.8V-4.2V constant current charging voltage range for subsequent aging characteristic map construction.

[0047] 2) Image Feature Construction: To eliminate the influence of inconsistent data acquisition step sizes in different cycles and accurately capture the dynamic characteristics of the voltage-time curve, this application establishes the charging voltage using a polynomial fitting method. With time Functional relationship between With voltage Time is the independent variable. For the dependent variable, a polynomial fit is performed, and its functional relationship is:

[0048] in, The number of loops. are the coefficients of the polynomial.

[0049] After obtaining the functional relationship of each cycle, a standardized voltage sequence is defined to construct a size-standardized feature map. This involves uniformly sampling N points (N=256 in this application) within the range of [3.8V-4.2V] to generate a standard voltage sequence. :

[0050] Subsequently, the established functional relationships are used to obtain the corresponding time series. :

[0051] Based on the generated one-dimensional time series Construct two-dimensional voltage-time characteristics Each element in this matrix From the time series The and the first The logarithmic mean of the elements is used to determine this method. This method can smoothly reflect the numerical relationship between different points in time, and its calculation formula is as follows:

[0052] in, The value range is [1, N]. It is a very small value (e.g., 10) 8 ) It can stably characterize the changes in charging time of batteries during aging.

[0053] By calculating the ratio of the time difference to the index difference between different sampling points, the local slope variation of the charging curve is approximated, thereby capturing subtle changes in the battery's dynamic characteristics. Voltage space characteristics The calculation method is as follows:

[0054] To address the computational limitations of voltage space features and to integrate the advantages of different features, a hybrid feature approach is proposed. The construction method merges the two through a mask operation. The mask is defined based on the absolute value of the index difference |i. Is j| less than a preset width d (d=5 in this application)? Hybrid feature map The formula for generating it is as follows:

[0055] like Figure 4 As shown, by utilizing voltage-time characteristics Stability to fill voltage space features The missing information in the region includes both the overall trend of the charging curve over time and the local dynamic changes, resulting in a hybrid feature with more complete information and more robust representation.

[0056] In step S3, the two-dimensional aging feature map is input into the pre-trained teacher model.

[0057] Deep convolutional neural networks (CNNs) are widely used due to their powerful automatic feature extraction capabilities. However, training an efficient CNN model from scratch requires not only a large amount of labeled data but also enormous computational resources and time. Recent research has shown that general features learned by CNN models pre-trained on a large-scale general dataset (such as ImageNet) can be effectively transferred to other tasks. Therefore, this application uses a ResNet50 model pre-trained on the ImageNet dataset as the core feature extractor. ResNet50 solves the gradient vanishing problem in deep network training with its unique residual learning framework, allowing the network to reach deeper layers, with structures such as... Figure 5 As shown. Its core is the residual block, and the output can be represented as:

[0058] Treat the ResNet50 model pre-trained using ImageNet as a function ,in It is a set of parameters learned on the ImageNet dataset. This set of parameters can be divided into parameters for the main part of the convolution responsible for feature extraction. And the parameters of the original fully connected layer responsible for classification. The target model for battery capacity prediction was not trained from random values. Instead, parameters are initialized in the following way:

[0059] The main convolutional parameters of the pre-trained model are directly transferred to the target model. To meet the input requirements of the ResNet50 model, the feature maps... The pixel values ​​were normalized to [0,1] and used the same mean as the ImageNet dataset. and standard deviation Standardize it.

[0060]

[0061] Preprocessed blended feature map ( For the sample size, For the number of channels, For the image height and width, in this application The convolution operation process in the model for (1, 224, 224 respectively) can be represented as:

[0062] in, and These represent the weights and biases of the convolution kernel, respectively. This represents the convolution operation.

[0063] Since the generated aging feature map is a single-channel image, the first convolutional layer (conv1) of the model was modified to utilize the rich feature information contained in the pre-trained weights. Let the original weights of the conv1 layer be... Here, 64 represents the number of output channels (i.e., the number of filters), 3 represents the number of input channels (RGB), and 7×7 represents the convolution kernel size. A new weight suitable for single-channel input is generated by averaging its input channel dimensions. This allows for the conversion from three channels to a single channel without altering the output channel structure, while preserving the general representational capabilities of the pre-trained convolutional kernels to the greatest extent possible.

[0064] At the output end, to migrate the network from the ImageNet classification task to the lithium-ion battery capacity estimation regression task, this paper replaces the fully connected layers originally used for classification in ResNet50 with linear regression layers with an output dimension of 1, so that the final output of the model directly corresponds to the prediction of continuous capacity values. The calculation process is as follows:

[0065] in, and These are the weights and biases of the new fully connected layer. This is the output battery capacity estimate.

[0066] Therefore, the construction of the teacher network can be summarized as follows: Based on ImageNet pre-training, the convolutional backbone parameters of ResNet50 are transferred, and the input layer convolutional weights are reconstructed to adapt to single-channel feature maps; simultaneously, normalization and standardization preprocessing are performed on the input feature maps; then, residual convolutional backbones are used to extract high-level representations step by step; finally, the original classification output layer is replaced with a regression output layer to meet the requirements of the lithium battery capacity estimation task. This setup enables the teacher model to output more accurate estimation results on the target dataset, thereby providing more accurate and stable soft-label supervision signals for the subsequent distillation process.

[0067] To adapt to the lithium battery capacity estimation task, the original fully connected layer designed for classification was replaced with a new linear layer.

[0068] In steps S4 to S6, a lightweight student model that meets the hardware constraints of the target edge device is designed and generated. The hardware constraints include at least the maximum available memory and the average processing time limit. Under the guidance of the teacher model, the lightweight student model is trained by knowledge distillation with the goal of minimizing the combined loss function, resulting in a trained lightweight student model. The trained student model that meets the deployment conditions of the target edge device is used to estimate the battery capacity of the battery to be estimated, so as to obtain the capacity estimation result.

[0069] Specifically, a lightweight distillation framework oriented towards resource constraints. To deploy high-performance deep neural networks on resource-constrained edge devices, a two-stage lightweight model design and training framework was constructed. This framework first designs a lightweight network structure that meets the hardware constraints of the edge device, and then trains it using knowledge distillation (KD) to maximize the preservation of high-precision prediction performance while maintaining deployment efficiency.

[0070] Knowledge distillation is a technique used to enhance lightweight deep neural networks. The process of improving performance. This process utilizes large deep neural networks. To train knowledge (or generalization ability) Therefore, the model Able to imitate and Similar output patterns. This from arrive The training in knowledge distillation is sometimes referred to as teacher-student training.

[0071] A. Lightweight model design oriented towards hardware constraints A lightweight student model that meets specific hardware constraints is obtained by compressing a ResNet50 model pre-trained on an aging feature dataset. The design process utilizes the maximum available memory of the target device. (MB) and the maximum allowed processing time (MAP) for the task. (ms) is used as a constraint. Lightweight model. memory usage and execution time It can be defined as:

[0072]

[0073] in, Represents the set of all parameters within the model; Tensor The number of elements in; Indicates the size in bytes of a single element; It is the total number of layers in the model. It is the first The number of floating-point operations per layer, and It is the time factor required to execute a unit FLOP, which is related to the target hardware architecture.

[0074] Based on this, the model design process is structured as a multi-constraint optimization problem, with the goal of minimizing the overall resource consumption:

[0075] in, The loss of the model on the training set, The preset performance threshold is used. To solve this problem, this application proposes a two-step iterative compression algorithm that combines Dropout sensitivity analysis and weight decomposition, by gradually adjusting the number of compression layers. Dynamically approximate the optimal solution that satisfies the constraints.

[0076] 1) Process 1: Iterative connection sparsity reduction By employing a structured dynamic Dropout strategy based on weight magnitude, redundant feature channels in the network are identified and suppressed, thus obtaining a preliminary lightweight model structure. First, the network's... The weight vector corresponding to each output channel Importance score This application uses the L2 norm of the weights as a measure of importance:

[0077] Subsequently, in the In this iteration, based on the dynamic Dropout rate Generate structured binary masks All weights are scored according to their importance. Sort from low to high, and select a threshold. This threshold corresponds to the sorted first... Each channel has an importance score. Weights with importance scores below or equal to this threshold have their corresponding mask positions set to 0, indicating that these connections will be removed; conversely, the remaining mask positions are set to 1, indicating that these connections will be retained. Specifically, for the weight tensor... Each element in (indicating the first) The first neuron and the first (the connection weights between each input), and their corresponding mask values. The following was generated:

[0078] in It is based on the pruning rate A defined importance threshold is established. The network weight update can be expressed as:

[0079] After applying the mask, the sparsed model is trained to evaluate its performance loss. If the loss is below a preset threshold If the iteration fails, the iteration terminates. Otherwise, it continues based on the current number of connections in the network. and the number of connections after sparsification , on pruning rate Update:

[0080] in, This is an adjustment factor, used until the loss threshold is met or the maximum number of iterations is reached. The final output is a sparse model that meets the initial performance requirements. .

[0081] 2) Process 2: Resource-aware weight decomposition In obtaining sparse networks Based on this, weight decomposition is used to reconstruct the computationally intensive layers in the network, resulting in a structure that strictly satisfies hardware constraints. and The final lightweight student model .

[0082] For a weight tensor of The convolutional layer can be decomposed into two consecutive convolutional layers: one with a kernel size of [missing value]. The "point convolution" layer (weights) ), used for dimensionality reduction; and a kernel size of Convolutional layers (weights) ), used for feature extraction and dimensionality enhancement. Among them... It is much smaller and The middle rank. The number of parameters is determined by... Reduce to .

[0083] For a weight matrix as The fully connected layer is approximated as the product of two low-rank matrices using Singular Value Decomposition (SVD):

[0084] in, The number of singular values ​​to retain. This layer was then replaced with two consecutive fully connected layers with weights of [values ​​to be inserted here]. and The number of parameters is determined by... Reduce to In each iteration, for Several layers in the process are weighted and their resource consumption is evaluated. and Until the model's resource consumption simultaneously satisfies the memory constraint. and time constraints A lightweight student model that meets the requirements is obtained. .

[0085] Lightweight Model Training Based on Knowledge Distillation A lightweight student model that satisfies hardware constraints is obtained. Subsequently, knowledge distillation is used to efficiently train the model to recover and improve the prediction accuracy that may have been lost due to model compression. A robust and fixed teacher model is employed. That is, the pre-trained ResNet50, the teacher model. Throughout the training process, its parameters are completely frozen, serving as a stable and high-quality source of knowledge. Inspired by the "rocket launch" method, this application applies to the student model... Teacher Model The former The layers are shared. For example... Figure 6 As shown, that is and The former The layers are identical. Layer sharing improves the performance of the trained lightweight student model, and also... This saves computing resources during the training process.

[0086] This knowledge distillation framework aims to incorporate teacher models. The rich knowledge contained within it, including its final prediction logic and intermediate feature extraction strategies, can be transferred to a lightweight student model. The goal of training is to minimize a combined loss function consisting of three parts.

[0087]

[0088] in These are task loss, knowledge distillation loss, and attention transfer loss, respectively. These are the weight hyperparameters used to balance the contributions of each loss. By minimizing the combined loss function, the student model... Training was conducted under the guidance of a teacher model, ultimately achieving a near-large-scale teacher model while meeting pre-set hardware constraints. The accuracy of the prediction.

[0089] The trained student model, which meets the deployment conditions of the target edge device, is deployed on the Jetson Orin Nano platform to estimate the battery capacity of the battery to be evaluated, so as to obtain the capacity estimation result.

[0090] In one specific embodiment, to verify the effectiveness of the resource-constrained lightweight distillation framework method proposed in this application, this embodiment uses a PC device configured with an Intel(R) Core(TM) i7-12700KF @3.60GHz 12-core processor and an NVIDIA GeForce RTX 4070 Ti GPU, as well as a Jetson Orin Nano edge device for experimental verification.

[0091] A. Experimental Setup 1) Dataset Description and Partitioning This embodiment uses the lithium-ion battery aging dataset publicly available from NASA's Ames Research Center for experimental verification. This dataset contains aging data of 18650 lithium-ion batteries (B0005, B0006, B0007, B0018) that underwent cyclic charge-discharge testing. In the experiment, the model input is a 256×256 pixel two-dimensional hybrid image feature generated by the aforementioned aging feature map construction method.

[0092] To simulate the model's generalization ability to unknown batteries in real-world applications, a "leave-one-out" approach is used to partition the dataset. Specifically, all cyclic data from batteries B0006, B0007, and B0018 are used as the training set for pre-training the teacher model and distillation training of the student model; all cyclic data from battery B0005 are used as an independent test set to evaluate the model's final performance.

[0093] 2) Evaluation Indicators To quantitatively evaluate the model's performance on the battery capacity estimation task, this application employs the following two widely used regression evaluation metrics: Mean Absolute Percentage Error (MAPE) measures the average level of the relative error between the predicted and actual values.

[0094]

[0095] The root mean squared error (RMSE) gives higher weight to larger prediction errors and can effectively reflect the prediction bias of the model.

[0096]

[0097] in, The total number of test samples, The actual battery capacity of the i-th sample. It represents the prediction capacity corresponding to the model.

[0098] C. Experimental Procedure and Result Analysis To verify the superiority of the proposed image feature construction method, a series of ablation experiments were conducted in this embodiment. Voltage-temporal features, voltage-spatial features, mixed feature maps, and spatial and temporal features from different battery data segments were selected as inputs to the ResNet50 model, and training and evaluation were performed under the same pre-training configuration of the teacher model.

[0099] like Figure 7 As shown, on four independent battery datasets, the capacity estimation results of the teacher model are all closely distributed near the reference diagonal. The estimation errors of each dataset exhibit a normal distribution with zero mean and low variance, and the vast majority of sample errors are concentrated in a very narrow interval. Even in the later stages of battery aging, the model maintains stable prediction accuracy without any degradation in estimation performance. This indicates that through the combined effect of transfer learning and hybrid feature maps, the teacher model effectively overcomes individual differences among batteries, accurately captures the common patterns of capacity degradation, and can provide reliable soft-label supervision for subsequent lightweight student models.

[0100] Furthermore, this embodiment also evaluated the impact of three different voltage segments on the model estimation accuracy. Table 2 shows the mean absolute percentage error and root mean square error for each voltage range under different feature combinations. The results show that, regardless of the features or initialization strategy, the error of the model trained using the interval [3.6V, 4.0V] is significantly higher than the other two intervals. In contrast, the intervals [3.6V, 4.2V] and [3.8V, 4.2V] both achieved better estimation performance, indicating that a wider voltage range or segments closer to the charging cutoff voltage can provide richer aging identification information. Notably, the [3.8V, 4.2V] interval, using only a shorter data segment, achieved similar estimation accuracy to [3.6V, 4.2V], verifying a stronger correlation between this interval and battery capacity degradation.

[0101] To verify the necessity of transfer learning in this application, this embodiment compares two model initialization strategies: random weight initialization and ImageNet pre-trained weights (ImageNet). As shown in Table 2, under different voltage ranges and feature combinations, the model using pre-trained weights significantly outperforms the randomly initialized model in both MAPE and RMSE. Taking the [3.8V, 4.2V] range as an example, when using time features as input, the transfer learning method reduces MAPE from 2.23% to 1.25%, a relative error reduction of 43.9%; even for spatial features with weaker representational power, MAPE decreases by 31.8%, indicating that the transfer learning method enables the model to utilize the general low-level visual features learned on the ImageNet dataset, avoiding the process of learning basic feature extraction from scratch. This helps alleviate overfitting and local optima problems caused by insufficient data, thereby improving the model's generalization ability and estimation accuracy.

[0102] To further verify the feasibility of deploying the proposed lightweight student model on real-world edge devices, this embodiment deployed the trained student model, which met the hardware constraints, on a Jetson Orin Nano platform, and conducted capacity estimation experiments on different battery datasets on this device. Table 3 lists the results of the student model's operation on the edge device. The experimental results show that the deployed model can achieve stable capacity estimation on all four independent battery datasets. Specifically, the estimation errors for batteries B0005, B0007, and B0018 remained at a low level, with RMSEs of 0.0275, 0.0353, and 0.0495, and MAPEs of 1.29%, 1.61%, and 2.43%, respectively. For battery B0006, the model still achieved effective estimation, with an RMSE of 0.0829 and a MAPE of 4.60%. These results demonstrate that the proposed student model has good practical deployment capabilities on resource-constrained edge platforms like the Jetson Orin Nano, and can complete lithium battery capacity estimation tasks while maintaining a certain level of estimation accuracy, thus verifying the effectiveness of the proposed method for edge device applications.

[0103] Table 2 Comparison of different feature combinations and initialization strategies

[0104] Table 3 Results of running the student model on Jetson Orin Nano

[0105] This application proposes a lightweight capacity estimation framework for lithium-ion batteries in resource-constrained edge devices, effectively addressing the challenge of directly deploying complex deep learning models on edge computing platforms. By converting irregular partial charging data into a two-dimensional aging feature map, the proposed method can extract effective information related to battery aging without relying on complete charge-discharge cycle data, thus achieving robust capacity estimation even in scenarios with incomplete charging data. Furthermore, combining hardware-constrained model compression with knowledge distillation improves the model's practical deployment capability while maintaining both lightweight design and estimation accuracy. Experimental results on the Jetson Orin Nano platform demonstrate that the proposed student model can complete capacity estimation tasks with multiple sets of battery data on edge devices while maintaining good estimation performance, thus validating the feasibility of this method in edge battery management systems.

[0106] The aforementioned lightweight capacity estimation method for lithium batteries based on hardware-constrained knowledge distillation proposes two main approaches. Firstly, a method for constructing two-dimensional aging feature maps is introduced, enhancing the model's application potential. This method converts easily collected one-dimensional time-domain data from the charging process into two-dimensional image features, eliminating reliance on complete charge-discharge cycle data and effectively addressing the problem of incomplete data under complex operating conditions. This provides a reliable data foundation for estimating lithium battery capacity. Secondly, a deep learning method based on transfer learning is introduced, significantly reducing the cost and data dependence of model training. By transforming the battery aging state estimation problem into a computer vision task, a teacher model fully pre-trained on a large image dataset can be directly utilized. This method effectively leverages the powerful general feature extraction capabilities of the pre-trained model, avoiding the need to design and train networks from scratch for specific battery data. This greatly reduces reliance on large-scale labeled data and significantly lowers the computational and time costs required for model training. Thirdly, a resource-constrained lightweight model design and training framework is constructed. Firstly, based on the maximum available memory and average processing time limitations of edge devices, a lightweight student model that meets hardware constraints is designed. Subsequently, a pre-trained teacher model is used for guided training to generate a lightweight model that meets hardware constraints while ensuring accuracy.

[0107] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this disclosure. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. In addition, those skilled in the art can combine and integrate the different embodiments or examples described in this specification.

[0108] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this disclosure are indicated by the appended claims.

Claims

1. A lightweight capacity estimation method for lithium batteries based on hardware constraint-driven knowledge distillation, characterized in that, include: Acquire one-dimensional time-domain data generated during battery charging; Convert one-dimensional time-domain data into size-normalized two-dimensional aging feature maps; The two-dimensional aging feature map is input into a pre-trained teacher model; the teacher model includes a conv1 input layer, a convolutional backbone, and a regression output layer. Based on the hardware constraints of the target edge device, design and generate a lightweight student model that meets the hardware constraints; wherein the hardware constraints include at least the maximum available memory and the average processing time limit. Guided by the teacher model, with the goal of minimizing the combined loss function, the lightweight student model is trained through knowledge distillation to obtain a well-trained lightweight student model. Using a student model that has been trained and meets the deployment conditions of the target edge device, the battery capacity of the battery to be estimated is estimated to obtain the capacity estimation result.

2. The lithium battery lightweight capacity estimation method based on hardware constraint-driven knowledge distillation according to claim 1, characterized in that, The steps of converting one-dimensional time-domain data into a size-normalized two-dimensional aging feature map include: From one-dimensional time-domain data, select feature data segments whose correlation with battery capacity aging is higher than a preset threshold. By using voltage as the independent variable and time as the dependent variable, a polynomial fitting was performed to obtain the functional relationship between charging voltage and time. N points are uniformly sampled within a preset voltage range to generate a standard voltage sequence; Based on the functional relationship between charging voltage and time and the standard voltage sequence, the corresponding one-dimensional time series is obtained; Based on a one-dimensional time series, a two-dimensional voltage-time feature matrix and a voltage-space feature matrix are constructed. Each element in the voltage-time feature matrix is ​​determined by the logarithmic average of the i-th and j-th elements in the one-dimensional time series, and each element in the voltage-space feature matrix is ​​calculated by the ratio of the time difference to the index difference between the i-th and j-th sampling points in the time series. The voltage time features and voltage spatial features are fused to generate a hybrid feature map as a two-dimensional aging feature map.

3. The lithium battery lightweight capacity estimation method based on hardware constraint-driven knowledge distillation according to claim 2, characterized in that, The processing procedure for the trained teacher model is as follows: Perform channel adaptation and weight reconstruction on the conv1 layer: Let the weights of the conv1 layer be... Where 64 is the number of output channels, 3 is the number of input channels, and 7×7 is the kernel size; Will Calculate the mean along the input channel dimension to generate new weights adapted to single-channel input. ; The two-dimensional aging feature map is normalized and standardized preprocessed to obtain the preprocessed input feature map; The input feature map is processed through a channel-adapted conv1 layer and a convolutional backbone consisting of four residual stages to extract high-level feature representations step by step. The high-level feature representation is mapped using the regression output layer to output the initial battery capacity estimation result.

4. The lightweight capacity estimation method for lithium batteries based on hardware constraint-driven knowledge distillation according to claim 3, characterized in that, The steps for designing and generating a lightweight student model that satisfies the hardware constraints of the target edge device include: With the target device's maximum available memory and the maximum allowed processing time for the task As a constraint, memory usage is defined. and execution time : in, This represents the set of all parameters within the teacher model; Tensor The number of elements in; Indicates the size in bytes of a single element; This represents the total number of layers in the teacher model. For the first The number of floating-point operations per layer For time coefficient; Based on memory usage and execution time, a multi-constraint optimization problem is constructed, with the goal of minimizing overall resource consumption: in, A lightweight student model. The loss of the teacher model on the training set, The preset performance threshold; By iteratively connecting and sparsifying using structured dynamic Dropout, a sparse model that meets the preset performance loss threshold constraint is obtained. The computationally intensive layers in the sparse model are weighted to further compress the model parameters and computational load until the memory usage and execution time of the student model simultaneously meet the hardware constraints, resulting in a lightweight student model that satisfies the hardware constraints.

5. The lithium battery lightweight capacity estimation method based on hardware constraint-driven knowledge distillation according to claim 4, characterized in that, The steps of iteratively joining and sparsifying using structured dynamic Dropout to obtain a sparse model that satisfies a preset performance loss threshold constraint include: Definition of the first The weight vector corresponding to each output channel Importance score The L2 norm of the weights is used as a measure of importance. in, Indicates the relationship with the first Vectorized weights associated with each output channel, for Element index in ( ) is used as a channel importance score. In the In this iteration, based on the dynamic Dropout rate Generate structured binary masks All weights are scored according to their importance. Sort from low to high, and select a threshold. This threshold corresponds to the sorted first... Each channel has an importance score; the mask position corresponding to the weight with an importance score lower than or equal to this threshold is set to 0, indicating that the connection will be removed; the remaining mask positions are set to 1, indicating that the connection will be retained. For weight tensors Each element in , Indicates the first The first neuron and the first The connection weights between each input, and their corresponding mask values. Generate as: in, It is based on the Dropout rate Given a defined importance threshold, the network weights are updated as follows: After applying the mask, the sparsed model is trained to evaluate its performance loss. If the loss is lower than a preset threshold If the iteration fails, the iteration terminates; otherwise, it continues based on the current number of connections in the network. and the number of connections after sparsification Regarding Dropout rate Update: in, As a regulating factor; Until the loss threshold is met or the maximum number of iterations is reached. The final output is a sparse model that meets the initial performance requirements. .

6. The lightweight capacity estimation method for lithium batteries based on hardware constraint-driven knowledge distillation according to claim 5, characterized in that, The steps involved in weight decomposition of computationally intensive layers in the sparse model to further compress model parameters and computational cost until the student model's memory usage and execution time simultaneously meet hardware constraints include: For a weight tensor of The convolutional layer is decomposed into two consecutive convolutional layers with a kernel size of [missing value]. The point convolutional layer, its weights , used for dimensionality reduction; and a kernel size of The convolutional layer, its weights It is used for feature extraction and dimensionality increase; the number of parameters is determined by... Reduce to ; much smaller and The middle rank, These represent the number of input channels and the number of output channels of the convolutional layer, respectively. Indicates the size of the spatial kernel; For a weight matrix as The fully connected layer is approximated as a product of two low-rank matrices using singular value decomposition: in, The number of singular values ​​to be retained; It is a left singular vector matrix. It is a singular value diagonal matrix. It is the transpose of the right singular vector matrix; The fully connected layer was replaced with two consecutive fully connected layers, with weights respectively. and Its parameter quantity is determined by Reduce to ; In each iteration, the sparse model Several layers in the process are weighted and their resource consumption is evaluated. and ; Until the model's resource consumption simultaneously satisfies the memory constraint. and time constraints A lightweight student model that meets the requirements is obtained. .

7. The lightweight capacity estimation method for lithium batteries based on hardware constraint-driven knowledge distillation according to claim 6, characterized in that, Guided by the teacher model, and aiming to minimize the combined loss function, the lightweight student model is trained through knowledge distillation to obtain the trained lightweight student model. The steps include: Freeze the parameters of the teacher model; The first few layers of the teacher model and the student model are shared; A lightweight student model is trained on a two-dimensional aging feature map using a combined loss function that includes task loss, distillation loss, and attention loss to obtain a well-trained lightweight student model.

8. The lightweight capacity estimation method for lithium batteries based on hardware constraint-driven knowledge distillation according to claim 7, characterized in that, The combined loss function is: in, For mission losses, For knowledge distillation loss, For attention transfer loss, The first weighted hyperparameter, This is the second weight hyperparameter. This is the third weight hyperparameter.