Battery bms intelligent detection and identification system based on reinforcement learning

The battery BMS intelligent detection and identification system, which utilizes reinforcement learning, dynamically adjusts the detection model parameters and anomaly detection thresholds. By combining these with battery aging characteristics, it solves the problem of decreased detection accuracy during battery aging, achieving efficient fault identification and system robustness.

CN121978546BActive Publication Date: 2026-06-16SHANDONG FENGHUO POWER COMM TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANDONG FENGHUO POWER COMM TECH CO LTD
Filing Date
2026-04-09
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing battery detection models suffer from decreased detection accuracy, increased false alarm and missed alarm rates, and inability to effectively identify faults such as micro-short circuits due to conceptual drift during battery aging.

Method used

A battery BMS intelligent detection and identification system based on reinforcement learning is adopted. Through an adaptive detection model construction and processing module and a concept drift real-time compensation analysis and processing module, the detection model parameters and abnormal detection thresholds are dynamically adjusted. Combined with battery aging characteristics such as SEI film thickness and active lithium loss rate, the system realizes real-time iterative updates of the model and concept drift compensation.

🎯Benefits of technology

Maintaining high detection accuracy during battery aging reduces false alarm and missed alarm rates, thereby improving system robustness and computational resource utilization efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121978546B_ABST
    Figure CN121978546B_ABST
Patent Text Reader

Abstract

The application discloses a battery BMS intelligent detection and identification system based on reinforcement learning and belongs to the technical field of battery detection. By introducing SEI film thickness, active lithium loss rate and other characteristics reflecting the essence of battery aging, the limitation of traditional detection based on surface parameters can be broken. By dynamically adjusting the reward weight of SOH, the model can maintain optimal detection performance in the new state, middle-aged state and old state of the battery. By comprehensively considering the detection accuracy, false negative rate and false positive rate, multi-objective balance optimization is realized. By introducing an adaptive clipping threshold of the aging state, the accuracy and reliability of model detection and identification are effectively improved. By real-time monitoring of the battery aging degree, triggering of the compensation mechanism, enhancement of the detection weight of the micro-short circuit feature, dynamic adjustment of the abnormal detection threshold, verification of the effect and closed-loop optimization, a complete concept drift compensation closed loop is formed, and only in the deep aging stage, the detection performance in the deep aging stage and the utilization of computing resources are effectively improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of battery testing technology, and more specifically to a battery BMS intelligent detection and identification system based on reinforcement learning. Background Technology

[0002] Battery BMS intelligent detection and identification refers to a technical system that uses artificial intelligence (AI), big data analysis and advanced algorithms to perform high-precision estimation of battery status, early warning of faults and automatic classification of abnormal modes in the battery management system (BMS). It surpasses the traditional BMS mode of passive protection that relies solely on fixed thresholds such as overvoltage / undervoltage and overtemperature, and instead adopts a data-driven approach.

[0003] In the implementation of existing technologies, most intelligent detection models are trained on data from the new state of the battery or a specific aging stage. However, a battery is a highly time-varying system, and its electrochemical characteristics change nonlinearly with the number of cycles, temperature history, and storage conditions, i.e., concept drift. For example, existing detection models have an accuracy rate of up to 99% in identifying micro-short circuits in the early stage of battery use (the first 500 cycles), but as the battery enters its middle and old age (after 1,500 cycles), due to changes in internal side reaction mechanisms, such as the thickening of the SEI film and changes in the proportion of active lithium loss, the original characteristic fingerprint shifts, causing the model to experience systematic false negatives or a surge in false positives. This indicates that the detection model is aging and failing due to concept drift. Summary of the Invention

[0004] The purpose of this invention is to provide a battery BMS intelligent detection and identification system based on reinforcement learning, in order to solve the technical problems existing in the background art.

[0005] The objective of this invention can be achieved through the following technical solutions:

[0006] A reinforcement learning-based intelligent detection and identification system for battery management systems includes:

[0007] The adaptive detection model construction and processing module uses the real-time cycle count, temperature sequence, SOH value, SEI film thickness, and active lithium loss rate of the collected and processed battery as the core state space. It uses the detection accuracy, false negative rate, and false positive rate as the multi-objective reward function and trains the agent through an improved proximal strategy optimization algorithm so that the model parameters are dynamically updated as the battery ages.

[0008] The real-time concept drift compensation analysis and processing module obtains the real-time health status and the moving average of the health status based on the real-time monitoring data of the current battery. It performs data analysis on the moving average of the health status and dynamically triggers the concept drift compensation mechanism. Through dynamic adjustment of the micro-short circuit association feature weights, updating the anomaly detection threshold, real-time compensation effect verification, and closed-loop optimization, it achieves real-time compensation for concept drift.

[0009] Furthermore, the battery pack's cycle count, temperature sequence, terminal voltage, and charge / discharge current are collected in real time through the BMS sensor network.

[0010] Furthermore, during online estimation of electrochemical characteristics, the state of health (SOH) is calculated in real time using the capacity decay method;

[0011] Based on the second-order RC equivalent circuit model, the polarization resistance is estimated in real time by extended Kalman filtering, and then the SEI film thickness is converted by empirical formula.

[0012] The loss rate of active lithium was calculated using the coulombic efficiency method.

[0013] Furthermore, based on the calculation results, the state space of the agent is defined as a high-dimensional electrochemical feature vector, which serves as the input state of the agent; and the action space is defined, with a fixed set of binary classification actions; the dynamic transition equations of each state element are constructed, and the transition logic of each state element is encapsulated as a state transition function.

[0014] Furthermore, based on thresholds derived from real-time BMS data, actual fault labels are generated.

[0015] ;in, Labels for actual faults; The terminal voltage measurement at time k; The measured value of the charging and discharging current at time k.

[0016] Furthermore, a multi-objective reward function R is defined, which integrates three core metrics: detection accuracy, false negative rate, and false positive rate.

[0017] ;in, As a reward for detection accuracy, the value ranges from [0,1]. The penalty for false negatives is [-1, 0]. The penalty for false alarms is [-1, 0]. All are weighting coefficients, adaptively adjusted based on the state of health (SOH).

[0018] Furthermore, real-time monitoring data of the current battery is obtained from the BMS system, specifically including polarization resistance, cycle count, voltage sequence, and internal resistance sequence;

[0019] The real-time health status is calculated based on the acquired monitoring data, and the moving average of SOH over three consecutive cycles is used as the basis for judgment, dynamically triggering the concept drift compensation mechanism.

[0020] Furthermore, when dynamically adjusting the weights of the micro-short circuit correlation features, the voltage spike frequency and internal resistance mutation rate are extracted from the real-time monitoring data;

[0021] The voltage peak frequency weight and internal resistance mutation rate weight are filled into the original feature weight vector, and the voltage peak frequency weight and internal resistance mutation rate weight are increased. The adjusted weight vector is normalized and deployed to the online anomaly detection model to replace the original feature weights.

[0022] Furthermore, when updating the anomaly detection threshold, a sliding window with a window size of W = 100 cycles is used to acquire the micro-short circuit association feature data of the previous 100 cycles in real time, and calculate the mean and standard deviation of each micro-short circuit feature within the window; the mean ± 3 times the standard deviation is used as the threshold range for anomaly detection.

[0023] If the real-time feature value exceeds the threshold range, it is judged as abnormal.

[0024] Furthermore, when verifying the real-time compensation effect and optimizing the closed loop, the adjusted feature weights and anomaly detection thresholds are used to perform anomaly detection on the monitoring data of the most recent several cycles, and the detection accuracy is calculated.

[0025] If the detection accuracy is greater than or equal to the accuracy threshold, the current adjustment parameters are maintained; otherwise, the feature distribution of the sliding window is recalculated and the anomaly detection threshold is updated.

[0026] Furthermore, the agent's network structure adopts an Actor-Critic dual-network structure; when training the agent based on the improved PPO algorithm, the objective function of the PPO algorithm is improved, and an adaptive pruning threshold for aging states is introduced. :

[0027] ;in, The objective function for the pruning strategy; For expectation operators; This is the minimum value operator, which takes the minimum value of two terms. To prune the operator, restrict the input values ​​to... ; The dominant function; For policy network parameters; The adaptive pruning threshold for SOH. , The battery health status at time k.

[0028] Compared to existing solutions, the beneficial effects achieved by this invention are:

[0029] This invention overcomes the limitations of traditional surface parameter-based detection by introducing features that reflect the essence of battery aging, such as SEI film thickness and active lithium loss rate. By dynamically adjusting the reward weight through SOH, the model can maintain optimal detection performance in the new, middle-aged, and old states of the battery. By comprehensively considering detection accuracy, false negative rate, and false positive rate, multi-objective balanced optimization is achieved. By introducing an adaptive aging state pruning threshold, the accuracy and reliability of model detection and recognition can be effectively improved.

[0030] This invention forms a complete concept drift compensation closed loop by real-time monitoring of battery aging, triggering a compensation mechanism, enhancing the detection weight of micro-short circuit features, dynamically adjusting the anomaly detection threshold, verifying the effect, and performing closed-loop optimization. The adjustment is triggered only in the deep aging stage, which can effectively improve the detection performance and utilization of computing resources in the deep aging stage. At the same time, it can dynamically adapt to the feature distribution changes caused by aging, avoid the performance degradation of fixed thresholds and weights in non-stationary environments, and enhance the robustness of the system. Attached Figure Description

[0031] The invention will now be further described with reference to the accompanying drawings.

[0032] Figure 1 This is a flowchart illustrating the operation of the reinforcement learning-based intelligent detection and identification system for battery management systems (BMS) according to the present invention.

[0033] Figure 2 This is a block diagram of the battery BMS intelligent detection and identification system based on reinforcement learning according to the present invention. Detailed Implementation

[0034] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0035] like Figure 1 and Figure 2 As shown, the present invention is a battery BMS intelligent detection and identification system based on reinforcement learning, including a BMS sensor network, an adaptive detection model construction and processing module, and a concept drift real-time compensation analysis and processing module.

[0036] The adaptive detection model construction and processing module uses the real-time battery cycle count, temperature sequence, SOH value, SEI film thickness, and active lithium loss rate as the core state space. It employs a multi-objective reward function of detection accuracy - false negative rate - false positive rate, and trains the agent using an improved proximal policy optimization algorithm, allowing the model parameters to be dynamically updated iteratively as the battery ages. Specific steps include:

[0037] The battery pack's cycle count, temperature sequence, terminal voltage, and charge / discharge current are collected in real time through the BMS sensor network.

[0038] The cycle count is the cumulative charge-discharge cycle count, which is automatically incremented by 1 after each complete charge-discharge cycle.

[0039] The temperature sequence is formed by sampling the surface temperature of a single cell every 100ms, creating a sliding window sequence of length 20, covering the temperature changes within 2 seconds;

[0040] Terminal voltage and charge / discharge current are sampled every 50ms for subsequent electrochemical characteristic estimation.

[0041] When performing online estimation of electrochemical characteristics, the state of health (SOH) is calculated in real time using the capacity decay method. The corresponding expression is:

[0042] ;in, The actual discharge capacity for the current cycle is obtained by integrating the discharge current: , Let be the discharge current at time t; The nominal capacity of the battery is provided by the battery manufacturer.

[0043] Based on the second-order RC equivalent circuit model, the polarization resistance is estimated in real time using an extended Kalman filter. Then, the SEI film thickness is converted using empirical formulas. The corresponding expression is:

[0044] ;in, This is a proportionality constant, determined by the characteristics of the battery materials and calibrated through laboratory aging tests. In this embodiment of the invention, the default value is 0.02.

[0045] Among them, the polarization resistance is estimated in real time by extended Kalman filtering. The specific steps include:

[0046] The second-order RC equivalent circuit model is constructed. The core parameters of the model include ohmic internal resistance, polarization resistance, polarization capacitance and open-circuit voltage.

[0047] Among them, the ohmic internal resistance is the inherent resistance of the electrolyte and electrode materials inside the battery;

[0048] The polarization resistance is the equivalent resistance of the battery's electrochemical polarization and concentration polarization;

[0049] Polarization capacitance reflects the charge storage capacity during the polarization process;

[0050] The open-circuit voltage is the terminal voltage of the battery under no-load conditions, and it has a non-linear mapping relationship with the state of charge (SOC).

[0051] Furthermore, the voltage response equation of the model is:

[0052] ;in, This is the real-time terminal voltage of the battery, i.e., the externally measured voltage between the positive and negative terminals of the battery. This is the battery open-circuit voltage; This represents the real-time charging and discharging current; the current is positive during charging and negative during discharging. The internal resistance is ohmic; The polarization voltage, i.e., the voltage across the polarization capacitor, satisfies the RC dynamic relationship:

[0053] ;in, The time rate of change of the polarization voltage; Polarization resistor; Polarizing capacitor;

[0054] Define an extended state vector containing the polarization voltage and the parameters to be estimated. ;

[0055] Discretize the RC dynamic relationship and parameter change model, with a sampling time Δt = 0.05 s, i.e., a sampling frequency of 50 ms. The corresponding expression is:

[0056] Where k is the discrete time step index, representing the k-th sampling time; The extended state vector at time k is the fusion of the battery polarization state and the parameters to be estimated. This is the predicted state vector at time k+1, derived from the state at time k. Let be the state transition matrix, describing the dynamic mapping relationship from the state at time k to the state at time k+1. ; For the input matrix, ; Let k be the discrete charging and discharging current value at time k; Let be the process noise vector, with a mean of 0 and a covariance of . Gaussian distribution;

[0057] Furthermore, an observation model is established based on the measured values ​​of the battery terminal voltage, and the expression for the observation equation is as follows:

[0058] ;in, The measured value of the battery terminal voltage at time k is directly acquired by the BMS voltage sensor; The observation function describes the mapping relationship between the state vector and the input current to the theoretical terminal voltage; Let be the battery open-circuit voltage at time k; The state of charge of the battery at time k; Let be the polarization voltage at time k; To measure noise, a sample follows a mean of 0 and a covariance of . Gaussian distribution;

[0059] Initial value of polarization voltage and the initial value of the polarization resistor. This data is obtained through battery factory test data or offline pulse testing.

[0060] Initial covariance matrix: ;

[0061] Voltage and current data are collected using voltage and current sensors in the BMS. The collected voltage and current data are then filtered using a sliding window mean filter with a window size of 5 to eliminate high-frequency noise interference. The sliding window mean filter is a conventional existing technical solution, and the specific implementation steps will not be described here.

[0062] Based on the state estimate from the previous time step, predict the state at the current time step:

[0063] ;in, Let be the prior predicted state vector at time k, representing the predicted state at time k based on the optimal estimate at time k-1. The posterior optimal state vector at time k-1 is the final estimated value after correction of the measurement value at time k-1. The measured value of the charging and discharging current at time k-1;

[0064] And, calculate the covariance matrix of the predicted state:

[0065] ;in, The prior prediction covariance matrix; Let be the posterior optimal covariance matrix at time k-1, representing the uncertainty of the optimal estimate at time k-1; This is the transpose of the state transition matrix;

[0066] For the observation function Taking the partial derivative with respect to the state vector x, we obtain the Jacobian matrix:

[0067] ;in, The Jacobian matrix of the observation function describes the degree of influence of small changes in the state vector on the observed values. This is the partial derivative of the observation function with respect to the state vector;

[0068] To calculate the optimal gain, weigh the confidence levels of the predicted and measured values:

[0069] ;in, Let be the Kalman gain vector, representing the weighting of the prediction and measurement values ​​in the state update; It is the transpose of the Jacobian matrix; To measure the noise variance;

[0070] By combining the measured values ​​to correct the predicted state, the optimal estimate for the current moment is obtained:

[0071] ;in, Let be the posterior optimal state vector at time k, which is the final estimated value after correction of the measured values; The terminal voltage measurement at time k; The measured value of the charging and discharging current at time k;

[0072] Update the covariance matrix of the state estimate:

[0073] ;in, Let be the posterior optimal covariance matrix, representing the uncertainty of the optimal state estimate at time k; It is the identity matrix; To update the weight matrix;

[0074] Extract the real-time estimate of polarization resistance from the updated state vector. ;in, This is the optimal estimate of the polarization resistance at time k; This is the second element of the optimal state vector;

[0075] Real-time estimates for 10 consecutive time points The final output polarization resistance estimate is obtained by using a moving average filter.

[0076] Calculation of active lithium loss rate using coulometric efficiency method The corresponding expression is:

[0077] ;in, The charging capacity for the current cycle; The discharge capacity is calculated; after each charge-discharge cycle, the charging capacity and discharge capacity of the current cycle are read from the capacity statistics module of the BMS.

[0078] Normalize all features to the [0,1] interval to eliminate dimensional differences. This is a conventional technical solution. The specific implementation steps will not be elaborated here.

[0079] It should be noted that by constructing a state space that includes the essential characteristics of battery aging, a comprehensive basis for decision-making can be provided for reinforcement learning agents; preprocessing operations ensure the stability and convergence speed of model training.

[0080] Compared to existing technologies that only use surface features such as voltage and current, this model introduces electrochemical features such as SEI film thickness and active lithium loss rate, enabling it to capture the deep mechanisms of battery aging and effectively shorten the time for early identification of faults such as micro-short circuits.

[0081] Based on the calculation results, the state space S of the agent is defined as a high-dimensional electrochemical feature vector, which serves as the input state of the agent: ;in, Let k be the equivalent cumulative loop count at time k. Let k be the temperature sequence. , These are the mean and variance of the temperature over the last 10 minutes, respectively, with 200 sampling points corresponding to the last 10 minutes; The battery health status at time k; Let be the SEI film thickness at time k; Let be the active lithium loss rate at time k;

[0082] In addition, define the action space and fix the binary action set. ;

[0083] When initializing the state transition model parameters, load the offline calibrated aging kinetic parameters, such as the SEI film growth rate constant and the active lithium loss rate constant;

[0084] The dynamic transition equations for each state element are constructed, specifically including equivalent cycle number transfer, temperature sequence update, SOH transfer, SEI film thickness transfer, and active lithium loss rate transfer.

[0085] When implementing equivalent loop number transfer, ;in, This represents the equivalent cumulative loop count at time k+1. Let k be the change in SOC during a single charge-discharge cycle at time k.

[0086] When updating the temperature series, the sliding window removes the oldest temperature value, adds the new temperature value at time k+1, and recalculates the mean and variance.

[0087] When performing SOH transfer, ;in, The battery's health status at time k+1; These are gas constants, universal thermodynamic constants; Let K be the absolute temperature of the battery at time k. The polarization resistance growth rate constant is obtained by fitting polarization resistance data over the entire life cycle, and its default value is 0.02. The activation energy of the polarization resistance growth reaction is obtained by fitting aging tests at different temperatures, with a default value of 30000. This is the initial polarization resistance of a new battery when it leaves the factory. The polarization resistance threshold at the end of battery life;

[0088] When performing SEI film thickness transfer, ;in, The thickness of the SEI film on the negative electrode of the battery at time k+1; The SEI film growth rate constant is obtained by fitting the SEI film thickness under different currents, and the default value is 0.1. The activation energy for the SEI film growth reaction is obtained by fitting SEI film tests at different temperatures, with a default value of 40000.

[0089] When implementing active lithium loss rate transfer, ;in, Let be the active lithium loss rate of the battery at time k+1; The active lithium loss rate constant was obtained by fitting the capacity decay test under different currents, and the default value is 0.001. The activation energy for the loss reaction of active lithium is obtained by fitting the capacity decay test at different temperatures, with a default value of 35000.

[0090] The transition logic of each state element is encapsulated into a state transition function. To achieve from arrive Dynamic simulation;

[0091] When generating real fault labels, the determination is based on a threshold of real-time BMS data:

[0092] ;in, Labels for actual faults;

[0093] The actual fault labels are strictly aligned with the 10-second sliding window of the state space to ensure that each state vector corresponds to a unique actual label;

[0094] Define a multi-objective reward function R that integrates three core metrics: detection accuracy, false negative rate, and false positive rate.

[0095] ;in, The reward is based on the accuracy of the detection, and the value ranges from [0,1]. The higher the value, the greater the reward. , The number of samples for which the model correctly detected the fault; The number of samples that the model correctly identifies as being in a normal state; The number of samples that the model mistakenly identified as faulty; This represents the number of samples where the model missed reporting faults. This is the penalty for missed detections, with a value ranging from -1 to 0. The more missed detections, the heavier the penalty. ; This is a penalty for false alarms, with a value ranging from -1 to 0. The more false alarms, the heavier the penalty. ; All are weighting coefficients, adaptively adjusted based on the State of Health (SOH):

[0096] ;

[0097] It should be noted that by guiding reinforcement learning agents to prioritize core detection targets at different aging stages, the overall performance of detection accuracy, false negative rate, and false positive rate can be balanced.

[0098] When the battery enters the aging stage, i.e., SOH≤0.7, by increasing the weight of the missed detection penalty term, the model's missed detection rate and false detection rate for faults such as micro short circuits can be effectively reduced.

[0099] Furthermore, the agent's network structure adopts an Actor-Critic dual-network structure:

[0100] The input to the Actor network is The output is the action probability distribution; the action space includes normal state, warning state, and fault state.

[0101] The input to the Critic network is the same as that of the Actor network, and the output is a state value function, which is used to evaluate the long-term reward expectation of the current state.

[0102] When training an agent based on the improved PPO algorithm, the objective function of the improved PPO algorithm is improved by introducing an adaptive pruning threshold for aging states. :

[0103] ;in, The objective function for the pruning strategy; For the expectation operator, Monte Carlo estimation based on sampled trajectories: where i is the index of the sampling trajectory and N is the number of sampling trajectories; This is the minimum value operator, which takes the minimum value of two terms. To prune the operator, restrict the input values ​​to... ; For the dominant function, , This refers to timing difference error; This is the discount factor, with a value of 0.99; This is the GAE coefficient, with a value of 0.95. These are the parameters of the policy network, including the weights and biases of the policy network, such as the weight matrix from the input layer to the hidden layer. Bias Weight matrix from hidden layer to output layer Bias ; The adaptive pruning threshold for SOH. , The battery health status at time k;

[0104] It should be noted that battery aging is a typical non-stationary environment, and a fixed pruning threshold can easily lead to the strategy converging to a local optimum. This embodiment of the invention introduces an adaptive threshold, which allows for a larger update of the strategy during the aging stage, and can effectively improve the long-term performance of the model.

[0105] Using a small threshold during the new battery phase to ensure training stability and a large threshold during the aging phase to ensure policy flexibility can resolve the contradiction between stability and inflexibility in traditional PPO in non-stationary environments, and flexibility and instability, thus achieving a balance between training stability and flexibility.

[0106] In this embodiment of the invention, by introducing features that reflect the essence of battery aging, such as SEI film thickness and active lithium loss rate, the limitations of traditional detection based on surface parameters can be overcome; by dynamically adjusting the reward weight through SOH, the model can maintain optimal detection performance in the new, middle-aged, and old states of the battery; by comprehensively considering detection accuracy, false negative rate, and false positive rate, multi-objective balanced optimization is achieved; by introducing an adaptive aging state pruning threshold, the accuracy and reliability of model detection and recognition can be effectively improved.

[0107] The concept drift real-time compensation analysis and processing module: Based on the current real-time monitoring data of the battery, it obtains the real-time health status and the moving average of the health status. It analyzes the moving average of the health status and dynamically triggers the concept drift compensation mechanism. Through dynamically adjusting the weights of micro-short circuit correlation features, updating the anomaly detection threshold, verifying the real-time compensation effect, and performing closed-loop optimization, it achieves real-time compensation for concept drift. Specific steps include:

[0108] Obtain real-time monitoring data of the current battery from the BMS system, specifically including polarization resistance, cycle count, voltage sequence, and internal resistance sequence;

[0109] Real-time health status is calculated based on the acquired monitoring data. The corresponding expression is:

[0110] ;in, Polarization resistor; The polarization resistor for the new battery has a default value of 20mΩ. The polarization resistor for discarded batteries has a default value of 200mΩ.

[0111] Using the moving average of SOH over three consecutive cycles The corresponding expression used as the basis for judgment is:

[0112] ;in, for Real-time battery health status; This is the time offset used to index SOH data from different cycles;

[0113] like If the condition is met, the concept drift compensation mechanism will be triggered; otherwise, the original anomaly detection parameters will remain unchanged. The default trigger threshold is 0.7.

[0114] It should be noted that identifying the timing of a battery entering the deep aging stage through data analysis can avoid unnecessary adjustments during the new battery and mid-aging stages, thus saving computing resources. Since traditional methods use fixed adjustment thresholds, the probability of false adjustments being triggered is relatively high during the new battery stage. This embodiment of the invention uses a moving average judgment for three consecutive cycles, which can effectively reduce the false trigger rate and thus effectively save computing resources.

[0115] When dynamically adjusting the weights of micro-short circuit correlation features, the voltage spike frequency is extracted from the real-time monitoring data. and internal resistance mutation rate ;

[0116] Among them, voltage spike frequency is the number of times the voltage exceeds the normal range per unit time; exceeding the normal range includes... , ; Standard voltage;

[0117] The internal resistance mutation rate is the relative rate of change between the current cycle's internal resistance and the average internal resistance of the previous 10 cycles. The corresponding expression for calculation is: ;in, This represents the current internal resistance of the loop. The average internal resistance over the first 10 cycles;

[0118] Fill the original feature weight vector with the voltage peak frequency weight and the internal resistance mutation rate weight. , This is the frequency weight for voltage spikes, with a default value of 0.3. This is the weight for the internal resistance mutation rate, with a default value of 0.3;

[0119] The weights of the micro-short-circuit association features are increased to 0.7, and the adjusted weight vector is as follows: ;

[0120] The adjusted weight vector is normalized to ensure that the sum of all weights is 1. The corresponding expression is: ;

[0121] The adjusted weight vector Deploy it to the online anomaly detection model, replace the original feature weights, and it takes effect in real time.

[0122] It should be noted that by enhancing the contribution of micro-short circuit correlation features in anomaly detection, it is possible to effectively adapt to the characteristic that the probability of micro-short circuit faults increases significantly in the deep aging stage, thereby effectively improving the accuracy of micro-short circuit detection and reducing the false negative rate.

[0123] When updating the anomaly detection threshold, a sliding window with a window size of W = 100 cycles is used to obtain the micro-short-circuit correlation feature data of the previous 100 cycles in real time:

[0124] ;in, The sliding window dataset at time t contains micro-short-circuit correlation feature data for the first 100 cycles; for The voltage spike frequency at a given moment corresponds to micro-short circuit correlation characteristic 1; for The rate of change in internal resistance at time t corresponds to micro-short circuit correlation feature 2;

[0125] Calculate the mean and standard deviation of each micro-short-circuit feature within the window:

[0126] , ;

[0127] , ;

[0128] in, The mean and standard deviation of associated feature 1 within the sliding window; The mean and standard deviation of the associated feature 2 within the sliding window;

[0129] The threshold range for anomaly detection is defined as the mean ± 3 standard deviations.

[0130] , ;in, , These are the lower and upper limits of the voltage spike frequency threshold.

[0131] , ;in, , These are the lower and upper limits of the voltage internal resistance mutation rate threshold.

[0132] If the real-time feature value exceeds the threshold range: or , or If so, it is considered abnormal;

[0133] It should be noted that by tracking the changes in feature distribution during the deep aging stage in real time, the judgment criteria for anomaly detection are dynamically adjusted to offset the feature distribution drift caused by aging. Traditional fixed threshold models have a high false alarm rate during the deep aging stage. In this embodiment of the invention, the threshold is dynamically updated through a sliding window, which can effectively reduce the false alarm rate while ensuring the recall rate of anomaly detection.

[0134] When verifying the real-time compensation effect and optimizing the closed loop, the adjusted feature weights and anomaly detection threshold are used to perform anomaly detection on the monitoring data of the most recent several cycles, specifically up to 10, and the detection accuracy is calculated. : ;in, To ensure the correct number of samples is detected; This represents the total number of samples;

[0135] like If the current adjustment parameters are not met, then the feature distribution of the sliding window is recalculated and the anomaly detection threshold is updated. For accurate thresholding, the default value is 90%.

[0136] It should be noted that by verifying the effectiveness of the concept drift compensation mechanism, a closed-loop optimization is formed to avoid performance degradation caused by single adjustment errors. This can stabilize the anomaly detection accuracy in the deep aging stage, avoid detection performance fluctuations caused by rapid changes in feature distribution, and improve the robustness of the system.

[0137] In this embodiment of the invention, by real-time monitoring of battery aging, triggering a compensation mechanism, enhancing the detection weight of micro-short circuit features, dynamically adjusting the anomaly detection threshold, verifying the effect, and performing closed-loop optimization, a complete concept drift compensation closed loop is formed. The adjustment is only triggered during the deep aging stage, which can effectively improve the detection performance and utilization of computing resources during the deep aging stage. At the same time, it can dynamically adapt to the feature distribution changes caused by aging, avoid the performance degradation of fixed thresholds and weights in non-stationary environments, and enhance the robustness of the system.

[0138] In the several embodiments provided by this invention, it should be understood that the disclosed system can be implemented in other ways. For example, the embodiments of the invention described above are merely illustrative; for example, the division of modules is only a logical functional division, and there may be other division methods in actual implementation.

[0139] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical modules; they may be located in one place or distributed across multiple network modules. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0140] Furthermore, the functional modules in the various embodiments of the present invention can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module. The integrated module can be implemented in hardware or in the form of hardware plus software functional modules.

[0141] It will be apparent to those skilled in the art that the present invention is not limited to the details of the exemplary embodiments described above, and that the present invention can be implemented in other specific forms without departing from the essential characteristics of the present invention.

[0142] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A battery BMS intelligent detection and identification system based on reinforcement learning, characterized in that, include: The adaptive detection model construction and processing module uses the real-time cycle count, temperature sequence, SOH value, SEI film thickness, and active lithium loss rate of the collected and processed battery as the core state space. It uses the detection accuracy, false negative rate, and false positive rate as the multi-objective reward function and trains the agent through an improved proximal strategy optimization algorithm so that the model parameters are dynamically updated as the battery ages. The real-time compensation analysis and processing module for concept drift is based on the real-time monitoring data of the current battery. It obtains the real-time health status and the moving average of the health status, performs data analysis on the moving average of the health status, and dynamically triggers the concept drift compensation mechanism. It achieves real-time compensation for concept drift by dynamically adjusting the weight of micro-short circuit correlation features, updating the anomaly detection threshold, verifying the real-time compensation effect, and optimizing the closed loop. Among them, when dynamically adjusting the weights of micro-short circuit correlation features, voltage spike frequency and internal resistance mutation rate are extracted from real-time monitoring data; The voltage peak frequency weight and internal resistance mutation rate weight are filled into the original feature weight vector, and the voltage peak frequency weight and internal resistance mutation rate weight are increased. The adjusted weight vector is normalized and deployed to the online anomaly detection model to replace the original feature weights. When updating the anomaly detection threshold, a sliding window with a window size of W = 100 cycles is used to acquire the micro-short circuit association feature data of the previous 100 cycles in real time, and calculate the mean and standard deviation of each micro-short circuit feature within the window; the threshold range of anomaly detection is taken as the mean ± 3 times the standard deviation. If the real-time feature value exceeds the threshold range, it is judged as abnormal; The agent's network structure adopts an Actor-Critic dual-network structure; when training the agent based on the improved PPO algorithm, the objective function of the PPO algorithm is improved, and an adaptive pruning threshold for aging states is introduced. : ;in, The objective function for the pruning strategy; For expectation operators; This is the minimum value operator, which takes the minimum value of two terms. To prune the operator, restrict the input values ​​to... ; The dominant function; For policy network parameters; The adaptive pruning threshold for SOH. , The battery health status at time k.

2. The battery BMS intelligent detection and identification system based on reinforcement learning according to claim 1, characterized in that, When performing online estimation of electrochemical characteristics, the state of health (SOH) is calculated in real time using the capacity decay method; Based on the second-order RC equivalent circuit model, the polarization resistance is estimated in real time by extended Kalman filtering, and then the SEI film thickness is converted by empirical formula. The loss rate of active lithium was calculated using the coulombic efficiency method.

3. The battery BMS intelligent detection and identification system based on reinforcement learning according to claim 2, characterized in that, Based on the calculation results, the state space of the agent is defined as a high-dimensional electrochemical feature vector, which serves as the input state of the agent; and the action space is defined, with a fixed set of binary action classifications; the dynamic transition equations of each state element are constructed, and the transition logic of each state element is encapsulated as a state transition function.

4. The battery BMS intelligent detection and identification system based on reinforcement learning according to claim 3, characterized in that, Based on thresholds derived from real-time BMS data, actual fault labels are generated. ;in, Labels for actual faults; The terminal voltage measurement at time k; The measured value of the charging and discharging current at time k.

5. The battery BMS intelligent detection and identification system based on reinforcement learning according to claim 4, characterized in that, Define a multi-objective reward function R that integrates three core metrics: detection accuracy, false negative rate, and false positive rate. ;in, As a reward for detection accuracy, the value ranges from [0,1]. The penalty for false negatives is [-1, 0]. The penalty for false alarms is [-1, 0]. All are weighting coefficients, adaptively adjusted based on the state of health (SOH).

6. The battery BMS intelligent detection and identification system based on reinforcement learning according to claim 5, characterized in that, The system obtains real-time monitoring data of the current battery from the BMS system, calculates the real-time health status based on the obtained monitoring data, and uses the moving average of SOH over three consecutive cycles as the judgment criterion to dynamically trigger the concept drift compensation mechanism.

7. The battery BMS intelligent detection and identification system based on reinforcement learning according to claim 1, characterized in that, When performing real-time compensation effect verification and closed-loop optimization, the adjusted feature weights and anomaly detection thresholds are used to perform anomaly detection on the monitoring data of the most recent several cycles and calculate the detection accuracy. If the detection accuracy is greater than or equal to the accuracy threshold, then keep the current adjustment parameters. Conversely, the feature distribution of the sliding window is recalculated, and the anomaly detection threshold is updated.