Predictive maintenance decision system and method accounting for imperfect maintenance stochastic characteristics
By decoupling the remaining life prediction module and the reinforcement learning decision-making module, the problems of inaccurate equipment status assessment and suboptimal decision-making under imperfect maintenance are solved, realizing accurate assessment of equipment operating status and adaptive generation of optimal maintenance actions, thereby reducing maintenance costs.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XI AN JIAOTONG UNIV
- Filing Date
- 2026-04-17
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to accurately assess equipment operating status and adaptively generate optimal maintenance actions under imperfect maintenance conditions. Status assessments are inaccurate and decisions are suboptimal, failing to effectively address the stochastic nature of imperfect maintenance.
The design decouples the remaining lifetime prediction module and the reinforcement learning decision-making module, decoupling the effect of imperfect maintenance from the baseline degradation trend. It quantifies the degradation rate change ratio through multi-scale features and corrects the remaining lifetime probability distribution. Combined with reinforcement learning, it generates the optimal maintenance strategy, which includes a hybrid decision action that includes discrete maintenance types and continuous execution time.
It significantly improves the accuracy of state assessment in imperfect maintenance scenarios, generates more robust and cost-effective dynamic maintenance strategies, avoids over-maintenance and unexpected failures, and reduces overall maintenance costs.
Smart Images

Figure CN122241027A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of mechanical equipment health management and predictive maintenance technology, specifically relating to a predictive maintenance decision system and method that takes into account the random characteristics of imperfect maintenance. Background Technology
[0002] With the development of modern industry, mechanical equipment plays an increasingly crucial role in industrial systems. Effective maintenance is not only a prerequisite for stable equipment operation but also a key means to extend equipment lifespan, reduce total life-cycle costs, and maximize return on investment. In practical engineering applications, imperfect maintenance has become the most widely used maintenance method due to its relatively low cost and the ability to produce a random recovery effect somewhere between "restoring to new condition" and "restoring to old condition." Therefore, incorporating imperfect maintenance into the decision-making framework is essential for improving the economic feasibility of maintenance strategies.
[0003] However, existing technologies have obvious shortcomings in both condition assessment and strategy optimization, specifically: (1) In terms of condition assessment, the coupling between the effect of imperfect maintenance and the baseline degradation trend hinders the effective updating of the model, resulting in inaccurate condition assessment. Most existing studies use statistical models to characterize the degradation trend of equipment (J. Wang, H. Liu, T. Lin. Optimal rearrangement and preventive maintenance policies for heterogeneous balanced systems with three failure modes [J]. ReliabilityEngineering&System Safety. 2023, 238:109429.), but due to the randomness of the effect of imperfect maintenance, the effect of imperfect maintenance and the baseline degradation trend of equipment are often intertwined and coupled in these models, making it difficult to separate them. This makes it difficult for the system to accurately update the corresponding model parameters using monitoring data. As the equipment continues to operate and the number of imperfect maintenance events accumulates, the deviation between the model prediction and the actual degradation trend will gradually increase, thus causing serious assessment errors. (2) In terms of strategy optimization, existing methods oversimplify the impact of imperfect maintenance, which can easily lead to suboptimal decisions. Existing technologies typically assume that the degree of imperfect maintenance is controllable and deterministic (X. Zhao, B. Liu, J. Xu, X. Wang. Imperfect maintenance policies for warranted products understochastic performance degradation [J]. European Journal of Operational Research. 2023, 308:150-65.), meaning that the system state is assumed to recover to a predetermined, deterministic level after maintenance and continue to degrade along a deterministic trend. However, in actual engineering, due to limitations such as operator skill levels and differences in parts assembly, the recovery level and subsequent degradation trend of equipment are difficult to control precisely, exhibiting strong randomness. Therefore, decision-making models optimized under idealized assumptions cannot anticipate and adapt to the random characteristics of imperfect maintenance in reality, leading to decision-making errors and increased maintenance costs.
[0004] It is evident that existing research struggles to achieve accurate state assessment and optimal strategy formulation under the random effects of imperfect maintenance. There is an urgent need for a maintenance decision-making system and method to achieve accurate assessment of equipment operating status and adaptive generation of optimal maintenance actions under the random characteristics of imperfect maintenance. Summary of the Invention
[0005] In order to overcome the shortcomings of the prior art, the present invention aims to provide a predictive maintenance decision system and method that takes into account the random characteristics of imperfect maintenance, so as to achieve accurate evaluation of equipment operating status and adaptive generation of optimal maintenance actions under the random characteristics of imperfect maintenance.
[0006] To achieve the above objectives, the technical solution adopted by the present invention is as follows: A predictive maintenance decision system that takes into account the stochastic characteristics of imperfect maintenance includes a decoupled remaining lifetime prediction module and a reinforcement learning decision module. The decoupled remaining lifetime prediction module includes a baseline prediction network and an imperfect maintenance effect quantification network. The baseline prediction network takes current monitoring data as input and is used to evaluate the remaining lifetime probability distribution under the baseline state. The imperfect maintenance effect quantification network combines historical data before maintenance and monitoring data after maintenance to quantify the change ratio of degradation rate after imperfect maintenance, and uses this ratio to correct the remaining lifetime probability distribution output by the baseline prediction network. Reinforcement learning decision-making module: Based on the remaining lifetime probability distribution output by the decoupled remaining lifetime prediction module and related maintenance information, it generates a hybrid decision action that includes discrete maintenance types and continuous maintenance execution times, so as to control the optimal maintenance timing.
[0007] The baseline prediction network introduces a Monte Carlo Dropout mechanism to obtain the probability distribution; the imperfect maintenance effect quantification network captures and extracts multidimensional degradation features from short-term fluctuations to long-term trends before and after maintenance through a multi-scale mean absolute difference extractor.
[0008] A predictive maintenance decision-making method considering the stochastic properties of imperfect maintenance, using the above system and based on a Markov decision process, includes the following steps: Step 1: Decoupling Remaining Life Prediction: The decoupling remaining life prediction module uses the baseline prediction network to obtain the remaining life probability distribution under the baseline state; uses the imperfect maintenance effect quantification network to calculate the degradation rate change ratio before and after maintenance; and corrects the remaining life probability distribution under the baseline state based on the evaluated degradation rate change ratio. Step 2: Reinforcement Learning-Based Decision Optimization: The reinforcement learning decision module combines the remaining lifetime prediction results output by the decoupled remaining lifetime prediction module to calculate the failure probability in the next decision cycle, and combines it with the number of consecutive imperfect maintenance cycles to form the current state; based on the current state, the output includes actions containing discrete maintenance types and continuous maintenance execution times; the decision optimization is performed based on the rewards obtained from the environment after the action is executed, and the reward function is defined as the negative of the expected cost generated by each maintenance action.
[0009] In the imperfect maintenance effect quantification network, on the time scale... The mean absolute difference The calculation formula is as follows: in, For data length, For monitoring data, This represents the element-wise absolute value operation; The correction formula is as follows: , in, and represents the mean and standard deviation of the remaining lifetime probability distribution under the baseline condition. This represents the percentage change in the degradation rate. and These are the corrected mean and standard deviation of remaining lifetime, respectively.
[0010] The current state in the decision optimization phase of the reinforcement learning decision module The construction method is as follows: in , representing the currently predicted remaining lifespan. Less than The probability per unit of time. The length of the decision-making cycle. This represents the number of consecutive imperfect maintenance attempts.
[0011] Hybrid decision-making actions in reinforcement learning decision-making modules The definition is as follows: in, Maintenance types include preventative replacement and imperfect maintenance; To execute The corresponding moment.
[0012] The reward function in the reinforcement learning decision module Defined as the negative expected cost generated by different maintenance actions, its specific mathematical expression is as follows: in, This represents a fixed cost for preventative replacement. Indicates the spare parts preparation cost rate. This indicates the cost of imperfect maintenance. Indicates the cost of corrective replacement, and satisfies , Indicates the equipment at any time The true remaining lifespan, Integer index representing a unit of time.
[0013] In the decision optimization phase of the reinforcement learning decision module, the Soft Actor-Critic algorithm based on the maximum entropy objective is used for training, with the global objective function being... To maximize expected return and policy entropy: in, Indicates maintenance strategy, State-action distribution guided by strategy. As a discount factor, Indicates a state-action pair The reward below, The temperature parameter is used to balance exploration and utilization. For policy entropy, Indicates the current state The probability distribution of actions under the following conditions; The training objective of the policy network is to minimize the policy loss. : in, This represents the experience replay pool. Indicates the current state Discrete action probability distribution under the following conditions and These represent the temperature parameters corresponding to continuous and discrete actions, respectively. Indicates a given discrete action and current state The conditional probability density of the next consecutive actions. This represents the action value function value evaluated by a network of critics; The commentator network updates by minimizing the mean square Bellman error, and its loss function is... The calculation method is as follows: in, Indicates the current state Execute action The state then transitions to the next decision-making cycle. This is the round end marker. Indicates from the experience replay pool The empirical trajectory of the sampled data, These represent the first two critics in a dual-critic network. The action value function value of the network evaluation. Indicates time The target value; The temperature parameters employ an adaptive update mechanism, corresponding to both continuous and discrete actions. Update using the following loss functions respectively: in, and The target entropy for continuous actions and discrete actions are respectively, and its calculation formula is as follows: in, For the dimension of continuous action, The number of discrete action types.
[0014] Compared with the prior art, the present invention has the following beneficial effects: (1) This invention decouples the effect of imperfect maintenance from the baseline degradation trend by designing a decoupled remaining lifetime prediction module. It uses multi-scale features to quantify the degradation rate change ratio and correct the remaining lifetime probability distribution under the baseline state, effectively overcoming the problem of evaluation error accumulating with maintenance events in traditional methods and greatly improving the state evaluation accuracy in imperfect maintenance scenarios.
[0015] (2) This invention constructs a reinforcement learning decision module that integrates the random characteristics of imperfect maintenance, avoiding the risk of suboptimal decision-making caused by the oversimplification of random characteristics in traditional theory. This enables the reinforcement learning decision module to learn more robust and cost-effective dynamic maintenance strategies in a highly dynamic and random decay environment.
[0016] (3) The present invention designs a hybrid action space that includes discrete maintenance types and continuous execution time, so that the system can not only decide what kind of maintenance to perform, but also precisely control the timing of maintenance, achieving the optimal balance between preventing over-maintenance and avoiding unexpected failures, and significantly reducing the overall maintenance cost. Attached Figure Description
[0017] Figure 1 This is a flowchart of a method according to an embodiment of the present invention.
[0018] Figure 2 This is a schematic diagram of the baseline prediction network in an embodiment of the present invention.
[0019] Figure 3 This is a schematic diagram of the structure of the imperfect maintenance effect quantification network in an embodiment of the present invention.
[0020] Figure 4This embodiment of the invention provides the lifetime prediction results of the decoupled remaining lifetime prediction module for different degradation rate changes under the influence of imperfect maintenance.
[0021] Figure 5 This is a schematic diagram of the policy network structure in an embodiment of the present invention.
[0022] Figure 6 This is a schematic diagram of the critic network structure in an embodiment of the present invention.
[0023] Figure 7 This is the maintenance decision result for an aircraft turbofan engine in a single round according to an embodiment of the present invention. Detailed Implementation
[0024] The present invention will be further illustrated below with reference to the embodiments and accompanying drawings. It should be understood that the following specific embodiments are for illustrative purposes only and are not intended to limit the scope of the present invention.
[0025] A predictive maintenance decision system that takes into account the stochastic characteristics of imperfect maintenance includes a decoupled remaining lifetime prediction module and a reinforcement learning decision module. The decoupled remaining lifetime prediction module includes a baseline prediction network and an imperfect maintenance effect quantification network. The baseline prediction network takes current monitoring data as input and is used to evaluate the remaining lifetime probability distribution under the baseline state. The imperfect maintenance effect quantification network combines historical data before maintenance and monitoring data after maintenance to quantify the change ratio of degradation rate after imperfect maintenance, and uses this ratio to correct the remaining lifetime probability distribution output by the baseline prediction network. Reinforcement learning decision-making module: Based on the remaining lifetime probability distribution output by the decoupled remaining lifetime prediction module and related maintenance information, it generates a hybrid decision action that includes discrete maintenance types and continuous maintenance execution times, so as to control the optimal maintenance timing.
[0026] In this embodiment, in the decoupled remaining lifetime prediction module, the baseline prediction network takes the current monitoring data as input and introduces the Monte Carlo Dropout mechanism after the network's convolutional and fully connected layers. This mechanism remains active during the inference phase and approximates the probability distribution of the prediction results by performing multiple random forward propagations, thereby outputting the remaining lifetime probability distribution under the baseline state.
[0027] In real-world industrial scenarios, imperfect maintenance is highly stochastic. Traditional models often couple the effects of imperfect maintenance with baseline degradation trends, leading to difficulties in updating model parameters and an increase in evaluation error as imperfect maintenance events accumulate. This embodiment introduces an imperfect maintenance effect quantification network to specifically process monitoring data containing imperfect maintenance characteristics. Since higher degradation rates are often accompanied by more significant data fluctuations, the imperfect maintenance effect quantification network uses a multi-scale mean absolute difference extractor to set multiple time scales arranged in ascending order. It can comprehensively capture various degradation characteristics of monitoring data, from short-term sharp fluctuations to long-term gradual trend shifts, and output the change ratio of the degradation rate after maintenance relative to the baseline. Using this ratio, the mean and variance of the baseline prediction are corrected, which greatly reduces the deviation between the prediction results and the actual degradation process.
[0028] The overall optimization goal of the reinforcement learning decision-making module is to output the optimal maintenance strategy based on the state evaluation results, thereby minimizing the long-term operation and maintenance costs of the equipment. This module employs a deep reinforcement learning architecture that includes a policy network and a critic network. Its state space integrates the remaining lifetime probability distribution within a fixed decision period, as well as the number of consecutive imperfect maintenance operations. This probabilistic state representation helps the reinforcement learning decision-making module more comprehensively capture the uncertainty of equipment degradation. The module achieves precise optimization of the maintenance strategy by outputting a hybrid decision action that includes both discrete and continuous actions. If the continuous execution time output by the reinforcement learning decision-making module is less than or equal to the decision period, the corresponding maintenance action is executed in the next decision period; conversely, if the continuous execution time is greater than the decision period, no maintenance action is executed in the next decision period.
[0029] In the reinforcement learning decision-making module, the policy network extracts common features using shared layers and outputs the probability distribution of maintenance types and Gaussian distribution parameters corresponding to execution times through specific branching structures. To improve the performance of the reinforcement learning decision-making module, the critic network adopts a double-Q network structure to alleviate the problem of value overestimation during function approximation. Simultaneously, a Soft Actor-Critic algorithm based on the maximum entropy objective is employed, using the policy entropy term to drive the policy to fully explore under the stochastic characteristics of imperfect maintenance, avoiding premature convergence to a suboptimal policy. Furthermore, a target critic network is introduced, employing a soft update mechanism to reduce the instability caused by frequent fluctuations in target values during training. By directly incorporating the stochastic effects of imperfect maintenance into the modeling rather than pre-setting a deterministic recovery level, this module can guide equipment to delay maintenance as much as possible before impending failure, while reasonably interspersing low-cost imperfect maintenance to effectively extend the overall operating cycle of the equipment.
[0030] A predictive maintenance decision-making method considering the stochastic characteristics of imperfect maintenance aims to address the inaccurate state assessment caused by the coupling of baseline degradation trends and imperfect maintenance effects, as well as suboptimal decisions resulting from the oversimplification of the stochastic characteristics of imperfect maintenance in traditional decision theory. This method employs a predictive maintenance decision-making system that considers the stochastic characteristics of imperfect maintenance, based on a Markov decision process, and refers to... Figure 1 This includes the following steps; Step 1: Decoupling Remaining Life Prediction: The decoupling remaining life prediction module uses the baseline prediction network to obtain the remaining life probability distribution under the baseline state; uses the imperfect maintenance effect quantification network to calculate the degradation rate change ratio before and after maintenance; and corrects the remaining life probability distribution under the baseline state based on the evaluated degradation rate change ratio. Step 2: Reinforcement Learning-Based Decision Optimization: The reinforcement learning decision module combines the remaining lifetime prediction results output by the decoupled remaining lifetime prediction module to calculate the failure probability in the next decision cycle, and combines it with the number of consecutive imperfect maintenance cycles to form the current state; based on the current state, the output includes actions containing discrete maintenance types and continuous maintenance execution times; the decision optimization is performed based on the rewards obtained from the environment after the action is executed, and the reward function is defined as the negative of the expected cost generated by each maintenance action.
[0031] This embodiment takes predictive maintenance of an aero-engine turbofan engine as an example, using the FD001 subset of the NASA Commercial Modular Aero-Propulsion System Simulation Dataset. This dataset records the engine degradation process using flight cycle periods as the time unit. In the data preprocessing stage, 14 valid sensor data points are selected from 21 sensor measurements and normalized. The monitoring data sequence length is set to 30 time units, serving as the input to the baseline prediction network. For the imperfect maintenance scenario, it is assumed that the engine degradation level after maintenance follows a truncated normal distribution, the degradation rate increment follows an exponential distribution with parameter 5, and the degradation rate change ratio under imperfect maintenance ranges from 1 to 2.
[0032] Reference Figure 2 The baseline prediction network employs a convolutional neural network architecture, specifically consisting of 5 convolutional layers and 2 fully connected layers, using the tanh activation function. The first five convolutional layers extract degradation features from the current monitoring data. The subsequent two fully connected layers map the extracted features. Furthermore, a Monte Carlo Dropout mechanism is introduced after some convolutional layers and the final flattening layer. This mechanism remains active during the inference phase, approximating the probability distribution of the prediction results through multiple random forward propagations. After statistical analysis, the mean of the remaining lifetime probability distribution under the baseline state is output. and standard deviation .
[0033] Reference Figure 3The imperfect maintenance effect quantification network consists of a multi-scale mean absolute difference extractor and a degradation rate change ratio predictor; the multi-scale mean absolute difference extractor is set to contain multiple time scale sequences with different time spans, and the time scale... The mean absolute difference The calculation formula is as follows: in, For data length, For monitoring data, It represents element-wise absolute value operation; by calculating the average absolute difference at different time scales, it can comprehensively capture various degradation characteristics of monitoring data, from short-term sharp fluctuations to long-term gradual trend shifts.
[0034] The degradation rate change ratio predictor concatenates the multi-scale mean absolute difference features extracted from historical data before maintenance and monitoring data after maintenance to form a unified feature vector. This feature vector is then input into a sequence of fully connected layers for processing. The network architecture contains two fully connected layers, each sequentially connected to a batch normalization layer, a GELU activation function, and a Dropout layer. The final output is the predicted degradation rate change ratio. .
[0035] Based on the proportion of change in degradation rate The remaining lifetime probability distribution under the baseline state is corrected to obtain the lifetime distribution under the imperfect maintenance scenario. The correction formula is as follows: , in, and represents the mean and standard deviation of the remaining lifetime probability distribution under the baseline condition. and These are the corrected mean and standard deviation of remaining lifetime, respectively.
[0036] Reference Figure 4 The decoupled remaining lifetime prediction module exhibits good prediction performance under different degradation rate changes. Within the degradation rate change range of 1 to 2, the lifetime prediction results can stably fluctuate around the true remaining lifetime value. This indicates that the decoupled prediction method can effectively quantify the impact of imperfect maintenance and provide an accurate condition assessment basis for subsequent maintenance decisions.
[0037] In the reinforcement learning-based decision optimization phase, the reinforcement learning decision module optimizes the policy by interacting with the environment: Based on the remaining life prediction results, the probability of equipment failure in the future is calculated, and the current status is obtained by combining the number of consecutive imperfect maintenance cycles. : in , representing the currently predicted remaining lifespan. Less than The probability per unit of time. The length of the decision-making cycle. This refers to the number of consecutive imperfect maintenance attempts. Hybrid decision-making actions that include reinforcement learning decision modules for discrete maintenance types and continuous maintenance execution times. The definition is as follows: in, Maintenance types include preventative replacement and imperfect maintenance; To execute The corresponding moment.
[0038] Specifically, in order to transform the goal of minimizing costs into maximizing rewards in reinforcement learning, the reward function is defined as the negative of the expected cost generated by each maintenance action, and its specific mathematical expression is as follows: in, This represents a fixed cost for preventative replacement. Indicates the spare parts preparation cost rate. This indicates the cost of imperfect maintenance. Indicates the cost of corrective replacement, and satisfies , Indicates the equipment at any time The true remaining lifespan, Integer index representing a unit of time.
[0039] Reference Figure 5 The policy network adopts a hybrid architecture of shared layer and branch structure. Specifically, the input state... First, the network is processed through two shared hidden layers to extract common state features; then, the network is divided into five specific branches: (1) Preventive replacement time-average branch: Contains two hidden layers, output ; (2) Preventive replacement time standard deviation branch: Contains two hidden layers, output ; (3) Imperfect maintenance of the mean time branch: contains two hidden layers, output ; (4) Imperfect maintenance time standard deviation branch: contains two hidden layers, output ; (5) Maintenance type probability branch: Contains two hidden layers, outputting the probability distribution of the maintenance type. ; Based on the output of the policy network, the action sampling process is as follows: in, Sampling is performed from a standard Gaussian distribution.
[0040] Reference Figure 6 The critic network employs a dual-Q network structure to mitigate the overvaluation problem, with each Q network's input being a state-action pair. The network contains two hidden layers, which output the action value for different maintenance types. Simultaneously, a target critic network was introduced. The parameters are updated using a soft update mechanism.
[0041] The reinforcement learning decision module is trained using the Soft Actor-Critic algorithm based on the maximum entropy objective, and its global objective function is... To maximize expected return and policy entropy: in, Indicates maintenance strategy, State-action distribution guided by strategy. As a discount factor, Indicates a state-action pair The reward below, The temperature parameter is used to balance exploration and utilization. For policy entropy, Indicates the current state The probability distribution of actions under the following conditions; The policy network uses a shared layer structure to extract common features, and its training objective is to minimize the policy loss. : in, This represents the experience replay pool. Indicates the current state Discrete action probability distribution under the following conditions and These represent the temperature parameters corresponding to continuous and discrete actions, respectively. Indicates a given discrete action and current state The conditional probability density of the next consecutive actions. The value of the action, as evaluated by a network of critics, is calculated using the following formula: in, and These represent the action value function values evaluated by the two networks in the dual critic network.
[0042] The commentator network updates by minimizing the mean square Bellman error, and its loss function is... The calculation method is as follows: in, Indicates the current state Execute action The state then transitions to the next decision-making cycle. This is the round end marker. Indicates from the experience replay pool The empirical trajectory of the sampled data, These represent the first two critics in a dual-critic network. The action value function value of the network evaluation. Indicates time The target value.
[0043] To ensure the exploration level of the reinforcement learning decision-making module remains within a reasonable range, the temperature parameter employs an adaptive update mechanism. Temperature parameters are defined for continuous and discrete actions. Update using the following loss functions respectively: in, and The target entropy for continuous actions and discrete actions are respectively, and its calculation formula is as follows: in, For the dimension of continuous action, The number of discrete action types.
[0044] During the overall training phase, the reinforcement learning decision-making module continuously interacts with the environment, storing states, actions, and rewards in the experience replay pool, and randomly sampling batches of data to synchronously update the policy network and the critic network, ultimately outputting the optimal maintenance decision that adapts to the highly dynamic decay environment.
[0045] Through validation in a turbofan engine maintenance scenario, the method proposed in this invention demonstrates excellent decision-making performance. (Refer to...) Figure 7The single-round maintenance decision results (alternating background colors are used to distinguish different engines) demonstrate that the reinforcement learning decision module can rationally schedule different types of maintenance actions: in the early stages of equipment degradation, maintenance measures are temporarily suspended to avoid over-maintenance; as the remaining service life gradually decreases, when the failure probability approaches the threshold, low-cost imperfect maintenance is adopted in a timely manner to slow down the degradation process; before the equipment approaches failure, preventive replacement is precisely executed, effectively avoiding corrective replacement while delaying replacement time as much as possible. This shows that the proposed decision method can adaptively arrange maintenance according to the equipment status, achieving an effective reduction in maintenance costs.
Claims
1. A predictive maintenance decision system considering the stochastic characteristics of imperfect maintenance, characterized in that: This includes decoupling the remaining lifetime prediction module and the reinforcement learning decision-making module; The decoupled remaining lifetime prediction module includes a baseline prediction network and an imperfect maintenance effect quantification network. The baseline prediction network takes current monitoring data as input and is used to evaluate the remaining lifetime probability distribution under the baseline state. The imperfect maintenance effect quantification network combines historical data before maintenance and monitoring data after maintenance to quantify the change ratio of degradation rate after imperfect maintenance, and uses this ratio to correct the remaining lifetime probability distribution output by the baseline prediction network. Reinforcement learning decision-making module: Based on the remaining lifetime probability distribution output by the decoupled remaining lifetime prediction module and related maintenance information, it generates a hybrid decision action that includes discrete maintenance types and continuous maintenance execution times, so as to control the optimal maintenance timing.
2. The decision-making system according to claim 1, characterized in that: The baseline prediction network introduces a Monte Carlo Dropout mechanism to obtain the probability distribution.
3. The decision-making system according to claim 1, characterized in that: The imperfect maintenance effect quantification network captures and extracts multidimensional degradation features from short-term fluctuations to long-term trends before and after maintenance through a multi-scale mean absolute difference extractor.
4. A predictive maintenance decision-making method considering the stochastic characteristics of imperfect maintenance, using the decision system described in any one of claims 1-3, based on a Markov decision process, comprising the following steps: Step 1: Decouple the remaining lifetime prediction module and use the baseline prediction network to obtain the remaining lifetime probability distribution under the baseline state; The imperfect maintenance effect is used to quantify the change in degradation rate before and after network calculation maintenance; based on the evaluated change in degradation rate, the remaining lifetime probability distribution under the baseline state is corrected. Step 2: The reinforcement learning decision module combines the remaining lifetime prediction results output by the decoupled remaining lifetime prediction module to calculate the failure probability in the next decision cycle, and combines it with the number of consecutive imperfect maintenance to form the current state; based on the current state, the output includes actions containing discrete maintenance types and continuous maintenance execution times; the decision optimization is performed based on the rewards obtained from the environment after the action is executed, and the reward function is defined as the negative of the expected cost generated by each maintenance action.
5. The decision-making method according to claim 4, characterized in that: In the imperfect maintenance effect quantification network, on the time scale... The mean absolute difference The calculation formula is as follows: in, For data length, For monitoring data, This represents the element-wise absolute value operation; The corrected formula is as follows: , in, and represents the mean and standard deviation of the remaining lifetime probability distribution under the baseline condition. This represents the percentage change in the degradation rate. and These are the corrected mean and standard deviation of remaining lifetime, respectively.
6. The decision-making method according to claim 4, characterized in that: The current state in the decision optimization phase of the reinforcement learning decision module The construction method is as follows: in , representing the currently predicted remaining lifespan. Less than The probability per unit of time. The length of the decision-making cycle. This refers to the number of consecutive imperfect maintenance attempts. Hybrid decision-making actions in reinforcement learning decision-making modules The definition is as follows: in, Maintenance types include preventative replacement and imperfect maintenance; To execute The corresponding moment.
7. The decision-making method according to claim 6, characterized in that: The reward function in the reinforcement learning decision module Defined as the negative expected cost generated by different maintenance actions, its specific mathematical expression is as follows: in, This represents a fixed cost for preventative replacement. Indicates the spare parts preparation cost rate. This indicates the cost of imperfect maintenance. Indicates the cost of corrective replacement, and satisfies , Indicates the equipment at any time The true remaining lifespan, Integer index representing a unit of time.
8. The decision-making method according to claim 4, characterized in that: In the decision optimization phase of the reinforcement learning decision module, the Soft Actor-Critic algorithm based on the maximum entropy objective is used for training, with the global objective function being... To maximize expected return and policy entropy: in, Indicates maintenance strategy, State-action distribution guided by strategy. As a discount factor, Indicates a state-action pair The reward below, The temperature parameter is used to balance exploration and utilization. For policy entropy, Indicates the current state The probability distribution of actions under the following conditions; The training objective of the policy network is to minimize the policy loss. : in, This represents the experience replay pool. Indicates the current state Discrete action probability distribution under the following conditions and These represent the temperature parameters corresponding to continuous and discrete actions, respectively. Indicates a given discrete action and current state The conditional probability density of the next consecutive actions. This represents the value of the action as evaluated by a network of critics. The commentator network updates by minimizing the mean square Bellman error, and its loss function is... The calculation method is as follows: in, Indicates the current state Execute action The state then transitions to the next decision-making cycle. This is the round end marker. Indicates from the experience replay pool The empirical trajectory of the sampled data, These represent the first two critics in a dual-critic network. The action value function value of the network evaluation. Indicates time Target value; The temperature parameters employ an adaptive update mechanism, corresponding to both continuous and discrete actions. Update using the following loss functions respectively: in, and The target entropy for continuous actions and discrete actions are respectively, and its calculation formula is as follows: in, For the dimension of continuous action, The number of discrete action types.