A multi-working-condition equipment residual service life prediction method and system
By collaborating with shared and private experts to design a window-level state-driven routing mechanism, the contradiction between cross-condition adaptability and single-condition timing fidelity was resolved, enabling accurate prediction of the remaining service life of equipment under multiple conditions and improving prediction accuracy and engineering interpretability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HUAZHONG UNIV OF SCI & TECH
- Filing Date
- 2026-05-22
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to strike a balance between cross-condition adaptability and single-condition time-series fidelity, which affects prediction accuracy.
A collaborative design of shared and private experts is adopted. Shared experts capture general degradation trends across operating conditions, while private experts fit specific operating condition degradation patterns. A window-level state-driven routing mechanism dynamically selects expert network combinations. Combined with a state modulation feature embedding module and a multi-head self-attention layer, explicit encoding of operating condition vectors and degradation states is achieved.
It significantly improves the generalization ability under multiple operating conditions, ensures that the prediction curve is smooth and consistent within a single operating condition, and provides a reliable basis for maintenance decisions.
Smart Images

Figure CN122241292A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the technical field of predictive maintenance of equipment, and more specifically, relates to a method and system for predicting the remaining service life of equipment under multiple operating conditions. Background Technology
[0002] Maintenance management is a fundamental aspect of ensuring the safe, stable, and efficient operation of modern large-scale machinery and equipment. With the continuous advancement of sensing technology and condition monitoring methods, real-time acquisition of multi-dimensional equipment health data has become possible. Based on this, Condition-based Maintenance (CBM) is gradually replacing traditional fixed-cycle preventative maintenance, becoming the mainstream industrial strategy. CBM dynamically collects and analyzes key operating parameters to assess equipment health status in real time and flexibly formulates maintenance plans based on the actual degree of degradation, significantly improving maintenance efficiency and reducing operating costs. It has been widely applied in high-value equipment fields such as aero-engines, wind turbine gearboxes, and key components of nuclear power plants. As a core supporting technology of CBM, Remaining Useful Life (RUL) prediction scientifically estimates the remaining life and potential failure risks of equipment by integrating current and historical operating data, providing a crucial basis for accurate maintenance decisions.
[0003] However, engineering systems are complex in structure and diverse in operating mechanisms, and their degradation behavior presents a fundamental contradiction under multiple operating conditions: on the one hand, different operating conditions (such as changes in load, speed, and ambient temperature) lead to significant differences in degradation trajectories, i.e., highly heterogeneous degradation modes; on the other hand, the degradation process within a single operating condition must maintain smooth and coherent evolutionary characteristics, i.e., meet the requirement of temporal continuity. This contradiction poses a severe challenge to the design of predictive models—globally unified models tend to ignore the specificity of operating conditions, while overly segmented strategies may sever the temporal dependence of the degradation process. In actual engineering, major failures often stem from complex causal relationships and unexpected interactions between multiple subsystems, further exacerbating the modeling difficulty.
[0004] Current RUL prediction methods are mainly divided into three categories: Physics of Failure (PoF) based methods have high accuracy when the mechanism is clear, but they rely heavily on complete physical prior knowledge, making it difficult to cover heterogeneity across multiple operating conditions, and they fail when the failure mechanism is unknown or the parameters are difficult to calibrate; Data-driven methods use sensor data running until failure to learn the mapping relationship between degradation patterns and RUL, which has strong engineering applicability, but if a static parameter sharing mechanism (such as standard LSTM, CNN, Transformer) is used, it is easy to cause coupling between common features across operating conditions and specific features of operating conditions, which weakens the ability to model heterogeneous degradation and destroys the temporal continuity of a single operating condition due to fine-grained processing; Hybrid methods attempt to integrate physical constraints and data adaptability, but it is still difficult to balance generalization ability and local temporal fidelity under dynamic operating conditions.
[0005] Specifically, Recurrent Neural Networks (RNNs) and their variants (LSTM, GRU) model temporal dynamics through recursive structures, but are susceptible to gradient vanishing and have limited long-term dependency modeling capabilities. Convolutional Neural Networks (CNNs) extract local patterns along the time dimension, but expanding the receptive field requires stacking deep structures, limiting efficiency and expressiveness. While Transformers efficiently capture long-range dependencies through self-attention mechanisms, they still adhere to a static sharing paradigm and focus primarily on the time dimension, neglecting the differentiated contributions of different sensor signals to degradation under specific conditions. To enhance adaptability, Mixture-of-Experts (MoE) models introduce a dynamic computation paradigm, but their native token-level routing strategy causes severe granularity mismatch in temporal tasks: consecutive time steps within the same observation window may be assigned to different experts, leading to non-physical breaks in the degradation trajectory. Furthermore, traditional MoE relies on black-box multilayer perceptron (MLP) gating for routing decisions, failing to explicitly integrate physical parameters of the operating conditions and degradation state indicators into the decision logic, resulting in a lack of physical basis and engineering interpretability in expert selection.
[0006] In summary, existing technologies struggle to achieve a balance between cross-condition adaptability and single-condition time-series fidelity. There is an urgent need for a new RUL prediction method that can accurately decouple the heterogeneity of operating conditions, ensure the temporal continuity of the degradation process, and possess physical interpretability. Summary of the Invention
[0007] To address the aforementioned deficiencies or improvement needs of existing technologies, this invention provides a method and system for predicting the remaining service life of equipment under multiple operating conditions. This method aims to solve the problem that existing technologies struggle to achieve a balance between cross-operating condition adaptability and single-operating condition time-series fidelity, thereby affecting prediction accuracy.
[0008] To achieve the above objectives, according to one aspect of the present invention, a method for predicting the remaining service life of equipment under multiple operating conditions is provided, comprising: Offline training phase: Acquire historical monitoring data of equipment sensors under various operating conditions, divide the time series data into time series samples by sliding window, and set up a mode generation module to generate a condition vector that reflects the working mode based on the operating conditions corresponding to each sample in the time series sample set. The prediction model is constructed by a state-driven encoding module and a prediction head module. The state-driven encoding module includes a state-driven hybrid expert layer, which comprises a private expert layer and a shared expert layer. The private expert layer generates gating weights based on window-level states, and performs weighted output based on these gating weights. The state-driven hybrid expert layer combines the outputs of the private and shared expert layers to obtain high-dimensional time-series features. The window-level states are obtained based on the condition vector and degradation information of the time-series data. The prediction head module maps the high-dimensional time-series features to predicted remaining useful life values. The modality generation module and prediction model are trained using a time series sample set to obtain the trained modality generation module and prediction model; Online prediction phase: The system acquires real-time monitoring data from equipment sensors, obtains real-time operating condition vectors based on the trained modal generation module, and uses the trained prediction model to predict the remaining service life.
[0009] According to the multi-condition equipment remaining service life prediction method provided by the present invention, a mode generation module is set up to generate a condition vector reflecting the working mode based on the condition parameters corresponding to each sample in the time series sample set, specifically including: K-Means clustering is performed on the various operating condition parameters corresponding to the time series sample set to divide the operating state into multiple operating condition modes; For each sample, the working condition mode label of the last time step is taken as a discrete input index, which is then mapped into a continuous working condition vector by the mode generation module.
[0010] According to the multi-condition equipment remaining service life prediction method provided by the present invention, the prediction model further includes a state modulation feature embedding module; the state modulation feature embedding module includes a state modulation layer, which generates a scale factor and an offset factor based on the condition vector through linear mapping, and performs a state-based radial transformation on the input features using a feature linear modulation mechanism; sample data is input to the state modulation feature embedding module, and output to the state-driven encoding module after state modulation.
[0011] According to the multi-condition equipment remaining service life prediction method provided by the present invention, the state modulation feature embedding module further includes an input embedding layer and a position encoding layer before the state modulation layer; the input embedding layer uses a linear projection matrix to map the input sample data into high-dimensional features; the position encoding layer uses classical sine and cosine position encoding to construct a position vector for each time step within the window, and adds the position vector to the high-dimensional features output by the input embedding layer element by element to obtain a feature sequence representation containing position information; And / or, the state-driven coding module further includes a multi-head self-attention layer preceding the state-driven hybrid expert layer, wherein the multi-head self-attention layer employs the MHA mechanism to capture long-distance temporal dependencies and mine correlation patterns between sensors; each sub-layer of the state-driven coding module is preceded by a root mean square normalization layer, and the output of the sub-layer is added to the input through a residual connection.
[0012] According to the multi-condition equipment remaining service life prediction method provided by the present invention, the window-level state Degradation information is obtained by combining load condition vectors and time series data, as detailed below: ; In the formula, This is the operating condition vector, representing the macroscopic operating conditions; This information pertains to the degradation stage and reflects the microscopic, instantaneous degradation trend. The end features of the input feature window, This represents the mean of the features across all time steps within the window.
[0013] According to the multi-condition equipment remaining service life prediction method provided by the present invention, the private expert layer generates gating weights at the window level, and performs weighted output of the private expert layer based on the gating weights, as follows: The window-level state is projected as a query vector, and each private expert in the private expert layer is associated with a learnable key vector; the routing score corresponding to each private expert is obtained by scaling dot product attention based on the query vector and the key vector corresponding to each private expert. The routing scores of all private experts are normalized to obtain the routing weight corresponding to each private expert. The top few private experts with the highest routing weights are then weighted and output as the output of the private expert layer.
[0014] According to the multi-condition equipment remaining service life prediction method provided by the present invention, the routing score corresponding to each private expert is obtained by the following formula: ; ; In the formula, It is a window-level state representation; and It is a learnable projection matrix; It is an activation function; It is a query vector generated from the window-level state representation; It is a private expert dimension, used as a scaling factor to control the numerical range of the dot product result; It is the first The learnable key vectors corresponding to each private expert; It is the first The original routing score of a private expert; The output of the state-driven hybrid expert layer Specifically as follows: ; In the formula, For the first c The output of a public expert; The number of publicly owned experts; For input features; Indicates the route with the highest weight. K A private expert; For the selected number e The output of a private expert This represents the normalized routing weights.
[0015] According to the multi-condition equipment remaining service life prediction method provided by the present invention, the prediction head module consists of two sub-layers: a time-biased pooling layer and a regression prediction head. The time-biased pooling layer performs non-uniform aggregation of the input sequence features based on time-series weights to generate a fixed-dimensional window-level representation. The regression prediction layer further maps the window-level representation to the final remaining service life estimation result.
[0016] According to the multi-condition equipment remaining service life prediction method provided by the present invention, the total loss function during the training process is... RUL regression loss and load balancing auxiliary losses It consists of two weighted components, as follows: ; In the formula, These are the weights used to balance regression accuracy and expert load balancing; the RUL regression loss uses the mean squared error as the regression loss function, as follows: ; In the formula, For batch size, and The first i The predicted and actual values of each sample; the construction of the load balancing auxiliary loss is as follows: Definition of the first e Hard usage frequency of an expert The normalized number of times this expert was selected in the current batch: ; In the formula, K The number of private experts selected for each sample; For indicator functions; Indicates the first i A set of private experts selected from a sample; Definition of the first e The probability of soft choice by an expert The average routing weight for this expert in the current batch: ; The load balancing auxiliary loss is the inner product of the expert soft-selection probability and the hard usage frequency: ; In the formula, The total number of private experts; For the first i The private expert score vector corresponding to each sample.
[0017] According to another aspect of the present invention, a multi-condition equipment remaining service life prediction system is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of any of the above-described multi-condition equipment remaining service life prediction methods.
[0018] Overall, compared with the prior art, the multi-condition equipment remaining service life prediction method and system provided by this invention offer the following advantages: 1. This invention utilizes the collaborative design of shared and private experts to capture general degradation trends across operating conditions, while private experts fit specific degradation patterns for particular operating conditions. A window-level state-driven routing mechanism is employed: the entire sliding time window is used as the basic routing unit; a window-level state is generated based on the deviation between the operating condition vector and the window-level degradation state; and the weights of private experts are dynamically calculated based on the window-level state. This forces all temporal features within the window to be processed by a unified expert combination. The expert network combination is dynamically selected based on the operating condition vector and the degradation deviation. This effectively separates common trends across operating conditions from specific degradation features, significantly improving generalization capabilities across multiple operating conditions. 2. This invention proposes a window-level state-driven routing mechanism, which uses the engineering degradation observation cycle as the granularity for expert scheduling, fundamentally avoiding degradation trajectory breaks caused by token-level routing, and ensuring smooth and continuous prediction curves within a single working condition; 3. In this invention, the state modulation feature embedding module sets up a state modulation layer, which enables the model to dynamically modulate the scale and bias of the feature space according to the current operating conditions, thereby achieving state alignment. This is beneficial for the model to identify heterogeneous degradation modes under multiple operating conditions. 4. This invention explicitly encodes the deviation between operating physical parameters and degradation state as the basis for routing, and the state gating decision-making process has clear engineering semantics, providing a reliable basis for maintenance decisions. Attached Figure Description
[0019] Figure 1 This is a framework diagram of the device remaining useful life prediction model based on state-driven gated Transformer in an embodiment of the present invention.
[0020] Figure 2 This is a schematic diagram illustrating the error changes during model training in an embodiment of the present invention.
[0021] Figure 3 This is a comparison chart of the actual values and predicted values in an embodiment of the present invention. Detailed Implementation
[0022] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention. Furthermore, the technical features involved in the various embodiments of this invention described below can be combined with each other as long as they do not conflict with each other.
[0023] Please see Figure 1 This embodiment provides a method for predicting the remaining service life of equipment under multiple operating conditions. The method includes: Offline training phase: Acquire historical monitoring data of equipment sensors under various operating conditions, divide the time series data into time series samples by sliding window, and set up a mode generation module to generate a condition vector that reflects the working mode based on the operating conditions corresponding to each sample in the time series sample set. The prediction model is constructed by a state-driven encoding module and a prediction head module. The state-driven encoding module includes a state-driven hybrid expert layer, which comprises a private expert layer and a shared expert layer. The private expert layer generates gating weights based on window-level states, and performs weighted output based on these gating weights. The state-driven hybrid expert layer combines the outputs of the private and shared expert layers to obtain high-dimensional time-series features. The window-level states are obtained based on the condition vector and degradation information of the time-series data. The prediction head module maps the high-dimensional time-series features to predicted remaining useful life values. The modality generation module and prediction model are trained using a time series sample set to obtain the trained modality generation module and prediction model; Online prediction phase: The system acquires real-time monitoring data from equipment sensors, obtains real-time operating condition vectors based on the trained modal generation module, and uses the trained prediction model to predict the remaining service life.
[0024] This embodiment also includes labeling each sample with its corresponding remaining useful life when constructing the time series sample set; extracting time series features through a state-driven coding module, wherein the state-driven hybrid expert layer divides the expert network into shared experts (capturing general degradation trends across operating conditions) and private experts (fitting degradation patterns for specific operating conditions), and adopts a window-level state-driven routing mechanism—using the entire sliding time window as the basic unit of routing, generating window-level states based on the deviation between the operating condition vector and the window-level degradation state, and dynamically calculating the private expert weights based on the window-level states through a physically constrained state gating module, thereby forcing all time series features within the window to be processed by a unified expert combination; This embodiment dynamically selects expert network combinations based on the operating condition vector and degradation deviation; through a window-level state-driven routing mechanism (i.e., obtaining the window-level state before the output of the private expert layer) and a state-gating design with physical constraints (i.e., generating private expert weights based on the window-level state), combined with the output of the shared expert layer, it effectively solves the fundamental contradiction between the heterogeneity of degradation modes and temporal continuity under multiple operating conditions. While ensuring the smoothness of the degradation trajectory under a single operating condition, it accurately captures the differences in degradation characteristics across operating conditions, significantly improving the accuracy of RUL prediction, temporal consistency, and engineering interpretability.
[0025] This embodiment addresses the technical challenge of highly heterogeneous equipment degradation modes under multiple operating conditions. It provides a prediction model for the remaining service life of multi-condition equipment systems based on a state-driven gated Transformer and its construction method. Specifically, it proposes a window-level state-driven routing mechanism that dynamically matches the optimal expert combination by using the operating condition vector and degradation deviation as routing navigation signals. A state modulation feature embedding module is designed to explicitly inject prior physical operating conditions into the feature space. A state-driven hybrid expert layer is constructed, ensuring that all features within the same time window share consistent expert combination coefficients, maintaining the continuity of temporal evolution. An attention pooling mechanism based on static query and linear decay bias is developed to strengthen the weight of key information at the end of the monitoring window. This method is validated on the NASA C-MAPSS multi-condition dataset, demonstrating good accuracy and robustness in predicting remaining service life under complex operating conditions, providing reliable technical support for predictive maintenance of industrial equipment. Figure 1 As shown, it includes the following steps: 1. Training set construction: 1.1 Data Acquisition: Degradation monitoring data under multiple operating conditions were obtained from the status maintenance data of the equipment during operation. The dataset consists of complete operating trajectories of multiple similar equipment, each trajectory containing D-dimensional sensor monitoring variables, covering the entire process of the equipment from normal state to failure.
[0026] 1.2 Perform data preprocessing on the raw data from the equipment sensors: 1.2.1 Normalization: Since the physical quantities monitored by different sensors have drastically different units and numerical ranges, directly using the raw data will cause the model to be dominated by large numerical features, resulting in an imbalanced optimization. Therefore, the min-max normalization method is used to map the data of each sensor to the interval [0, 1]. For the multi-condition dataset, statistics are calculated independently within each condition cluster for normalization, eliminating the differences in units while preserving the relative distribution differences between conditions. Specifically, the normalization formula used is as follows: ; In the formula, Indicates the first i The first sample j The raw values of each sensor, This represents the normalized value; and These represent the first time under operating condition c. j The maximum and minimum values of each sensor.
[0027] 1.2.2 Sliding Window Sampling: Sliding windows are commonly used for data segmentation preprocessing to enable the model to extract as much valuable information as possible from multivariate time series. Specifically, a fixed window length of L and a sliding step size of s are used to extract continuous segments of sensor readings along the time axis. This process not only preserves local temporal dependencies but also effectively expands the number of training samples.
[0028] 1.2.3 Operating Condition Construction: A modal generation module is set up to generate operating condition vectors reflecting the operating modes based on the operating condition parameters corresponding to each sample in the time-series sample set. Specifically, this includes: K-Means clustering is performed on the various operating condition parameters corresponding to the time series sample set to divide the operating state into multiple operating condition modes; For each sample, the working condition mode label of the last time step is taken as a discrete input index, which is then mapped into a continuous working condition vector by the mode generation module.
[0029] Specifically, the operating condition parameters provided by the original dataset are usually discrete physical settings, which are difficult to directly use as continuous features for effective utilization by deep networks. To obtain the explicit operating condition priors defined in the methodology... K-Means clustering is performed on the operational settings parameters in the original data. Based on the elbow rule, the operating states are divided into multiple typical modes. For example, when constructing a time-series sample set for a piece of equipment, three different operating parameters are involved. Therefore, each sample corresponds to a combination of these three operating parameters. All samples are clustered using these operating parameter combinations to divide them into multiple operating modes, each with a label. For each time window, the label of the operating mode at its end is taken as a discrete input index. Then, within the model, a learnable embedding layer, i.e., a mode generation module, maps this index into a continuous operating vector, thus providing a macroscopic physical context for feature extraction.
[0030] Specifically, in online prediction, for real-time monitoring data, the operating mode can be determined by K-Means clustering based on the operating parameters at the end of the window, and then the corresponding operating vector can be obtained.
[0031] 2. Model Construction: The prediction model also includes a state modulation feature embedding module; the state modulation feature embedding module includes a state modulation layer, which generates a scale factor and a shift factor based on the working condition vector through linear mapping, and performs a state-based radial transformation on the input features using a feature linear modulation mechanism; sample data is input into the state modulation feature embedding module, and after state modulation, it is output to the state-driven encoding module.
[0032] The state modulation feature embedding module further includes an input embedding layer and a position encoding layer before the state modulation layer; the input embedding layer uses a linear projection matrix to map the input sample data into high-dimensional features; the position encoding layer uses classical sine and cosine position encoding to construct a position vector for each time step within the window, and adds the position vector to the high-dimensional features output by the input embedding layer element by element to obtain a feature sequence representation containing position information.
[0033] 2.1 Model Structure: The prediction model is a state-driven pure encoder-only Transformer architecture (SDG-Former), which includes a state modulation feature embedding module, a state-driven encoding module, and a prediction head module.
[0034] 2.2 State Modulation Feature Embedding Module: This module maps the multivariate sensor sequences within the window to a unified latent space representation to form a high-dimensional feature representation, and explicitly injects operating condition information before entering the encoding module. This module consists of three parts: input embedding, position encoding, and state modulation. The input embedding layer converts the original sensor data into latent features, the position encoding layer injects temporal position information, and the state modulation layer generates scale factors and offset factors based on the operating condition physical parameter vector, performs a radial transformation on the latent features, and realizes the explicit injection of operating condition information.
[0035] 2.2.1 Input embedding layer utilizes linear projection matrix Input multivariable sensor sequence Mapping to high-dimensional features ; 2.2.2 The position encoding layer uses classic sine and cosine position encoding for each time step within the window. Construct a position vector and add it element-wise to the embedded features to obtain a sequence representation containing positional information. The classic sine and cosine position coding functions used are shown below: ; ; In the formula, PE It's location information. It is the time step index within the window. a It is a feature dimension index. H This is the dimension of the model's latent space. This design allows the encoding at any fixed offset position to be represented as a linear function of the encoding at the current position, thus giving the model the ability to capture relative positional relationships.
[0036] 2.2.3 The state modulation layer introduces a Feature Linear Modulation (FiLM) mechanism to align the embedded features to a state. This mechanism utilizes the working condition vector. Generate scale factor and offset factor The latent features are subjected to a state-based radiative transformation, and the output of the state modulation layer is as follows: ; In the formula, and It is a linear projection layer, and its outputs are a scale factor and a shift factor. It represents the Hadamardi (or Hadama) stack; The input features for the state modulation layer are the feature sequence representations containing positional information output by the position coding layer. This design enables the model to dynamically modulate the scale and bias of the feature space according to the current operating conditions, achieving state alignment and facilitating the model's identification of heterogeneous degradation patterns under multiple operating conditions.
[0037] 2.3 State-Driven Encoding Module: The state-driven encoding module further includes a multi-head self-attention layer preceding the state-driven hybrid expert layer. This multi-head self-attention layer employs the MHA mechanism to capture long-range temporal dependencies and mine correlation patterns between sensors. Each sub-layer of the state-driven encoding module is preceded by a root mean square normalization (RMSNorm) layer, and the sub-layer output is added to the input via a residual connection. This module mainly consists of a multi-head self-attention layer (MHA) and a state-driven hybrid expert (SDG-MoE) layer, employing a pre-norm structure. Each sub-layer is preceded by a root mean square normalization (RMSNorm), and the sub-layer output is added to the input via a residual connection.
[0038] Formally, for the input features of the multi-head self-attention layer (b=1,…,B) in layer b... The calculation process of the multi-head self-attention layer in this layer can be represented as follows: ; ; It is the input to the multi-head self-attention layer in layer b+1. It is the output of the multi-head self-attention layer in layer b.
[0039] 2.3.1 The multi-head attention layer employs the MHA mechanism to capture long-range temporal dependencies and mine correlation patterns between multiple sensors. Specifically, given the normalized input... ,use A set of independent learnable linear transformations projects the query, key, and value onto a dimension of . , , The low-dimensional subspace. For the m-th head, the attention output is calculated as follows: ; In the formula, It is the output of the m-th attention head; These are the query, key, and value, respectively.
[0040] Finally, the outputs of all the heads are concatenated and passed through a learnable linear layer. The data is fused to restore the original feature dimensions and aggregate temporal information from multiple perspectives: .
[0041] 2.3.2 State-Driven Hybrid Expert Layer (SDG-MoE) adds a state-driven mechanism to the hybrid expert layer, generating gating weights with window-level states, ensuring that all tokens within the same time window share the same set of expert combination coefficients. A gating module preceding the private expert layer receives the input feature map and condition vector to obtain the window-level state. The window-level state input state gating router, i.e., the state gating module, assigns routing weights to each private expert, and the weighted output of the private experts becomes the output of the private expert layer. The common expert layer receives the input feature map, and its output, combined with the output of the private expert layer, forms the final output of the state-driven hybrid expert layer.
[0042] (1) Construction of window-level state representation: window-level state The degradation information is obtained by combining the load condition vector and time series data, specifically as follows: The combined load condition vector and degradation stage information are used to construct the state information to guide expert selection. ; In the formula, This is the operating condition vector, representing the macroscopic operating conditions; This information pertains to the degradation stage and reflects the microscopic, instantaneous degradation trend. The end features of the input feature window, This represents the mean of the features across all time steps within the window.
[0043] (2) Attention-based Gated Routing: The routing process is modeled as a semantic matching between states and experts. The state gating module uses an attention mechanism, taking the operating parameters and the deviation of the degraded state as queries and keys to generate routing weights with clear physical semantics, making the expert selection process engineering interpretable. The private expert layer generates gating weights based on window-level states, and performs weighted output of the private expert layer according to the gating weights, as follows: The window-level state is projected as a query vector, and each private expert in the private expert layer is associated with a learnable key vector; the routing score corresponding to each private expert is obtained by scaling dot product attention based on the query vector and the key vector corresponding to each private expert. The routing scores of all private experts are normalized to obtain the routing weight corresponding to each private expert. The top few private experts with the highest routing weights are then weighted and output as the output of the private expert layer.
[0044] The routing score for each private expert is obtained using the following formula: ; ; In the formula, It is a window-level state representation; and It is a learnable projection matrix; It is an activation function; It is a query vector generated from the window-level state representation; It is a private expert dimension, used as a scaling factor to control the numerical range of the dot product result; It is the first The learnable key vectors corresponding to each private expert; It is the first The original routing score of a private expert; It is the gate space dimension, used as a scaling factor to prevent the dot product value from becoming too large as the dimension increases, thus maintaining training stability. This mechanism enables the model to dynamically match the most suitable expert combination based on the current state.
[0045] (3) Expert Hybrid Computation: The expert network is divided into two groups: shared experts and private experts, achieving functional decoupling. The shared experts, which are always active, aim to capture general degradation trends, while the top-K private experts, dynamically selected by routing, focus on fitting specific working condition modes. The output process of each expert is a feature mapping process. The structure of the expert itself is a basic FFN, that is, each expert can be regarded as a feature transformer (FFN) with independent parameters. When the input features enter a certain expert, the expert will perform a nonlinear mapping and output a feature representation with the same dimension as the input or the same dimension as the preset dimension. This result is the output of the expert. The final output is a weighted aggregation of the two: the output of the state-driven hybrid expert layer. Specifically as follows: ; In the formula, For the first c The output of a public expert; The number of publicly owned experts; For input features; Indicates the route with the highest weight. K A private expert; For the selected number e The output of a private expert; This represents the normalized routing weights.
[0046] 2.4 Prediction Head Module: As the output terminal of the model, it aims to convert the high-dimensional temporal features output by the encoder into... The remaining useful life (RUL) is mapped to a scalar form. The prediction head module consists of two sub-layers: a temporal bias pooling layer and a regression prediction head. The temporal bias pooling layer performs non-uniform aggregation of the input sequence features based on temporal weights, generating a fixed-dimensional window-level representation. The regression prediction layer further maps the window-level representation to the final RUL estimate. The prediction head module performs temporal bias aggregation on the encoded features: a weighted averaging mechanism is used to assign differentiated weights to features at each time step within the window, highlighting the dominant influence of recent degradation states on RUL, and outputting the RUL prediction value. The temporal bias aggregation layer is used to perform weighted averaging of state information at different time points to enhance the model's ability to predict future states.
[0047] 2.4.1 Temporal Bias Pooling: An attention pooling mechanism based on static query and linearly decaying bias is designed. This mechanism introduces a globally shared learnable query vector, which is used as a benchmark to search for key features in the latent space. Simultaneously, a learnable penalty term that increases with temporal distance is superimposed, guiding the model to focus on the latest state at the end of the window, thus achieving an optimal balance between noise robustness and timeliness. Specifically, a learnable static query vector is defined. As a parameterized reference vector independent of the input sample, The aim is to learn feature representations of target degradation patterns through training, and to compute relative importance weights at each time step through interaction with temporal features. The input sequence for the temporally biased pooling layer... After linear projection, the projected values are as follows: ; In the formula, The key matrix; It is a value matrix; and The linear projection parameters are then used; subsequently, the query vector is computed relative to each time step in the input sequence. key vector The relevance score is calculated and injected with a linearly decaying temporal distance bias. The attention score at each time step is unnormalized. The calculation is as follows: ; In the formula, a The slope parameter is learnable; Ensure the penalty coefficient is positive; for The Middle The key vector corresponding to each time step; The feature dimension is used to scale the dot product result; The input sequence has a time dimension. This design gives the model an explicit temporal inductive bias, allowing the attention mechanism to adaptively suppress interference from long-term historical information and ensure that key degenerate features at the end of the window dominate the aggregation process. The final aggregated features... We obtain the following by weighted summation: ; ; In the formula, It's an attention score. These are the attention weights at each time step after normalization. for The Middle The value vector corresponding to each time step.
[0048] 2.4.2 Regression Prediction Head: Obtaining a Window-Level Representation with Fixed Dimensions Then, a classic two-layer MLP is used to map it to the final RUL estimate. This network consists of a linear transformation layer, a SiLU activation function, and Dropout regularization, aiming to extract nonlinear degradation patterns in the features and prevent overfitting. The specific calculation process is as follows: ; ; In the formula, and This is a learnable weight matrix. This is the RUL prediction value of the model for the current input window.
[0049] 3. Model Training: 3.1 Loss Function Construction: The model parameters are adjusted using a physical constraint optimization algorithm. The model parameters are jointly optimized based on prediction error and physical constraint losses (such as degenerate monotonicity constraints) to ensure that the prediction results conform to the physical laws of equipment operation. Total Loss Function It consists of a weighted average of the RUL regression loss and the load balancing auxiliary loss. The total loss function during training... RUL regression loss and load balancing auxiliary losses It consists of two weighted components, as follows: ; In the formula, It is a weight used to balance regression accuracy and expert load balancing.
[0050] 3.1.1 RUL Regression Loss: The main objective is to minimize the difference between the predicted RUL and the true label. The model uses mean squared error as the regression loss function: The RUL regression loss uses mean squared error as the regression loss function, as follows: ; In the formula, For batch size, and The first i The predicted and actual values of each sample.
[0051] 3.1.2 Load Balancing Auxiliary Loss: Introducing auxiliary load balancing losses The loss function aims to minimize the covariance between the "hard usage frequency" and the "soft selection probability" of expert assignments, ensuring that expert selection is both explicit and prevents imbalances in expert utilization.
[0052] For a containing The batch of the nth window sample is first defined. e Hard usage frequency of an expert That is, the normalized number of times the expert is selected by Top-K in the current batch: Define the first... e Hard usage frequency of an expert The normalized number of times this expert was selected in the current batch: ; In the formula, K The number of private experts selected for each sample; For indicator functions; Indicates the first i A set of private experts selected from a sample; Definition of the first e The probability of soft choice by an expert The average routing weight for this expert in the current batch: ; Based on the above two metrics, the load balancing auxiliary loss is the inner product of the expert soft-selection probability and the hard usage frequency: ; In the formula, The total number of private experts; For the first i Each sample corresponds to a private expert score vector. By minimizing... The model tends to avoid an extreme skew in the distribution of expert selection, preventing pattern collapse where a few experts bear most of the load while other experts are idle.
[0053] 3.3 Evaluation Metrics: To comprehensively evaluate prediction performance, two commonly used metrics, Root Mean Square Error (RMSE) and a score function, are used to measure the difference between the model output and the true RUL. The formulas for calculating both are as follows: ; ; In the formula, n It is the number of test samples. Indicates the first i The prediction error for each sample.
[0054] RMSE primarily measures the overall deviation of predicted values and is highly sensitive to large errors. The Score function, on the other hand, employs an asymmetric penalty mechanism. Considering that the safety risks associated with overestimating RUL in industrial forecasting far outweigh those with underestimating RUL, it imposes a more severe penalty on overestimation errors. For both metrics, smaller values indicate better predictive performance.
[0055] This embodiment also provides a multi-condition equipment remaining service life prediction system, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements the steps of any of the above-described multi-condition equipment remaining service life prediction methods.
[0056] The following are specific examples: Specifically, the turbofan engine degradation simulation dataset (C-MAPSS) was selected as the data source for method validation. This dataset contains engine operating data under four different operating conditions. Under each operating condition, the engine operates at a fixed altitude and Mach number, but the environmental conditions and operating settings differ, resulting in significant heterogeneity in degradation patterns. Operating condition 1 (FD001) contains 100 training sequences and 100 test sequences; Operating condition 2 (FD002) contains 260 training sequences and 259 test sequences; Operating condition 3 (FD003) contains 100 training sequences and 100 test sequences; and Operating condition 4 (FD004) contains 248 training sequences and 249 test sequences. Each time step includes 21-dimensional sensor readings and 3-dimensional operating condition parameters, i.e., 3-dimensional operating condition parameters, used to characterize the engine's operating state.
[0057] Given that FD001 and FD003 only contain single operating conditions and cannot fully reflect the challenges of cross-operating condition modeling, the focus is on the subsets FD002 and FD004, which have high operating condition heterogeneity. The C-MAPSS dataset contains 21 sensor channels, of which sensors 1, 5, 6, 10, 16, 18, and 19 have constant values or extremely small variances during operation, indicating that the data from these sensors are irrelevant to the engine degradation process. Therefore, the remaining 14 sensor data, which are highly correlated with the engine degradation process, are selected as model inputs. Furthermore, in the entire life cycle of an engine, early performance is usually in a healthy and stable period, and the minor degradation that occurs during this period is not only difficult to capture from sensor data, but is also often considered a negligible non-critical stage in actual engineering maintenance. Therefore, in order to focus on the more predictive mid-to-late-stage degradation process and eliminate the interference of early invalid gradients on model training, a piecewise linear RUL strategy is adopted and a maximum lifespan threshold is set. =125, that is .
[0058] To determine the structural configuration of SDG-Former, Bayesian optimization was used to search for optimal model parameters on the FD002 and FD004 subsets, respectively. The relevant structural parameter settings are shown in Table 1. The sliding time window lengths for the FD002 and FD004 datasets were set to 60 and 52, respectively, with step sizes of 1 and 2. During training, the AdamW optimizer was used, with a maximum training epoch count of 200, an early stopping strategy, and a patience epoch count of 30. Furthermore, the batch size was set to 128, and the random seed was set to 42. The learning rates for FD002 and FD004 were set to 1.6 × 10⁻⁶. -4 and 4.0×10 -4 The load balancing auxiliary loss weights were set to 0.08 and 0.12, respectively. All experiments were conducted on a Windows 11 workstation with an AMD Ryzen 9 9950X processor, 128GB of RAM, and an NVIDIA RTX 5090 32GB GPU.
[0059] Table 1 SDG-Former structural parameters
[0060] Figure 2 The results show the trend of mean squared error during training. The model's loss value converged rapidly within 10 training epochs, indicating that the algorithm demonstrated strong learning ability in the early stages of training and could quickly capture the main features of the data. The smooth curve in the later stages without drastic oscillations indicates that the training process was stable and there was no overfitting or gradient explosion. The similar convergence levels of both methods indicate that the algorithm has good adaptability to data of different complexities.
[0061] Figure 3 The temporal evolution patterns of the actual and predicted IRI values on the test set were compared. The results show that the predicted curves and the measured values are highly consistent in both the normal degradation stage and the accelerated degradation stage. Figure 3 In (a), the predicted curve (red dashed line) closely follows the actual RUL curve (blue solid line), maintaining a high degree of consistency throughout the entire degradation process of 140 time steps, while the algorithm successfully predicts the continuously linearly decreasing RUL. Figure 3 The prediction curve (b) accurately captures the two-stage degradation pattern, remaining stable before time step 70 and then accurately predicting the downward trend. Neither prediction curve shows significant oscillations or abrupt changes.
[0062] The above results show that the algorithm, through effective temporal feature extraction and multi-scale modeling capabilities, simultaneously captures short-term fluctuations and long-term degradation trends in sensor data, achieving high-precision tracking of the FD002 linear degradation mode and the FD004 segmented degradation mode. The predicted curve closely follows the real RUL curve and remains stable without oscillation throughout, demonstrating strong adaptive learning capabilities. It can adapt to different degradation modes without manual adjustment and adopts an end-to-end prediction method directly from sensor data to RUL without intermediate feature engineering, maintaining accurate prediction even at critical moments when RUL is close to 0.
[0063] Those skilled in the art will readily understand that the above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A method for predicting the remaining service life of equipment under multiple operating conditions, characterized in that, include: Offline training phase: Acquire historical monitoring data of equipment sensors under various operating conditions and construct a time series sample set by dividing the time series data into sliding window segments. A modality generation module is set up to generate a working condition vector of the reaction working mode based on the working condition parameters corresponding to each sample in the time series sample set. The prediction model is constructed by a state-driven encoding module and a prediction head module. The state-driven encoding module includes a state-driven hybrid expert layer, which comprises a private expert layer and a shared expert layer. The private expert layer generates gating weights based on window-level states, and performs weighted output based on these gating weights. The state-driven hybrid expert layer combines the outputs of the private and shared expert layers to obtain high-dimensional time-series features. The window-level states are obtained based on the condition vector and degradation information of the time-series data. The prediction head module maps the high-dimensional time-series features to predicted remaining useful life values. The modality generation module and prediction model are trained using a time series sample set to obtain the trained modality generation module and prediction model; Online prediction phase: The system acquires real-time monitoring data from equipment sensors, obtains real-time operating condition vectors based on the trained modal generation module, and uses the trained prediction model to predict the remaining service life.
2. The method for predicting the remaining service life of equipment under multiple operating conditions as described in claim 1, characterized in that, A modality generation module is set up to generate a condition vector for the reactive operating mode based on the operating parameters corresponding to each sample in the time-series sample set. Specifically, this includes: K-Means clustering is performed on the various operating condition parameters corresponding to the time series sample set to divide the operating state into multiple operating condition modes; For each sample, the working condition mode label of the last time step is taken as a discrete input index, which is then mapped into a continuous working condition vector by the mode generation module.
3. The method for predicting the remaining service life of equipment under multiple operating conditions as described in claim 1, characterized in that, The prediction model further includes a state modulation feature embedding module; the state modulation feature embedding module includes a state modulation layer, which generates a scale factor and a shift factor based on the working condition vector through linear mapping, and performs a state-based radial transformation on the input features using a feature linear modulation mechanism; sample data is input into the state modulation feature embedding module, and after state modulation, it is output to the state-driven encoding module.
4. The method for predicting the remaining service life of equipment under multiple operating conditions as described in claim 3, characterized in that, The state modulation feature embedding module further includes an input embedding layer and a position encoding layer preceding the state modulation layer; the input embedding layer uses a linear projection matrix to map the input sample data into high-dimensional features; The position encoding layer uses classical sine and cosine position encoding to construct a position vector for each time step within the window, and adds the position vector to the high-dimensional features output by the input embedding layer element by element to obtain a feature sequence representation containing position information; And / or, the state-driven coding module further includes a multi-head self-attention layer preceding the state-driven hybrid expert layer, wherein the multi-head self-attention layer employs the MHA mechanism to capture long-distance temporal dependencies and mine correlation patterns between sensors; each sub-layer of the state-driven coding module is preceded by a root mean square normalization layer, and the output of the sub-layer is added to the input through a residual connection.
5. The method for predicting the remaining service life of multi-condition equipment as described in any one of claims 1-4, characterized in that, Window-level state Degradation information is obtained by combining load condition vectors and time series data, as follows: ; In the formula, This is the operating condition vector, representing the macroscopic operating conditions; This information pertains to the degradation stage and reflects the microscopic, instantaneous degradation trend. The end features of the input feature window, This represents the mean of the features across all time steps within the window.
6. The method for predicting the remaining service life of multi-condition equipment as described in any one of claims 1-4, characterized in that, The private expert layer generates gate weights at the window level, and performs weighted output based on these gate weights, as follows: The window-level state is projected as a query vector, and each private expert in the private expert layer is associated with a learnable key vector; the routing score corresponding to each private expert is obtained by scaling dot product attention based on the query vector and the key vector corresponding to each private expert. The routing scores of all private experts are normalized to obtain the routing weight corresponding to each private expert. The top few private experts with the highest routing weights are then weighted and output as the output of the private expert layer.
7. The method for predicting the remaining service life of equipment under multiple operating conditions as described in claim 6, characterized in that, The routing score for each private expert is obtained using the following formula: ; ; In the formula, It is a window-level state representation; and It is a learnable projection matrix; It is an activation function; It is a query vector generated from the window-level state representation; It is a private expert dimension, used as a scaling factor to control the numerical range of the dot product result; It is the first The learnable key vectors corresponding to each private expert; It is the first The original routing score of a private expert; The output of the state-driven hybrid expert layer Specifically as follows: ; In the formula, For the first c The output of a public expert; The number of publicly owned experts; For input features; Indicates the route with the highest weight. K A private expert; For the selected number e The output of a private expert This represents the normalized routing weights.
8. The method for predicting the remaining service life of multi-condition equipment as described in any one of claims 1-4, characterized in that, The prediction head module consists of two sub-layers: a time-biased pooling layer and a regression prediction head. The time-biased pooling layer performs non-uniform aggregation of the input sequence features based on temporal weights to generate a fixed-dimensional window-level representation. The regression prediction layer further maps the window-level representation to the final remaining lifetime estimation result.
9. The method for predicting the remaining service life of multi-condition equipment as described in any one of claims 1-4, characterized in that, Total loss function during training RUL regression loss and load balancing auxiliary losses It consists of two weighted components, as follows: ; In the formula, These are the weights used to balance regression accuracy and expert load balancing; the RUL regression loss uses the mean squared error as the regression loss function, as follows: ; In the formula, For batch size, and The first i The predicted and actual values of each sample; the construction of the load balancing auxiliary loss is as follows: Definition of the first e Hard usage frequency of an expert The normalized number of times this expert was selected in the current batch: ; In the formula, K The number of private experts selected for each sample; For indicator functions; Indicates the first i A set of private experts selected from a sample; Definition of the first e The probability of soft choice by an expert The average routing weight for this expert in the current batch: ; The load balancing auxiliary loss is the inner product of the expert soft-selection probability and the hard usage frequency: ; In the formula, The total number of private experts; For the first i The private expert score vector corresponding to each sample.
10. A multi-condition equipment remaining service life prediction system, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps of the multi-condition equipment remaining service life prediction method as described in any one of claims 1 to 9.