Optimization method for aircraft engine maintenance strategies based on nonparametric reinforcement learning
Nonparametric reinforcement learning with Gaussian processes and Bayesian networks addresses system uncertainty in aircraft engines, enhancing maintenance efficiency and reliability by integrating uncertainty and updating training data.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- ZHEJIANG UNIV
- Filing Date
- 2024-08-01
- Publication Date
- 2026-07-02
AI Technical Summary
Conventional aircraft engine maintenance methods struggle with system uncertainty, particularly in complex environments with sensor failures, leading to inefficiencies and resource waste, and there is a need for intelligent maintenance strategies that balance safety and economic efficiency.
A method utilizing nonparametric reinforcement learning, combining Gaussian processes and Bayesian networks to model aircraft engine systems, integrating system uncertainty and prior maintenance experience data, and dynamically updating training data to improve decision-making.
Enhances the reliability and safety of aircraft engine maintenance by accurately predicting maintenance needs, reducing system feedback costs, and improving sampling efficiency through dynamic data updates.
Smart Images

Figure 2026521822000001_ABST
Abstract
Description
[Technical Field]
[0001] This invention relates to the technology of aircraft engine operation and maintenance, and more particularly to a method for optimizing aircraft engine maintenance strategies based on nonparametric reinforcement learning. [Background technology]
[0002] Aircraft engines are a vital part of the infrastructure for transporting cargo and passengers, and rationally developing maintenance strategies is crucial for keeping aircraft engine devices functioning properly, increasing reliability, and saving airline costs. However, routine aircraft engine inspections are hampered by the complexity of the engine devices, relatively poor working environments, high levels of uncertainty in system sensors, complex degeneration mechanisms in sub-components, and the need for safety-assured design in the engine system. This leads to reduced operational efficiency and serious resource waste in aircraft engine system operation and maintenance.
[0003] To solve the above problems, it is necessary to incorporate system uncertainty into the scope of policy formulation and improve algorithmic efficiency. Therefore, there is an urgent need to research new aircraft engine operation and maintenance proposals that can achieve intelligent operation and maintenance of aircraft engines while balancing safety and economic efficiency. Against this backdrop, a method for formulating aircraft engine operation and maintenance policies based on nonparametric reinforcement learning has emerged.
[0004] Existing aircraft engine maintenance decision-making methods primarily include fault repair maintenance, preventive maintenance, and condition-based maintenance. Fault repair maintenance is maintenance action taken after a device or system failure occurs, i.e., unplanned maintenance, usually occurring when a device fails or stops working, and mainly includes fault detection, diagnostic issues, device repair, and restoration to normal operating conditions. Preventive maintenance is planned maintenance activity taken before a device fails, with the goal of preventing device failure through regular inspections, maintenance, and repairs, and ensuring that the device maintains efficient operation within a given timeframe. Condition-based maintenance is a maintenance strategy that determines maintenance timing based on the actual operating and health status of the device, using sensors, monitoring devices, and technologies to collect data, predict the device's health, and determine when to perform maintenance. Conventional aircraft engine maintenance decision-making methods generally rely on static maintenance plans or the current system status, and are insufficient in situations requiring rapid response and high safety levels. For example, they struggle to provide accurate maintenance proposals for decision-making problems in system uncertainty scenarios such as sensor failures.
[0005] By introducing multi-agent reinforcement learning technology and combining it with Gaussian process uncertainty, system uncertainty in aircraft engine maintenance systems can be better resolved. Combined with a priori maintenance experience data, maintenance decisions can be made for complex coupled multi-subsystem engines, improving the operational efficiency of the devices. [Overview of the Initiative] [Problems that the invention aims to solve]
[0006] The objective of this invention is to provide an optimization method for aircraft engine maintenance strategies based on nonparametric reinforcement learning, in order to address the shortcomings of the prior art. [Means for solving the problem]
[0007] To achieve the above objective, the present invention is realized by the following technical solution. In a first aspect of the present invention, a method for optimizing an aircraft engine maintenance strategy based on nonparametric reinforcement learning is provided, and the optimization method is The steps include collecting historical aircraft engine operating data and constructing an aircraft engine model, The steps include constructing a policy network and a value network, randomly selecting an initial system state s0, and formulating an initial maintenance operation a0, The steps include: interacting with the aircraft engine model based on the initial system state s0 and initial maintenance operation a0, and obtaining a set of state-operation value groups including the system state, maintenance operation, system feedback cost, and next system state; Construct a playback buffer area and set state operation value groups (s) for each step. t ,a t ,r(s t ),s t+1 The steps include storing the above in the playback buffer area, The steps include performing random sampling in the aforementioned playback buffer area as a training set, Steps to update the policy network and value network, Steps to update your training set, The system includes the steps of determining whether the system operation step size has reached a predetermined system training step size, stopping the training if it has reached the target size, obtaining the trained network, and outputting an aircraft engine maintenance optimization policy, otherwise continuing to select actions on the policy network and repeating the training until the operation step size reaches the training step size.
[0008] Furthermore, the parameter values of the system state correspond to the stagnation pressure of each assembly i of the engine. TIFF2026521822000002.tif88, stagnation temperature TIFF2026521822000003.tif87, Stagnation Enthalpy TIFF2026521822000004.tif88, Pressure Ratio π i , Specific Heat at Constant Pressure of the Working Medium TIFF2026521822000005.tif89, Specific Heat Ratio γ of the Working Medium i , Efficiency Factor η i including, the maintenance operation includes overhaul and minor repair.
[0009] Furthermore, the aircraft engine model includes a system state transition model and a system feedback model, The system state transition model fits the system state transition probability obtained based on the Markov process through a Bayesian network, and the Bayesian network parameter L is represented by the maximum likelihood estimation method. The Bayesian network parameter optimization is performed using the expectation maximization algorithm to obtain the system state transition function P(s t+1 |s t ). The system feedback model fits the operation cost r1(s t ) by a Gaussian process based on the system state s and, based on the maintenance operation determined by the policy network in the system state s t ), takes the sum of the prices corresponding to the maintenance operations of each assembly as the maintenance cost r2(s t ), and takes the sum of the operation cost r1(s t ) and the maintenance cost r2(s t ) as the system feedback cost r(s t ) in the system state s t ), t ), Then, according to the system state transition function P(s<00000!8>|s t ), the next system state s is obtained. t+1
[0010] Furthermore, the policy network is based on the system state s tBased on this, an action is selected, and the probability of performing a maintenance action P is determined by a Gaussian process. net (a t ) obtain the maintenance operation execution probability P net (a t ) is calculated by the following formula 10,
number
number
[0011] Furthermore, the operation selection involves using an ε-greedy algorithm to select either a maintenance operation search or a maintenance policy utilization to formulate the next maintenance operation. If the ε-greedy algorithm sampling is greater than ε, a maintenance operation search is performed, and an advantage function is obtained by performing random sampling within the uncertainty range of the Gaussian process using the policy network. If the ε-greedy algorithm sampling is less than ε, a maintenance policy utilization is performed, and the advantage function is output as an average value via a Gaussian process using the policy network.
[0012] Furthermore, Bayesian data is used to integrate a priori maintenance experience data with maintenance operations obtained from policy networks, thereby supporting decision-making and guidance.
[0013] Furthermore, the step of updating the policy network and the value network is, System state of the original sampling point TIFF2026521822000008.tif714, Advantage Function A π (s t ,a t ) old and the system state of the new sampling point TIFF2026521822000009.tif715, Advantage Function A π (s t ,a t ) new Based on this, in a Gaussian process that extends the Gaussian process fitting dataset, the input is: The filename is TIFF2026521822000010.tif1820, and the output is: The filename is TIFF2026521822000011.tif1541. A π (s t ,a t ) is state s t Operation a t This is the advantage function when performing the operation, and is calculated by the following formula 14:
number
number
number
[0014] Furthermore, the kernel function of the Gaussian process is calculated by the following equation 6:
number
[0015] Furthermore, the aforementioned step of updating the training set, The method includes a substep in which the degree of fitting between each training data and the value network in the training set is obtained by calculating the negative log-likelihood, batch random sampling is performed again on the regeneration buffer area and the degree of fitting between the resampling point and the value network is obtained by calculating the negative log-likelihood, and the degree of fitting between each training data and the value network is compared with the degree of fitting between the resampling point and the value network to replace data with a high degree of fitting in the training set with data with a low degree of fitting, thereby tilting the model to train on data with a low degree of model fitting and further improving the reliability of the model as a whole.
[0016] In a second aspect, the present invention further provides an optimization system for aircraft engine maintenance strategies based on nonparametric reinforcement learning, the optimization system comprising an aircraft engine model building module, a network initialization module, a maintenance operation determination module, a regeneration buffer module, a model training module, and a dynamic update module. The aforementioned aircraft engine model construction module is used to construct a system state transition model and a system feedback model based on Bayesian networks and Gaussian processes, using collected aircraft engine operating data. The aforementioned network initialization module is used to initialize the policy network and the value network. The aforementioned maintenance operation determination module is used to formulate maintenance strategies based on the current system state of the aircraft engine model, to avoid the occurrence of high-altitude deterioration conditions of the aircraft engine, and to reduce system feedback costs. The aforementioned playback buffer module is used to store the state operation value group for each step. The aforementioned model training module is used to train the policy network and the value network using state-action-value groups batch-extracted by the replay buffer module. The aforementioned dynamic update module is used to dynamically update the training set and replace data with a high degree of fitting to the value network with data with a low degree of fitting. [Effects of the Invention]
[0017] The beneficial effect of this invention is that, by constructing an aeroengine model and a nonparametric reinforcement learning method based on a Gaussian process, it is possible to fit an engine model with sparse data to a Gaussian process via a Bayesian network, while guaranteeing similarity to high-accuracy simulations. This invention can improve the degree of fitting between the entire training data and the model by fusing system uncertainty and a dynamic data update method, thereby improving the sampling efficiency of the algorithm. This invention improves system safety and sampling efficiency by an action selection method that fusing uncertainty based on prior maintenance experience data. The effects are superior to conventional methods, and it has advantages such as high versatility and ease of use. [Brief explanation of the drawing]
[0018] [Figure 1] This is a schematic diagram of the agent learning process according to the present invention. [Figure 2] This is a schematic diagram of the construction process of the aircraft engine model according to the present invention. [Figure 3] This is a block diagram of the optimization system for aircraft engine maintenance strategies based on nonparametric reinforcement learning according to the present invention. [Figure 4] This is the fitting effect of the aircraft engine model, where (a) is a comparison of the model prediction results with the actual results, and (b) is a global sensitivity analysis of the feedback model for the high-pressure turbine, low-pressure turbine, high-pressure compressor, and low-pressure compressor. [Figure 5] This is a comparison diagram of the training results using this method and the training results of a conventional decision algorithm. (a) is the algorithm training convergence result diagram, and (b) is a box diagram showing the actual application of the algorithm after training and the five-time result of an aircraft engine model. [Figure 6] This document proposes specific maintenance measures for the high-pressure turbine, low-pressure turbine, high-pressure compressor, and low-pressure compressor components of aircraft engines. [Modes for carrying out the invention]
[0019] The following describes the technical concepts in the embodiments of the present invention clearly and completely with reference to the drawings of the embodiments of the present invention. It is obvious that the embodiments described are only some, and not all, embodiments of the present invention. Any other embodiments that a person skilled in the art can obtain based on the embodiments of the present invention without requiring inventive work are all within the scope of the protection of the present invention.
[0020] Furthermore, the features of the embodiments of the present invention can be combined with each other, as long as they do not contradict each other.
[0021] The core technology of this invention mathematically models sparse data features of aircraft engine history data, constructs an aircraft engine model, and solves the model using nonparametric reinforcement learning for system uncertainty and sparse data characteristics of aircraft engine cases. Nonparametric reinforcement learning utilizes Gaussian process system uncertainty quantification and nonparametric features to integrate system uncertainty in aircraft engine scenes into the overall flow of a reinforcement learning algorithm, solving uncertainty problems with a sparse dataset of a real industrial scene. A Bayesian data fusion algorithm is used to integrate prior maintenance experience into the maintenance strategy search process, improving algorithm sampling efficiency. Furthermore, predictive maintenance problems for aircraft engine systems are solved, and a method for formulating operational and maintenance strategies for aircraft engine systems based on integrated system uncertainty is realized. The concept can be summarized as follows: 1. Construct an aircraft engine model. Based on engine history operating data, model the state transition probabilities of the aircraft engine model using a Bayesian network, fit the system feedback cost using a Gaussian process, and construct an aircraft engine model that can interact with a reinforcement learning process. 2. We propose a nonparametric reinforcement learning method based on Gaussian processes. We construct policy and value functions for a reinforcement learning algorithm based on Gaussian processes, explore and utilize actions based on the uncertainty of the Gaussian process, fuse a priori maintenance experience data with reinforcement learning decision results using a Bayesian data fusion algorithm, integrate the engine system uncertainty into the reinforcement learning process, use the deterioration state of the engine subsystem as a safety evaluation index, and use system feedback costs as an economic evaluation index for the maintenance plan. This allows us to formulate operational and maintenance strategies for aircraft engine systems and improve the overall safety and reliability of the system. 3. A dynamic model sampling point update method is adopted. Based on the negative log-likelihood of the training data and value network used in each model iteration, the degree of fitting between the training data and the value function model is calculated. By dynamically updating the model, training points with a low degree of fitting are preferentially selected for batch training, thereby improving the degree of fitting of the reinforcement learning network to the entire model's sampling points and enhancing the reliability of the system. 4. We propose an optimization system for aircraft engine maintenance strategies based on nonparametric reinforcement learning.
[0022] The method for optimizing aircraft engine maintenance strategies based on nonparametric reinforcement learning provided by the present invention specifically includes the following steps, as shown in Figure 1: (1) Construction of the aircraft engine model involves constructing an aircraft engine model for algorithm verification, which consists of two parts: a system state transition model and a system feedback model. Here, the system state transition model is responsible for determining the engine state deterioration process based on the current aircraft engine state, and the system feedback model is responsible for calculating the total cost of each engine operation and maintenance as the system reward. As shown in Figure 2, it includes the following substeps: In (1.1), the system state transition model determines the engine state deterioration process based on the current aircraft engine state. Because the operating costs of aircraft engines are high, there is little actual flight failure data and historical operation maintenance upkeep data for engines, and much of the historical aircraft engine operation data is sparse, in order to fit the transition probabilities between different states in each system state, the deterioration state s at engine time t is first determined based on sparse engine flight limit failure data. t (The following describes the system state s) t We define (referred to as) and use a Bayesian network to define the state transition probability P(s) of engine flight restriction failure data. t+1 |s t Fit it.
[0023] The state transition process between different engine assemblies is represented by isentropy relations and calibrated polytropic efficiency, and the state parameter value of the i-th assembly is the stagnation pressure. TIFF2026521822000019.tif88, stagnation temperature TIFF2026521822000020.tif87, Stagnant Enthalpy TIFF2026521822000021.tif88, pressure ratio π i The specific heat of the working fluid at constant pressure is The filename is TIFF2026521822000022.tif99, and the specific heat ratio of the working fluid is γ i The efficiency factor is η i The state parameter transition equations between different assemblies are as follows:
number
number
[0024] In the Bayesian network fitting process, the state transition probabilities of an aircraft engine are calculated using the following equation based on a Markov process:
number
[0025] The Bayesian network parameter L is expressed by the maximum likelihood estimation method.
number
number
[0026] In (1.2), the system feedback model fits the operational maintenance and upkeep costs of an aircraft engine based on sparse historical maintenance and upkeep data, and the transitions between degraded states of the aircraft engine. In an aircraft engine system, operational maintenance and upkeep costs are incurred during the operation process and during the process of degrading or repairing from one state to another. Furthermore, due to the sparsity of the system, problems such as data loss occur. Therefore, a Gaussian process is used to fit the system feedback costs of the aircraft engine system. By using a non-parametric random process, the Gaussian process has a better effect on the data loss problem of sparse systems compared to other fitting algorithms.
[0027] Collect historical operating cost data for each engine assembly and the status of the aircraft engine system. t and operating cost r1(s t ) to x (i) ,y (i) As shown above, the kernel function of a Gaussian process is as follows:
number
number
number
[0028] Based on the maintenance operation determined by the policy network for each assembly in that system state, and the corresponding maintenance price (based on the current factory estimate), the maintenance price for each assembly is calculated to determine the engine's system state s t Maintenance cost r2(s t )
[0029] Engine operating cost r1(s t ) and maintenance cost r2(s t The sum of ) is the system feedback cost r(s t )
[0030] In (1.3), the integration and initialization of the aircraft engine system is performed by the mathematical model of the aircraft engine system according to this embodiment, which consists of a system state transition model and a system feedback model, which accepts the existing state of the system during operation and gives the next state and system feedback cost based on the current state of the system and maintenance operations. After periodic operation of the aircraft engine model, the system transition function P(s) obtained by step (1.1) of the aircraft engine system is performed. t+1 |s t ) is the state s of this cycle t and the adopted maintenance operation a t Based on the following cycles of aircraft engine states s t+1 The system feedback model determines the state s in this cycle. tThe engine operation cost r1(s t ) and the maintenance cost r2(s t ) are calculated, and the operation cost and the maintenance cost are integrated as the system feedback cost r(s t ).
[0031] As in the above method, that is, the aviation engine model and the initialization state are obtained. This embodiment can be modeled using non-parametric networks such as Bayesian networks and Gaussian processes, and can perform effective fitting for the characteristics of the sparse dataset.
[0032] (2) Initializing the decision system network to formulate a policy means constructing and initializing the algorithm policy function Pnet and the value function Vnet, and formulating the maintenance policy through the policy network and experience support based on the initial state of the aviation engine model in step (1). It includes the following sub-steps. (2.1) In this embodiment, for formulating the maintenance policy of the engine based on the aviation engine model system state transition function P(s t+1 |s t ) and the system feedback R(s t ), the reinforcement learning policy network Pnet is constructed to formulate the system maintenance policy. The system state s t is input into the policy network Pnet, and the execution probability P(a t ) of the maintenance operation formulated by the system is obtained. The system state s t and the system advantage function A π (s t ,a t ) are represented by x and y, and a Gaussian process is constructed. The kernel function of the Gaussian process is as follows.
Equation
number
number
[0033] The policy network Pnet selects an action based on the system state. Action selection is based on the ε-greedy algorithm and involves choosing between two methods: action search or policy utilization. An advantage function is obtained for each maintenance action, and the next maintenance action is created. ε is a number between 0 and 1. Random sampling is performed between 0 and 1. If the sampling point is greater than ε, action search is adopted, meaning the policy network performs random sampling within the uncertainty range of the Gaussian process to obtain the advantage function. Conversely, if the sampling point is greater than ε, policy utilization is adopted, and the policy network outputs the average value of the advantage function via a Gaussian process.
[0034] Action search is performed by a policy network to determine the uncertainty range of a Gaussian process. The data was obtained by performing random sampling in TIFF2026521822000041.tif45.
number
[0035] Policy utilization uses the policy network obtained through training to perform the mean value output of the Gaussian process and inherit the results of the previous training.
[0036] Based on the advantage function of each maintenance action output by the Gaussian process, calculate the execution probability of the maintenance action. The execution probability of the maintenance action is as follows:
Number
Number
[0037] After obtaining the maintenance actions of the policy network, use Bayesian data fusion to combine the prior maintenance experience data and the policy network, combine and guide the prior maintenance experience data to improve the safety of the system, improve the sampling efficiency, and accelerate the convergence of the algorithm.
Number
[0038] [[ID=AB]] Here, Pa P is the decision-making probability of the policy network. ae w is the probability of execution of the operation based on a priori maintenance experience data, and w is the influence weight of the decision based on a priori maintenance experience.
[0039] (2.2) In a reinforcement learning process, the value network Vnet must perform a value assessment on the current state, and in a single Markov process, the meaning of the value function Vnet is the expected sum of the feedback values of the current state s.
number
[0040] Here is the G t This is system feedback. The system state s in the value network Vnet t Enter the value and the system will generate an estimated operating value V. t Obtain the system state s. t and the estimated operational value V of the system operation t to x (i) ,y (i) As shown, a Gaussian process is constructed, and the Gaussian kernel function and marginal likelihood are consistent with step (2.1). The output of the Gaussian process is the value network Vnet, which evaluates the system state V π (s)
[0041] In (2.3), based on the reinforcement learning policy network Pnet and value network Vnet generated in steps (2.1) and (2.2), the aircraft engine model constructed in step (1) is randomly initialized, a system state s0 is randomly selected and input to the policy network Pnet, and the maintenance operation a0 of the first step is created.
[0042] (3) Policy iteration and network training interact with the aircraft engine model built in step (1) based on the initial maintenance operation a0 and initial system state s0 built in step (2), obtain a set of state operation value groups, build a regeneration buffer area, and for each step, the state operation value group (s t ,a t ,R(s t ),s t+1 The data is stored in a playback buffer area, sampled, and used as a training set. The policy network Pnet and the value network Vnet are then updated, and the training set is dynamically updated using the negative log-likelihood.
[0043] In (3.1), based on the initial system state s0 and initial maintenance operation a0 obtained in step (2.3), the aircraft engine system is interacted with to obtain the next state of the system and the cost of the state transition process. This process is divided into one state operation group (s t ,a t ,R(s t ),s t+1 ) Organize the data as follows. Since both the policy network Pnet and the value network Vnet require training with batch data, a data replay buffer area is constructed to store each state-action group, and batch extraction is required to train the policy network and the value network. Store the state-action group for that step in the replay buffer area.
[0044] In (3.2), based on the regeneration buffer area constructed in step (3.1), the training data is batch extracted and the prediction V for the state guided by the policy π of the value network is made. π (s t Based on ), the advantage function A π Calculate,
number
[0045] The advantage function determines the system state of the original sampling point (i.e., the sampling point of the upper period). TIFF2026521822000048.tif713, Advantage Function A π (s t ,a t ) old and the system state of the new sampling point (i.e., the sampling point for this period) TIFF2026521822000049.tif716, Advantage Function A π (s t ,a t ) new Based on this, we extended the Gaussian process fitting dataset and updated the policy network. The input to the extended Gaussian process is: The filename is TIFF2026521822000050.tif1820, and the output is: The filename is TIFF2026521822000051.tif1641. A π (s t ,a t ) is state s t Operation a t The advantage function when performing this is the extended global covariance matrix,
number
number
[0046] Value function V using the time-lag method (TD-Error) π (st ) correct, perform batch training, and perform Gaussian process fitting,
number
[0047] In (3.3), based on the training data batched in step (3.2), the negative log-likelihood between each training data and the value network Vnet is calculated.
number
number
[0048] Here, e is the set of sampling points for training, and e * This is the set of sampling points for resampling.
[0049] In (3.4), the system operation step size is set in advance, and it is determined whether or not it has reached the system training step size. If it has reached the training step size, training is stopped; otherwise, the operation selection continues, and step (3) is repeated until the operation step size reaches the training step size.
[0050] In Example 1, An embodiment of the present invention is implemented in a machine equipped with an Intel(R) Xeon(R) Gold 6330 CPU, an NVidia GTX3090 graphics processor, and 128GB of memory. According to the above embodiment, the results shown in Figures 4 to 6 were obtained.
[0051] The General Electric GE90 engine was modeled as an aircraft engine, and the engine's state transition matrix and system feedback were fitted. The fitting results are shown in Figure 4. Figure 4(a) shows a comparison between the model prediction results and the actual results, with the vertical axis representing the predicted values and the horizontal axis representing the actual values. The circles represent actual data points, and the solid plus points represent predicted points. This method can accurately predict the state transition process and system feedback of an aircraft engine. Figure 4(b) shows a global sensitivity analysis of the system feedback model for high-pressure turbines, low-pressure turbines, high-pressure compressors, and low-pressure compressors. Figure 5(a) shows the effect of this method on rule-based maintenance methods, deep value reinforcement learning algorithms, and actor-critic reinforcement learning algorithms. The bottom curve is the convergence result of this method, and the top dashed line is for the rule-based maintenance algorithm. Nonparametric reinforcement learning can significantly reduce the system feedback cost compared to the rule-based maintenance algorithm. The two convergence curves at the top are for the deep value reinforcement learning algorithm and the actor-critic reinforcement learning algorithm, respectively. The convergence results of these two algorithms after 1000 training iterations were still not as good as those of nonparametric reinforcement learning after 100 training iterations. Furthermore, both of these comparison algorithms showed strong oscillations during the convergence process, indicating that the sampling efficiency of these two methods is low and stable training is not possible. Nonparametric reinforcement learning, on the other hand, was relatively stable during the training process and showed high sampling efficiency. As can be seen from the results, nonparametric reinforcement learning improved data utilization efficiency by more than 100 times compared to conventional reinforcement learning methods. This method improves algorithm sampling efficiency by fusing system uncertainty and improving the degree of fitting between the overall training data and the model through dynamic data updates. Figure 5(b) shows the average effect box plots after actually applying this method 5 times after training, comparing it with a rule-based maintenance method, a depth value reinforcement learning algorithm, and an Actor-Critic reinforcement learning algorithm. The leftmost box plot is the result for this method.The nonparametric reinforcement learning method exhibited significantly lower upper, lower, and average system feedback costs compared to the comparison method, while the Actor-Critic model sometimes resulted in divergent decisions. As can be seen, this method is relatively stable in formulating maintenance strategies for aircraft engines. Figure 6 shows the specific maintenance strategies formulated by this method for high-pressure turbines, low-pressure turbines, high-pressure compressors, and low-pressure compressors, with circles representing minor repairs and inverted triangles representing overhauls. As a result, this method can accurately mathematically model aircraft engines and accurately formulate rational maintenance strategies for different components of aircraft engines.
[0052] This embodiment also provides an optimization system for aircraft engine maintenance strategies based on nonparametric reinforcement learning to realize the above embodiment. The terms "module," "unit," etc., used below refer to a combination of software and / or hardware capable of realizing a predetermined function.
[0053] The nonparametric reinforcement learning-based aircraft engine maintenance strategy optimization system according to this embodiment includes the following modules, as shown in Figure 3: An aircraft engine model building module that constructs a system state transition model and a system feedback model based on Bayesian networks and Gaussian processes, based on collected aircraft engine operating data. A network initialization module used to initialize policy networks and value networks, Based on the current system status of the aircraft engine model, maintenance measures will be formulated to avoid the occurrence of high-altitude deterioration in the aircraft engine, and the system feedback cost r(s) will be reduced. t A maintenance operation determination module used to reduce ) and A playback buffer module is used to store the state-action-value group at each step in order to batch extract the training policy network and the value network. A model training module used to train policy networks and value networks using state-action-value groups extracted in batches by a replay buffer module, It includes a dynamic update module used to dynamically update the training set and replace data with a high degree of fitting to the value network with data with a low degree of fitting.
[0054] The above embodiments are used solely to illustrate the advantages and specific steps of the present invention and do not limit the scope of application of the present invention. Those skilled in the art may perform simulations of other forms based on the present invention, which is within the scope of protection of the present invention.
Claims
1. A method for optimizing an aircraft engine maintenance strategy based on nonparametric reinforcement learning, wherein the optimization method is: The steps include collecting historical aircraft engine operating data and constructing an aircraft engine model, We construct a policy network and a value network, and the initial system state s 0 Select randomly and perform initial maintenance operation a 0 The steps to formulate and The initial system state s 0 and initial maintenance operation a 0 Based on this, the steps include interacting with the aircraft engine model and obtaining a set of state-operation value groups including the system state, maintenance operation, system feedback cost, and the next system state, A playback buffer area is constructed, and a state operation value group (s) is formed for each step. t , a t ,r(s t ), s t+1 The steps include storing the above in the playback buffer area, The steps include performing random sampling in the aforementioned playback buffer area as a training set, Steps to update the policy network and value network, Steps to update your training set, A method for optimizing an aircraft engine maintenance strategy based on nonparametric reinforcement learning, characterized by comprising the steps of: determining whether the system operation step size has reached a predetermined system training step size; if so, stopping the training, acquiring the trained network, and outputting an aircraft engine maintenance optimization strategy.
2. The parameter values of the system state correspond to the stagnation pressure of each assembly i of the engine. , stagnation temperature Stagnation enthalpy , pressure ratio π i , specific heat at constant pressure of the working medium , working medium specific heat ratio γ i Efficiency factor η i Includes, The method for optimizing an aircraft engine maintenance strategy based on nonparametric reinforcement learning according to claim 1, characterized in that the maintenance operation includes an overhaul and minor repairs.
3. The aforementioned aircraft engine model includes a system state transition model and a system feedback model, The system state transition model is obtained by fitting the system state transition probabilities obtained based on a Markov process via a Bayesian network, the Bayesian network parameters L are shown by maximum likelihood estimation, and the Bayesian network parameters are optimized using an expectation maximization algorithm to obtain the system state transition function P(s) t+1 | s t ) obtain, The aforementioned system feedback model is based on the system state s t Based on this, the operating cost r is calculated using a Gaussian process. 1 (s t ) Fit and system state s t Based on the maintenance operations determined by the policy network, the sum of the prices corresponding to the maintenance operations for each assembly is the maintenance cost r 2 (s t ) and operating cost r 1 (s t ) and maintenance costs 2 (s t The sum of ) is the system state s t System feedback cost r(s) t )year, Subsequently, the system state transition function P(s) t+1 | s t ) results in the following system state s t+1 A method for optimizing an aircraft engine maintenance strategy based on nonparametric reinforcement learning according to claim 1, characterized in that it obtains the following.
4. The aforementioned policy network is in the system state s t Based on this, an action is selected, and the probability of performing a maintenance action P is determined by a Gaussian process. net (a t ) obtain the maintenance operation execution probability P net (a t ) is calculated by the following formula 10, [Number 10] A π (s t , a i ) is the system state s t i-th maintenance operation a of policy network prediction in i It is the advantage function of The maximum probability estimation method is used to select an action, and the system formulates maintenance action a t Outputs the maintenance operation a t This is calculated by the following formula 11, [Math 11] The aforementioned value network is a system state s t Based on the input, the system state s is determined by a Gaussian process. t Value assessment V π (s t A method for optimizing an aircraft engine maintenance strategy based on nonparametric reinforcement learning according to claim 1, characterized in that it outputs ).
5. The method for optimizing an aircraft engine maintenance policy based on nonparametric reinforcement learning, as described in claim 4, is characterized in that the operation selection involves formulating the next maintenance operation by selecting either a maintenance operation search or a maintenance policy use using an ε-greedy algorithm, executing a maintenance operation search if the ε-greedy algorithm sampling is greater than ε, obtaining an advantage function by performing random sampling within the uncertainty range of a Gaussian process using a policy network, and executing a maintenance policy use if the ε-greedy algorithm sampling is less than ε, and outputting the average value of the advantage function via a Gaussian process using a policy network.
6. A method for optimizing an aircraft engine maintenance policy based on nonparametric reinforcement learning according to claim 4, characterized in that it uses Bayesian data to integrate a priori maintenance experience data and maintenance operations obtained by a policy network, and supports decision guidance.
7. The step of updating the aforementioned policy network and value network is: System state of the original sampling point 【number】 , Advantage function A π (s t , a t ) old and the system state of the new sampling point 【number】 , Advantage function A π (s t , a t ) new Based on this, in a Gaussian process that extends the Gaussian process fitting dataset, the input is: 【number】 The output is, 【number】 And, A π (s t , a t ) is state s t Operation a t This is the advantage function when performing the operation, and is calculated by the following formula 14: [Number 14] γ is the damping coefficient, and r(s t , a t ) is the system feedback cost function, A Gaussian process is constructed based on the covariance matrix, and the updated policy network is calculated using the following equation 16: [Number 16] 【number】 is the variance of the observation noise, I is the identity matrix, and K is the covariance matrix calculated by the Gaussian process kernel function, and the aforementioned K is calculated by the following equation 15. [Number 15] The value evaluation V by the time difference method π (s t A method for optimizing an aircraft engine maintenance strategy based on nonparametric reinforcement learning according to claim 1, characterized by comprising a substep of correcting the value network, performing batch training, and updating the value network.
8. The kernel function of the Gaussian process is calculated by the following equation 6: [Math 6] x and x' are input vectors, 【number】 l is the square of the Euclidean distance, and l is the length scale parameter, which controls the smoothness of the function. 【number】 A method for optimizing an aircraft engine maintenance strategy based on nonparametric reinforcement learning according to any one of claims 3 to 7, characterized in that is a signal dispersion and is used to control the overall amplitude of the kernel function.
9. The step of updating the aforementioned training set is: A method for optimizing an aircraft engine maintenance strategy based on nonparametric reinforcement learning according to claim 1, characterized in that it includes a substep of comparing the degree of fitting between each training data and the value network in the training set by calculating the negative log-likelihood, performing batch random sampling again on the playback buffer area and calculating the negative log-likelihood to obtain the degree of fitting between the resampling point and the value network, and replacing data with a high degree of fitting in the training set with data with a low degree of fitting.
10. An optimization system for aircraft engine maintenance strategies based on nonparametric reinforcement learning, the optimization system comprising: an aircraft engine model building module; a network initialization module; a maintenance operation determination module; a regeneration buffer module; a model training module; and a dynamic update module. The aforementioned aircraft engine model construction module is used to construct a system state transition model and a system feedback model based on Bayesian networks and Gaussian processes, using collected aircraft engine operating data. The aforementioned network initialization module is used to initialize the policy network and the value network. The aforementioned maintenance operation determination module is used to formulate maintenance strategies based on the current system state of the aircraft engine model, to avoid the occurrence of high-altitude deterioration conditions of the aircraft engine, and to reduce system feedback costs. The aforementioned playback buffer module is used to store the state operation value group for each step. The aforementioned model training module is used to train the policy network and the value network using state-action-value groups batch-extracted by the replay buffer module. The aforementioned dynamic update module is used to dynamically update the training set and replace data with a high degree of fitting to the value network with data with a low degree of fitting, characterized in that it is used for optimizing aircraft engine maintenance strategies based on nonparametric reinforcement learning.