Satellite fault detection
On-board AI and machine learning systems for satellites facilitate rapid fault detection and isolation, addressing ground-based latency issues and enhancing system robustness and mission management.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- THE CHARLES STARK DRAPER LABORATORY INC
- Filing Date
- 2025-12-22
- Publication Date
- 2026-07-02
AI Technical Summary
Current diagnostics and prognostics for satellites are typically performed on the ground due to computational constraints, leading to delays in fault detection and response, and existing on-board systems lack robustness and efficiency in identifying and isolating faults.
Implementing on-board fault detection systems using artificial intelligence and machine learning algorithms that utilize sensor data and control data to identify, isolate, and self-improve through communication with other satellites, enabling rapid fault detection and proactive system health monitoring.
Enables rapid fault detection and isolation, reduces latency, increases system robustness, and enhances mission management by utilizing the full range of available sensor data without relying on ground-based analysis.
Smart Images

Figure US2025060891_02072026_PF_FP_ABST
Abstract
Description
Attorney Docket No 40093-151-B-WOClient Docket No. CSDL.7364.10SATELLITE FAULT DETECTIONCROSS-REFERENCES TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 63 / 738,512 filed December 23, 2024, the entire contents of which are incorporated herein by reference.TECHNICAL FIELD
[0002] Aspects of the disclosure generally relate to systems and methods for failure prediction in vehicles and more specifically on satellites.BACKGROUND OF THE INVENTION
[0003] Diagnostics and prognostics for health monitoring of flight vehicles, such as satellites, is typically performed on the ground, remote from the vehicles, due to computational constraints of in-flight vehicles such as satellites. Currently, the flight industry generally flies on-board flight code with guaranteed execution time and reactive behavior.SUMMARY OF THE INVENTION
[0004] Using artificial intelligence or machine learning would be advantageous because doing so could create a more systematic way of doing fault protection, although trust is low in these solutions.
[0005] Presented herein are systems and methods for on-board fault prediction in vehicles, such as satellites. The system can include one or more vehicles in operation, such as a satellite orbiting a body. The vehicle can include a variety of sensors to collect information or data about the vehicle and / or the environment in which it operates. In some cases, one or more models on-board the vehicle can identify information as being related to a potential fault in the vehicle. The one or more models can identify the fault based on a constraint system of inputs and outputs as well as technical measures. The one or more models can provide an indication of the type of fault, a source of the fault, or a quantified rating of the fault, based on the information on-board the satellite. The system can further self-improve through communication with other vehicles, such as other satellites in communication with the satellite as a part of a constellation of satellites.Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.10
[0006] In a general aspect, a system for on-board fault detection in vehicles includes a vehicle including a plurality of sensors configured to provide sensor data about the vehicle and one or more processors configured to receive the sensor data from the plurality of sensors, receive control data associated with operation of the vehicle, generate, using one or more models, predicted values for one or more of the sensors based on at least one of the sensor data and the control data, determine a fault in the vehicle based at least in part on a comparison of the predicted values to measured values from the sensor data, and identify one or more components of the vehicle responsible for the fault.
[0007] Aspects may include one or more of the following features.
[0008] The one or more models may include a network of constraints interconnected at nodes, each constraint representing a model of a component or subsystem of the vehicle. The model may include a physical or analytical model of the component or subsystem of the vehicle.
[0009] Generating the predicted values may include propagating information forward through the netw ork of constraints from input nodes to output nodes to generate forw ard-propagated values and propagating information backward through the network of constraints from output nodes to input nodes to generate backward-propagated values.
[0010] Determining the fault may include generating residuals at the nodes by¬ comparing forward-propagated values, backward-propagated values, and sensed values and comparing the residuals to thresholds. The one or more processors may be further configured to propagate uncertainties associated with the sensor data and the models through the network of constraints, and the thresholds may be scaled based on the propagated uncertainties.
[0011] Identifying the one or more components responsible for the fault may include iteratively suspending one or more constraints in the netw ork of constraints, for each suspended constraint, re-propagating information through the network with the constraint suspended, and identifying a constraint as corresponding to a faulty- component w hen suspension of that constraint causes the residuals to return to nominal levels.
[0012] At least one constraint in the network of constraints may include a machine learning model trained to perform one or both of forward propagation and backward propagation. The machine learning model may include a neural networkAttomey Docket No 40093-151-AClient Docket No. CSDL.7364.10trained to predict output values from input values in forward propagation and predict required input values from observed output values in backward propagation.
[0013] The one or more models may include a probabilistic recurrent neural network configured to predict a probability density function over sensor variables at a next time step based on a history of the sensor data and the control data. The probabilistic recurrent neural network may generate parameters representing a Gaussian Mixture Model including mean vectors for a number of Gaussian components, covariance matrices for the number of Gaussian components, and weights for the number of Gaussian components.
[0014] Determining the fault may include comparing measured sensor values to the predicted probability density function and identifying a fault when the measured sensor values are statistically inconsistent with the predicted probability density function. Identify ing the one or more components responsible for the fault may include activating a network of constraints representing component models, iteratively suspending individual constraints in the network, and identifying which constraint suspension causes anomaly scores to return to nominal levels.
[0015] The vehicle may include a satellite orbiting a celestial body. The one or more processors may be located onboard the vehicle. The number of sensors may include at least two of: a tachometer, a magnetometer, a star tracker, a coarse sun sensor, a medium sun sensor, an accelerometer, a pressure transducer, or a thermistor.
[0016] The one or more processors may be further configured to transmit fault data identifying the fault and the one or more components responsible for the fault to a ground station. The system may further include a controller configured to receive fault data identify ing the one or more components responsible for the fault and, in response to the fault data, reconfigure operation of the vehicle to compensate for the fault by at least one of switching to a redundant component or generating a surrogate output for a faulty component using data from other components.
[0017] In another general aspect, a method for on-board fault detection in vehicles includes receiving sensor data from a number of sensors on-board a vehicle, receiving control data associated wi th operation of the vehicle, generating, using one or more models, predicted values for one or more of the sensors based on at least one of the sensor data and the control data, determining a fault in the vehicle based at least in part on a comparison of the predicted values to measured values from the sensor data, and identifying one or more components of the vehicle responsible for the fault.Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.10
[0018] In another general aspect, a non-transitory computer-readable medium stores instructions that, when executed by one or more processors on-board a vehicle, cause the one or more processors to receive sensor data from a number of sensors on the vehicle, receive control data associated with operation of the vehicle, generate, using one or more models, predicted values for one or more of the sensors based on at least one of the sensor data and the control data, determine a fault in the vehicle based at least in part on a comparison of the predicted values to measured values from the sensor data, and identify one or more components of the vehicle responsible for the fault.
[0019] At least one aspect of the present disclosure is directed to a system for onboard fault prediction in vehicles. The system can include a vehicle. The vehicle can include a set of sensors to provide on-board data about the vehicle. The system can include one or more processors coupled with memory. The one or more processors can receive the data from the vehicle. The one or more processors can determine, by one or more models, a prospective fault in the vehicle based on the data.
[0020] System health monitoring for satellites can be done in a systemic way by only modeling nominal behavior and detecting faults using artificial intelligence (Al) based algorithms such as those discussed herein. By using Al to detect faults, the system can achieve comparable or better diagnostics resolution with existing system data rather than needing multiple copies of the same data. For example, onboard systems typically have access to orders of magnitude more data than what can be transmitted to ground processing systems, therefore if Al or other forms of machine learning can be used within on-board fault detection system, then the robustness of the system can be increased and the system could quickly detect and isolate faults that ty pically required hours or days to determine due to the need to transmit data and then process it at a base station. Such a system would require an ability' to discriminate based on the mission phase or safety7criticality of an action, for example the system could execute initial responses based on a preliminary categorization of the proposed action but then go into safe mode if not a critical time in the operations.
[0021] System health monitoring for satellites ty pically focuses on fault diagnostics (i.e., determining whether something is already failing). Industry typically separates out prognostic (i.e., determining whether something could fail) at a component level. Using this approach, prognostics could be done on board with similar diagnostics algorithms such as those discussed herein. If these algorithms are used to do prognostics, modifications would need to be made to alter the detection thresholds to minimize a number of false alarms or missed detections, such as carrying out fault prognostics as a low priority task that does not interfere with criticalAttorney Docket No 40093-151-AClient Docket No. CSDL.7364.10operations or using machine learning to reduce the number of false alarms. For example, machine learning algorithms could assist with this optimization by identifying a clustering of false alarms and by interacting with Al algorithms to provide feedback to the base system. The value of such fault prognostics increases with the number of satellites or sensors used in the data collection, such as for example a constellation.
[0022] Other features and advantages of the invention are apparent from the following description, and from the claims.BRIEF DESCRIPTION OF THE DRAWINGS
[0023] FIG. 1 is a satellite orbiting Earth.
[0024] FIG. 2 is a satellite system.
[0025] FIG. 3 is an embodiment of a fault detector.
[0026] FIG. 4 is a propagation module.
[0027] FIG. 5 shows propagation and fault detection where no fault is detected.
[0028] FIG. 6 shows propagation and fault detection where a fault is detected.
[0029] FIG. 7 is another embodiment of a fault detector.
[0030] FIG. 8 is a comparison of anomaly detection using Gaussian mixture modeling and single Gaussian distribution.
[0031] FIG. 9 is a diagram of a system for on-board fault prediction in vehicles.
[0032] FIG. 10 is a constraint diagram for determining on-board faults in vehicles.DETAILED DESCRIPTION1 OVERVIEW
[0033] Referring to FIG. 1, a satellite 1000 orbits a celestial body 1001 (e.g., Earth) and communicates with a ground station 1002 via a communication link 1004. In general, the satellite 1000 transmits telemetry data, system health status information, and fault detection results, to the ground station 1002 using the communication link 1004.Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.10
[0034] As is described in greater detail below, the satellite 1000 includes on-board fault detection capabilities that analyze sensor data from multiple sensors on-board the satellite along with control data to detect, identify, and isolate faults in satellite components and subsystems. By performing fault detection on-board the satellite 1000 using the full range of available sensor data, the system can rapidly detect faults and take corrective action without the delays inherent in ground-based analysis. The on-board processing also enables the satellite to prioritize which data to downlink, making efficient use of limited communication bandwidth.
[0035] Referring to FIG. 2, the satellite 1000 includes a controller 2006, a fault detector 2008, sensors 2010 (e.g., a tachometer 2012 (TAC), magnetometer 2014 (MAG), star tracker 2016 (ST), and other sensors), actuators 2018 (e.g., reaction wheels 2020 (RWs), thrusters 2022, and gimbals 2024), and a communications system 2026. The controller 2006 receives sensor data from the sensors 2010 and generates commands for the actuators 2018 to maintain the satellite's desired state, such as attitude control, orbital maintenance, and payload operations. The fault detector 2008 receives both the sensor data from the sensors 2010 and the commands generated by the controller 2006 for the actuators 2018 and analyzes the consistency between commanded actions and measured sensor responses to identify faults in satellite components and subsystems.
[0036] When a fault is identified and isolated, the fault detector 2008 provides fault data to the controller 2006, which can then take corrective action such as reconfiguring operation of the satellite to compensate for the fault by, for example, switching to redundant sensors, adjusting control algorithms, creating virtual sensors through analytic redundancy, or entering a safe mode. The fault detector 2008 also communicates fault data via the communications system 2026 for transmission to ground stations, enabling ground operators to monitor system health and make informed decisions about satellite operations.2 FAULT DETECTOR
[0037] Referring to FIG. 3, in one example, the fault detector 2008 uses a stochastic constraint suspension algorithm that processes sensor data 3028 and control data 3029 to identify and isolate faults, which are output as fault data 3030. For example, the fault detector 2008 includes a propagation module 3032, a constraint network 3034, a thresholding module 3036, and a fault isolation module 3038.
[0038] Very generally, the fault detector 2008 implements a two-step process for fault detection and isolation. In the first step, the propagation module 3032 receivesAttorney Docket No 40093-151-AClient Docket No. CSDL.7364.10sensor data 3028 and control data 3029 (e.g., actuator commands) and propagates this information through the constraint network 3034 (as is descnbed in greater detail below). In some examples, the constraint network 3034 includes interconnected models (e.g., physical models) of satellite components and subsystems. The propagation module 3032 performs both forward propagation and backward propagation through the constraint network 3034, where information flows in both directions across constraints within the network. For example, actuator commands are propagated forward through component models to predict what sensor measurements should be observed at output locations, while sensor measurements are propagated backward through the same models to predict what the actuator commands and intermediate states should have been to produce those measurements.
[0039] This bidirectional propagation generates multiple predicted values at nodes within the constraint network 3034, where nodes represent interfaces between components. The propagation module 3032 generates residuals at these nodes by comparing the forward-propagated values, backward-propagated values, and actual sensed values. When these values are consistent with one another (within expected uncertainty bounds), the system is operating nominally. The thresholding module 3036 applies statistical hypothesis testing to these residuals to determine whether a fault is present.
[0040] If a fault is detected, the second step begins. In the second step, the fault isolation module 3038 iteratively modifies the constraint network 3034 by suspending individual constraints or sets of constraints (representing individual components or sets of components) and directs the propagation module 3032 to re-propagate sensor data 3028 and control data 3029 through the modified network. When suspension of a particular constraint or set of constraints causes all residuals to return to nominal levels, that component or set of components is identified as faulty.
[0041] The fault detector generates the fault data 3030, including for example an alert that a fault exists and a list of one or more components that were identified as faulty’.2.1 Step 1: Detection
[0042] The first step detects faults by propagating data through the constraint network 3034 and applying thresholds to the residuals generated by that propagation.Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.102.1.1 Propagation Module
[0043] Referring to FIG. 4, as mentioned above, when activated, the propagation module 3032 propagates sensor data 3028 and control data 3029 through the constraint network 3034 to generate residual data 4040. In some examples, the constraint network 3034 includes multiple constraints 4042, each representing a physical or analytical model of a satellite component or subsystem. The constraints 4042 are interconnected at nodes 4044, which represent interfaces between components where sensor measurements or analytical predictions can be compared. Information is propagated bidirectionally across the constraints 4042. Sensed input values enter the constraint network 3034 and are propagated forward through the constraints 4042 to generate propagated output values at downstream nodes 4044. Similarly, sensed output values are propagated backward through the constraints 4042 to generate propagated input values at upstream nodes 4044. At each node 4044, the propagation module 3032 compares the sensed value, the forward-propagated value, and the backward-propagated value to generate a residual.
[0044] In general, the propagation module 3032 not only propagates state values (e.g., torques, velocities) through the constraint network 3034, but also propagates uncertainties associated with sensor measurements and analytical models. Each constraint 4042 models both the nominal behavior of a component and the uncertainties inherent in that component's operation and measurement. As values are propagated forward and backw ard through the constraints 4042, the associated uncertainties are also propagated and accumulated, such that each predicted value at a node 4044 has an associated uncertainty distribution (e.g., a Gaussian distribution or Gaussian Mixture Model).
[0045] As a practical example, consider a reaction wheel subsystem. A commanded torque (control data) enters the constraint network 3034 at an input node 4044. This command is propagated forward through a first constraint 4042 that models the reaction wheel's torque generation, producing a predicted torque value at an intermediate node 4044. This torque value is then propagated forward through a second constraint 4042 that models the spacecraft's rotational dynamics, producing a predicted angular velocity at an output node 4044 where a gyroscope provides a sensed angular velocity’ measurement. The sensed angular velocity is propagated bac w ard through the rotational dynamics constraint 4042 to produce a backward-propagated torque value at the intermediate node 4044, and this torque value is propagated backward through the reaction wheel constraint 4042 to produce a backward-propagated command value at the input node 4044. At each node 4044, residuals are computed by comparing the forw ard-propagated, backward-propagated,Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.10and sensed values. If the reaction wheel is functioning properly, these values will agree within expected tolerances.
[0046] In some examples, one or more of the constraints 4042 in the constraint network 3034 are implemented as machine learning models rather than physics-based analytical models. For example, a constraint 4042 may be implemented as a neural network trained to model the input-output relationship of a particular component. Implementing constraints as machine learning models can be particularly advantageous for addressing the challenge of backward propagation (i.e., model inversion), which can be difficult to formulate analytically for complex component behaviors. A machine learning model can be trained to perform both forw ard propagation (predicting outputs from inputs) and backward propagation (predicting required inputs from observed outputs) by training on data with known input-output relationships and their associated uncertainties. This approach reduces the engineering effort required to develop analytical backward propagation models and enables the constraint network 3034 to incorporate components whose behavior is difficult to model analytically.
[0047] The residual values determined by the propagation module 3032 are output as residual data 4040.2.1.2 Thresholding Module
[0048] Referring again to FIG. 3, the thresholding module 3036 processes the residual data 4040 to determine whether a fault is present in the system. In some examples, the thresholding module 3036 performs statistical hypothesis testing on the residuals by comparing them against statistically significant thresholds. In general, the thresholds are not fixed values but are automatically scaled based on the propagated uncertainties in the system.
[0049] The thresholding module 3036 uses the propagated uncertainties from the propagation module 3032 to establish consistency checking thresholds at each node 4044. For example, if a node 4044 has high propagated uncertainty due to noisy sensors or imprecise models, the threshold at that node will be wider to account for the anticipated noise environment. On the other hand, nodes with low uncertainty will have tighter thresholds. When a residual at a node 4044 exceeds its statistically significant threshold (accounting for the propagated uncertainty), the thresholding module 3036 indicates that a fault has been detected. This approach enables the fault detector 2008 to distinguish between normal system variability7and true anomalies or faults.Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.102.1.3 Simple Example
[0050] Referring to FIG. 5, a simple example of propagation and fault detection illustrates the fault detection process when no fault is present. An input value of 1 enters a simple constraint network 5034 and is propagated forw ard through a first constraint 5042A that doubles the input, producing a forward-propagated value of 2. This value is then propagated through two parallel constraints (i.e., a second constraint 5042B that adds 3, and a third constraint 5042C that multiplies by 5) to downstream nodes where sensed values of 5 and 10 are measured, respectively. These sensed values are propagated backward through their respective constraints, both producing reverse-propagated values of 2 at an intermediate node 5044. As shown in the table, the forward-propagated value (2), the first reverse-propagated value (2), and the second reverse-propagated value (2) all agree at the intermediate node 5044, indicating consistency throughout the system. Since all propagated values match within expected tolerances, no residual exceeds the threshold and no fault is flagged. This demonstrates nominal system operation where the mathematical relationships modeled by the constraints accurately describe the physical system behavior.
[0051] Referring to FIG. 6, another simple example of propagation and fault detection illustrates the fault detection process when a fault is present. As in FIG. 5, an input value of 1 enters the constraint network 5034 and is propagated forw ard through the first constraint 5042A that doubles the input, producing a forward-propagated value of 2. This value is then propagated through the second constraint 5042B and the third constraint 5042C in parallel to downstream nodes. However, in this example, the second constraint 5042B (shown with bold border) is faulty and produces an incorrect sensed value of 0 instead of the expected value of 5. The third constraint 5042C continues to function properly with a sensed value of 10.
[0052] When the sensed values are propagated backward, the faulty upper branch produces a reverse-propagated value of -3 (computed by subtracting 3 from the erroneous sensed value of 0), while the properly functioning lower branch produces a reverse-propagated value of 2. As shown in the table, the forw ard-propagated value (2) and the second reverse-propagated value (2) agree with each other, but the first reverse-propagated value (-3) is inconsistent with both. This inconsistency generates a residual at the intermediate node 5044 that exceeds the statistical threshold, triggering a fault detection.Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.102.2 Step 2: Fault Isolation
[0053] Referring again to FIG. 3, when the thresholding module 3036 detects a fault, the fault isolation module 3038 systematically modifies the constraint network 3034 to identify which component or components are responsible for the fault. The fault isolation module 3038 employs a constraint suspension strategy, wherein individual constraints or sets of constraints are iteratively removed from the constraint network 3034, and the propagation module 3032 re-propagates the sensor data 3028 and control data 3029 through the modified network. After each modification, the thresholding module 3036 evaluates whether the residuals have returned to nominal levels. If suspension of a particular constraint or set of constraints causes all residuals to fall below their statistical thresholds, that constraint (and the component it represents) is identified as faulty.
[0054] In some examples, the fault isolation module 3038 employs various strategies to efficiently identify faulty components. For example, in a sequential approach, the fault isolation module 3038 removes constraints one at a time in a predetermined order (e.g., based on component criticality or failure hi stor ) until the faulty component is identified. This approach scales linearly with the number of components in the constraint network.
[0055] In another example, the fault isolation module 3038 employs a localization strategy that prioritizes constraints based on proximity to nodes exhibiting the largest residuals. For example, if nodes 4044 in a particular region of the constraint network 3034 show high residuals while other regions remain nominal, the fault isolation module 3038 first tests constraints 4042 adjacent to those high-residual nodes. This reduces the search space and accelerates fault identification by focusing on the most likely fault locations.
[0056] In yet another example, the fault isolation module 3038 employs a hierarchical approach that leverages the hierarchical structure of the constraint network 3034. Rather than testing individual low-level component constraints, the fault isolation module 3038 first suspends higher-level subsystem constraints to narrow dow n which subsystem contains the fault, then progressively tests finer-grained components within the identified faulty subsystem. This divide-and-conquer strategy7significantly reduces the number of iterations required for complex systems with many components.
[0057] In some examples, the fault isolation module 3038 may identify multiple simultaneous faults or common-cause faults by testing combinations of constraints. For example, if suspending a single constraint does not restore all residuals to nominalAttorney Docket No 40093-151-AClient Docket No. CSDL.7364.10levels, the fault isolation module 3038 may test pairs or sets of constraints to identify multiple concurrent failures or cascading fault effects.2.3 Machine Learning State of Health
[0058] Referring to FIG. 7, in other examples, the fault detector is implemented as a machine learning-based state of health algorithm (ML-SOH). The fault detector 7008 of FIG. 7 uses a machine learning approach rather than the constraint-based approach shown in previous figures. The fault detector 7008 includes an ML-SOH module 7032 and a thresholding module 7036. The ML-SOH module 7032 receives sensor data 7028 and control data 7029 as inputs and generates predictions with quantified uncertainty. The thresholding module 7036 compares the predictions against actual sensor measurements to identify anomalies, outputting fault data 7030 when discrepancies exceed defined thresholds.
[0059] In some examples, the ML-SOH module 7032 is formulated as a probabilistic recurrent neural network that models sensor and actuator time series data. The neural networks use the observed history’ of sensor data 7028 and control data 7029 to predict future sensor readings, with quantified uncertainty. Sensor readings that are incompatible with the probabilistic predictions are flagged as anomalies by the thresholding module 7036. This machine learning approach scales well to large numbers of sensors with minimal additional engineering effort and can run in real time after training.
[0060] At each time step, the probabilistic recurrent neural network of the ML-SOH module 7032 predicts a probability7densify function (PDF) over all the modeled sensor and actuator variables at the next time step. The recurrent layer of the neural network allows it to condition the predicted PDF on the full history of observations from the sensor data 7028 and control data 7029, not just the most recent observation. The neural network can use all sensor inputs to predict the full joint probability densify function over all the sensor outputs.
[0061] To handle multimodal or asymmetric uncertainties in the underlying sensor data 7028. the ML-SOH module 7032 can implement a Gaussian Mixture Model (GMM) to represent the system uncertainty. In such an algorithm, the neural network outputs parameters representing M mean vectors, covariance matrices, and weights, conditioned upon the previously observed inputs describing the Gaussian components that make up the overall PDF.
[0062] This uncertainty7model is a w eighted sum of individual Gaussian distributions and can represent arbitrary7continuous probability7distributions, givenAttomey Docket No 40093-151-AClient Docket No. CSDL.7364.10enough modes. The GMM can trivially represent multimodal distributions but can also represent asymmetric or long-tailed unimodal distributions by stacking Gaussian distributions on top of each other.
[0063] The thresholding module 7036 receives the predicted probability¬ distributions from the ML-SOH module 7032 and compares them against actual sensor measurements from the sensor data 7028. In some examples, the thresholding module 7036 calculates a fault anomaly score that quantifies the discrepancy between the predicted distribution and the observed measurement. When the fault anomaly¬ score exceeds a defined threshold, the thresholding module 7036 generates fault data 7030 indicating a detected anomaly. The threshold can be tuned to balance the tradeoff between false positives and missed detections.
[0064] In some implementations, the GMM approach implemented in the ML-SOH module 7032 provides significant improvements in anomaly detection accuracy. For example, referring to FIG. 8, anomaly detection results for the GMM versus a standard Gaussian model for medium sun sensor (MSS) current measurements over several orbits demonstrate substantial performance gains. Using a GMM reduces the false positives observed on current measurements during eclipse by approximately a factor of ten. Further tuning of the fault anomaly score threshold in the thresholding module 7036 can reduce the number of false positives even further.
[0065] In some examples, alternative distance metrics, such as the Mahalanobis distance, are implemented in the thresholding module 7036 to reduce false positives and can be calculated significantly faster than the GMM approach. However, the Mahalanobis distance requires careful tuning of the threshold in each individual channel, which reduces the flexibility and portability- of the system to different systems. While the Mahalanobis distance is simple to define for the Gaussian output model, there are multiple plausible ways to define it for the Gaussian mixture model. In addition, the Mahalanobis distance may struggle to detect performance degradation faults because the degradation becomes built into the covariance over time. The GMM approach therefore provides a more robust and adaptable solution for fault detection across diverse satellite systems and operating conditions.
[0066] In some examples, to address issues with the Mahalanobis distance as discussed above, various alternative anomaly detection metrics may be implemented. Two categories of improvements may be considered. First, adjustments may be made to the original metrics to better reflect the expected types of errors. These adjustments may include considering a sliding window of anomaly scores and considering sub-Attomey Docket No 40093-151-AClient Docket No. CSDL.7364.10system level faults in addition to channel-level faults. Secondly, additional metrics may be implemented.
[0067] In some examples, the anomaly detection metrics that are considered include a p-value metric (i.e., the probability of obtaining a value less than or equal to the true value, given the predicted distribution), a proxy Mahalanobis distance metric (i.e., selecting the closest mean in the GMM and calculating the Mahalanobis distance, with thresholding to determine fault), a difference of values metric (i.e., calculating the difference in neighboring values), a negative log likelihood metric (i.e., the negative log likelihood of the true data given the GMM), smoothed metrics (i.e., for each of the given metrics, considering a sliding window around a given timestep), and sub-system level metrics (i.e., for each of the given metrics, only considering channel-level faults if faults are also detected at a sub-system level).
[0068] In some examples, the anomaly detection metric that is ultimately used combines three of these options: difference of outputs, negative log likelihood, and smoothed metrics. First, the negative log likelihood is calculated for each channel in the model output at each timestep (NLLt). Then, the difference between neighboring values is calculated at each timestep:dNLLt= NLLt— NLLt_^
[0069] A moving average smooths the values over a given timestep window w:
[0070] Finally, the ratio of the raw difference value to the moving average is thresholded to determine fault:dNLLt- — - - > threshsmooth(dNLLt)
[0071] The threshold for each channel is then selected based on examination of nominal trajectory results. Training models with more realistic data and additional tuning can result in decreased time to recognize the faults.3 ALTERNATIVE EMBODIMENTS AND DETAILS
[0072] Spacecraft employ fault detection and isolation (FDI) mechanisms in order to maintain system reliability and availability of a space asset. These mechanisms are traditionally based on hardware redundancy or analytic redundancy to address fail-over situations. FDI approaches generally fall into three categories: sensor-based,Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.10model-based (MB), and knowledge-based. Each method depends on traditional state of health (SOH) sensors and, in general, do not take into account potentially valuable information from other mission system sensors and measurands.
[0073] A typical satellite generates many thousands of measurands from a vast array of sensors, such as voltmeters, accelerometers, pressure transducers, thermistors, etc. Simple ground-based SOH systems sample the telemetry from specific sensors to pull out only a few measures considered characteristic of satellite performance, and generally alarm when measures fall above or below set thresholds. The relationship between health states and measurands is not well defined and does not result in optimal performance across platforms, satellites, etc. Additionally, subtle patterns in the telemetry may be lost. While some systems were developed to monitor the full range of measurands and note these subtle patterns, patterns had to be identified ahead of time and were not amenable to on-board changes in state of the vehicle or environment.
[0074] To address these and other technical problems, a system is described herein for in-flight detection and prediction of faults in vehicles. The systems and methods described herein can utilize constraint suspension to detect and isolate faults occurring in a system by using nominal models of system components and interconnections to check for consistency between sensor measurements. By developing models of nominal system behavior, source knowledge of system components can be catalogued. No models of failure modes or off-nominal behavior are required. Unlike some systems, constraint suspension as described herein is capable of detecting faults both anticipated and unanticipated. The nominal models are not restricted by assumptions of linearity, continuity, or differentiability. The component models can be algebraic, algorithmic, lookup tables, or even high-fidelity' simulations unto themselves. Moreover, these models may contain both analog and discrete information. By propagating system information through component models to search for inconsistency, faults can be identified earlier than simple thresholds around expected individual component telemetry.
[0075] Further, once a fault is detected, the system and methods described herein can include an isolation system that systematically examines components and combinations of components until a hypothesis regarding the cause of the fault is not disproven. The approach allows both single and multiple cause faults to be considered. This includes the possibility of detecting common cause faults between pieces of redundant hardware (e.g., failures caused by sensor firmware bugs). In order to reduce computational complexity and simplify the process of system design, models can be connected hierarchically. Exploiting this capability permits the user toAttomey Docket No 40093-151-AClient Docket No. CSDL.7364.10model systems, subsystems, and components at varying levels of fidelity without sacrificing feasibility or diagnostic resolution.
[0076] The systems and methods described herein identify variables, models, and workflows that support autonomous fault prognostics and diagnostics on-board a satellite platform. Neighboring constellation platforms provide additional evidence through direct communications. The systems and methods described herein provide methods to assess individual platform data for fault information, models that enable fault prediction and diagnostics, workflows to rank data sources for detailed ground reviews, data sharing of platform & payload SOH knowledge across the common relevant operating picture (CROP), and allocations for functions performed on-board and on ground, among others.
[0077] Through the integration of various machine learning models and systems, a combination of inputs and outputs can determine and predict faults in in-flight vehicles, such as spacecraft failure faults or space radiation events, among others. Further, the systems and methods described herein can reduce computational power associated with determination of these faults, by communicating with nearer vehicles as opposed to communicating all information to Earth for calculations to be performed. In this manner, bandwidth, computational power, and latency can be reduced across vehicle systems, such as satellite system orbiting in a low-earth environment. In some examples, the machine learning approaches to fault detection may reduce computational power relative to the nominal constraint suspension algorithm, regardless of whether communication with nearby vehicles or the ground is performed. This is because the iterative nature of constraint suspension (remove a component / constraint and re-propagate sensed values) can be computationally expensive depending on the constraint or network, whereas machine learning can simplify and reduce these complexities.
[0078] Further, autonomous machine learning (ML) approaches can be included in the systems and methods described here. Autonomous machine learning approaches can identify onboard data used in SOH assessment not available to ground operators, better monitor SOH using all available measures, begin to predict failures and remaining useful life (RUL) of components, assign causes to those failures, perform onboard SOH processing in order to reduce ground operator intervention, improve overall space situational awareness, and provide real-time reactive control system inputs, among others.
[0079] Integrating online system health monitoring and prognostics onboard the spacecraft in a constellation yields a number of advantages that would not beAttomey Docket No 40093-151-AClient Docket No. CSDL.7364.10available if those functions were performed solely on the ground. These advantages primarily stem from several factors: reduced overall system latency, larger pool (and greater diversity) of input information, improved data sharing, and enhanced mission management effectiveness. Ground telemetry is limited by the frequency of ground downlink opportunities and limited bandwidth of ground downlink. The advantages of onboard analysis combine to allow faults to be identified and isolated sooner than would otherwise be available. In addition, the greater quantity of data available has the potential to improve vehicle health monitoring (VHM) fault detection and isolation (FDI) statistics (i.e., probability of missed detection, probability of false alarm), the granularity of isolation capability (i.e., more available data between smaller system components can lead to more system components that can be diagnosed), and improved prognostic accuracy and granularity.
[0080] FIG. 9 depicts a diagram of a system 100 for detecting faults in in-flight vehicles. The system 100 can include a data processing system 105. The data processing system 105 can include memory storage, such as a database. The data processing system 105 can include technical measures 115, models 110A-N. and faults 120. The system 100 can include a vehicle 125. The vehicle 125 can include sensors 135 and data 130. In some cases, the fault processing system is communicatively coupled with the vehicle 125. For example, the data processing system 105 can be onboard the vehicle 125 and configured to communicate (i.e., via fiber optics, copper wires, network communications, etc.) with the vehicle and its subcomponents. Embodiments may comprise additional or alternative components or omit certain components from those of FIG. 9, and still fall within the scope of this disclosure.
[0081] The vehicle 125 is any device capable of movement, orbit, or housing devices. In some cases, the vehicle 125 is a spacecraft, such as a satellite or rocket. The vehicle 125 can have onboard a variety of sensors 135.
[0082] The sensors 135 can detect or identify data 130 about the vehicle 125 or the environment in which it operates. The sensors 135 can include tachometers (TAC), magnetometer (MAG), star tracker (ST), coarse sun sensor (CSS), and medium sun sensors (MSS), among others.
[0083] The data processing system 105 can include any hardware or software capable of receiving the data 130 and determining a fault 120 of the vehicle 125 based on the data 130. The data processing system 105 can include the models 110A-N. In some cases, the models 110 detect the faults 120 as compared to one or more technical measures 115 (or technical performance measures).Attomey Docket No 40093-151-AClient Docket No. CSDL.7364.10
[0084] The system can include one or more technical measures (TPMs) 115. The (TPMs) that describe Measures of Effectiveness (MOEs) at the mission level, and Measures of Performance (MOPs) at the platform level, are mapped to representative objective functions including system latency, data sharing, resource utilization and mission management effectiveness, among others.
[0085] The system can include one or more models 110A-N (also referred to herein as the “model(s) 11 O’"). In some cases, the models 110 can be or include machine learning models. Each model of the models 110 can include different strengths and weaknesses of each of select model used for autonomous, onboard vehicle health monitoring (VHM). In some cases, certain models of the models 110 can be more suited for certain subsets of data 130.
[0086] The system can include the data 130. In some cases, data 130 is data that is available from the vehicle 125. In some cases, the data 130 can be defined to provide SOH information. In some cases, the data 130 can include data beyond SOH information, such as data collected by sensors not directed towards the health of the vehicle 125. The data 130 can include non-traditional platform data, labeled data, or data sources with their operating environments (healthy, single system stress, platform failure, etc.), among others.
[0087] For example, the data 130 can include a subset of the data produced by on-orbit and simulated satellites . For example, the data 130 can include categories of data, such as inputs, such as data provided to flight software. This can include sensor data and ground commands. Categories of data can include outputs, such as commands generated by flight software, actuator commands and informing the GPS of spacecraft attitude. Categories of data can include telemetry', such as diagnostic data generated by the vehicle 125 or the data processing system 105. This data provides diagnostic state information of the vehicle 125. The state information can include operating modes, sensor processing telemetry, attitude and rate filter telemetry, and Vehicle Health and Safety Monitoring (VHSM) telemetry.
[0088] The data 130 can be characterized. In some cases, the data processing system 105 characterizes the data 130. The data 130 can be characterized according to various attributes, such as Volume, Velocity, Variety, Veracity, and Value. In some cases, the sensors 135 provide the data 130 as lightweight messages at high frequency, where lightweight means a small number of structured or unstructured data fields within each file.
[0089] The data 130 can include the following example data:Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.10<
[0090] Each model 110 can be trained on a respective or overlapping subset of the data 130. In some cases, one or more of the models 110 can be trained off of a subset of the data 130 over a period of time. For example, the model 110A can be trained off of a subset of data related to SOH information of various satellites over a period of time. For example, the model 110A can be trained off of a subset of the data from one or more sensors in one or more vehicles over a period of time. In some cases, the models 110 can take as input for training nominal analytical models of subsystem and component performance onboard the vehicle 125.
[0091] Each model 110 can utilize a respective subset of the data 130 to determine or predict the SOH of satellite systems (e.g., the vehicle 125). For example, given limited data 130 (few hours of telemetry data from a couple of sensors) and it is required that the prediction be made in near real-time onboard, decision tree-basedAttorney Docket No 40093-151-AClient Docket No. CSDL.7364.10ensemble approaches such as gradient boosting or adaptive boosting decision trees could be used as one or more of the models 110. For example, given a larger set of the data 130. (weeks to months) from more sensors (tens to hundreds) and GPUs, but have no prior knowledge on what features are relevant, neural network approaches are suitable for one or more of the models 110 to not only perform health prognostics but also to leam features that are important for health monitoring and prediction. In contrast, if very limited data 130 is available, the models 110 can include unsupervised learning algorithms can be applied to perform anomaly detection, in order to detect system behavior that doesn’t fit the normal operation condition.
[0092] Machine learning (ML) and data analytics (DA) employed at the platform level provide an opportunity to improve MOPs while enhancing overall mission effectiveness at the constellation level. Smarter space assets will be able to continuously monitor system SOH and perform DPHM to autonomously achieve a desired state independent of ground resources. Using onboard ML, the systems and methods described herein can enable continuous DPHM and autonomous system control that incorporates not just the SOH data from sensors, actuators, payloads, etc. but also the engineering data to effect system state changes and provide rapid responses to internal problems as well as external threats. These improvements, along with data sharing across the constellation, provides contributions at the platform level to be shared with the common relevant operating picture (CROP) across the constellation.
[0093] The data processing system 105 can, with the models 110, assess the data 130. Assessing the data 130 can include both direct and indirect methods because the data are both high dimensional and temporal. In low dimensions, modeling the relationship among independent variables (xi) is straightforward, when the number of observations (n) » (i). When that assumption is violated, modeling the relationship becomes ill conditioned. The covariance matrix become sparse or populated by values with multiple orders of magnitude difference. The models 110 can reduce the variable dimensionality by selecting a subset of the data, project the data into a smaller domain, and use non-linear non-statistical models, among others.
[0094] The data processing system 105 can review the data 130 upon its assessment. The data processing system 105 can quantify the extent that variations in the SOH and sensor messages relate to health. The initial analyses are performed individually, followed by coupled and multiple sensors’ impact as diagnostics. The models 110 can aggregate platform orbital characteristics into the analysis. The models 110 can, in addition to quantifying utility, analyze the prognostics for how theAttomey Docket No 40093-151-AClient Docket No. CSDL.7364.10changes in SOH and sensor magnitude predict future changes in sensor state, particularly failures.
[0095] The models 110 can determine relationships among the various onboard vehicle data 130. For example, in some cases a subset of the data 130A can predict another subset of the data BOB. The models can analyze spatial relationships to separate subsystem health issues from regional platform issues. One possible outcome is, for example, a surrogate or suite of surrogates that allow continued operations even when the direct sensor measurements are unavailable. Another possible outcome is ranking the data 130 for downlinking. This work also provides insights for developing inter-platform communications that confirm / refute candidate causes for unexpected changes in health.
[0096] In some cases, one or more of the models 110 can include prognostic / diagnostic capabilities. In some cases, one or more of the models 110 can evaluate an outcome of another model of the model 110 against one or more technical measures 115. In some cases, the one or more models 110 can evaluate a TPM score (or relative placement) for each TPM for each model 110 based on the data 130. The one or more models 110 can process the data 130 from massively dispersed runs of the candidate algorithms across all reference scenarios to evaluate absolute and relative TPM scoring that can inform fault detection.
[0097] Using classification and prediction for both labeled and unlabeled data, the models 110 can determine which technical measures 115 best describe SOH performance. The models 110 can quantify relationships between non-traditional, labeled, data sources with their operating environments (healthy, single system stress, platform failure, etc.). The data processing system 105 can provide data analytics for assessing datastreams as fault information. Furthermore, the data processing system 105 can enable the generation of the models 110 from various data sources, such as the data 130 and the technical measures 115. The one or more models 110 can determine metrics associated with the identified faults 120 such as detection effectiveness, probability7of false alarm, missed detection, isolation effectiveness, diagnostic resolution, i.e., ability to narrowly isolate the underlying cause of a fault, probability of correct isolation, remaining use life prediction, accurately characterize time remaining in which a component remains viable, characterizing performance trends within requirements that can support configuration tuning and longevity enhancement (e.g.) understanding long term actuator bias trends can allow those biases to be planned for in orbit maintenance.), among others.Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.10
[0098] In some cases, one or more models of the models 110 can select, curate, or otherwise process the data 130 to provide as inputs to other models of the models 110. For example, a first model 110A can select relevant data threads to provide to a second model 110B. In this manner, the models 110 can build time series models for healthy and known unhealthy conditions. The evidence accumulates with correlated predictions across the various models expected outcomes. Models outside agreement provide evidence for unexpected and possibly anomalous conditions that need to be further explored. For machine learning, this translates to anomaly detection and unsupervised learning paradigms. On-board diagnostics allow for low-latency, data rich evidentiary' exploration to provide the models 110 with the data 130. With the data 130, one or more models 110 can determine relationships between the data 130 and the faults 120, such as if the anomaly is associated with a particular platform function, if the anomaly is associated with a particular platform area, if the anomaly is associated with an external factor, if the anomaly is growing at any rate.
[0099] The data processing system 105 can determine agreement among the sensors 135 based on the data 130. In some cases, the data processing system 105 can determine if a similar anomaly occurring across platforms (e.g., other vehicles communicatively coupled with the vehicle 125).
[0100] In some cases, the system 100 can include more than one vehicle 125. For example, multiple vehicles 125 may include a data processing system 105, or may communicate with one or more data processing systems 105. The data processing system 105 may receive the data 130 from a variety of the vehicles 125. In some cases, a plurality of the vehicles 125 can be a constellation, such as a constellation of intercommunicating satellites. In this manner, the systems and methods described herein have the ability' to expand the scope of fault detections from the sensor level of an individual satellite to the platform level of a small constellation using the data 130 available.
[0101] This availability of data 130 from various vehicles 125 can be enhanced through direct inter-satellite links (ISL) in the constellation. These advantages can include1) Heterogeneous capability across constellation2) Higher bandwidth, lower latency communication among spacecraft vs. direct to ground3) Look-ahead capability4) Coordinated autonomous responseAttorney Docket No 40093-151-AClient Docket No. CSDL.7364.10
[0102] In the case of (1), those vehicles in the constellation with additional computing resources to source more intensive system health monitoring and prognostics onboard support low-latency aggregated multi-platform processing.
[0103] For (2), allowing all or nearly all of the high bandwidth data available within each satellite's flight computer to be shared across the constellation enables a more capable onboard constellation-level data collection capability. Sharing strategies vary' from complete between every individual spacecraft with every other spacecraft, partial (i. e. , with nearest neighbors), or converging (i.e., spacecraft share information with neighbor spacecraft and pass on their neighbors’ information to other neighbors, eventually giving all vehicles a complete picture of the constellation state). The advantages of (2) are analogous to those in (1). In addition, the network communication provides an extra layer of robustness that are separate to those supporting ground communication.
[0104] Further, as described in (3), a constellation can create a look-ahead capability with lower latency to diagnosis and response than any similar capability with the ground in the loop. For example, a vehicle is damaged by debris. Intersatellite communication via. ISL during the damage event allows the other platforms to correctly assess that damage occurred and to correctly isolate the causes. The process alerts vehicles in similar orbits (i.e., in imminent danger of the same debris) to take evasive action or to enter safing modes / protect instruments ahead of reaching the likely debris. A cache of intersatellite communicated data from the satellite that w as damaged could also be downlinked from a different satellite in order to more thoroughly assess the root cause of any issues (i.e., as opposed to being permanently unavailable due to the failure of the damaged vehicle ahead of its next ground pass).
[0105] Direct communication between vehicles in the constellation enables coordinated autonomous response (4) in which vehicles coordinate collective action in response to a converged upon agreement. For example, in the scenario where evasive action is required, a coordinated action plan could result in vehicles selecting avoidance maneuvers that are guaranteed to not collide with one another or that maximize the likelihood of some vehicles in the constellation surviving (e g., maximizing physical separation due to uncertain orbit parameters of the debris). Another type of coordinated response could involve autonomously configuring network paths through inter-satellite communication to restore ground connectivity if communication links to the ground are interrupted for some of the vehicles in the constellation.Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.10
[0106] In some cases, one or more of the models 110 can intake health and status data 130 to predict a fault 120 such as sensor failure with low frequency data with long prediction times, while the components perform with small variance in their lifecycles and a smooth degradation surface. In some cases, one or more of the models 110 can utilize time series analyses to quantify the extent that a component’s health and status data predict its health. For example, a model of the model 110 including a recurrent neural network, such as long short-term memory (LSTM) network, can characterize the predictive capacity. The example model can compare the component’s health and status data to other locally relevant, space and / or kind, to characterize their utility' as a surrogate to predict component health and status.
[0107] In some cases, the models 110 can adjust or create the technical measures 115. The technical measures 115 can include metrics and indicators. In some cases, the models 110 can characterize the utility' of system 100 and sensor health metrics (e.g., from the sensors 135). The metrics and indicators are multiple single value parameters derived directly from the health and status data 130 or through their datastream analysis.
[0108] The models 110 can perform validation of the technical measures 115 and / or the data 130 and any relationships derived thereof. In some cases, the models 110 can quantify data information for generation of other models 110. In some cases, the data processing system 105 can train a subset of the models 110 based on the received data 130, the technical measures 115. and / or other models 110 to detect one or more faults 120. For example, the models 110 can exploit deterministic and stochastic functions to assess attribute information content to support prediction and classification in cases of normal and failure operations.
[0109] In some cases, the models 110 or a subset thereof can include a constraint algorithm, such as the Stochastic Constraint Suspension (StCS) algorithm. The StCS algorithm can use one or more models (e.g., the models 110A and HOB) to perform fault detection and isolation. Fault detection can begin with the acquisition of sensor data 130 of the vehicle 125 and the propagation of this data through a physics-based system model. For example, a first model 110A of the models 110 can acquire the data 130 from the sensors 135 and propagate this data through the physics-based system model 110A. The model 110A can include smaller physical models each referred to as a constraint. Information can be propagated in both directions across a constraint, generating multiple values to compare at the interfaces, or nodes, of each constraint. The residual between propagated values can be compared at each node, and stochastic hypothesis testing is performed. If the residual between propagated values exceeds a statistically significant threshold, the model 110A can indicate aAttomey Docket No 40093-151-AClient Docket No. CSDL.7364.10fault, prompting the system to diagnose the issue further via a fault isolation process through a second model HOB.
[0110] While some implementations of StCS may individually model the subsystem data paths with the states and uncertainties propagated forwards and backwards through the model and compared at each node, the model 110A can leam and adapt for the forward and backward propagation models. In this manner, computational requirements can be lessened. Without usage of machine learning models 110, models for forward and backwards propagation of the states and uncertainties must be identified, which can be particularly challenging for the uncertainties and backwards propagation of the states. Further, the thresholds for comparing the propagated sensor values are tuned for a given subsystem, which can require significant development effort. By implementing StCS with machine learning models 110 (i.e., the model 110A), forward and reverse uncertainty propagation models can be developed by analyzing results of forward and reverse constraints operating on data 130 with known dispersions and uncertainty characteristics.
[0111] In some cases, the models 110 such as the model 110A can include probabilistic recurrent neural networks to model sensor and actuator time series data. The neural networks can use the observed history of sensor and actuator data to predict future sensor readings, with quantified uncertainty. Sensor readings that are incompatible with the probabilistic predictions can be flagged as anomalies. The model 110A can include neural -network models that execute to produce fast, but highly accurate results for forward constraints, backward constraints, and forward / backward uncertainty propagation.
[0112] In some cases, at each time step, the model probabilistic recurrent neural network predicts a probability density function (PDF) over all the modeled sensor and actuator variables at the next time step. The recurrent layer of the neural network allows it to condition the predicted probability density' PDF on the full history of observations, not just the most recent observation. The neural network can use all sensor inputs to predict the full joint probability density function over all the sensor outputs.
[0113] In some cases, to handle multimodal or asymmetric uncertainties in the underlying sensor data, one or more of the models 110 can implement a Gaussian Mixture Model (GMM) to represent the system uncertainty. The one or more models can include a neural network with output parameters representing mean vectors, covariance matrices, and weights, conditioned upon the previously observed inputs.Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.10
[0114] In some cases, the models 110 can accept analog or discrete inputs. In some cases, in an analog system including uncertainties, state uncertainties are propagated across constraints along with the system states by the models 110. By modeling the uncertainties of sensor and analytical measurements, this example system can perform consistency checking thresholds that are automatically scaled for the anticipated noise environment.
[0115] The model 110B can perform a fault isolation process to identify the specific component or sensor responsible for the fault 120. The model HOB can include a constraint suspension strategy, iteratively removing each constraint, observing the system residuals, and identifying which component suspension causes the residuals to return to nominal levels. If once a component is removed, all residuals return to nominal levels, the component can beflagged as faulty.
[0116] In some cases, certain models 110 can detect and isolate faults in a system using only a nominal system model. For example, there may be no need to model off-nominal behavior or to predict failure modes of components a priori. This function can be extended to detecting degrading components (i.e., prognostics) by tightening detection thresholds. Faults can be detected / isolation in sensors, actuators, subsystems, etc. In some cases, certain models 110 can isolate a single fault scales linearly with N, the number the components in the model. In some cases, certain models 110 can detect and isolate multiple faults and common cause faults. In some cases, certain models 110 can define faults. For example, certain models 110 can define faults 120 in terms of failing to satisfy required performance requirements, enabling methodical comparisons of data from substantially disparate sources.
[0117] In some cases, certain models 110 can provide for performance of tuned fault monitoring to achieve a risk posture with respect to probability of false alarm and probability of missed detection. This tuning can be throughout the system or higher / lower for individual components. In some cases, the models 110 can be parallelized made to operate on hierarchically defined components / subsystems in order to further reduce computation need.
[0118] FIG. 10 shows an example constraint diagram. In this example, the Z-axis angular velocity prediction was generated by propagating reaction wheel commands from node 3 through a series of physical constraint models. Ultimately, component 11 integrates reaction wheel torque to approximate the resulting spacecraft Z-axis angular velocity at node 24 where hypothesis testing is performed against the gyroscope measurement. In the figure, the orange line represents the gyroscope measurement, the solid blue line represents the STCS forward propagation, and theAttomey Docket No 40093-151-AClient Docket No. CSDL.7364.10doted blue line indicates the range of sensor measurements that STCS would not flag a fault. Since the sensed and propagated values closely match at all timesteps, no fault is flagged.
[0119] Through these systems and methods, autonomous, on-board platform and constellation SOH monitoring, detection, diagnostics and prognostics are provided. The systems and methods described herein can identify data sets and algorithms for SOH assessments.4 IMPLEMENTATIONS
[0120] The approaches described above can be implemented, for example, using a programmable computing system executing suitable software instructions or it can be implemented in suitable hardware such as a field-programmable gate array (FPGA) or in some hybrid form. For example, in a programmed approach the software may include procedures in one or more computer programs that execute on one or more programmed or programmable computing system (which may be of various architectures such as distributed, client / server, or grid) each including at least one processor, at least one data storage system (including volatile and / or non-volatile memory' and / or storage elements), at least one user interface (for receiving input using at least one input device or port, and for providing output using at least one output device or port). The software may include one or more modules of a larger program, for example, that provides services related to the design, configuration, and execution of data processing graphs. The modules of the program (e.g., elements of a data processing graph) can be implemented as data structures or other organized data conforming to a data model stored in a data repository.
[0121] The software may be stored in non-transitory form, such as being embodied in a volatile or non-volatile storage medium, or any other non-transitory medium, using a physical property of the medium (e.g., surface pits and lands, magnetic domains, or electrical charge) for a period of time (e.g., the time between refresh periods of a dynamic memory device such as a dynamic RAM). In preparation for loading the instructions, the software may be provided on a tangible, non-transitory medium, such as a CD-ROM or other computer-readable medium (e.g., readable by a general or special purpose computing system or device), or may be delivered (e.g., encoded in a propagated signal) over a communication medium of a network to a tangible, non-transitory medium of a computing system where it is executed. Some or all of the processing may be performed on a special purpose computer, or using special-purpose hardware, such as coprocessors or field-Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.10programmable gate arrays (FPGAs), dedicated, application-specific integrated circuits (ASICs), or graphics processing units GPUs (e.g., for efficient execution of large language models or other machine leaming / artificial intelligence models). The processing may be implemented in a distributed manner in which different parts of the computation specified by the software are performed by different computing elements. Each such computer program is preferably stored on or downloaded to a computer-readable storage medium (e.g., solid state memory or media, or magnetic or optical media) of a storage device accessible by a general or special purpose programmable computer, for configuring and operating the computer when the storage device medium is read by the computer to perform the processing described herein. The inventive system may also be considered to be implemented as a tangible, non-transitory medium, configured with a computer program, where the medium so configured causes a computer to operate in a specific and predefined manner to perform one or more of the processing steps described herein.
[0122] A number of embodiments of the invention have been described.Nevertheless, it is to be understood that the foregoing description is intended to illustrate and not to limit the scope of the invention, which is defined by the scope of the following claims. Accordingly, other embodiments are also within the scope of the following claims. For example, various modifications may be made without departing from the scope of the invention. Additionally, some of the steps described above may be order independent, and thus can be performed in an order different from that described.
Claims
Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.10WHAT IS CLAIMED IS:
1. A system for on-board fault detection in vehicles, comprising:a vehicle including a plurality of sensors configured to provide sensor data about the vehicle;one or more processors configured to:receive the sensor data from the plurality of sensors; receive control data associated with operation of the vehicle; generate, using one or more models, predicted values for one or more of the sensors based on at least one of the sensor data and the control data;determine a fault in the vehicle based at least in part on a comparison of the predicted values to measured values from the sensor data; andidentify one or more components of the vehicle responsible for the fault.
2. The system of claim 1, wherein the one or more models include a network of constraints interconnected at nodes, each constraint representing a model of a component or subsystem of the vehicle.
3. The system of claim 2 wherein the model includes a physical or analytical model of the component or subsystem of the vehicle.
4. The system of claim 2, wherein generating the predicted values includes:propagating information forward through the network of constraints from input nodes to output nodes to generate forward-propagated values; andpropagating information backward through the network of constraints from output nodes to input nodes to generate backward-propagated values.Attomey Docket No 40093-151-AClient Docket No. CSDL.7364.
105. The system of claim 4, wherein determining the fault includes:generating residuals at the nodes by comparing forward-propagated values, backward-propagated values, and sensed values; and comparing the residuals to thresholds.
6. The system of claim 5, wherein the one or more processors are further configured to propagate uncertainties associated with the sensor data and the models through the network of constraints, and wherein the thresholds are scaled based on the propagated uncertainties.
7. The system of claim 5, wherein identifying the one or more components responsible for the fault includes:iteratively suspending one or more constraints in the network of constraints;for each suspended constraint, re-propagating information through the network with the constraint suspended; andidentifying a constraint as corresponding to a faulty component when suspension of that constraint causes the residuals to return to nominal levels.
8. The system of claim 2, wherein at least one constraint in the network of constraints includes a machine learning model trained to perform one or both of forward propagation and backward propagation.
9. The system of claim 8, wherein the machine learning model includes a neural network trained to:predict output values from input values in forward propagation; and predict required input values from observed output values in backward propagation.
10. The system of claim 1 , wherein the one or more models include a probabilistic recurrent neural network configured to predict a probability density function over sensor variables at a next time step based on a history of the sensor data and the control data.Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.1011. The system of claim 10, wherein the probabilistic recurrent neural network generates parameters representing a Gaussian Mixture Model including:mean vectors for a plurality of Gaussian components; covariance matrices for the plurality of Gaussian components; and weights for the plurality of Gaussian components.
12. The system of claim 10, wherein determining the fault includes:comparing measured sensor values to the predicted probability density function; andidentifying a fault when the measured sensor values are statistically inconsistent with the predicted probability density function.
13. The system of claim 12, wherein identifying the one or more components responsible for the fault includes:activating a network of constraints representing component models; iteratively suspending individual constraints in the network; and identifying which constraint suspension causes anomaly scores to return to nominal levels.
14. The system of claim 1, wherein the vehicle includes a satellite orbiting a celestial body.
15. The system of claim 1, wherein the one or more processors are located on-board the vehicle.
16. The system of claim 1, wherein the plurality of sensors includes at least two of: a tachometer, a magnetometer, a star tracker, a coarse sun sensor, a medium sun sensor, an accelerometer, a pressure transducer, or a thermistor.
17. The system of claim 1, wherein the one or more processors are further configured to transmit fault data identifying the fault and the one or more components responsible for the fault to a ground station.Attorney Docket No 40093-151-AClient Docket No. CSDL.7364.1018. The system of claim 1, further comprising a controller configured to:receive fault data identifying the one or more components responsible for the fault; andin response to the fault data, reconfigure operation of the vehicle to compensate for the fault by at least one of:switching to a redundant component; or generating a surrogate output for a faulty component using data from other components.
19. A method for on-board fault detection in vehicles, comprising:receiving sensor data from a plurality of sensors on-board a vehicle; receiving control data associated with operation of the vehicle; generating, using one or more models, predicted values for one or more of the sensors based on at least one of the sensor data and the control data;determining a fault in the vehicle based at least in part on a comparison of the predicted values to measured values from the sensor data; and identifying one or more components of the vehicle responsible for the fault.
20. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors on-board a vehicle, cause the one or more processors to:receive sensor data from a plurality of sensors on the vehicle; receive control data associated with operation of the vehicle; generate, using one or more models, predicted values for one or more of the sensors based on at least one of the sensor data and the control data;determine a fault in the vehicle based at least in part on a comparison of the predicted values to measured values from the sensor data; and identify7one or more components of the vehicle responsible for the fault.