An unmanned aerial vehicle channel inspection system and method based on a multi-modal large space-time prediction model perception
By constructing a multimodal spatiotemporal prediction model, the UAV airway inspection system solves the problems of high false alarm rate and poor environmental adaptability of UAVs in complex waters, and achieves high-precision, high-efficiency obstacle detection and energy optimization.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INST OF COMM SCI YUNNAN PROV
- Filing Date
- 2026-01-13
- Publication Date
- 2026-06-16
AI Technical Summary
Existing UAV waterway inspection technologies suffer from high false alarm rates and poor environmental adaptability in complex waters, making them unable to effectively detect underwater hidden obstacles. Furthermore, existing methods cannot be efficiently applied to lightweight UAV platforms.
A UAV airway inspection system based on a multimodal spatiotemporal prediction model is adopted. By acquiring prediction data of future environmental conditions, a dynamic benchmark spatiotemporal prediction model is constructed, real-time environmental response data is calibrated, detection result signals of obstacle existence probability and physical properties are generated, and linkage control is performed.
It achieves high-precision and robust inspection in complex waters, reduces false detection and missed detection rates, optimizes the utilization of UAV energy resources, and provides rich decision-making information.
Smart Images

Figure CN122219476A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of unmanned aerial vehicle (UAV) inspection technology, specifically to a UAV airway inspection system and method based on a multimodal spatiotemporal prediction model. Background Technology
[0002] With the widespread application of drone technology in fields such as water conservancy, shipping, and environmental monitoring, efficient and reliable detection of underwater hidden obstacles in waterways has become a core prerequisite for ensuring operational safety and efficiency.
[0003] Currently, the technical solutions in this field mainly rely on sonar or optical sensors. However, sonar devices are limited in size, power consumption, and performance in shallow water, making them difficult to popularize on lightweight UAV platforms. Traditional direct optical detection methods are severely constrained by complex environmental channel conditions such as water turbidity, surface ripples, and specular reflection (i.e., glare), resulting in very limited detection capabilities. To address this, some researchers have proposed indirectly inferring the presence of obstacles by analyzing the ripple field generated by the UAV's downwash airflow on the water surface and detecting abnormal patterns within it. While this approach has some innovation, it suffers from inherent technical bottlenecks: existing methods typically use a static or oversimplified "ideal ripple" as a comparison benchmark. This benchmark cannot be dynamically adjusted according to real-time environmental factors such as wind, illumination, and water flow. Therefore, it exhibits a low signal-to-noise ratio when distinguishing between weak, localized distortions caused by real obstacles and complex background ripples caused by environmental noise. This directly leads to problems such as high false alarm rates and poor environmental adaptability in existing indirect detection methods. Summary of the Invention
[0004] The purpose of this invention is to provide an unmanned aerial vehicle (UAV) airway inspection system and method based on a multimodal spatiotemporal prediction model to solve the problems mentioned in the background art.
[0005] To achieve the above objectives, the present invention provides the following technical solution: A UAV airway inspection system and method based on a multimodal spatiotemporal prediction model, comprising the following specific steps: S1. Obtain first prediction data characterizing the future environmental state of the target inspection water area. The first prediction data is configured to quantify the visual detection channel quality of each spatial location in the target inspection water area within a future time window. S2. Acquire real-time environmental response data generated by the UAV's own energy field on the water surface and collected by airborne sensors; S3. Based on the first prediction data, the dynamic benchmark spatiotemporal prediction model used to analyze real-time environmental response data is calibrated to generate a calibrated benchmark spatiotemporal prediction model after environmental channel state adjustment. S4. Based on the calibration benchmark spatiotemporal prediction model, process real-time environmental response data to generate detection result signals that characterize the probability and physical properties of hidden obstacles in the water. S5. Based on the first prediction data and the detection result signal, generate and execute the linkage control command for adjusting the flight status and energy field intensity of the UAV.
[0006] A UAV airway inspection system based on a multimodal spatiotemporal prediction model includes: an environmental prediction data acquisition module, used to acquire first prediction data characterizing the future environmental state of the target inspection water area, wherein the first prediction data is configured to quantify the visual detection channel quality of each spatial location in the target inspection water area within a future time window. The real-time response data acquisition module is used to acquire real-time environmental response data generated by the UAV's own energy field on the water surface medium and collected by airborne sensors. The dynamic benchmark calibration module is used to calibrate the dynamic benchmark spatiotemporal prediction model used to parse real-time environmental response data based on the first prediction data, so as to generate a calibration benchmark spatiotemporal prediction model adjusted by environmental channel state. The detection, analysis and processing module is used to process real-time environmental response data based on the calibration benchmark spatiotemporal prediction model to generate detection result signals that characterize the probability and physical properties of hidden obstacles in the water. The linkage control generation module is used to generate and execute linkage control commands for adjusting the flight status and energy field intensity of the UAV based on the first prediction data and the detection result signal.
[0007] Compared with the prior art, the beneficial effects of the present invention are: This invention provides a method for high-precision and robust inspection of waterways in complex and dynamic aquatic environments. Existing UAV waterway inspection technologies either rely on passive visual observation, which is susceptible to interference from surface glare, turbulence, and other environmental noise, leading to missed or false detections of semi-submersible or underwater concealed obstacles; or they employ macro-environmental prediction for strategic path planning, but cannot address unknown local and instantaneous risks. Therefore, existing technologies lack a means to combine macro-environmental prediction with real-time micro-level detection, thereby enabling adaptive optimization and closed-loop control of detection energy and perception algorithms in dynamically changing environments. This invention aims to address the aforementioned technical pain points and provide a method that significantly improves the safety, efficiency, and data quality of UAVs in complex waterway inspection missions.
[0008] By constructing a spatio-temporal prediction model and generating a four-dimensional spatio-temporal prediction glare index field covering the target inspection water area, the present invention enables the inspection system to have the ability of forward-looking perception of the future environmental channel quality. Instead of passively coping with harsh detection conditions, the system can obtain in advance the visual detection channel quality at specific spatio-temporal points on the route, providing a decision-making basis for subsequent dynamic detection and control strategies.
[0009] The present invention constructs a predicted glare index, selects from a preset model library or generates through interpolation calculation a calibration reference spatio-temporal prediction model that matches the predicted environmental channel state. The calibration reference spatio-temporal prediction model is defined by a set of core parameters such as the main wavelength, attenuation coefficient, and anisotropy degree. The dynamic reference constructed in this way can more realistically reflect the expected background ripple pattern under specific environments compared with the static or simplified ideal water surface models in the prior art. Therefore, when performing difference comparison, it provides a higher-quality reference system for subsequent accurate separation of abnormal signals, helping to reduce misjudgment caused by environmental noise.
[0010] The present invention generates linkage control instructions through an optimal energy field intensity mapping function, where the channel compensation energy component is accurately calculated as a function related to the predicted glare index and the flight height H. This enables the unmanned aerial vehicle (UAV) to perform non-linear and target-oriented energy compensation according to the predicted channel severity and flight height. When the channel quality is good, the system operates with basic energy; only when the channel quality deteriorates, additional energy input is increased as needed. This realizes the optimized utilization of the UAV's energy resources compared with the strategy of using constant high-power detection throughout the inspection process.
[0011] The present invention calculates the normalized ripple field distortion energy index and compares it with the distortion energy threshold determined through systematic calibration experiments, providing a quantitative and repeatable determination basis for the presence or absence of obstacles. In addition, by performing Fourier transform on the time series of this distortion energy index and analyzing its energy spectral density, it is possible to preliminarily classify the physical properties (such as rigidity or flexibility) of the obstacles, going beyond the traditional binary detection results of "yes or no" and providing richer decision-making information. BRIEF DESCRIPTION OF THE DRAWINGS
[0012] Figure 1 Is an isometric schematic diagram of the logical steps of the present invention; Figure 2 Is a schematic diagram of the execution steps of the overall method flow of the present invention. DETAILED DESCRIPTION OF THE EMBODIMENTS
[0013] In order to make the above objects, features, and advantages of the present invention more obvious and understandable, the following detailed description of the specific embodiments of the present invention will be made in conjunction with the drawings in the specification.
[0014] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.
[0015] Example 1: Please see Figures 1 to 2 This invention provides a technical solution: a method for UAV airway inspection based on a multimodal large spatiotemporal prediction model, the specific steps of which include: S1: Acquire first prediction data characterizing the future environmental state of the target inspection water area. The first prediction data is configured to quantify the visual detection channel quality of each spatial location in the target inspection water area within a future time window. The purpose of step S1 is to enable the inspection system to have "predictive" capabilities, allowing it to know in advance the quality of visual detection channels at various spatiotemporal points along the route, thus providing a basis for subsequent active control decisions. In a specific embodiment, step S1 obtains first prediction data from a remote server via a network interface. The generation of the first prediction data depends on a spatiotemporal prediction model deployed on the server. The spatiotemporal prediction model processes the first prediction data and ultimately outputs a spatiotemporal four-dimensional predicted glare index field covering the target inspection water area, spanning a time span of the next 24 hours, with a spatial resolution of 10 meters and a temporal resolution of 15 minutes. This index field assigns a value to the probability of specular reflection at each three-dimensional spatial coordinate point (longitude, latitude, and altitude) within the water area at each future moment; this probability value is the predicted glare index, and its value range is normalized to a closed interval between zero and one.
[0016] S1 includes: acquiring upstream hydrological data, regional meteorological data, and channel topology data covering the target inspection area through a network interface; and processing the upstream hydrological data, regional meteorological data, and channel topology data using a spatiotemporal prediction model to generate first prediction data, wherein the first prediction data includes a spatiotemporal four-dimensional predicted glare index field, and the predicted glare index field is assigned a value to the probability of specular reflection occurring at each spatial coordinate point in the target inspection area at each future moment.
[0017] To ensure that those skilled in the art can implement the present invention without ambiguity, the construction and implementation of the aforementioned spatiotemporal prediction model are now described in detail: The first prediction data in this embodiment includes three types of input data and one type of label data.
[0018] Input Data 1: Upstream hydrological and regional meteorological data. This involves using a standardized API provided by public meteorological and hydrological departments to obtain historical data sequences (hourly units) from upstream observation stations in the target inspection area for the past 72 hours, including water flow, velocity, wind speed, wind direction, total cloud cover, and solar altitude and azimuth angles. Input Data 2: Channel topology data. This involves obtaining Geographic Information System (GIS) vector data of the target inspection area, rasterizing it, and generating a two-dimensional channel mask matrix, where water locations are assigned a value of one and land locations a value of zero. Input Data 3: Time-coded data. Future time points to be predicted (e.g., every hour within the next 24 hours) are converted into periodic sine and cosine codes to represent the time periodicity within a day and a year. Label data: Historical actual glare index. Fixed observation units facing the water surface are pre-installed at several key locations in the target inspection area, either on the shore or on a fixed platform. Each observation unit includes: a wide-angle camera with a fixed downward angle, whose field of view continuously covers a pre-defined, representative body of water; and a solar position sensor that works synchronously with the camera to accurately record the solar altitude and azimuth angles at the time of each image acquisition. An image analysis algorithm is run by acquiring a long-term (e.g., year-long) sequence of images of this fixed body of water and combining it with the synchronously recorded solar position data. This algorithm quantifies the historical actual glare index, between zero and one, associated with a specific solar position and water surface state, by calculating the area ratio of oversaturated pixel regions caused by specular reflection and the gradient diffusion of highlight regions in the image. This index is then used as the ground truth for training the spatiotemporal prediction model.
[0019] This embodiment selects a spatiotemporal graph convolutional network (ST-GCN) as the core spatiotemporal prediction model. The reason for choosing a spatiotemporal prediction model is that it can simultaneously capture spatial topological dependencies (through graph convolution) and temporal dynamic evolution patterns (through temporal convolution or RNN structures), which highly aligns with the multi-source spatiotemporal data prediction problem that this invention aims to solve. The specific structure of the spatiotemporal prediction model is as follows: First, the data embedding module maps the previously processed hydrological, meteorological, topological, and time-coded data to a unified high-dimensional feature space through their respective independent fully connected layers. Next, multiple spatiotemporal graph convolutional modules are stacked. Within each module, a graph convolutional network (GCN) layer first aggregates and propagates spatial features on a graph defined by the waterway topology; then, a gated recurrent unit layer updates and evolves the time-series features. Finally, an output decoding module decodes the feature vector output by the last spatiotemporal module into a four-dimensional tensor through two fully connected layers. This tensor is the predicted glare index field, with dimensions of time, latitude, longitude, and predicted value.
[0020] The third step is training the spatiotemporal prediction model. The collected dataset from the past year is divided chronologically, with the first 80% used as the training set and the last 20% as the test set. Mean squared error (MSE) is used as the loss function, and the Adam adaptive moment estimator is employed. The initial learning rate is set to 0.05%, the batch size is 16, and training is performed for 100 epochs. The final deployed spatiotemporal prediction model exhibits the lowest mean squared error on the test set. In this embodiment, the mean absolute error of the trained spatiotemporal prediction model on the test set is less than 0.08.
[0021] To achieve a forward-looking quantification of the visual inspection channel quality, this step constructs a predictive glare index field covering the target inspection water area and possessing both temporal and spatial dimensions through a spatiotemporal prediction model. This construction process specifically includes the following steps: S11. The acquisition and spatiotemporal alignment of the first prediction data, specifically including the following steps: S111. By calling meteorological and hydrological data services through the network interface, obtain historical and forecast datasets covering the target inspection water area and its upstream basin, from the past 72 hours to the next 24 hours, at 15-minute intervals, and obtain the first feature vector. The first feature vector includes at least: total cloud cover, wind speed, wind direction, solar altitude angle, solar azimuth angle, and upstream water flow. The total cloud cover is obtained by calling the API (Application Programming Interface) of a third-party commercial meteorological data service provider. An HTTP request is sent to the API service terminal, containing the latitude and longitude coordinates of the target geographical location (e.g., 110.582°E, 30.851°N) and the future time point to be predicted (e.g., 10 hours after the prediction task starts). The obtained value representing the percentage of the sky covered by clouds is defined as the raw percentage value. This raw percentage value is linearly normalized and mapped to the [0,1] interval. The total cloud cover is then calculated using the following formula. : In the formula, This represents the original cloud cover percentage. and These represent the preset cloud cover range boundaries, and are the minimum and maximum percentages of the total cloud cover, respectively (0 = 100). For example, if the initial value is 10%, then the total cloud cover is (10...). 0) / (100 0) = 0.10.
[0022] The wind speed is obtained by collecting raw wind speed values in m / s, for example, 15.0 m / s. Based on the safe flight threshold for humans and aircraft and local historical meteorological data, an effective wind speed operating range is set (e.g., [0 m / s, 20 m / s]). Within this range, linear normalization is performed, and the wind speed is calculated using the following formula: In the formula, This is the original wind speed value. `x` and `y` are the preset wind speed operating range boundaries, representing the minimum and maximum wind speed values, respectively, i.e., 0 and 20. The `clip` function is used to truncate values exceeding the range at the boundaries. Example: If the original value is 15.0 m / s, the normalized value is (15.0...). 0) / (20 0) = 0.75; Wind direction is obtained by acquiring raw wind direction angle values, setting true north as 0° and increasing clockwise within the range [0, 360), for example, 225° (representing a southwest wind). Wind direction data is periodic (359° is very close to 0°), and direct linear normalization would lose this periodicity. Therefore, trigonometric functions are used to convert it into sine and cosine components of wind direction in a two-dimensional Cartesian coordinate system to preserve its periodicity. The sine component of wind direction... Sum and cosine components Calculated using the following formula: in, Given the original wind direction angle value, an example calculation is as follows: if the obtained original value is 225°, then V wd,sin =sin(225°)≈ 0.707, V wd,cos =cos(225°)≈ 0.707.
[0023] The solar altitude angle is obtained as follows: the angle between the sunlight and the horizon at the target spatiotemporal point is collected and defined as the original angle. The original angle is then linearly normalized within the effective range (above the horizon, i.e., [0°, 90°]), and the solar altitude angle is calculated using the following formula. : in, and The effective range boundaries are 0 and 90; for example, if the calculated original value is 79.2°, then the normalized value is (79.2°). 0) / (90 0) = 0.88; The solar azimuth angle is obtained by acquiring the projection direction of the sun on the horizontal plane at the target spatiotemporal point, i.e., the original azimuth angle. The original azimuth angle is an angular value representing direction, usually starting with true north as 0° and increasing clockwise, ranging from [0, 360°), for example, 240°. The solar azimuth angle is periodic data and needs to be encoded using trigonometric functions to obtain the sine component of the solar azimuth angle. Sum and cosine components Calculated using the following formula: For example, if the calculated original value is 240°, then V az,sin =sin(240°)≈ 0.866, V az,cos =cos(240°)= 0.5.
[0024] The upstream water flow is obtained by collecting forecast data from hydrological monitoring stations upstream of the target inspection area to obtain the raw water flow. The original water flow rate represents the amount of water flowing through a cross-section per unit time, with the unit being cubic meters per second (m³ / s), for example, 19500 m³ / s.
[0025] Based on historical water flow data from upstream hydrological monitoring stations in the target inspection area, including annual average flow, a maximum value for the normalization range is set. and minimum value The upstream water flow rate is set to [0 m³ / s, 30000 m³ / s] and calculated using the following formula: Calculation example: If the original water flow rate is obtained... Given a flow rate of 19500 m³ / s and a normalization range of [0, 30000], the normalized upstream flow rate is (19500 m³ / s). 0) / (30000 0) = 0.65.
[0026] S112. Obtain high-precision geographic information system (GIS) vector map data of the target inspection water area, and rasterize it into a two-dimensional waterway topology matrix. Each element in the matrix corresponds to a 10-meter by 10-meter geographic grid, and the geographic grid is binarized and assigned a value according to whether it is a water area (1 for water area and 0 for land). S113. For each future time point that needs to be predicted, generate its corresponding time encoding vector, which includes the periodic position of the time point within a day (through sine / cosine transformation) and the periodic position within a year.
[0027] S114. Align the temporal data obtained in S111 with the spatial data obtained in S112. For each target inspection water area, associate the corresponding hydrological, meteorological and time coding features at each historical and future time point to form a unified, multi-channel spatiotemporal data cube.
[0028] The principle and purpose of the first prediction data acquisition and spatiotemporal alignment is that glare formation is the result of multiple factors acting together, including spatial (geographical location, solar position) and temporal (weather changes, water flow changes). Step S11 aims to build a unified, structured data foundation, integrating data from different systems with different spatiotemporal attributes into a unified spatiotemporal coordinate system. This is the fundamental prerequisite for the subsequent spatiotemporal prediction model to perform effective feature learning. The introduction of the waterway topology matrix enables the spatiotemporal prediction model to understand the boundaries and connectivity of waterways, while time encoding allows the spatiotemporal prediction model to capture patterns with periodic regularity (such as daily dawn-dusk variations).
[0029] S12. Spatial dependency modeling based on graph convolutional networks, the specific steps of which include: S121. Convert the channel topology matrix generated in S112 into an undirected graph structure. Each target inspection area is regarded as a node in the graph; if two target inspection areas are geographically adjacent (including sharing edges or vertices), an edge is established between the corresponding two nodes, thus constructing a spatial adjacency graph. S122. For any data slice at any time point in S114, take the first feature vector of each node as input. S123. Apply a Graph Convolutional Network (GCN) layer to process the feature vectors of all nodes. For any node, the GCN layer obtains a second feature vector by aggregating its own features and the features of all its direct neighbors and performing a weighted average.
[0030] The principle and purpose of spatial dependency modeling is that the glare state of a given water grid is not only affected by its own factors but also closely related to its surrounding environment (for example, upstream water flow affects downstream water surfaces, and canyon topography alters local wind fields). Traditional convolutional neural networks (CNNs) can only process regular grid data and cannot effectively handle irregular channel shapes. Step S12 creatively abstracts the water topology into a graph structure and utilizes a graph convolutional network to enable the spatiotemporal prediction model to propagate and aggregate features along real geographical adjacency relationships, thereby gaining a deeper understanding of the spatial correlation of the water environment.
[0031] S13. Decoding and predicting the generation of the glare index field, the specific steps include: S131. Input the second feature vector from step S12 into a decoder module consisting of one or more fully connected layers; S132. The last layer of the decoder module uses a sigmoid activation function to compress the output value into a closed interval between 0 and 1. This output value is defined as the predicted glare index of the corresponding spatial grid at that future time point. S133. Repeat the calculations of S131 to S132 for all target inspection waters and all future time points, and combine all results into a four-dimensional tensor with dimensions of [time, latitude grid index, longitude grid index, and predicted value]. This tensor is the spatiotemporal four-dimensional predicted glare index field of this invention.
[0032] The detailed calculation example of the single-point generation process of the predicted glare index is as follows: How the spatiotemporal prediction model of this invention generates the predicted value of the final predicted glare index from multi-source input data. The predicted glare index is calculated at the 40th time point in the future (t=39, i.e., 10 hours later) located at the water grid coordinates N(19,34).
[0033] S11. The first feature vector of the target spatiotemporal point is collected. A specific example is as follows: at t=39, the system collects all relevant input features for the target grid N(19,34). According to the feature engineering method of this invention, these features are processed into a standardized vector containing 10 components (wind direction and solar azimuth angle each occupy 2 components, and time encoding occupies 2 components). First feature vector: [Total cloud cover, wind speed, wind direction sin, wind direction cos, solar altitude angle, solar azimuth angle sin, solar azimuth angle cos, upstream water flow, time encoding sin, time encoding cos]. The first feature vector X collected and processed in this embodiment. t39 (19,34) is as follows: X t39 (19,34)=[0.10,0.75, 0.707, 0.707, 0.88, 0.866, 0.50, 0.65, 0.50, 0.87]; First eigenvector X t39 (19,34) is a comprehensive and quantitative description of the environmental state of the target point at a specific future time. Specifically: Total cloud cover = 0.10: Clear sky. Wind speed = 0.75: Strong wind. [Wind direction sin, wind direction cos] = [ 0.707, [0.707]: Indicates a southwest wind (225°). This direction, whether along or against the line of sight, has a specific impact on the ripple pattern. Solar altitude angle = 0.88: The sun is near the zenith, providing favorable specular reflection conditions. [Solar azimuth angle sin, Solar azimuth angle cos] = [ 0.866, [0.50]: This indicates the sun's azimuth is 240°, i.e., southwest by west. The relative relationship between this angle and the observation angle is the decisive factor in glare. Upstream water flow = 0.65: The water flow is relatively fast. [Time code sin, Time code cos] = [0.50, [0.87]: This represents the periodic position of the predicted time within a day.
[0034] S12, Spatial Feature Fusion (Graph Convolution Operation): The spatiotemporal prediction model finds four spatial neighbor nodes of N(19,34) and obtains their 10-dimensional feature vectors at time t=39. X t39 (18,34)(Upstream)=[0.10,0.78, 0.707, 0.707, 0.88, 0.866, 0.50, 0.68, 0.50, 0.87]; X t39 (20,34)(downstream)=[0.10,0.72, 0.707, 0.707, 0.88, 0.866, 0.50, 0.62, 0.50, 0.87]; X t39 (19,33)(nearshore)=[0.12,0.65, 0.659, 0.752, 0.88, 0.866, 0.50, 0.65, 0.50, [0.87] (Nearshore wind direction may be slightly adjusted); X t39 (19,35)(center)=[0.10,0.75, 0.707, 0.707, 0.88, 0.866, 0.50, 0.65, 0.50, [0.87]; The graph convolutional network layer performs a weighted averaging operation, aiming to allow each water grid (node) to perceive its surrounding environment. The model does not simply analyze the target point N(19,34) in isolation, but simultaneously aggregates the feature information of all its directly adjacent nodes (up, down, left, right). The weighted averaging operation is used to calculate the second feature vector using the following formula. : In the formula, This represents the new feature vector obtained by spatial fusion of the target node i (i.e., N(19,34) in this example) at time t. Let represent the original 10-dimensional feature vector of the neighboring node j at time t. Let i represent the set of neighboring nodes of the target node i; Represents the adjacency matrix of self-loops. This indicates that target nodes i and j are connected; otherwise, it is 0. Indicates the number of connections to target node i. This represents the number of connections between adjacent nodes j. By dividing by a value related to the connectivity between neighboring node j and target node i; N i ∪U i This represents all neighboring nodes of the target node i, plus the target node i itself; when calculating the new features of i, it is necessary to include not only its surrounding neighbor nodes N i The features are used to weight the average, and then it is also necessary to use its own U i The original features are also included; if U is not added i (That is, without self-loops), when a node updates its features, it will only include information about its surrounding neighbors, losing its own original information. Through N i ∪U i The algorithm ensures that the updated features incorporate environmental information (spatial fusion) while retaining their own characteristics.
[0035] After the spatial feature fusion (weighted averaging) operation in step S12, the resulting second feature vector X 2 t39,(19,34) for: X 2 t39,(19,34)=[0.104,0.730, 0.697, 0.716, 0.880, 0.866, 0.500, 0.650, 0.500, 0.870]; S13, Based on the second eigenvector The input is fed into a decoder consisting of two fully connected layers, and the predicted glare index is calculated using the Sigmoid activation function.
[0036] Specifically, the second feature vector Ten features are assigned an importance weight, which is multiplied by each corresponding value and summed. This sum is then added to the bias term b used in the spatiotemporal prediction model training, set to 0.1, to obtain the first aggregate score. This first aggregate score is then converted to a standardized value between 0 and 1 to obtain the predicted glare index, as shown in Table 1 below. Table 1 shows the weighted calculation results of the second feature vector (containing 10 environmental feature values) of the target inspection water area at coordinates (19,34) on the water area grid map at time point 39. Add all the values in the "Weighted Result" column of the table above: 0.052 + 0.146 + 1.0455 0.5728 + 1.056 + 0.7794 0.050 + 1.170 + 0.150 + 0.957 = 4.7331, which is the first aggregation score. Adding the bias term b, the first aggregation score becomes 4.7331 + 0.1 = 4.8331. This yields the first aggregation score representing the original tendency of glare: 4.8331. This score is then input into the Sigmoid function, converting it to a value between 0 and 1, resulting in the predicted glare index, denoted as PGIinst. The calculation formula is PGIinst = 1 / (1 + e^(-1 / 2)). ( 第一聚合分数) ); where e represents the standard base, set to 2.71828; substituting the values: Predicted glare index = 1 / (1+e) ( 4.8331)) ≈0.992; S2. Acquire real-time environmental response data generated by the UAV's own energy field on the water surface medium and collected by airborne sensors; S2 includes: generating a water surface ripple field on the target inspection area by using the downwash airflow generated by the UAV's rotor; and collecting a real-time video stream containing water surface ripple field morphology information through the visual sensor on the UAV as real-time environmental response data.
[0037] S3. Based on the first prediction data, the dynamic reference spatiotemporal prediction model used to analyze real-time environmental response data is calibrated to generate a calibration reference spatiotemporal prediction model adjusted for environmental channel state. The step of calibrating the dynamic reference spatiotemporal prediction model in S3 specifically includes: based on the predicted glare index extracted from the predicted glare index field, which corresponds to the current position and time of the UAV, an ideal ripple field spatiotemporal prediction model corresponding to the water surface turbulence level represented by the predicted glare index is selected from the preset spatiotemporal prediction model library or generated by interpolation; the ideal ripple field spatiotemporal prediction model is used as the calibration reference spatiotemporal prediction model.
[0038] S4. Based on the calibration benchmark spatiotemporal prediction model, process real-time environmental response data to generate detection result signals that characterize the probability and physical properties of hidden obstacles in the water area; the steps of processing real-time environmental response data in S4 specifically include: performing pixel-level difference operations between the current frame image in the real-time video stream and the calibration benchmark spatiotemporal prediction model to generate a ripple field difference map.
[0039] The steps in S4 for processing real-time environmental response data also include: calculating the normalized ripple field distortion energy index of the spatiotemporal distortion fusion map; and generating a detection result signal indicating the presence of hidden obstacles when the normalized ripple field distortion energy index exceeds a preset distortion energy threshold.
[0040] S5. Based on the first prediction data and the detection result signal, generate and execute the linkage control command for adjusting the flight status and energy field intensity of the UAV.
[0041] This embodiment illustrates how the processing device acquires real-time environmental response data.
[0042] Assume that at time 40 (t=39), a drone flies to the target inspection area, its geographic coordinates precisely corresponding to position (19.2, 34.5) on the water's grid map. The drone hovers 5 meters above the water surface. Its quadcopter system operates at 3000 RPM, generating a downwash that acts vertically on the water surface. This downwash energy creates a ripple field with a specific shape and dynamic evolution, centered directly below the drone. The morphology of this ripple field (e.g., wavelength, amplitude, and diffusion velocity) is a direct response to a combination of environmental factors such as surface tension, viscosity, wind, and water flow. The drone's onboard vision sensor (e.g., an industrial camera with a resolution of 1920x1080 pixels and a frame rate of 60fps) is pointed vertically downwards, continuously capturing images of the ripple field generated by the downwash. The continuous sequence of images captured by the camera constitutes a real-time video stream. A processing device (located at a ground station or in the cloud) receives this video stream in real time via a wireless communication link. For example, within the first second of time t=39, the processing device receives 60 frames of images. Each frame can be represented as a 1920×1080 pixel matrix, where each pixel value (e.g., grayscale value range 0-255) represents the light reflection intensity of the water surface at that point, indirectly reflecting the instantaneous slope of the water surface. This series of high-frequency image matrices constitutes the real-time environmental response data in this embodiment. The processing device knows the current spatiotemporal position of the UAV (coordinates (19.2, 34.5), time t=39). However, the generated four-dimensional predicted glare index field is based on a discrete grid with an integer index spatial resolution. Therefore, the device needs to obtain the predicted value of the precise position through interpolation calculation. Suppose that the device queries the index field and finds the predicted glare indices of the four grid nodes closest to the drone's position at time t=39 as follows: PGIinst(19,34)=0.992; PGIinst(20,34)=0.991; PGIinst(19,35)=0.993; PGIinst(20,35)=0.992; The processing device uses the bilinear interpolation algorithm to calculate the predicted glare index at position (19.2,34.5): PGIinst(19.2,34.5)=0.9923; S3 Step-by-Step Explanation: From Predicting Glare Index to Generating an Ideal Model; In practical applications, the "ideal ripple field spatiotemporal prediction model" is not a large and independent software program, but a mathematical function defined by a set of core parameters (i.e., dominant wavelength λ0, attenuation coefficient α, and anisotropy ε) that can accurately describe the physical behavior of ripples. Therefore, the "spatiotemporal prediction model library" is a pre-set database or lookup table that stores the mapping relationship between different water surface turbulence levels (quantified by the calibrated glare index) and the corresponding core parameter sets.
[0043] When the processing device obtains a real-time calculated predicted glare index, it queries this spatiotemporal prediction model library. If the predicted glare index value exactly matches a calibration value in the library, the core parameter set corresponding to that calibration value is directly selected. If the predicted glare index value lies between two adjacent calibration values in the library, an accurate new core parameter set continuously corresponding to the current predicted glare index value is generated through interpolation calculation. As shown in Table 2: Table 2: Spatiotemporal Prediction Model Library The predicted glare index based on the position (19.2, 34.5) is: PGIinst(19.2, 34.5) = 0.9923; referring to "Table 2: Spatiotemporal Prediction Model Library", this value is between models IDM010 (corresponding index 0.9) and M011 (corresponding index 1.0). Core parameter interpolation calculation: The processing device performs linear interpolation calculations on the three core parameters to obtain a parameter set that precisely matches Ip = 0.9923; dominant wavelength λ0: λ0 = 6.5 + (5.0 6.5)×(0.9923 0.9) / (1.0 0.9) = 5.1155cm; Attenuation coefficient α: α = 0.910 + (1.000) 0.910)×(0.9923 0.9) / (1.0 0.9) = 0.99307m - ¹; Anisotropy degree ε: ε = 2.620 + (3.000) 2.620)×(0.9923 0.9) / (1.0 0.9) = 2.97074; 0.9923: This is the target input value calculated for the UAV's current spatiotemporal point (19.2, 34.5, t=39), i.e., the predicted glare index; 0.9 and 1.0: These two values are the two known input value boundaries in "Table 2" that can "sandwich" the target input value 0.9923. 0.9 comes from the "calibrated predicted glare index" corresponding to the spatiotemporal prediction model library IDM010.
[0044] 1.0 is the "calibrated predicted glare index" corresponding to IDM011 in the spatiotemporal prediction model library. 6.5 and 5.0: These two values are known output value boundaries that correspond one-to-one with the boundaries of the two input values mentioned above. 6.5 (in cm) is the value of the "core parameter dominant wavelength λ0" corresponding to row M010 in "Table 2". 5.0 (in cm) is the value of the "core parameter dominant wavelength λ0" corresponding to row M011 in "Table 2". The processing device has successfully generated the calibration reference spatiotemporal prediction model at the current spatiotemporal point. The calibration reference spatiotemporal prediction model consists of this set of precise core parameters (λ0=5.1155cm, α=0.99307m). - ¹, ε=2.97074) is uniquely determined.
[0045] S4. After generating the ripple field difference map, the processing device performs a feature processing step dynamically guided by the core parameters of the ideal ripple field. Using the calculated dominant wavelength λ0 and anisotropy ε, a set of spatial frequency matched filters is configured. These filters are precisely tuned to suppress residuals in the difference map that conform to the expected background ripple structure. Simultaneously, based on a complete baseline model including the attenuation coefficient α, an idealized expected optical flow field is generated and vector-subtracted from the measured optical flow field calculated from the real-time video stream to obtain a motion residual field containing only anomalous motion patterns. Finally, by fusing the structural anomalous signal and the motion residual field, a highly clean, denoised spatiotemporal distortion fusion map is generated.
[0046] The following is a specific example of generating a spatiotemporal distortion fusion map: A 1920x1080 frame image of the current frame, captured from a real-time video stream. Step S4 generates a spatiotemporal distortion fusion map, clearly indicating potential subtle anomalies. For detailed illustration, three representative pixels in the image are selected for full data tracing.
[0047] Pixel A1 (coordinates 200, 350): Located in a normal background ripple area. Pixel A2 (coordinates 800, 500): Located in an area with a slightly altered shape caused by a real underwater obstacle. Pixel A3 (coordinates 1500, 600): Located in a noise point formed by a momentary reflection from the water surface, inconsistent with the background shape and movement.
[0048] S41. The processing device performs pixel-level difference calculations between the real-time acquired current frame image and the ideal image generated by the "calibration benchmark spatiotemporal prediction model" to obtain the absolute value of the difference between the brightness value of the current frame image and the brightness value predicted by the ideal model at each pixel point in the ripple field difference map. The absolute value of the brightness value difference is the original input for all subsequent analyses, mixing real anomalies and background residuals. In this example, at pixel value (0-255), the absolute value of the difference between the brightness value of the current frame image and the brightness value predicted by the ideal model at pixel value (0-255) in the ripple field difference map D(x,y) is the original input for all subsequent analyses, mixing real anomalies and background residuals. Example: D(A1)=30 (background area, small difference, mainly normal residuals from model prediction); D(A2)=180 (real anomaly, water surface morphology is destroyed, large difference from the ideal model); D(A3)=120 (reflective noise, sudden brightness change, large difference from the ideal model); S42. Structural anomaly processing, including: accurately separating structural anomalies from D(x,y) that do not conform to the expected ripple pattern. Dominant wavelength λ0 and anisotropy ε; λ0 is defined as the spatial period of the ideal background ripple, and ε is its directional stretching. In this example, the values are: λ0 = 50 pixels, ε = 3.0; these two core parameters are used to "tell" the filter what kind of background texture to match and suppress.
[0049] The processing device configures a set of spatial frequency matched filters using λ0=50 and ε=3.0. This filter is applied to D(x,y) to obtain the background response map Gresponse(x,y). The matching response strength of the filter on D(x,y) to the expected background ripples is defined. Its value is high where the ripples match the expectations; its value is low where there are anomalies or noise that do not match. In this example, the pixel values are (0-255); Gresponse(A1)=28 (highly matched to the expected ripples, strong response); Gresponse(A2)=10 (morphology is disrupted, mismatch, weak response); Gresponse(A3)=5 (sharp bright spots, non-periodic ripples, mismatch, weak response); Calculate the structural anomaly signal: Mstructraw(x,y)=max(0,D(x,y)) Gresponse(x,y)); the structural anomaly signal at coordinate A1, Mstructraw(A1) = max(0,30). 28)=2; Structural anomaly signal at coordinate A2, Mstructraw(A2)=max(0,180 10)=170; Structural anomaly signal at coordinate A3, Mstructraw(A3)=max(0,120) 5) = 115; and normalize to obtain the structural outlier value of each pixel, denoted as: Mstruct(x,y): a floating-point number in the range [0,1], which precisely quantifies the degree of mismatch between the morphological structure of each pixel coordinate (x,y) and the ideal background. In this example, the values (normalized with 255 as the maximum value) are: Mstruct(A1) = 2 / 255 ≈ 0.008; Mstruct(A2) = 170 / 255 ≈ 0.667; Mstruct(A3) = 115 / 255 ≈ 0.451; S43, Motion Anomaly Handling, used to identify dynamic anomalies that do not conform to expected motion patterns.
[0050] Extract the attenuation coefficient α, the expected optical flow field Video, and the measured optical flow field Vreal; α is one of the parameters used to generate the ideal model and affects the prediction of ripple motion. Video is the ideal velocity field generated without disturbance based on the complete model (including α). Vreal is the actual pixel motion velocity field calculated from real video. In this example, the vector values (unit: pixels / frame); Video(A1)=(2.0,1.0), Vreal(A1)=(2.1,1.1) (background area, motion matches expectations); Video(A2)=(2.0,1.0), Vreal(A2)=(0.3,0.2) (real anomaly, water flow is obstructed, actual motion is much slower than expected); Video(A3)=(2.0,1.0), Vreal(A3)=( 4.0, 3.0 (Reflective noise, apparent motion is violent and irregular); Calculate the motion residual field and take its modulus: Mmotionraw(x,y)=||Vreal(x,y) Videal(x,y)||; Mmotionraw(A1)=||(0.1,0.1)||≈0.141; Mmotionraw(A2) = ||( 1.7, 0.8)||≈1.879; Mmotionraw(A3) = ||( (6.0, 2.0)||≈6.325; Normalization yields the final motion anomaly value, denoted as: Mmotion(x,y): Mmotionraw(x,y) / Vdiffmax; the parameter Vdiffmax represents the maximum velocity difference, used for normalization, and is set to 7.0 pixels / frame. Defined as a floating-point number in the range [0,1], it precisely quantifies the degree of mismatch between the motion pattern of each pixel and the ideal background.
[0051] In this example, the values are: Mmotion(A1) = 0.141 / 7.0 ≈ 0.020; Mmotion(A2) = 1.879 / 7.0 ≈ 0.268; Mmotion(A3) = 6.325 / 7.0 ≈ 0.904; S44. A spatiotemporal distortion fusion map is generated. By fusing anomalous signals from both structural and motion dimensions, the final decision-making basis is produced. The two normalized outlier values, Mstruct and Mmotion, are multiplied pixel-by-pixel, and the result is then mapped back to the image brightness range of 0-255. The spatiotemporal distortion fusion map S(x,y) defines the final, highly clean, denoised image. Its pixel value (0-255) represents the overall confidence level that a true anomaly exists at that point.
[0052] The calculation formula is S(x,y)=Mstruct(x,y)×Mmotion(x,y)×255; In this example, the final pixel value S(A1) (background) = 0.008 × 0.020 × 255 ≈ 0.04 (almost 0, pure black). S(A2) (True Anomaly) = 0.667 × 0.268 × 255 ≈ 45.6 (Presented as a medium-brightness pixel); S(A3) (reflective noise) = 0.451 × 0.904 × 255 ≈ 104.0 (represented as a high-brightness pixel); S45. Based on the spatiotemporal distortion fusion map, calculate its normalized ripple field distortion energy index Ed, which comprehensively quantifies the degree of physical deviation of the real-time signal. When the ripple field distortion energy index Ed exceeds the preset distortion energy threshold, it indicates a distortion risk that does not match the environmental prediction, and the processing device then generates a detection result signal indicating the presence of hidden obstacles.
[0053] The specific method for obtaining the ripple field distortion energy index Ed aims to transform pixel-level information in the spatiotemporal distortion fusion map into a normalized scalar that can characterize anomalous regions. The calculation process includes the following steps: The input "denoised spatiotemporal distortion fusion map" (denoted as matrix S) is binarized, with pixels above a preset low-noise threshold set to 1 and the rest set to 0, forming a binary mask. Subsequently, candidate anomaly region analysis is performed on this mask to identify spatially independent candidate anomaly regions composed of anomalous pixels, denoted as C1, C2, ..., C6. n .
[0054] Regional energy and spatial compactness weighting: For each identified k-th candidate anomalous region C k Two core metrics are calculated: raw energy and space compactness. The raw energy, denoted as Eraw(k), is obtained by processing the k-th candidate anomaly region C. k The values of all pixels within the original spatiotemporal distortion fusion map S are summed. Spatial compactness, denoted as Wc(k), is obtained by calculating the k-th candidate anomaly region C. k The ratio of the area A(k) to its minimum convex hull area AH(k) is denoted as Wc(k) = A(k) / AH(k). This value is between 0 and 1; the closer the value is to 1, the more regular and "clustered" the region is, rather than elongated or scattered noise. The weighted energy Eweighted(k) = Eraw(k) × Wc(k) is calculated for each candidate anomaly region. This step aims to highlight regularly shaped anomaly regions while suppressing residual noise with irregular shapes. The largest weighted energy value Emax = max(Eweighted(k)) among all candidate anomaly regions is selected as the final energy representation. To eliminate the influence of factors such as image size and sensor gain, Emax is compared with a preset reference energy benchmark Eref, obtained through experimental calibration. Eref represents the weighted energy value that a standard-sized reference object can produce under typical conditions. The final distortion energy index Ed is calculated as follows: Ed = Emax / Eref; Data Example: A 1920x1080 pixel "denoised spatiotemporal distortion fusion image" shows that most areas have a pixel value of 0. Candidate anomaly region analysis: Two candidate anomaly regions were identified: C1: A long, narrow, and scattered region consisting of 100 pixels, possibly noise from water surface reflections. C2: A relatively rounded region consisting of 450 pixels, suspected to be a genuine anomaly. Weighted calculation of region energy and spatial compactness: For region C1: raw energy Eraw(1) = 5,000 (total pixel values); spatial compactness Wc(1) = area(100) / convex hull area(300) = 0.33; weighted energy Eweighted(1) = 5,000 × 0.33 = 1,650; For region C2: raw energy Eraw(2) = 40,000; spatial compactness Wc(2) = area(450) / convex hull area(500) = 0.90; weighted energy Eweighted(2) = 40,000 × 0.90 = 36,000; The maximum weighted energy is selected as Emax = max(1650, 36000) = 36,000. Assume the system's reference energy baseline Eref is calibrated to 44,444. The final distortion energy index is calculated as: Ed = 36,000 / 44,444 ≈ 0.81.
[0055] The calculated Ed=0.81 is compared with the preset distortion energy threshold TE=0.5.
[0056] Since 0.81 > 0.5, the system determines that an abnormal distortion has been detected and generates a detection result signal = the presence of a hidden obstacle.
[0057] It should be noted that the preferred value of the distortion energy threshold is denoted as TE, which represents a preset threshold used to determine whether the calculated normalized ripple field distortion energy index has reached the significance level indicating the presence of a hidden obstacle. In this embodiment, the preferred value is 0.5. The "distortion energy threshold TE" is the core parameter of the detection decision module of this invention, and its value directly affects the system's detection performance. There is an inherent trade-off in setting TE: a lower distortion energy threshold TE will improve the system's sensitivity to weak anomalies, but may misjudge slight distortions caused by environmental noise (such as water surface ripples or small-scale turbulence) as obstacles, leading to an increased false alarm rate. A higher distortion energy threshold TE will enhance the system's ability to suppress environmental noise, making the system more robust, but may ignore distortions caused by real obstacles that are small in size or have inconspicuous physical properties, leading to an increased false alarm rate. Therefore, the optimal value of this parameter is not determined in isolation, but is obtained through joint calibration experiments of the following systems: Experimental environment and reference preparation steps: In this embodiment, a standardized water tank experimental environment is prepared, with the tank dimensions being 5 meters × 5 meters and a water depth of 2 meters, equipped with a device for precisely controlling the water flow velocity and surface wind field intensity. Simultaneously, two types of key reference objects are prepared: The target obstacle sample set step involves a group of objects with different sizes, shapes, and materials to simulate real concealed obstacles. These include spheres with diameters ranging from 5 centimeters to 30 centimeters, PVC pipes of different lengths, and a flexible mesh structure simulating a fishing net; The noise simulation source step involves an adjustable-power bubble generator placed at the bottom of the water to simulate non-obstacle-related local surface turbulence generated by underwater gas escape or biological activity, serving as a typical environmental noise source.
[0058] Each sample from the "target obstacle sample set" was placed sequentially in a water tank and fixed at different depths below the water surface (set to 10 cm, 30 cm, and 50 cm underwater). For each combination of "obstacle and depth," the drone was driven to hover directly above the obstacle, and the normalized ripple field distortion energy index was continuously calculated according to the method of this invention, recording the stable value. Multiple sets of experiments were repeated to obtain the distortion energy index distribution curves for various real obstacles. All target obstacles were removed, and the "noise simulation source" (bubble generator) was started, with multiple sets of experiments set at different power levels. At each power level, the drone was driven to hover directly above the obstacle, and the normalized ripple field distortion energy index was continuously calculated and recorded. The distortion energy index distribution curves for various typical environmental noises were obtained. In offline data analysis, the search range for the distortion energy threshold TE was set to [0.1, 0.9], with a step size of 0.01. For each candidate threshold TE, the following statistics were performed using the two types of data distribution curves collected: Calculate the false negative rate: In the data from the "False Negative Rate Calibration Experiment," count the number of times the distortion energy index generated by real obstacles is lower than the current candidate threshold TE. The proportion of these instances to the total number of real obstacle experiments is the false negative rate at that threshold. Calculate the false positive rate: In the data from the "False Alarm Rate Calibration Experiment," count the number of times the distortion energy index generated by environmental noise is higher than the current candidate threshold TE. The proportion of these instances to the total number of noise experiments is the false positive rate at that threshold. Finally, plot the false negative rate curve and the false positive rate curve as a function of the threshold TE. Select the threshold point that minimizes the total error rate (false negative rate + false positive rate), or, based on the specific application scenario (setting a minimum false negative rate for safety-critical scenarios), select the threshold point that minimizes the false positive rate while meeting specific performance indicators (e.g., a false negative rate below 1%), as the final preferred value. In this embodiment, the preferred value of the distortion energy threshold TE determined by the above method is 0.5.
[0059] The steps for generating linkage control commands in S5 specifically include: calculating the target energy field intensity value through the optimal energy field intensity mapping function; wherein, the target energy field intensity value (Etarget) output by the optimal energy field intensity mapping function is composed of the linear superposition of the base energy component (Ebase) and the channel compensation energy component (Ecomp); the base energy component is set to a preset constant value; the channel compensation energy component is calculated as a function related to the predicted glare index and the current flight altitude (H) of the UAV, and its form is defined as: Ecomp=g×(PGIinst / (1 PGIinst))×H γWhere g is the preset channel compensation gain coefficient, γ is the altitude influence index, and γ>1. The function is configured such that when the predicted glare index approaches 1, the channel compensation energy component increases nonlinearly, and the magnitude of this increase amplifies exponentially with increasing flight altitude. Energy decomposition divides the total energy into "base energy" and "compensation energy," a structure that is very consistent with engineering practice. Base energy ensures basic detection capability under any circumstances, while compensation energy is the "cost" incurred to overcome environmental noise. This decomposition method is logically clear, easy to understand, and easy to implement. Channel compensation term (1 The term PGIinst has a very good physical interpretation. PGIinst represents the noise level of the channel's "severity". When PGIinst is very small (close to 0, the channel is good), this term is also close to 0, and the compensation energy is very small. When PGIinst approaches 1 (the channel is extremely bad), the denominator (1... As PGIinst approaches 0, the value of the entire fraction increases sharply and non-linearly. This perfectly simulates the physical reality that maintaining communication (probing) requires a huge energy cost when channel quality is about to completely fail. High-influence term H γ The condition (and γ>1) also aligns with physical intuition. The higher the drone flies, the more significant the energy loss of its rotor downwash airflow upon reaching the water surface. Therefore, to generate ripples of the same intensity on the water surface, the energy required must increase exponentially with altitude. γ, as an exponent greater than 1, precisely characterizes this nonlinear relationship.
[0060] In S5, the linkage control command also includes a trajectory adjustment command for avoiding hidden obstacles identified by the detection result signal; when the first prediction data is lost or its confidence level is lower than the preset confidence level threshold, the linkage control command is forced to be set to a preset conservative detection strategy, which corresponds to the preset flight altitude and rotor speed under the worst operating conditions based on historical statistics.
[0061] The method also includes step S6: after the linkage control command is executed, continuously monitor the temporal change of the normalized ripple field distortion energy index; extract the energy spectral density within the preset characteristic frequency range by performing a Fourier transform on the temporal sequence of the normalized ripple field distortion energy index; and classify and identify the material properties or motion state of the hidden obstacles by comparing the energy spectral density with a preset feature library that stores the correspondence between different obstacle materials and shaking modes.
[0062] The core of this invention is to infer the presence of underwater obstacles by analyzing the distortion of the "ripple field" generated by the drone on the water surface. Imagine this ripple field as a "detection net" cast by the drone on the water. When a certain point on this "net" deforms due to an underwater obstacle, the system issues an alarm. However, this "avoidance" is not traditional avoidance to prevent physical collisions, but rather "active detection-oriented avoidance." Its core principle is as follows: the avoidance principle is not to prevent collisions, but to see more clearly. When the drone performs an "avoidance" maneuver, its primary purpose is not to fly away from the danger zone, but to actively adjust its position and attitude to move to a more favorable position for "observing" the distorted area. Changing the observation angle: By flying laterally or hovering, the drone can observe the ripple distortion from different angles. This is similar to subconsciously adjusting one's observation angle to see a reflective object more clearly. This effectively eliminates misjudgments caused by environmental factors such as solar glare and water surface reflection at certain angles, further confirming the authenticity of the distortion.
[0063] Optimizing the detection energy field: Changes in the drone's position alter the location and intensity of its rotor downwash airflow acting on the water surface. By adjusting its flight path, the drone can more precisely focus its detection energy (i.e., downwash airflow) on the area where the suspected obstacle is located, or adjust the energy intensity by changing the distance to obtain the distortion signal with the highest signal-to-noise ratio.
[0064] Initial detection (S4): The system calculates the "normalized ripple field distortion energy index" and preliminarily determines that an obstacle "may" exist. This is a "present or absent" signal.
[0065] Executing evasive maneuvers (S5's trajectory adjustment command): At this point, the system does not simply fly the drone away. Instead, it generates a specific "trajectory adjustment command." This command may not involve flying away in a straight line, but rather a small-scale circling, lateral movement, or altitude adjustment. Controlling the drone to perform a reconnaissance maneuver, moving it from its current position to a new observation location; this action essentially initiates a "detailed reconnaissance" procedure.
[0066] Secondary Confirmation and Attribute Analysis (S6): During the "avoidance" maneuver (i.e., detailed flight mode), the UAV continuously collects data. The system continuously monitors the temporal changes of the distortion energy index and analyzes its energy spectral density using methods such as Fourier transform. By comparing with the feature library, the system can classify and identify the material of the obstacle (is it rigid rock or flexible aquatic plants?) and its motion state (is it fixed or swaying?). If it simply flies away, the system cannot collect the richer data required for attribute analysis. Therefore, the "avoidance" maneuver is a key prerequisite for subsequent refined analysis (S6). Although the UAV will not collide with underwater obstacles, these obstacles pose a direct threat to the safety of other equipment that may be launched into the water (such as underwater robots, survey vessels, divers) or the waterway itself. The purpose of this invention is to provide high-precision underwater risk maps for these subsequent operations.
[0067] Therefore, when a drone detects an obstacle, it performs a "detailed avoidance survey" to: increase the confidence of the detection results, ensuring that the report sent to the operator is not a false alarm; and provide richer decision-making information, not just telling the operator "there is something underwater," but telling them "there is a suspected 2×3 meter rigid object 10 meters underwater," which provides extremely valuable decision-making basis for subsequent obstacle removal or navigation planning.
[0068] Specific Implementation: In the aforementioned steps, the system has calculated the entire spatiotemporal distortion fusion map S(x,y) and extracted the maximum saliency value Smax=0.81 from it. Since 0.81>0.5 and the distortion energy threshold TE=0.5, the system has generated a "detection result signal = presence of a hidden obstacle," confirming the detection of an abnormal target underwater. The location coordinates of this abnormal target have also been recorded.
[0069] Step S5: Generate linkage control commands, including: after the system receives the "there is a hidden obstacle" signal, it immediately executes S5 to generate a composite command that includes energy regulation and trajectory adjustment.
[0070] S51: Calculate the target energy field intensity value (Etarget); Parameter preset: The base energy component (Ebase) is set to a constant value of 100 units (to ensure basic detection capability).
[0071] The channel compensation gain factor (g) is set to 0.5. The height influence index (γ) is set to 1.5 (to meet the requirement of γ>1).
[0072] Predicted glare index: The system calculates this value to be 0.6 in real time based on the current lighting and water surface reflection.
[0073] UAV current flight altitude (H): Read from the flight control system, the current altitude is 15 meters. Calculate the channel compensation energy component (Ecomp): According to the formula Ecomp=g×(PGIinst / (1 PGIinst))×H γ Ecomp = 0.5 × (0.6 / (1) 0.6))×15 1.5 ≈43.57 units; Calculate the final target energy field intensity value (Etarget): according to the formula Etarget = Ebase + Ecomp; Etarget = 100 + 43.57 = 143.57 units; S52: Generate complete linkage control commands, specifically converting the calculation results and detection position into specific flight control commands, including: the first command, which converts the target energy field intensity value of 143.57 units into the rotor target speed through a preset mapping table, as shown in Table 3 below.
[0074] Table 3: Example of target energy field intensity versus rotor target rotation speed (RPM) mapping Assuming the mapping relationship is linear, the first command is: "Adjust the average rotational speed of the quadcopter to 2850 RPM"; the second command includes: based on the previously detected abnormal target position, in order to avoid the hidden obstacle, control the UAV to perform a reconnaissance maneuver and move from the current position to a new observation position; the command is: "Move 2 meters to the left at the current altitude (15 meters) and then hover." Move 2 meters to the left to conduct a second observation.
[0075] S53: Example of a scenario triggering a conservative detection strategy, including the following: Suppose that during the execution of the first or second command, the image transmission signal between the UAV and the ground station is interrupted due to interference, resulting in the inability to receive real-time "first prediction data" (i.e., inability to calculate PGIinst). In this case, the system determines that the confidence level is below the threshold and forcibly triggers the conservative detection strategy. The system immediately abandons the calculation of Ecomp, directly queries the preset "worst-case" parameter library, generates the command: "Immediately adjust the flight altitude to 8 meters, set the rotor speed to 3200 RPM", and continues to execute it until the data link is restored.
[0076] Step S6: Classification and identification of obstacle attributes, including: after the UAV executes the command in S52 (translate and hover at 2850 RPM), the system enters S6 to perform in-depth analysis of the target.
[0077] S61. Time-series data monitoring and acquisition: The UAV hovers stably at the new position and rotor speed, continuously generating and detecting water surface ripples. The system records the "normalized ripple field distortion energy index" (assuming this is an indicator that reflects the degree of ripple anomaly in real time) for 10 consecutive seconds at a frequency of 30Hz.
[0078] The collected time series sample is: [0.78, 0.81, 0.76, 0.72, 0.75, 0.80, 0.77, 0.73, ...] (This sequence exhibits periodic oscillations due to the interaction between obstacles and ripples).
[0079] S62. Fourier Transform and Energy Spectral Density Extraction: The system performs a Fast Fourier Transform (FFT) on the sequence of 300 data points (10 seconds × 30 Hz) to generate a transformed spectrum. The transformed spectrum shows that a significant peak appears in the energy spectral density (PSD) within the preset characteristic frequency range of 2 Hz to 4 Hz, with the peak center frequency located at approximately 2.8 Hz. The energy is lower in other frequency bands.
[0080] S63. Feature Database Comparison and Recognition: The system will match the extracted 2.8Hz peak feature with a preset obstacle feature database. As shown in Table 4: Table 4: Preset obstacle feature library S64. Output classification and recognition results: The detected 2.8Hz peak value matches the characteristic frequency range [2,5] of "partially submerged flexible objects (such as fishing nets)" in the preset obstacle feature library. Update the status or send the final recognition report to the ground station via data link: "Underwater concealed obstacle detected, type identified as: flexible object (high probability of fishing net), location coordinates [x,y,z]." Through this series of steps, the system not only completes the "present or absent" detection, but also realizes intelligent optimization of the detection behavior, and further provides in-depth analysis of "what it is", forming a complete "detection-decision-action-analysis" closed loop.
[0081] Figure 1The isometric view visually illustrates the physical interaction between the UAV, the ripple field it creates on the water surface, and underwater concealed obstacles, which forms the basis for the detection in this invention. The technical roadmap clearly reveals the logical steps of this invention in solving the technical problem. Specifically, the UAV and the ripple field below it together constitute the physical scene of step S2 (acquiring real-time environmental response data). The sunlight symbol in the figure and the first box "Environmental Prediction and Channel Awareness" in the roadmap together represent step S1 (acquiring the first prediction data), that is, making a forward-looking prediction of the environmental channel quality. The second box "Dynamic Benchmark Calibration and Data Processing" and the third box "Obstacle Identification and Attribute Analysis" in the roadmap correspond to the core processes of steps S3 and S4, that is, calibrating the benchmark model based on the prediction data (S1), processing real-time data (S2) to identify the ripple distortion caused by underwater obstacles (such as underwater geometry in the figure), and finally generating the detection result signal. The last box in the roadmap, "Linked Control and Strategy Optimization," corresponds to step S5, whereby the system generates linked control commands to adjust the UAV's flight state and energy field based on the prediction results (S1) and detection results (S4), thereby achieving closed-loop optimization.
[0082] A UAV airway inspection system based on a multimodal spatiotemporal prediction model includes: an environmental prediction data acquisition module, used to acquire first prediction data characterizing the future environmental state of the target inspection water area, wherein the first prediction data is configured to quantify the visual detection channel quality of each spatial location in the target inspection water area within a future time window. The real-time response data acquisition module is used to acquire real-time environmental response data generated by the UAV's own energy field on the water surface medium and collected by airborne sensors. The dynamic benchmark calibration module is used to calibrate the dynamic benchmark spatiotemporal prediction model used to parse real-time environmental response data based on the first prediction data, so as to generate a calibration benchmark spatiotemporal prediction model adjusted by environmental channel state. The detection, analysis and processing module is used to process real-time environmental response data based on the calibration benchmark spatiotemporal prediction model to generate detection result signals that characterize the probability and physical properties of hidden obstacles in the water. The linkage control generation module is used to generate and execute linkage control commands for adjusting the flight status and energy field intensity of the UAV based on the first prediction data and the detection result signal.
[0083] It should be noted that all calculation formulas in this application employ regression analysis, including but not limited to machine learning algorithms, to deeply analyze the collected relevant parameters and identify their natural trends and interrelationships. Specialized software, such as Python's Scikit-learn library or the R language, is used to automatically generate mathematical spatiotemporal prediction models that match the data. Then, the performance of the spatiotemporal prediction models is objectively evaluated through methods such as cross-validation, and continuous feedback and optimization are combined to ensure that the created formulas truly reflect the inherent laws of the data, thereby guaranteeing their effectiveness and accuracy. In all calculation formulas in this application, the parameters in each formula undergo dimensionless processing within a consistent range to ensure that different physical quantities are compared on the same scale; dimensionless processing techniques include, but are not limited to, min-max-normalization and Z-score standardization. The algorithm of this invention is implemented as a Python script. Before executing the core logic, the program first executes a data loading module (e.g., using the widely used pandas library in Python) configured to read the aforementioned spreadsheet file and load its contents into the program's working memory (e.g., a DataFrame data structure). Subsequent algorithm steps will directly query and retrieve the required configuration parameters from this in-memory data structure.
[0084] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.
Claims
1. A method for UAV airway inspection based on a multimodal spatiotemporal prediction model, characterized in that, The specific steps include: S1. Obtain first prediction data characterizing the future environmental state of the target inspection water area. The first prediction data is configured to quantify the visual detection channel quality of each spatial location in the target inspection water area within a future time window. S2. Acquire real-time environmental response data generated by the UAV's own energy field on the water surface and collected by airborne sensors; S3. Based on the first prediction data, the dynamic benchmark spatiotemporal prediction model used to analyze real-time environmental response data is calibrated to generate a calibrated benchmark spatiotemporal prediction model after environmental channel state adjustment. S4. Based on the calibration benchmark spatiotemporal prediction model, process real-time environmental response data to generate detection result signals that characterize the probability and physical properties of hidden obstacles in the water. S5. Based on the first prediction data and the detection result signal, generate and execute the linkage control command for adjusting the flight status and energy field intensity of the UAV.
2. The method for UAV airway inspection based on a multimodal spatiotemporal prediction model according to claim 1, characterized in that: S1 includes: The system acquires upstream hydrological data, regional meteorological data, and channel topology data covering the target inspection area through network interfaces. Furthermore, using a spatiotemporal prediction model, upstream hydrological data, regional meteorological data, and waterway topology data are processed to generate first prediction data. The first prediction data includes a spatiotemporal four-dimensional predicted glare index field, which assigns a value to the probability of specular reflection at each spatial coordinate point in the target inspection water area at each future moment.
3. The method for UAV airway inspection based on a multimodal spatiotemporal prediction model according to claim 1, characterized in that: S2 includes: The downwash airflow generated by the drone rotor creates a water surface ripple field on the surface of the target inspection area. Additionally, real-time video streams containing information about the morphology of water surface ripples are collected using visual sensors mounted on drones, serving as real-time environmental response data.
4. A method for UAV airway inspection based on a multimodal spatiotemporal prediction model according to claim 1, characterized in that: The steps for calibrating the dynamic reference spatiotemporal prediction model in S3 specifically include: based on the predicted glare index extracted from the predicted glare index field, which corresponds to the current position and time of the UAV, selecting from the preset spatiotemporal prediction model library or generating an ideal ripple field spatiotemporal prediction model corresponding to the water surface turbulence level represented by the predicted glare index through interpolation; and using the ideal ripple field spatiotemporal prediction model as the calibration reference spatiotemporal prediction model.
5. A method for UAV airway inspection based on a multimodal spatiotemporal prediction model according to claim 1, characterized in that: The steps for processing real-time environmental response data in S4 specifically include: performing pixel-level difference operations between the current frame image in the real-time video stream and the calibration benchmark spatiotemporal prediction model to generate a ripple field difference map. After generating the ripple field difference map, a feature processing step dynamically guided by the preset ideal ripple core parameters is executed. Using the calculated dominant wavelength and anisotropy, a set of spatial frequency matched filters is configured. The filters are precisely tuned to suppress residuals in the difference map that conform to the expected background ripple structure. Meanwhile, based on a complete benchmark model including attenuation coefficients, the expected optical flow field under ideal conditions is generated, and then vector subtraction is performed between it and the measured optical flow field calculated from the real-time video stream to obtain a motion residual field containing only abnormal motion patterns. By fusing the structural anomalous signal and the motion residual field, a spatiotemporal distortion fusion map is generated.
6. A method for UAV airway inspection based on a multimodal large spatiotemporal prediction model according to claim 5, characterized in that: The steps in S4 for processing real-time environmental response data also include: calculating the normalized ripple field distortion energy index of the spatiotemporal distortion fusion map; and generating a detection result signal indicating the presence of hidden obstacles when the normalized ripple field distortion energy index exceeds a preset distortion energy threshold.
7. A method for UAV airway inspection based on a multimodal large spatiotemporal prediction model according to claim 1, characterized in that: The steps for generating linkage control commands in S5 specifically include: calculating the target energy field intensity value through the optimal energy field intensity mapping function; wherein, the target energy field intensity value output by the optimal energy field intensity mapping function is composed of the linear superposition of the basic energy component and the channel compensation energy component; the basic energy component is set to a preset constant value; the channel compensation energy component increases nonlinearly, and the increase is exponentially amplified with the increase of flight altitude.
8. A method for UAV airway inspection based on a multimodal spatiotemporal prediction model according to claim 7, characterized in that: In S5, the linkage control command also includes a trajectory adjustment command for avoiding hidden obstacles identified by the detection result signal; when the first prediction data is lost or its confidence level is lower than the preset confidence level threshold, the linkage control command is forced to be set to a preset conservative detection strategy, which corresponds to a preset flight altitude and rotor speed under the worst operating condition based on historical statistics.
9. A method for UAV airway inspection based on a multimodal spatiotemporal prediction model according to claim 8, characterized in that: The method also includes step S6: after the linkage control command is executed, continuously monitor the temporal change of the normalized ripple field distortion energy index; extract the energy spectral density within the preset characteristic frequency range by performing a Fourier transform on the temporal sequence of the normalized ripple field distortion energy index; and classify and identify the material properties or motion state of the hidden obstacles by comparing the energy spectral density with a preset feature library that stores the correspondence between different obstacle materials and shaking modes.
10. A UAV airway inspection system based on a multimodal spatiotemporal prediction model, characterized in that: The system is used to perform the method of any one of claims 1-9, comprising: The environmental prediction data acquisition module is used to acquire first prediction data characterizing the future environmental state of the target inspection water area. The first prediction data is configured to quantify the visual detection channel quality of each spatial location in the target inspection water area within a future time window. The real-time response data acquisition module is used to acquire real-time environmental response data generated by the UAV's own energy field on the water surface medium and collected by airborne sensors. The dynamic benchmark calibration module is used to calibrate the dynamic benchmark spatiotemporal prediction model used to parse real-time environmental response data based on the first prediction data, so as to generate a calibration benchmark spatiotemporal prediction model adjusted by environmental channel state. The detection, analysis and processing module is used to process real-time environmental response data based on the calibration benchmark spatiotemporal prediction model to generate detection result signals that characterize the probability and physical properties of hidden obstacles in the water. The linkage control generation module is used to generate and execute linkage control commands for adjusting the flight status and energy field intensity of the UAV based on the first prediction data and the detection result signal.