A method and system for detecting marine red tide anomaly by fusing multi-source remote sensing and graph neural network

By integrating multi-source remote sensing with graph neural networks, the problems of insufficient multi-source data fusion and spatiotemporal dynamic feature extraction in red tide detection were solved, achieving high-precision red tide event detection and trend analysis, and improving the level of intelligence in red tide disaster response.

CN120656076BActive Publication Date: 2026-06-16SHANDONG MARINE RESOURCE AND ENVIRONMENT RESEARCH INSTITUTE (SHANDONG MARINE ENVIRONMENTAL MONITORING CENTER SHANDONG AQUATIC PRODUCTS QUALITY INSPECTION CENTER)

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANDONG MARINE RESOURCE AND ENVIRONMENT RESEARCH INSTITUTE (SHANDONG MARINE ENVIRONMENTAL MONITORING CENTER SHANDONG AQUATIC PRODUCTS QUALITY INSPECTION CENTER)
Filing Date
2025-06-19
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing red tide detection methods have weak multi-source data fusion capabilities and insufficient extraction of spatiotemporal dynamic features, making it difficult to achieve high-precision red tide early warning. Furthermore, they lack the ability to identify atypical samples and sudden abnormal events, leading to prediction lag and misjudgment.

Method used

This paper adopts a method that integrates multi-source remote sensing and graph neural networks. By acquiring remote sensing image data, UAV image data and monitoring point data, data preprocessing, feature extraction and feature fusion are performed to construct a spatiotemporal graph structure. A cross-modal contrastive self-supervised learning mechanism is used for consistency representation learning, and an attention mechanism is combined to perform heterogeneous feature fusion and joint representation. Finally, anomaly detection and early warning are performed.

🎯Benefits of technology

It significantly improves the detail and global perception capabilities of red tide feature modeling, enhances the understanding of the nonlinear coupling relationship between red tide inducing factors, realizes high-precision red tide event detection and trend analysis, and improves the intelligence and foresight of red tide disaster response.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120656076B_ABST
    Figure CN120656076B_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of red tide anomaly detection, and more particularly to a marine red tide anomaly detection method and system fusing multi-source remote sensing and graph neural networks. The method comprises: acquiring remote sensing image data, unmanned aerial vehicle image data and monitoring data of monitoring points; performing data preprocessing on the acquired remote sensing image data and unmanned aerial vehicle image data; performing feature extraction and feature fusion on the remote sensing image and the unmanned aerial vehicle image to obtain remote sensing feature data; constructing a space-time graph structure based on the monitoring data of the monitoring points to obtain graph structure data; performing consistent representation learning of remote sensing feature modalities and graph structure feature modalities based on a cross-modal contrast self-supervised learning mechanism; and introducing multi-source heterogeneous data and fusing a graph neural network modeling approach, thereby effectively breaking through the limitations of single data-driven methods in terms of coarse red tide recognition granularity and low spatio-temporal accuracy, and significantly improving the delicacy and global perception ability of red tide feature modeling.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of red tide anomaly detection technology, and in particular to a method and system for detecting marine red tide anomalies that integrates multi-source remote sensing and graph neural networks. Background Technology

[0002] With the increasing intensity of global climate change and near-shore human activities, the marine ecological environment faces increasingly severe challenges, and the frequent occurrence of harmful algal blooms such as red tides has attracted widespread attention. Red tides, as a typical marine disaster characterized by sudden onset, strong regionality, and rapid evolution, often cause reduced marine fisheries production, damage to coastal tourism, and ecological imbalance in aquatic bodies. In severe cases, they can even endanger human health and the safety of coastal infrastructure. To achieve early detection, dynamic tracking, and accurate warning of marine red tides, it is urgent to construct an intelligent anomaly detection technology system with high spatiotemporal resolution and strong environmental adaptability.

[0003] Existing red tide monitoring and detection methods mainly fall into three categories: First, traditional monitoring methods based on water quality factors rely on physicochemical parameters such as nutrients, chlorophyll, and temperature collected from fixed monitoring stations. Red tide risk is assessed using empirical formulas or threshold rules. While this method has a certain scientific basis, its spatial coverage is limited, making it difficult to meet the needs of dynamic monitoring across wide marine areas. Second, image recognition methods based on remote sensing images utilize visible light or infrared bands to invert ocean color anomalies for macroscopic identification. These methods offer advantages in terms of wide coverage and high frequency, but suffer from problems such as cloud cover, insufficient spatial resolution, and strong target heterogeneity. Third, data-driven machine learning methods train classifiers or prediction models based on historical images or monitoring data. While this can improve detection efficiency, it often relies on a large number of labeled samples and lacks sufficient spatiotemporal dynamic feature modeling capabilities, making it difficult to capture the complex causal relationships and cross-modal feature interactions in red tide evolution. In practical applications, these methods generally face problems such as prediction lag, regional misjudgment, and difficulties in cross-scale fusion, making it difficult to effectively support intelligent monitoring and scientific decision-making in the marine ecological environment.

[0004] Current red tide detection methods mainly rely on single remote sensing image processing, threshold-based early warning models, or static classification models trained on historical samples. These methods have the following prominent drawbacks: (1) They have weak multi-source data fusion capabilities, making it difficult to effectively integrate remote sensing images, UAV images, and red tide forecasting factors such as nutrients and temperature, resulting in one-sided information utilization and limited monitoring sensitivity and spatial coverage; (2) They lack a modeling mechanism for spatiotemporal evolution, and most existing methods are static analyses, making it difficult to capture the occurrence and development trend of red tides, resulting in delayed predictions and insufficient accuracy; (3) The models have insufficient ability to identify atypical samples and sudden abnormal events, and are prone to misjudgment or missed reporting when there are insufficient samples or significant changes in the scene, making it difficult to support the real-time and stable risk warning requirements. Therefore, it is urgent to propose an intelligent red tide detection system that integrates multi-source remote sensing images and red tide factor data and has spatiotemporal modeling and graph structure reasoning capabilities. By constructing a spatiotemporal heterogeneous graph and introducing a graph neural network for high-order information extraction, it can achieve accurate detection, trend analysis, and interpretable early warning of red tide events, thereby improving the system's comprehensive perception and intelligent response capabilities. Summary of the Invention

[0005] To address the challenges of multi-source data fusion, insufficient spatiotemporal dynamic feature extraction, and lack of cross-modal consistency modeling in marine red tide anomaly detection, this invention provides a method and system for marine red tide anomaly detection that integrates multi-source remote sensing and graph neural networks.

[0006] In a first aspect, the present invention provides a method for detecting marine red tide anomalies that integrates multi-source remote sensing and graph neural networks, employing the following technical solution:

[0007] A method for detecting marine red tide anomalies that integrates multi-source remote sensing and graph neural networks includes:

[0008] Acquire remote sensing image data, UAV image data, and monitoring data from monitoring points;

[0009] Data preprocessing is performed on the acquired remote sensing image data and UAV image data;

[0010] Feature extraction and feature fusion are performed on remote sensing images and UAV images to obtain remote sensing feature data;

[0011] A spatiotemporal graph structure is constructed based on the monitoring data from the monitoring points to obtain graph structure data;

[0012] Learning the consistency representation of remote sensing feature modalities and graph structure feature modalities based on a cross-modal contrastive self-supervised learning mechanism;

[0013] Heterogeneous feature fusion and joint representation based on consistency learning;

[0014] Anomaly detection and early warning based on fusion characterization results.

[0015] Furthermore, the data preprocessing of the acquired remote sensing image data and UAV image data includes first performing physical and geometric consistency processing on the remote sensing images and UAV images, and uniformly standardizing all factor data; then performing spatiotemporal alignment and missing data completion, including aligning the image data and factor data in time according to timestamps; for missing monitoring point data caused by buoy offline, spatial weighted interpolation is used for completion; a time-series-based frame interpolation and image completion method is introduced, and an optical flow-guided image reconstruction method is used to predict and complete the content of frames obscured by fog. Let I be the number of image frames acquired by the UAV in continuous time. t-1 with I t+1 The goal is to reconstruct the missing intermediate image frame I. t First, calculate the forward optical flow F between the two frames. t-1 →t+1 and reverse optical flow F t+1 →t-1, using bidirectional optical flow and temporal weight α, the reconstructed value of a pixel location in the intermediate frame is estimated, expressed as:

[0016] Where (x,y) represents a pixel position in the image, (u,v) represents the motion vector of the pixel from time point a to b, and F a→b This represents the optical flow field from frame a to frame b, where u1 and v1 represent the optical flow field from frame I. t-1 To I t+1 Optical flow, u2, v2 from I t+1 To I t-1 The optical flow.

[0017] Furthermore, the feature extraction and feature fusion of remote sensing images and UAV images includes, in the time series feature extraction branch, setting a UAV image sequence of length T as { X u} u=1 T Each frame of image X u For multispectral image data at time t, firstly, a lightweight convolutional module is used to extract local perceptual features. Then, the self-attention mechanism in the Transformer architecture is introduced to model the dynamic dependencies between image sequences to obtain the behavioral patterns of key regions in the time dimension, and finally obtain a sequence-level dynamic embedding representation. In the spatial high-resolution feature extraction branch, the input is a single remote sensing image. Multi-level spatial semantic features are extracted through a pre-trained multi-scale residual network ResNet-50 + ASPP. At the same time, in order to highlight potential abnormal regions in the image, a spatial attention mechanism is introduced to recalibrate the feature distribution.

[0018] Furthermore, the construction of the spatiotemporal graph structure based on the monitoring data of the monitoring points includes constructing a dynamic adjacency graph based on a sliding time window. At each time step t, based on the historical sequence {t-w+1,…,t} with a window length of w, an adjacency matrix is ​​constructed using feature similarity. A t This approach captures the dynamic connection strength between nodes at the current moment. Addressing the strong volatility of marine environmental variables across different time scales, a sliding window decomposition mechanism is introduced to divide long-term time series data into multiple sliding short sequences. This extracts short-term dynamic features and avoids long-term stable trends masking sudden anomalies. At each time step t, a node feature tensor under the current sliding window is constructed. S t ∈ R w×N×F The graph is then subjected to convolution operations; finally, to accommodate the differences in node characteristics, a node-level personalized parameter vector is constructed. θ i A personalized weight adjustment mechanism is introduced, which is expressed as:

[0019] ,

[0020] in, This represents the final graph convolution output of node i at time t. The trainable individual weights representing node i reflect its preference for factor responses. Represents element-wise multiplication. This represents the input features of node j at time t. Representative node i In time t The set of adjacent nodes.

[0021] Furthermore, the method for learning consistent representations of remote sensing feature modalities and graph structure feature modalities based on cross-modal contrastive self-supervised learning includes, to achieve consistent mapping between the two modalities in the global semantic space, projecting the two modalities into a shared semantic space, and constructing a global modality alignment target using InfoNCE loss based on contrastive learning, as follows:

[0022] ,

[0023] Where sim(a,b) represents the cosine similarity. It is a temperature coefficient used to adjust the smoothness of the distribution. This represents a set of negative sample graph structures containing different time steps. This represents the feature vector generated by the spatiotemporal graph neural network. This represents the feature vector generated by the dual-branch encoder.

[0024] Furthermore, the method for learning consistent representations between remote sensing feature modalities and graph structure feature modalities based on cross-modal contrastive self-supervised learning also includes introducing a local semantic alignment mechanism to enhance the semantic alignment capability between modalities at a fine-grained level. Specifically, this involves dividing the remote sensing image into K fixed spatial window regions, and extracting local embedding representations for each region using a convolutional encoder. Meanwhile, the embedding representation of each node in the graph structure is as follows: The goal is to enable each local region of the image to find the semantically closest site node in the graph structure, constructing cross-modal local alignment pairs. The alignment loss is expressed as:

[0025] ,

[0026] in, v sim(a,b) represents a node in the graph structure, and sim(a,b) represents the similarity function.

[0027] Furthermore, the heterogeneous feature fusion and joint representation based on consistency learning includes introducing a modal attention mechanism to adaptively adjust the fusion weights of different modal features according to task relevance, wherein the image modal features are denoted as h. I ∈R d The graph structure modal features are h G ∈R d The fusion weights are calculated using a shared attention network, and are expressed as follows:

[0028] Where W is the learnable attention parameter, h I h represents the modal features of an image. G Represents the modal features of the graph structure. The final fused representation is: .in, and This represents the fusion weight.

[0029] Furthermore, the heterogeneous feature fusion and joint representation based on consistency learning also includes employing a semantic channel attention mechanism to enhance the semantic dimension highly correlated with red tides in the fused features. This mechanism performs weighted adjustments on the fused joint representation along the channel dimension. Let the fused features... h fused The channel attention weights are constructed using a Squeeze-and-Excitation mechanism, and the semantic dimension is added to the fused features to obtain the features. h recon , will feature h recon The input is fed into the Transformer-based spatiotemporal modeling module to construct a joint representation sequence of multiple time steps and multiple modalities. The fused feature sequence representing the past T time steps is input into the temporal encoder and represented as:

[0030] ,

[0031] in, This represents the fused dynamic joint spatiotemporal feature sequence, which serves as the input to the downstream red tide early warning and spatial reasoning module.

[0032] Furthermore, the anomaly detection and early warning based on the fusion representation results includes employing an unsupervised anomaly detection method based on an autoencoder, where the input joint features are... ∈R d A self-encoder is composed of an encoder. and decoder The system is structured by modeling the normal state by minimizing the reconstruction error, and detecting anomalies based on the reconstruction error determined by a set threshold. It also uses historical fusion feature sequences to train a time-series prediction model to predict the feature representation of the next moment or several future moments, and further predicts the probability of red tide occurrence.

[0033] Secondly, a marine red tide anomaly detection system integrating multi-source remote sensing and graph neural networks includes:

[0034] The data acquisition module is configured to acquire remote sensing image data, UAV image data, and monitoring data from monitoring points.

[0035] The preprocessing module is configured to perform data preprocessing on the acquired remote sensing image data and UAV image data;

[0036] The remote sensing feature module is configured to extract and fuse features from remote sensing images and UAV images to obtain remote sensing feature data.

[0037] The graph structure module is configured to construct a spatiotemporal graph structure based on the monitoring data of the monitoring points to obtain graph structure data.

[0038] The consistency module is configured to learn the consistency representation of remote sensing feature modalities and graph structure feature modalities based on a cross-modal contrastive self-supervised learning mechanism.

[0039] The joint module is configured to perform heterogeneous feature fusion and joint representation based on consistency learning;

[0040] The early warning module is configured to perform anomaly detection and early warning based on the fusion representation results.

[0041] Thirdly, the present invention provides a computer-readable storage medium storing a plurality of instructions adapted for loading and execution by a processor of a terminal device of the method for detecting marine red tide anomalies by integrating multi-source remote sensing and graph neural networks.

[0042] Fourthly, the present invention provides a terminal device, including a processor and a computer-readable storage medium, wherein the processor is used to implement various instructions; the computer-readable storage medium is used to store multiple instructions, the instructions being adapted to be loaded and executed by the processor as described in the method for detecting marine red tide anomalies by integrating multi-source remote sensing and graph neural networks.

[0043] In summary, the present invention has the following beneficial technical effects:

[0044] Compared with existing technologies, the marine red tide spatiotemporal modeling and early warning system based on remote sensing images, UAV images and red tide forecast factor data constructed by the present invention has the following beneficial effects: by introducing multi-source heterogeneous data (including remote sensing images, UAV monitoring images and various environmental factors) and integrating graph neural network modeling methods, it effectively overcomes the limitations of single data-driven methods in terms of coarse red tide identification granularity and low spatiotemporal accuracy, and significantly improves the detail and global perception capability of red tide feature modeling.

[0045] By constructing a cross-modal contrastive self-supervised learning mechanism, unified semantic alignment between image modalities and graph structure modalities is achieved, enhancing the model's ability to understand the potential correlation patterns of multimodal data. Combining the attention mechanism and joint representation module, the nonlinear coupling relationship between red tide inducing factors is further explored, improving the interpretability and reliability of red tide triggering mechanism modeling. On this basis, anomaly detection and time series prediction modules are introduced, which can perform high-precision dynamic extrapolation of the occurrence and evolution trend of marine red tides and realize visualized early warning, effectively improving the intelligence and foresight level of red tide disaster response. Attached Figure Description

[0046] Figure 1 This is a schematic diagram of a marine red tide anomaly detection method integrating multi-source remote sensing and graph neural networks according to Embodiment 1 of the present invention;

[0047] Figure 2 This is a schematic diagram comparing the ACC and F1 of various red tide anomaly detection models in Embodiment 1 of the present invention;

[0048] Figure 3 This is a radar comparison diagram of different models for red tide anomaly detection in Embodiment 1 of the present invention;

[0049] Figure 4 This is a thermal schematic diagram of the spatial distribution of red tide anomaly detection according to Embodiment 1 of the present invention;

[0050] Figure 5 This is a schematic diagram of the red tide anomaly detection concentration time-series prediction curve in Embodiment 1 of the present invention. Detailed Implementation

[0051] The present invention will be further described in detail below with reference to the accompanying drawings.

[0052] Example 1

[0053] Reference Figure 1 This embodiment of a method for detecting marine red tide anomalies by integrating multi-source remote sensing and graph neural networks includes:

[0054] A method and system for detecting marine red tide anomalies by integrating multi-source remote sensing images, UAV images, and red tide forecasting factors is presented. The overall system design consists of six core modules, specifically: an image and monitoring data preprocessing module for standardizing and registering remote sensing images, UAV images, and factor data; a dual-branch feature encoding module for extracting dynamic and spatial features from time-series images and high-resolution images, respectively; a spatiotemporal graph structure modeling module for constructing a spatiotemporal graph structure integrating factors such as water temperature, salinity, and pH, using the monitoring area as graph nodes; a cross-modal contrastive self-supervised learning module for achieving consistent representation learning of image features and graph structure features; an attention fusion and joint representation module for weighted fusion of multimodal features to enhance the expressive ability of red tide induction information; and an anomaly detection and prediction early warning module for identifying and predicting trends of red tide anomalies based on fused features, and realizing visualized early warning output.

[0055] S1. Image and Monitoring Data Preprocessing Module

[0056] In the task of detecting marine red tide anomalies, the raw data comes from multiple sources, mainly including satellite remote sensing images with high spatial resolution but slow temporal updates, UAV patrol images with high temporal resolution but limited coverage, and red tide forecast factor data from monitoring buoys or sensors, such as water temperature, salinity, pH, and chlorophyll-a concentration. The heterogeneity, scale inconsistencies, and potential missing data of this data severely limit the accuracy of subsequent feature extraction and modeling. Therefore, this module aims to standardize, spatiotemporally register, and structurally transform all the data to form a standardized dataset that can be used as input for graph modeling and deep learning. The specific processing flow includes the following three parts:

[0057] 1) Preprocessing of remote sensing and UAV images: This involves addressing the physical and geometric consistency issues between remote sensing and UAV images. Remote sensing images are often affected by the atmosphere, clouds, solar altitude angle, etc., requiring radiometric correction to restore the true reflectance of ground features. The raw digital values ​​(DN values) of the remote sensing images are converted into the radiance received by the sensor. .

[0058] ,

[0059] Where DN represents the digital value recorded by the sensor, and G represents the sensor's calibrated gain. B This represents the offset of the sensor calibration. Furthermore, atmospheric correction aims to convert the reflectance of the surface-atmosphere system (i.e., the value seen in the remotely sensed image) into the true surface reflectance, eliminating interference from atmospheric aerosols, molecular scattering, water vapor, and other factors on the remotely sensed image. It also improves the spectral consistency of images acquired at different times.

[0060] ,

[0061] in, Represents surface reflectance. This represents the reflectance of the top of the atmosphere obtained from remote sensing imagery. Represents path reflectivity. and Represents the transmittance along the solar incident path and the sensor observation path. S This represents the atmospheric scattering term.

[0062] Due to the low sensor altitude and variable viewing angle, UAV images exhibit significant geometric distortion. Therefore, geometric distortion correction and orthorectification are necessary to ensure alignment with remote sensing images in a unified coordinate system. For spatial alignment, to achieve high-precision registration between remote sensing and UAV images, a feature point matching algorithm is used to extract and align key regions between the images, ensuring a consistent spatial reference frame for different modalities. Feature points are extracted from both remote sensing and UAV images using the Scale Invariant Feature Transform (SIFT) algorithm to extract local image feature descriptors that are scale- and rotation-invariant. Let image X... r Represents a remote sensing image, X u ={X u1 X u2 , ..., X uT} represents a drone image, and the extracted feature point sets are as follows:

[0063] ,

[0064] in, f ri , fuj This indicates the first extracted from remote sensing or drone images. i , j Each feature point contains attributes such as position, scale, and orientation, as well as a descriptor vector. d i The similarity between two feature vectors is calculated using Euclidean distance to obtain an initial set of matching pairs. Due to issues such as viewpoint differences, occlusion, and duplicate textures during feature matching, the matching pairs may contain a large number of incorrect matching points. Therefore, the RANSAC (Random Sample Consensus) algorithm is introduced to perform robust filtering and geometric transformation model estimation on the initial matching pairs, evaluating the matching consistency of each point.

[0065] ,

[0066] Where, x i x represents a feature point in a remotely sensed image. i ′ represents the corresponding point in the drone image. H Let X represent the homography transformation matrix to be estimated, which maps the coordinates of the remote sensing image to the UAV image, and ||·|| represent the Euclidean distance. Let the remote sensing image be X. r The time series image of the drone is X u ={X u1 X u2 , ..., X uT The image undergoes spatial resolution unification (using bilinear interpolation for resampling) to ensure consistent input dimensions for downstream models. Finally, grayscale normalization (0-1 normalization) and image enhancement (contrast stretching) are performed to improve feature representation capabilities.

[0067] 2) Time-series standardization of red tide factor data: Monitoring factor data is typically collected periodically by equipment such as buoys and marine sensors. It has high temporal granularity and diverse dimensions, making it a crucial driving data source for red tide anomaly detection. These factors include water temperature (T), salinity (S), pH value, and chlorophyll concentration (Chl), among others. Their value ranges and trends vary, and directly inputting them into the model can easily lead to a bias in the training process towards features with larger numerical scales. Therefore, it is necessary to standardize all factor data uniformly beforehand.

[0068] Let the factor vector of the i-th monitoring point at time t be F(x). i t ) = [T i t , S i t pH i t , Chl it The Z-score standardization method is used to convert the data into a distribution with a mean of 0 and a variance of 1 using the following formula:

[0069] ,

[0070] in, and Let be the mean and standard deviation of the i-th factor in all the monitoring data, respectively, to ensure that the weights of each factor are on the same order of magnitude during the training phase and to avoid gradient update bias.

[0071] In addition, to enhance the stationarity of time series, difference operations can be introduced to reduce the interference of periodic fluctuations on the modeling results.

[0072] ,

[0073] Where, x t x represents the observations of the original time series at time t. t-1 y represents the observed value at the previous moment. t This represents the sequence value after differencing.

[0074] 3) Spatiotemporal alignment and missing data completion: Due to complex observation environments and poor equipment stability, multi-source data often suffers from temporal inconsistencies and spatial missing data. To ensure the spatiotemporal alignment of the inputs to the graph neural network and the time series model, the following three steps are required:

[0075] Time synchronization aligns image data and factor data based on timestamps. For example, if remote sensing images are acquired daily while factor data is recorded hourly, a time window can be set to extract factor values ​​closest to the image time or perform linear interpolation.

[0076] ,

[0077] Spatial interpolation for data completion: For monitoring point data loss due to reasons such as buoy offline, spatial weighted interpolation is used for completion. Let a certain monitoring point... i The missing values ​​are From its neighboring area Valid data is completed using distance-weighted augmentation:

[0078] ,

[0079] Image occlusion handling, such as image information loss due to fog or other obstructions, can enhance image data integrity through frame interpolation algorithms based on image sequences (e.g., optical flow-based image completion). This paper introduces a time-series-based frame interpolation and image completion method, employing optical flow-guided frame interpolation to predict and complete the content of fog-obstructed frames. Assume that the number of image frames acquired by the UAV in continuous time is I. t-1 with I t+1 The goal is to reconstruct the missing intermediate image frame I. t First, calculate the forward optical flow F between the two frames. t-1 →t+1 and reverse optical flow F t+1 →t-1, defined as follows:

[0080] ,

[0081] Where (x,y) represents a pixel position in the image, (u,v) represents the motion vector (optical flow) of the pixel from time point a to b, and F a→b This represents the optical flow field from frame a to frame b. Using bidirectional optical flow and temporal weight α, the reconstructed value of a pixel location in an intermediate frame can be estimated.

[0082] Where u1 and v1 represent from I t-1 To I t+1 Optical flow, u2, v2 from I t+1 To I t-1 The optical flow.

[0083] S2. Dual-branch feature encoding and spatiotemporal fusion module

[0084] In the anomaly detection task of marine red tides, different modalities of images contain different dimensions of information: remote sensing images have the advantage of high spatial resolution, which can capture large-scale spatial structures and regional anomalies; while UAV images have the advantage of high temporal resolution, which can reflect dynamic changes in local areas in a timely manner. To fully explore the complementary information contained in these two types of images, the system designs a dual-branch feature encoding and spatiotemporal fusion module, which constructs a time series feature branch and a spatial texture feature branch respectively, and achieves spatiotemporal semantic fusion through a cross-scale attention mechanism.

[0085] 1) In the time series feature extraction branch, a UAV image sequence of length T is defined as { X u} u=1 T Each frame of image X uThis represents the multispectral image data at time t. To preserve long-range dependencies and regional variation trajectories within the sequence, a lightweight convolutional module is first used to extract local perceptual features:

[0086] ,

[0087] in, f t ∈R d For the first t Feature representation of a frame image θ These are the network parameters. Subsequently, the self-attention mechanism from the Transformer architecture is introduced to model the dynamic dependencies between image sequences, in order to obtain the temporal behavior patterns of key regions. The attention weights are defined and calculated as follows:

[0088] ,

[0089] Where Q,K,V∈R T×dk These represent the query, key, and value matrices, respectively. d k For the feature dimension, Softmax ensures the normalization of attention weights. The final sequence-level dynamic embedding representation H={ h t} t=1 T , as the encoding of regional dynamic behavior.

[0090] 2) In the spatial high-resolution feature extraction branch, the input is a single remote sensing image, and multi-level spatial semantic features are extracted through a pre-trained multi-scale residual network (ResNet-50 + ASPP):

[0091] ,

[0092] in, g ∈R d As a spatial structural feature, For network parameters, the ASPP module is used to expand the receptive field to perceive the region context. To highlight potentially anomalous regions in the image (such as the edges of red tide patches or abrupt changes), a spatial attention mechanism is introduced to recalibrate the feature distribution:

[0093] ,

[0094] in, W 1, W 2 represents the weight matrix of the attention module. b 1, b 2 is the bias term, σ is the Sigmoid function, and ⊙ represents element-wise multiplication, which achieves explicit enhancement of the salient region.

[0095] 3) In the spatiotemporal fusion stage, to achieve cross-modal semantic alignment, dynamic features h are... t Static features g In a unified embedding space, the vectors are concatenated and then input into a fully connected network to achieve nonlinear remapping.

[0096] ,

[0097] Among them, z t This represents the fused multimodal spatiotemporal representation. To further enhance the discriminative power of the fused features in prediction tasks, gating mechanisms or cross-attention modules can be introduced to achieve semantic enhancement.

[0098] ,

[0099] Here, Gate represents a gating unit, which is used to adaptively adjust the importance weights of each modality feature in different scenarios.

[0100] S3. Spatiotemporal Graph Structure Modeling Module

[0101] In the spatiotemporal evolution of marine red tides, various monitoring data (such as water temperature, salinity, and pH) are typically collected continuously by multiple automated monitoring stations distributed across different sea areas. These monitoring stations have topological connections in geospatial space, and the collected data exhibits obvious time-series characteristics. Therefore, constructing a spatiotemporal graph structure model is crucial to accurately characterize the dependency structure of each factor in its spatial propagation and temporal evolution. The core objective of this module is to construct a spatial topological graph by integrating hydrological environmental factors, using monitoring stations as graph nodes, and then achieving high-dimensional dynamic modeling of the marine environment based on a graph neural network (GNN) and a temporal modeling module.

[0102] 1) Dynamic graph modeling: Constructing a dynamic adjacency graph based on a sliding time window. To address the issue of the evolution of influence relationships between monitoring nodes over time, this module employs a sliding time window to construct a dynamic graph structure. Specifically, at each time step t, based on the historical sequence {t-w+1,…,t} with a window length of w, an adjacency matrix is ​​constructed using feature similarity. A t This captures the dynamic connection strength between nodes at the current moment. The edge weight calculation formula is:

[0103] ,

[0104] in, F represents the feature matrix of node i over the past w time steps, where F is the factor dimension. ||·|| denotes the L2 norm, which measures the Euclidean distance between the feature sequences of nodes. Representative node i and jAt any moment t Dynamic similarity edge weights N Represents the total number of nodes in the graph. Dynamic adjacency matrix. A t This reflects the time-varying structural dependencies between nodes, providing a structural foundation for subsequent graph convolution operations.

[0105] 2) Sliding window decomposition mechanism for modeling short-term dynamic changes. Considering the strong volatility of marine environmental variables across different time scales, this module introduces a sliding window decomposition mechanism to divide long-term time-series data into multiple sliding short sequences to extract short-term dynamic features and avoid long-term stable trends masking sudden anomalies. At each time step t, the node feature tensor under the current sliding window is constructed. S t ∈ R w×N×F And input the graph convolution operation:

[0106] ,

[0107] Among them, H t ∈R N×d The denot represents the encoded output of the node at time t, where d is the output dimension. D t represent A t The degree matrix, corresponding to A t ( i , j The weighted sum of ). W t The learnable weight matrix of a graph convolutional layer. This represents a non-linear activation function (such as ReLU). S t The sliding window represents the sequence of node features, which serves as the input signal. The sliding mechanism enhances the model's ability to perceive abrupt trends (such as rapid increases in ocean factors) and seasonal cyclical changes, thereby improving the model's sensitivity and robustness.

[0108] 3) Personalized graph convolutional modeling to adapt to differences in node responses. Different monitoring stations exhibit significant differences in their response mechanisms to red tide factors due to variations in their location (e.g., nearshore, deep sea, estuary) or sensor deployment conditions. To model this heterogeneity, node-level personalized parameter vectors are designed. θ i Introduce a personalized weight adjustment mechanism:

[0109] ,

[0110] in, This represents the final graph convolution output of node i at time t. The trainable individual weights representing node i reflect its preference for factor responses. Represents element-wise multiplication. This represents the input features of node j at time t. Representative node i In time t The set of adjacent nodes.

[0111] S4. Cross-modal contrastive self-supervised learning module

[0112] In the spatiotemporal modeling of red tides using multi-source information fusion, remote sensing images and graph-structured data constructed from marine monitoring stations exhibit significant differences in multiple aspects, including data modality, sensing method, temporal resolution, and spatial density. This modal heterogeneity not only increases the difficulty of cross-modal learning but also leads to problems of information redundancy and representation inconsistency. Traditional fusion strategies often rely on supervised learning methods, which are limited by the scarcity of red tide occurrence samples and the difficulty of manual annotation, making it difficult to obtain sufficient training signals. Therefore, this module introduces a cross-modal contrastive self-supervised learning mechanism to drive the consistent representation learning between remote sensing image modalities and monitoring graph-structured data in an unsupervised manner, thereby improving the robustness and generalization ability of downstream spatiotemporal prediction and inference.

[0113] 1) Global Modality Alignment: In red tide early warning systems, remote sensing image data typically contains large-scale marine environmental features, while graph-structured data consists of multiple monitoring stations, reflecting local numerical indicators such as water temperature, salinity, pH, and chlorophyll concentration. To achieve consistent mapping between the two modalities in the global semantic space, the two modalities are projected into a shared semantic space, and a global modality alignment objective is constructed using InfoNCE loss based on contrastive learning.

[0114] ,

[0115] Where sim(a,b) represents the cosine similarity. It is a temperature coefficient used to adjust the smoothness of the distribution. This represents a set of negative sample graph structures containing different time steps. This represents the feature vector generated by the spatiotemporal graph neural network. This represents the feature vector generated by the dual-branch encoder. Representative sample set The eigenvectors in the model.

[0116] 2) Red tide formation exhibits significant spatial heterogeneity, often concentrating in certain sea areas or near local stations. Therefore, relying solely on global embedding may mask the semantic responses of key local regions. This module further introduces a local semantic alignment mechanism to enhance the fine-grained semantic alignment capability between modalities. Specifically, the remote sensing image is divided into K fixed spatial window regions (e.g., 16×16 patches), and each region's local embedding representation is extracted using a convolutional encoder. Meanwhile, the embedding representation of each node in the graph structure is as follows: The goal is to enable each local region of an image to find a semantically closest site node within the graph structure, constructing cross-modal local alignment pairs. The alignment loss takes the following form:

[0117] ,

[0118] in, v sim(a,b) represents a node in the graph structure, and sim(a,b) represents the similarity function.

[0119] This loss encourages the local spatial structure of the image to obtain the most similar semantic mapping in the graph node space, so that the local features are consistent and the model’s ability to perceive and generalize to local red tide change areas is enhanced.

[0120] 3) Region-aware negative sampling: In cross-modal contrastive learning, the quality of negative samples directly determines the effectiveness of the learning signal. If negative samples are too similar to positive samples, it will lead to blurred learning objectives, slow convergence, and even gradient vanishing. Therefore, a region-aware negative sampling strategy is designed. When constructing image negative samples and graph structure negative samples, the joint constraints of spatial distance and graph topological distance are considered to eliminate semantically similar pseudo-negative samples. Let the positive sample come from image region p. + The corresponding graph structure node is v + Then the candidate set of the negative sample region is {p} - The effective negative sample set is defined as follows:

[0121] Among them, ||p - -p + ||2 represents the Euclidean distance in the image pixel space, GraphDist(v + ,v - The distance represents the shortest path graph distance between sites, and δ and η represent the minimum spatial distance and minimum structural distance thresholds, respectively. Combining global modality alignment, local semantic alignment, and a region-aware negative sampling mechanism, the final joint training objective for cross-modal contrastive self-supervised training is as follows:

[0122] ,

[0123] Wherein, λ1 and λ2 are weighting coefficients used to balance the representation alignment loss at different scales.

[0124] S5. Attention Fusion and Joint Representation Module

[0125] The formation and evolution of red tides are often driven by multiple environmental factors, including changes in sea surface temperature, salinity, nutrient concentration, sunlight, wind, and hydrodynamics. Remote sensing images and monitoring station map data capture the apparent morphology and triggering mechanisms of red tides from different dimensions. Previous modules have completed deep feature extraction and cross-modal consistency learning for each modality; however, effectively integrating these heterogeneous features and performing joint representation remains crucial to the accuracy of red tide prediction and the model's generalization ability.

[0126] Modal attention-weighted fusion addresses the dynamic differences in how remotely sensed images and monitoring maps express red tide information. At certain times, image information may be more sensitive (e.g., clearly showing red tide staining), while at other times, map structure data (e.g., nutrient changes) may reflect the potential trend of red tides earlier. Therefore, a modal attention mechanism is first introduced to adaptively adjust the fusion weights of different modal features based on task relevance. Let h be the image modal features. I ∈R d The graph structure modal features are h G ∈R d The fusion weights are calculated using a shared attention network:

[0127] Where W is the learnable attention parameter, h I h represents the modal features of an image. G Represents the modal features of the graph structure. The final fused representation is:

[0128] ,

[0129] in, and This mechanism represents the fusion weights. Based on the intensity of modal information expression at different time points, it dynamically allocates the contribution ratio of each modality to the final fusion features, effectively improving the flexibility and discriminativeness of red tide cause characterization.

[0130] To further enhance the semantic dimensions highly correlated with red tides (such as sudden water temperature rise, high chlorophyll areas, and abnormal reflectance regions) in the fused features, a semantic channel attention mechanism is adopted to weight the joint representation after fusion along the channel dimension. This mechanism guides the model to focus on key semantic channels by learning the importance weights between feature channels. The Squeeze-and-Excitation (SE) mechanism is used to construct channel attention weights, incorporating the semantic dimension into the fused features to obtain... hrecon :

[0131] ,

[0132] in, W 1. W 2 represents the weights for dimensionality reduction and dimensionality increase. Represents the Sigmoid function. This represents element-wise multiplication. Through this mechanism, the model can automatically suppress redundant channels unrelated to red tides (such as cloud interference, shoreline background, etc.) and enhance the response strength of key channels, thereby improving overall semantic consistency and predictive discriminative power.

[0133] 3) Spatiotemporal Joint Representation Construction: Red tides, as a complex spatiotemporal evolution phenomenon, exhibit continuity and coupling in their spatial expansion and temporal evolution. Relying solely on single frames or local information is insufficient to form a complete predictive perspective. Therefore, feature fusion is crucial. h recon The data is further input into a Transformer-based spatiotemporal modeling module to construct a joint representation sequence of multiple time steps and multiple modalities. Let... This represents the fused feature sequence from the past T time steps, which is input into the temporal encoder (TemporalTransformer):

[0134] ,

[0135] in, This represents the fused dynamic joint spatiotemporal feature sequence, which serves as the input for the downstream red tide early warning and spatial reasoning module. This joint feature not only integrates information from both remote sensing and monitoring map structures but also explicitly models their coupling relationship along the time axis, contributing to the formation of a complete evolutionary chain representation.

[0136] S6. Anomaly Detection and Prediction Early Warning Module

[0137] In the marine red tide monitoring system, to achieve real-time identification and trend prediction of abnormal red tide events, this module uses the multimodal joint features obtained from the fusion of previous modules as input. Through deep learning and statistical anomaly detection algorithms, it constructs a comprehensive early warning platform integrating anomaly detection, trend prediction, and visual early warning. This module can not only capture sudden anomalies in the spatiotemporal distribution of red tides but also predict the trends of potential future red tide events, providing a scientific basis and intuitive display for prevention and control decisions. Specifically, it includes the following three parts:

[0138] 1) Anomaly detection, in feature fusion Building upon this foundation, the anomaly detection module aims to identify whether red tide anomalies exist at the current moment. Therefore, an unsupervised anomaly detection method based on an autoencoder is employed. Let the joint features of the input be... ∈R d A self-encoder is composed of an encoder. and decoder The structure involves modeling the normal state by minimizing the reconstruction error:

[0139] ,

[0140] in, For reconstructing features. The reconstruction error is given in the form of mean squared error (MSE):

[0141] ,

[0142] The assumption of anomaly detection is that, under normal conditions, the model can reconstruct input features well, but the reconstruction error will increase significantly when anomalies occur. Let the threshold ϵ be a preset upper limit for reconstruction error, then if

[0143] ,

[0144] An abnormal event is detected. This method utilizes an unsupervised learning autoencoder to capture the inherent patterns in the data, enabling the system to accurately identify anomalies even in the absence of labeled samples.

[0145] 2) Trend Prediction: To achieve dynamic prediction of red tide evolution trends, this module further constructs a neural network model based on time series prediction, such as a Long Short-Term Memory (LSTM) time series model. Based on this, the time series prediction model is trained using historical fusion feature sequences to predict the feature representation at the next moment or several future moments, and further predict the probability of red tide occurrence. Assume the prediction model is F... pred (·), the fusion feature for predicting the future time T+1 is represented as:

[0146] ,

[0147] Next, the fused feature output by the prediction model is mapped to the red tide anomaly probability:

[0148] ,

[0149] Here, σ(·) is the Sigmoid function, which maps the output to the interval [0,1], representing the probability of a red tide event occurring. When the probability exceeds a preset probability threshold γ, the occurrence of an abnormal red tide event is predicted, and future trend prediction information is output.

[0150] After anomaly detection and trend prediction are completed, the system needs to present the detection results to decision-makers and users in an intuitive way. The early warning visualization module will visualize information such as anomaly distribution, future trends, and predicted probabilities in the form of charts, time series curves, etc.

[0151] Experimental verification

[0152] To verify the effectiveness of the marine red tide spatiotemporal modeling and early warning system proposed in this study, which is constructed based on remote sensing images, UAV images, and red tide forecasting factor data, this paper builds an experimental platform for field scene simulation and multi-source data-driven analysis in a typical nearshore high-incidence red tide area. Multimodal data, including remote sensing images, high-resolution low-altitude UAV images, and temperature, salinity, pH, and chlorophyll concentration from monitoring buoy stations, are collected, totaling 11,000 sets of data. This covers the complete red tide evolution cycle (occurrence-development-regression) and exhibits rich spatial and temporal variation characteristics. To enhance the experiment's relevance to actual early warning needs, complex environmental variables such as typhoon interference, cloud cover, degraded remote sensing image quality, and missing data from some monitoring nodes are introduced during the design process to comprehensively test the model's robustness and generalization ability under uncertain scenarios.

[0153] The comparison method selected representative red tide detection and prediction models, including ConvLSTM based on convolutional and temporal modeling fusion, ASTGCN combined with graph attention mechanism, Multi-Modal Fusion Net (MMFN) with multimodal input, and the proposed Cross-Modal Matching Transformer (CMMT). All models were compared under a unified dataset partition (training:validation:test = 6:2:2), consistent optimization strategy, and number of training epochs to ensure fairness in the evaluation. Performance evaluation metrics included six indicators: accuracy (ACC), F1-Score, spatial positioning error, warning lead time, false alarm rate, and model inference latency. The models were comprehensively evaluated from three dimensions: detection accuracy, timeliness, and deployment feasibility. Experimental results are as follows: Figure 2 , Figure 3 As shown in Table 1, the proposed method outperforms the comparative model in all indicators, verifying its significant advantages in multimodal modeling and red tide prediction tasks.

[0154] Table 1. Comparison of data from different methods under six major indicators.

[0155]

[0156] from Figure 2 , Figure 3As shown in Table 1, traditional methods such as Transformer, ASTGCN, MMFN, CMMT, and ConvLSTM all demonstrated certain predictive capabilities in red tide anomaly detection, but still showed significant shortcomings in key performance dimensions. Transformer has advantages in global modeling, capable of modeling long dependencies in time series, but its weak ability to model regional spatial details leads to large localization errors and a false alarm rate as high as 9.8%. ASTGCN introduces a graph structure modeling approach, effectively integrating temporal and spatial factor information, and outperforms Transformer in terms of early warning lead time and accuracy; however, its static graph structure is ill-suited to the dynamic nature of rapid red tide evolution. MMFN and CMMT improved prediction accuracy through multimodal fusion mechanisms, achieving F1 scores of 83.9% and 86.8%, respectively. However, their fusion methods are relatively shallow, lacking cross-modal consistency alignment and semantic enhancement, resulting in a still high false alarm rate and a significant increase in inference latency. ConvLSTM has certain advantages in time series modeling, but due to the lack of explicit spatial structure representation, its prediction accuracy is low, showing the lowest ACC and F1 values ​​(82.2% and 80.7%).

[0157] In contrast, the method of this invention, based on key technologies such as bi-branch feature encoding, cross-modal contrastive learning self-supervised learning, attention fusion, and graph structure modeling, fully integrates multi-source information including remote sensing images, UAV imagery, and red tide inducing factors, achieving a higher level of semantic consistency and structural sensitivity modeling capabilities. In experiments, this method comprehensively outperforms other comparative methods across six evaluation metrics: accuracy reaches 91.3%, F1 score reaches 89.6%, and it significantly leads in early warning lead time (3.0 days) and spatial positioning accuracy (error 6.3 km), while reducing the false alarm rate to 5.1% and controlling inference latency within 3.8 minutes. These results fully verify the practicality and advancement of this method in red tide anomaly identification and trend prediction, demonstrating good prospects for engineering deployment and emergency response value.

[0158] To verify the model's ability to perceive spatial anomalies, a spatial heatmap of the red tide occurrence probability was constructed, and the results are as follows: Figure 4 As shown in the figure, different shades of color correspond to the probability of red tide occurrence in different monitoring areas, with darker colors indicating a higher risk of red tide in the area. It can be seen that the anomaly distribution predicted by the method of this invention exhibits significant spatial clustering, with high-risk areas concentrated in near-shore waters heavily affected by human activities. The spatial boundaries are clear, and the trends are consistent with historical monitoring data. These results demonstrate that this method not only possesses high prediction accuracy but also effectively locates red tide anomaly areas on a spatial scale, providing a refined reference for subsequent early warning and response measures.

[0159] To verify the model's ability to predict the dynamic trend of red tide concentration, the system compared and analyzed the red tide concentration prediction results of different models at typical monitoring stations. The results are as follows: Figure 5 As shown in the figure, the actual red tide concentration curve and the predicted curves of various comparative models change over time (t, t+1, t+2...). It can be seen that the concentration change trajectory predicted by the method of this invention closely matches the actual observed values, accurately capturing the rise and fall of red tides, especially showing the best fitting effect at abrupt change points and inflection points. Experimental results demonstrate that the method of this invention has higher prediction accuracy and response sensitivity in red tide concentration time-series modeling, contributing to continuous dynamic monitoring and trend early warning of red tide development.

[0160] Example 2

[0161] This embodiment provides a marine red tide anomaly detection system that integrates multi-source remote sensing and graph neural networks, including:

[0162] The data acquisition module is configured as follows:

[0163] A computer-readable storage medium storing a plurality of instructions adapted for loading and execution by a processor of a terminal device, the aforementioned method for detecting marine red tide anomalies by integrating multi-source remote sensing and graph neural networks.

[0164] A terminal device includes a processor and a computer-readable storage medium, the processor being used to implement various instructions; the computer-readable storage medium being used to store multiple instructions adapted to be loaded and executed by the processor, the instructions being a method for detecting marine red tide anomalies that integrates multi-source remote sensing and graph neural networks.

[0165] The above are all preferred embodiments of the present invention and are not intended to limit the scope of protection of the present invention. Therefore, all equivalent changes made in accordance with the structure, shape and principle of the present invention should be covered within the scope of protection of the present invention.

Claims

1. A method for detecting marine red tide anomalies by integrating multi-source remote sensing and graph neural networks, characterized in that, include: Acquire remote sensing image data, UAV image data, and monitoring data from monitoring points; Data preprocessing is performed on the acquired remote sensing image data and UAV image data; Feature extraction and feature fusion are performed on remote sensing images and UAV images to obtain remote sensing feature data; A spatiotemporal graph structure is constructed based on monitoring data from monitoring points to obtain graph structure data. This includes building a dynamic adjacency graph based on a sliding time window. At each time step t, an adjacency matrix is ​​constructed using feature similarity based on the historical sequence {t-w+1,…,t} with a window length of w. A t This approach captures the dynamic connection strength between nodes at the current moment. Addressing the strong volatility of marine environmental variables across different time scales, a sliding window decomposition mechanism is introduced to divide long-term time series data into multiple sliding short sequences. This extracts short-term dynamic features and avoids long-term stable trends masking sudden anomalies. At each time step t, a node feature tensor under the current sliding window is constructed. S t ∈ R w×N×F The graph is then subjected to convolution operations; finally, to accommodate the differences in node characteristics, a node-level personalized parameter vector is constructed. θ i A personalized weight adjustment mechanism is introduced, which is expressed as: , in, This represents the final graph convolution output of node i at time t. The trainable individual weights representing node i reflect its preference for factor responses. Represents element-wise multiplication. This represents the input features of node j at time t. Representative node i In time t The set of adjacent nodes; Learning the consistency representation of remote sensing feature modalities and graph structure feature modalities based on a cross-modal contrastive self-supervised learning mechanism; Heterogeneous feature fusion and joint representation based on consistency learning; Anomaly detection and early warning based on fusion characterization results; The method for learning consistent representations of remote sensing feature modalities and graph structure feature modalities based on cross-modal contrastive self-supervised learning includes projecting the two modalities into a shared semantic space to achieve consistent mapping between the two modalities in the global semantic space, and constructing a global modality alignment target using InfoNCE loss based on contrastive learning, as follows: , Where sim(a,b) represents the cosine similarity. It is a temperature coefficient used to adjust the smoothness of the distribution. This represents a set of negative sample graph structures containing different time steps. This represents the feature vector generated by the spatiotemporal graph neural network. This represents the feature vector generated by the dual-branch encoder. Representative sample set The feature vectors in; The cross-modal contrastive self-supervised learning mechanism for learning consistent representations between remote sensing feature modalities and graph structure feature modalities also includes introducing a local semantic alignment mechanism to enhance the fine-grained semantic alignment capability between modalities. Specifically, this involves dividing the remote sensing image into K fixed spatial window regions, and extracting local embedding representations for each region using a convolutional encoder. Meanwhile, the embedding representation of each node in the graph structure is as follows: The goal is to enable each local region of the image to find the semantically closest site node in the graph structure, constructing cross-modal local alignment pairs. The alignment loss is expressed as: , in, v sim(a,b) represents a node in the graph structure, and sim(a,b) represents the similarity function.

2. The method for detecting marine red tide anomalies by integrating multi-source remote sensing and graph neural networks according to claim 1, characterized in that, The data preprocessing of the acquired remote sensing image data and UAV image data includes first performing physical and geometric consistency processing on the remote sensing images and UAV images, and then performing unified standardization processing on all factor data. Then, spatiotemporal alignment and missing data completion are performed, including temporal alignment of image data and factor data based on timestamps; For data loss at monitoring points caused by buoy offline issues, spatial weighted interpolation is used for completion. A time-series-based frame interpolation and image completion method is introduced, employing an optical flow-guided image reconstruction method to predict and complete the content of frames obscured by fog. Let I be the number of image frames acquired by the UAV over continuous time. t-1 with I t+1 The goal is to reconstruct the missing intermediate image frame I. t First, calculate the forward optical flow F between the two frames. t-1 →t+1 and reverse optical flow F t+1 →t-1, using bidirectional optical flow and temporal weight α, the reconstructed value of a pixel location in the intermediate frame is estimated, expressed as: Where (x,y) represents a pixel position in the image, (u,v) represents the motion vector of the pixel from time point a to b, and F a→b This represents the optical flow field from frame a to frame b, where u1 and v1 represent the optical flow field from frame I. t-1 To I t+1 Optical flow, u2, v2 from I t+1 To I t-1 The optical flow.

3. The method for detecting marine red tide anomalies by integrating multi-source remote sensing and graph neural networks according to claim 2, characterized in that, The feature extraction and feature fusion of remote sensing images and UAV images includes, in the time series feature extraction branch, setting a UAV image sequence of length T as { X u } u=1 T Each frame of image X u To represent the multispectral image data at time t, we first use a lightweight convolution module to extract local perceptual features, then introduce the self-attention mechanism in the Transformer architecture to model the dynamic dependencies between image sequences, so as to obtain the behavior patterns of key regions in the time dimension, and finally obtain a sequence-level dynamic embedding representation. In the spatial high-resolution feature extraction branch, the input is a single remote sensing image. Multi-level spatial semantic features are extracted through a pre-trained multi-scale residual network ResNet-50 + ASPP. At the same time, in order to highlight potential abnormal regions in the image, a spatial attention mechanism is introduced to recalibrate the feature distribution.

4. The method for detecting marine red tide anomalies by integrating multi-source remote sensing and graph neural networks according to claim 3, characterized in that, The heterogeneous feature fusion and joint representation based on consistency learning includes introducing a modal attention mechanism to adaptively adjust the fusion weights of different modal features according to task relevance. Here, let the image modal features be h. I ∈R d The graph structure modal features are h G ∈R d The fusion weights are calculated using a shared attention network, and are expressed as follows: , Where W is the learnable attention parameter, h I h represents the modal features of an image. G The representative graph structure modal features are ultimately fused and represented as follows: , in, and This represents the fusion weight.

5. The method for detecting marine red tide anomalies by integrating multi-source remote sensing and graph neural networks according to claim 4, characterized in that, The heterogeneous feature fusion and joint representation based on consistency learning also includes a semantic channel attention mechanism to enhance the semantic dimension of the fused features that is highly correlated with red tides. This mechanism performs weighted adjustments on the joint representation of the fused features along the channel dimension. h fused The channel attention weights are constructed using a Squeeze-and-Excitation mechanism, and the semantic dimension is added to the fused features to obtain the features. h recon ,Will h recon The input is fed into the Transformer-based spatiotemporal modeling module to construct a joint representation sequence of multiple time steps and multiple modalities. The fused feature sequence representing the past T time steps is input into the temporal encoder and represented as: , in, This represents the fused dynamic joint spatiotemporal feature sequence, which serves as the input to the downstream red tide early warning and spatial reasoning module.

6. The method for detecting marine red tide anomalies by integrating multi-source remote sensing and graph neural networks according to claim 5, characterized in that, The anomaly detection and early warning based on the fusion representation results includes employing an unsupervised anomaly detection method based on an autoencoder, where the input joint features are... ∈R d A self-encoder is composed of an encoder. and decoder The system is structured by modeling the normal state by minimizing the reconstruction error, and detecting anomalies based on the reconstruction error determined by a set threshold. It also uses historical fusion feature sequences to train a time-series prediction model to predict the feature representation of the next moment or several future moments, and further predicts the probability of red tide occurrence.

7. A marine red tide anomaly detection system integrating multi-source remote sensing and graph neural networks, executing the marine red tide anomaly detection method integrating multi-source remote sensing and graph neural networks as described in claim 1, characterized in that, include: The data acquisition module is configured to acquire remote sensing image data, UAV image data, and monitoring data from monitoring points. The preprocessing module is configured to perform data preprocessing on the acquired remote sensing image data and UAV image data; The remote sensing feature module is configured to extract and fuse features from remote sensing images and UAV images to obtain remote sensing feature data. The graph structure module is configured to construct a spatiotemporal graph structure based on the monitoring data of the monitoring points to obtain graph structure data. The consistency module is configured to learn the consistency representation of remote sensing feature modalities and graph structure feature modalities based on a cross-modal contrastive self-supervised learning mechanism. The joint module is configured to perform heterogeneous feature fusion and joint representation based on consistency learning; The early warning module is configured to perform anomaly detection and early warning based on the fusion representation results.