End edge cloud hyperspectral monitoring method and device based on binary mask and mechanism cooperation

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing a binarized mask and mechanism-based end-edge cloud hyperspectral monitoring method, the contradiction between band redundancy and dynamic computing power in resource-constrained environments of hyperspectral water quality monitoring systems is resolved, achieving efficient and robust water quality monitoring that is adaptable to complex environments and harsh network conditions.

CN122242789APending Publication Date: 2026-06-19XIAMEN SIXIN INTERNET OF THINGS TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: XIAMEN SIXIN INTERNET OF THINGS TECH CO LTD
Filing Date: 2026-05-25
Publication Date: 2026-06-19

Smart Images

Figure CN122242789A_ABST

Patent Text Reader

Abstract

This invention provides an edge-cloud hyperspectral monitoring method and device based on binary mask and mechanism synergy, relating to the interdisciplinary fields of edge computing and hyperspectral remote sensing. The invention acquires raw spectral data and outputs a digital binary feature mask through a multi-agent action space combined with a three-dimensional reward function. In the computing power scheduling stage, the edge gateway constructs a cost function based on Lyapunov to dynamically decide on computing power offloading; when the decision is local inference, the first-layer computation graph of the network is reconstructed through diagonal matrix projection, and column pruning and compression inference are performed, strictly reducing the floating-point computation to a proportion proportional to the band retention rate. Regarding physical mechanism injection, the Gordon water radiative transfer residual term is introduced into the cloud-based teacher network training, and physical priors are injected into the edge student network through knowledge distillation. This invention solves the contradictions between band redundancy and dynamic computing power, edge lightweighting and physical generalization, and multi-device collaboration and catastrophic forgetting in resource-constrained environments.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of intelligent Internet of Things, edge computing and hyperspectral remote sensing, and more specifically, to an edge cloud hyperspectral monitoring method and device based on the synergy of binary mask and mechanism. Background Technology

[0002] Hyperspectral remote sensing technology, as an important means of modern water quality monitoring, can acquire rich spectral information of target water bodies and plays an increasingly important role in fields such as marine environmental monitoring, lake eutrophication assessment, and river pollution tracking. With the rapid development of smart IoT and edge computing technologies, the edge-cloud collaborative architecture provides a new technical path for solving real-time hyperspectral processing in resource-constrained environments.

[0003] However, existing hyperspectral water quality monitoring systems face a core contradiction that is difficult to reconcile when deployed in real-world scenarios. In resource-constrained environments, hyperspectral equipment typically collects hundreds or even thousands of spectral bands, resulting in massive amounts of data, while edge gateways have limited computing power and fluctuating transmission bandwidth. Most existing band selection methods employ static offline dimensionality reduction strategies, which cannot adaptively adjust based on real-time network bandwidth conditions. More importantly, when the length of dynamically selected bands changes, the fixed input layer dimension of deep learning models prevents the direct input of longer bands for inference. This necessitates zero-padding of the selected bands or retraining the model, leading to a severe waste of underlying hardware computing resources—a phenomenon known as idle computing power. This contradiction between band redundancy and dynamic computing power severely restricts the real-time performance and robustness of hyperspectral monitoring systems in complex field environments.

[0004] Meanwhile, lightweight, purely data-driven networks exhibit significant limitations in generalization under complex lighting conditions. Hyperspectral water quality inversion involves complex optical mechanisms of water bodies, including the absorption and scattering of light by components such as chlorophyll, suspended matter, and yellow substances. Traditional methods neglect the integration of water body optical physics models such as the Gordon radiative transfer equation, relying solely on historical data to fit the data distribution. When encountering scenarios such as abrupt changes in lighting or drastic variations in water composition, the model's prediction accuracy drops significantly. Furthermore, edge devices, constrained by computational resources and energy consumption, struggle to deploy complex physical information neural networks for online partial differential equation solving, making it difficult to effectively inject physical mechanism knowledge into edge models.

[0005] In multi-node collaborative monitoring scenarios, while traditional federated learning methods can achieve distributed model training while protecting data privacy, their aggregation mechanism based on fitting local features with pure data suffers from a catastrophic forgetting problem. When multiple edge nodes jointly train, the replacement of local data gradually dilutes the physical laws originally inherited through knowledge distillation in the model, causing the model to lose its implicit representation ability of the optical mechanisms of water bodies. In addition, traditional federated learning is highly dependent on cloud communication links. In cases of poor network coverage or temporary disconnection in outdoor waters, edge nodes struggle to obtain aggregation guidance from the cloud, and model updates stagnate.

[0006] In view of the above, this application is hereby submitted. Summary of the Invention

[0007] This invention aims to provide a method, device, equipment, and medium for edge cloud hyperspectral monitoring based on binary mask and mechanism synergy, in order to solve the contradictions between band redundancy and dynamic computing power, edge lightweighting and physical generalization, and multi-device collaboration and catastrophic forgetting in existing hyperspectral water quality monitoring systems in resource-constrained environments.

[0008] To solve the above-mentioned technical problems, the present invention is achieved through the following technical solution:

[0009] A hyperspectral monitoring method for edge clouds based on the synergy of binary masking and mechanism includes: S1, acquire raw spectral data covering the entire spectral band from visible light to near-infrared; S2, the entire band of the original spectral data is divided into several adjacent sub-bands, and independent agents are deployed one by one. A three-dimensional reward function based on mutual information regularization is used to guide each agent to select decision output binary actions for the sub-bands. The binary actions are spliced together to form a digital binary feature mask vector. S3, construct a cost function based on the Lyapunov multi-objective decision framework, and generate the optimal computing power unloading decision by minimizing the cost function; S4, when the optimal computing power offloading decision is determined to be local inference, the digital binary feature mask is transformed into a diagonal indicator matrix. The first-layer computation graph of the network is reconstructed by projection. The non-zero index of the digital binary feature mask is extracted and the column pruning dimension compression is performed on the preset first-layer weights of the student network and the original spectral data to obtain the first-layer compressed inference output features, so as to output water quality inversion parameters through the student network. The student network is a lightweight inference model deployed on the edge gateway. During training, it inherits the physical prior feature weights of the cloud teacher network through knowledge distillation. The cloud teacher network is a deep neural network deployed in the cloud. During training, the physical conservation residual term of Gordon's water body radiative transfer equation is introduced so that the prior knowledge of the optical physical laws of water bodies is embedded in the hidden layer features of the network.

[0010] This invention also provides an edge cloud hyperspectral monitoring device based on the synergy of binary mask and mechanism, comprising: The acquisition unit is used to acquire raw spectral data covering the entire visible to near-infrared region. The multi-agent construction unit is used to divide the entire band of the original spectral data into several adjacent sub-bands and deploy independent agents one by one. The three-dimensional reward function based on mutual information regularization is used to guide each agent to select decision output binary actions for the sub-bands. The binary actions are spliced together to form a digital binary feature mask vector. A multi-objective coordination and control unit is used to construct a cost function based on the Lyapunov multi-objective decision framework, and generate the optimal computing power unloading decision by minimizing the cost function. The compression and reconstruction unit is used to transform the digital binary feature mask into a diagonal indicator matrix when the optimal computing power offloading decision is determined to be local inference. It then reconstructs the first-layer computation graph of the network through projection, extracts the non-zero indices of the digital binary feature mask, performs column pruning and dimension compression on the preset first-layer weights of the student network and the original spectral data, and obtains the first-layer compressed inference output features. These features are then used by the student network to output water quality inversion parameters. The student network is a lightweight inference model deployed on an edge gateway, which inherits the physical prior feature weights of the cloud-based teacher network through knowledge distillation during training. The cloud-based teacher network is a deep neural network deployed in the cloud, which incorporates the physical conservation residual term of Gordon's water radiative transfer equation during training to embed prior knowledge of water optical physics laws into the network's hidden layer features.

[0011] The present invention also provides an edge cloud hyperspectral monitoring device based on the synergy of binary mask and mechanism, including a processor and a memory. The memory stores a computer program that can be executed by the processor to realize the edge cloud hyperspectral monitoring method based on the synergy of binary mask and mechanism as described above.

[0012] The present invention also provides a computer-readable storage medium storing computer-readable instructions, which, when executed by a processor of the device on which the computer-readable storage medium resides, implement the edge cloud hyperspectral monitoring method based on binary mask and mechanism synergy as described above.

[0013] In summary, compared with the prior art, the present invention has the following beneficial effects: This invention generates digital binary feature masks by decoupling the action space of multiple agents to adapt to dynamic network bandwidth, achieves precise computing power offloading and scheduling through the Lyapunov multi-objective decision framework, realizes lightweight edge inference through Gordon water radiation residual physical distillation, and achieves continuous model evolution under harsh network conditions through LoRa Mesh topology decentralized federated learning.

[0014] This invention provides a method for efficient processing of hyperspectral data in resource-constrained environments. It optimizes computing power through hardware and software collaboration using digital binary masks and enhances prediction robustness in complex environments by incorporating water body optical mechanism models.

[0015] This invention enables adaptive evolution of edge models through a decentralized federated learning mechanism even without network support. This has significant scientific and engineering value for promoting hyperspectral water quality monitoring technology from the laboratory to practical applications and building intelligent, real-time, and robust environmental monitoring systems.

[0016] This invention achieves extreme computational power release through hardware-software decoupling. By using a decision mask, it directly reduces the dimensionality of the first-layer feature map of a deep network to a feature subset (column pruning), strictly reducing floating-point computation at the hardware level to a proportion proportional to the band retention rate. This solution completely solves the hardware bottleneck caused by the need to compute a large number of zero values even after reducing the input dimension, as traditional dimensionality reduction algorithms still require. It releases maximum effective computational resources even under the condition of limited computing power on edge devices.

[0017] This invention achieves high-dimensional generalization by incorporating physical equations. It completely departs from the purely data-driven approach, forcibly introducing the Gordon radiative transfer partial differential equation as a physical conservation residual term into the loss function of the cloud-based teacher network, and injecting physical priors into the edge student network through knowledge distillation. This scheme significantly broadens the generalization ability and robustness of the water quality prediction model under complex lighting conditions, ensuring stable monitoring performance in uncontrolled field environments.

[0018] This invention achieves robust evolution under harsh network conditions. By combining the LoRa decentralized federated learning mechanism with RSSI signal matrices, this invention enables edge gateways to continue collaborative model training even in network-disconnected environments. Simultaneously, it achieves unsupervised feature drift calibration through optimal transmission theory, completely overcoming the limitations of uninterrupted iteration in remote water environments without network coverage or manual labeling. This enables continuous adaptive evolution capabilities in real-world environments. Attached Figure Description To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained from these drawings without creative effort.

[0019] Figure 1 This is a schematic diagram of an edge cloud hyperspectral monitoring method based on the synergy of binary mask and mechanism, provided in Example 1.

[0020] Figure 2 This is a schematic diagram of an edge cloud hyperspectral monitoring device based on the synergy of binary mask and mechanism, provided in Example 2.

[0021] The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. Detailed Implementation

[0022] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without inventive effort are within the scope of protection of the present invention. Therefore, the following detailed description of the embodiments of the present invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without inventive effort are within the scope of protection of the present invention.

[0023] Example 1 Embodiment 1 of the present invention provides a method for end-edge cloud hyperspectral monitoring based on the synergy of binary mask and mechanism, which can be implemented by an end-edge cloud hyperspectral monitoring device based on the synergy of binary mask and mechanism (hereinafter referred to as monitoring device), specifically, executed by one or more processors within the monitoring device.

[0024] In this embodiment, the monitoring device may be an electronic device equipped with a processor, which carries a computer program for the edge cloud hyperspectral monitoring method based on binary mask and mechanism synergy, and the computer program can be executed, such as a computer, smartphone, smart tablet, workstation, etc., which are not limited here.

[0025] The monitoring equipment consists of a three-layer architecture: a hyperspectral sensor on the edge, a computing power box at the edge gateway, and a cloud server, forming a complete closed loop from front-end data acquisition to cloud-based intelligent processing.

[0026] This invention achieves dynamic dimensionality reduction and bandwidth saving of hyperspectral data by using a binary mask in an edge-cloud architecture, and achieves optimal scheduling of edge computing power through Lyapunov decision-making. Hardware-level sparse computation is performed during local inference, while the lightweight student network inherits the water physics mechanism of the cloud teacher network. Finally, real-time, low-power, high-precision, and robust hyperspectral water quality monitoring is achieved on edge devices.

[0027] In practical applications, end-side hyperspectral sensors are typically deployed on unmanned surface vessels, surface buoys, or fixed monitoring stations to collect real-time water spectral data. The raw spectral data vector acquired by the sensor... This represents a 200-dimensional full-band spectral vector covering the visible to near-infrared region, with the visible band ranging from 400nm to 700nm and the near-infrared band ranging from 700nm to 1100nm. This original spectral vector contains complex noise information caused by factors such as light fluctuations, water disturbances, and changes in phytoplankton concentration. Furthermore, the data volume is limited by network bandwidth, necessitating dynamic dimensionality reduction. The edge gateway computing unit is deployed near the monitoring platform, possessing limited computing resources and storage space. Its hardware configuration typically includes an embedded ARM processor, an FPGA accelerator, and a LoRa wireless communication module. The cloud server is deployed at the ground control center or data center, equipped with a high-performance GPU cluster and large-capacity storage devices, responsible for model training and global policy optimization. This three-layer architecture transmits data and control information via 4G / 5G cellular networks or satellite communication links. When the main communication link is interrupted, the edge gateways maintain collaborative communication through a LoRa Mesh self-organizing network.

[0028] like Figure 1 As shown, a hyperspectral monitoring method for edge clouds based on the synergy of binary mask and mechanism includes steps S1 to S4.

[0029] S1, acquire raw spectral data covering the entire visible to near-infrared region.

[0030] This step involves connecting to a real-time hyperspectral sensor to acquire raw spectral data vectors. After the hyperspectral sensor completes one spectral scan, it generates a raw vector containing reflectance data for approximately two hundred bands. Preliminary quality control of the raw spectral data is then performed to remove abnormal band data caused by sensor contamination, light oversaturation, or water bubble interference. The standard deviation and coefficient of variation for each band are calculated. When the standard deviation of a band exceeds a preset threshold or the coefficient of variation is abnormal, that band is marked as pending processing.

[0031] In one specific embodiment, when the reflectivity of the visible light band exceeds 0.95 or falls below 0.02, the system automatically determines that the data in that band is interfered with by direct solar reflection and includes it in the candidate set for rejection by the digital binary feature mask. After quality control is completed, the processed original spectral data vector is used for subsequent band selection decisions.

[0032] S2, the entire band of the original spectral data is divided into several adjacent sub-bands, and independent agents are deployed one by one. A three-dimensional reward function based on mutual information regularization is used to guide each agent to select decision output binary actions for the sub-bands. The binary actions are then spliced together to form a digital binary feature mask vector.

[0033] This step divides the entire band into several adjacent sub-bands by performing a multi-agent action space decoupling operation. In a preferred embodiment, the entire band can be divided into four sub-bands, corresponding to the visible blue light band (400nm to 500nm), the visible green light band (500nm to 600nm), the visible red light band (600nm to 700nm), and the near-infrared band (700nm to 1100nm), respectively.

[0034] Each sub-band comprises 50 consecutive bands, and an independent agent is deployed for each sub-band. Each agent is responsible for band selection decisions for its corresponding sub-band, using a three-dimensional reward function of the policy network within a reinforcement learning framework to output binary actions. The policy network takes as input a 50-dimensional spectral vector of the corresponding sub-band and global state information, and outputs a 50-dimensional binary decision vector.

[0035] The three-dimensional reward function is used to guide the action decisions of the multi-agent system, and its expression is: ; ; ; ; in, The value of the three-dimensional reward function; , , These are the inference accuracy weighting coefficient, retention rate penalty weighting coefficient, and mutual information regularization term coefficient, respectively. For the accuracy corresponding to the spectral band; for Real-time network bandwidth at any given moment The current moment; The overall retention rate of the spectral band; For mutual information, the Pearson cross-correlation matrix is used as an engineering approximation. This represents the total number of spectral bands. For the first Digital binary feature mask decision values for each band; For the first , spectral vectors of each band , The cross-correlation coefficient; Expressing expectations; , Representing spectral vectors respectively , The mean; , Representing spectral vectors respectively , The standard deviation.

[0036] parameter β It is negatively correlated with real-time network bandwidth; when bandwidth decreases... β Increase to penalize high retention rates; γ This is used to select band combinations with good spectral orthogonality for excitation.

[0037] The penalty term incentive system based on mutual information selects band combinations with low cross-correlation coefficients and good spectral orthogonality to avoid the negative impact of redundant bands on inference accuracy.

[0038] Weighting coefficient , , The system adaptively adjusts based on the system deployment environment and performance requirements. When the monitoring task demands high prediction accuracy, the system size is increased. The value is chosen to prioritize prediction performance; when network bandwidth resources are scarce, the system increases... The value is chosen to compress the amount of transmitted data; when it is necessary to ensure the physical interpretability of band selection, the system increases... The value of is chosen to strengthen the spectral orthogonality constraint. Through the above-mentioned multi-agent action space decoupling and mutual information regularized reward mechanism, the system can dynamically generate digital binary feature masks that adapt to changes in the field environment based on real-time network bandwidth and physical orthogonality constraints.

[0039] Each agent executes local binary actions in parallel. The action output is 1 when the decision is to retain a certain band within the sub-band, and 0 when the decision is to remove that band. The binary action outputs of each agent are concatenated to form a fixed-length 200-dimensional digital binary feature mask vector. ,in .

[0040] Digital binary feature mask It has a dual technical function: First, as the basis for data downlink strategy, the data corresponding to the band with a value of 0 in the mask does not need to be uploaded to the edge gateway or cloud server, thus adapting to dynamically changing network bandwidth conditions; Second, as the underlying hardware execution mask, it directly controls the input dimension of the edge inference network, realizing deep collaboration between software strategy and hardware execution.

[0041] Each agent is trained using a deep Q-network or a policy gradient method, with the action space following a Bernoulli distribution of {0,1}. Each agent makes independent decisions based on local observations, avoiding the communication overhead and single-point-of-failure risk associated with centralized decision-making.

[0042] The action space decoupling design allows agents to run in parallel, significantly reducing decision latency. In one specific embodiment, when the network bandwidth decreases from 10 Mbps to 1 Mbps, the decision-making strategies of each agent are coordinated to automatically increase the band removal ratio, reducing the retention rate from 0.8 to 0.4, ensuring that the compressed data can adapt to the limited transmission bandwidth.

[0043] S3. Based on the Lyapunov multi-objective decision framework, a cost function is constructed, and the optimal computing power unloading decision is generated by minimizing the cost function.

[0044] This step integrates a lightweight precision predictor and a pre-defined first-layer weight matrix for the student network. The lightweight precision predictor uses a pre-defined MLP network architecture, which can contain two fully connected layers and a ReLU activation function. The input layer has 200 neurons corresponding to the mask vector dimension, the hidden layer has 64 neurons, and the output layer has 1 neuron corresponding to the expected inference precision value. The precision predictor takes the mask vector as input, performs forward computation, and outputs the corresponding expected inference precision value.

[0045] The cost function is constructed based on queue drift, latency, energy consumption, and accuracy loss, and its expression is: ; ; ; in, express The value of the Lyapunov multi-objective decision cost function at time t; This represents the single-step queue drift amount; Optimize the weighting coefficients for Lyapunov; Expressing expectations; To unload decision variables, where 0 represents local inference and 1 represents cloud unloading; express Communication and inference latency under real-time decision-making; express Energy consumption under real-time decision-making; express The amount of accuracy loss under real-time decision-making; , , These represent the weighting coefficients for latency, energy consumption, and accuracy loss, respectively. for The backlog in the edge gateway's data buffer queue at any given time; for Data processing rate at any given moment; for Data arrival rate at any given time; This indicates taking the maximum value.

[0046] In this embodiment, communication and inference latency This represents the total time from data acquisition to outputting monitoring results under the current offloading decision. When the decision is local inference, the latency is the sum of spectral data compression time and network forward inference time; when the decision is cloud inference, the latency is the sum of data upload and transmission time and cloud processing and result delivery time.

[0047] Energy consumption This represents the total energy consumption for an edge device to complete one monitoring cycle under the current offloading decision. When the decision is local inference, the energy consumption includes the power consumption of edge device compression, computation, and storage; when the decision is cloud inference, the energy consumption includes data transmission power consumption plus edge device standby power consumption.

[0048] The accuracy loss is calculated by using the digital binary feature mask vector as input to a lightweight accuracy predictor for forward inference to output the expected inference accuracy, and combining it with the full-band reference accuracy. The expression is as follows: ; ; in, This refers to the amount of precision loss. To achieve the expected inference accuracy; This represents the highest benchmark inference accuracy across the entire frequency band. This represents the Softmax activation function; This represents the ReLU activation function; , These are the first and second layer weight matrices of the lightweight precision predictor, respectively. This is a digital binary feature mask vector; , These are the first and second layer bias vectors of the lightweight precision predictor, respectively.

[0049] S4, when the optimal computing power offloading decision is determined to be local inference, the digital binary feature mask is transformed into a diagonal indicator matrix. The first-layer computation graph of the network is reconstructed by projection. The non-zero index of the digital binary feature mask is extracted and the column pruning dimension compression is performed on the preset first-layer weights of the student network and the original spectral data to obtain the first-layer compressed inference output features, so as to output water quality inversion parameters through the student network. The student network is a lightweight inference model deployed on the edge gateway. During training, it inherits the physical prior feature weights of the cloud teacher network through knowledge distillation. The cloud teacher network is a deep neural network deployed in the cloud. During training, the physical conservation residual term of Gordon's water body radiative transfer equation is introduced so that the prior knowledge of the optical physical laws of water bodies is embedded in the hidden layer features of the network.

[0050] In this embodiment, the student network is a lightweight inference model deployed on an edge gateway, used to output water quality inversion parameters based on the first-layer compressed inference output features. The student network is trained using a knowledge distillation mechanism, inheriting prior physical knowledge from the cloud-based teacher network.

[0051] The training of the cloud-based teacher network not only relies on water quality label data, such as the prediction errors of chlorophyll-a concentration, turbidity, and dissolved oxygen content as supervision signals, but also forcibly introduces the radiative transfer residual term based on the Gordon water optical model to constrain the model output to conform to the optical physics laws of water.

[0052] Loss function of cloud-based teacher network The formula is: ; in, The loss to the cloud-based teacher network; This represents the true label vector of water quality parameters; The water quality parameter prediction vector output by the teacher network; This is the mean square error loss; These are the weighting coefficients for the physical residual term, used to balance data loss and physical constraints; For water body remote sensing reflectance, , The water body at wavelengths Absorption coefficient and backscattering coefficient at the location; This is the medium constant in Gordon's optical model of water; This represents the L2 norm.

[0053] The cloud-based teacher network trained using the aforementioned loss function contains prior knowledge of the physical laws described by Gordon's water radiative transfer equation in its hidden layer features.

[0054] The a priori physical laws described by Gordon's water radiative transfer equation refer to the inherent optical quantitative constraints between the remote sensing reflectance of natural water bodies and the absorption coefficient and backscattering coefficient of water bodies. Specifically, it manifests as the monotonic physical change law that the remote sensing reflectance of water bodies decreases as the absorption coefficient increases and increases as the backscattering coefficient increases, as well as the inherent spectral mechanism that the absorption and scattering parameters of water bodies change continuously and smoothly with wavelength under different spectral bands. These constitute the physical conservation a priori constraints that must be followed in hyperspectral observation of water bodies.

[0055] Edge student networks do not require solving partial derivatives; they achieve this solely by minimizing the total knowledge distillation loss. To inherit the teacher's implicit physical characteristics, the expression is: ; in, The total knowledge distillation loss for edge student networks; This is the weighting factor for distillation losses; Predicted water quality parameters output to the student network; For accurate labeling of water quality parameters; Cross-entropy loss; This is the distillation temperature, used to soften the output distribution of the teacher network. The Softmax activation function is used. The unnormalized Logits vector output by the cloud-based teacher network; The unnormalized Logits vector output by the edge student network; This is the KL divergence, used to measure the difference between the output distributions of teachers and students.

[0056] By minimizing the KL divergence between the output distributions of the teacher and student networks, the student network inherits the physical prior feature weights from the teacher network. Through the aforementioned cloud-edge knowledge distillation process, the edge student network carries prior feature weights distilled from the Gordon water radiation residual loss function introduced from the cloud, maintaining robust water quality prediction performance even under complex lighting conditions.

[0057] This step constructs a cost function based on the Lyapunov multi-objective decision framework. By minimizing the cost function, the system generates the optimal computing power unloading decision. When edge gateway computing resources are sufficient and network transmission latency is high, the system tends to choose local inference; when the buffer queue is severely backed up and cloud computing resources are idle, the system tends to choose cloud offloading. In one specific embodiment, the edge gateway monitors local CPU utilization, memory usage, and GPU utilization. When CPU utilization is below 50% and the buffer queue backlog is high... When the value is below a preset threshold, the system makes a decision. =0 executes local inference; when CPU utilization exceeds 80% or buffer queue backlog... During sustained growth, system decision-making =1. The inference task is offloaded to the cloud for execution. Through the Lyapunov multi-objective decision framework described above, the system can achieve an optimal balance between accuracy and efficiency under conditions of limited computing resources.

[0058] Specifically, when the decision is local inference, column pruning and compression inference is performed. The specific process of local inference is as follows: First, the digital binary feature mask is converted into a diagonal indicator matrix. , represented as: ; in, This is a digital binary feature mask vector; This represents a diagonal matrix operator.

[0059] Then extract the diagonal indicator matrix. The set of indexes for non-zero columns, to preserve the set of band indexes. , represented as: ; in, For the first Digital binary feature mask decision values for each band.

[0060] Then based on the band index set Column pruning is performed on the pre-defined first-layer weight matrix of the student network and the input raw spectral data to obtain sparse compressed weights. With compressed spectrum .

[0061] Reconstruct sparse compression weights With compressed spectrum The output features of the first-layer compressed inference are obtained, and are represented as follows: ; in, This is the output feature of the first-level compressed inference.

[0062] This method does not employ the inefficient approach of padding zeros with a mask in traditional schemes, which results in wasted computing power. Instead, it reconstructs the first-layer computation graph of the network through diagonal matrix projection, achieving hardware-level column pruning and compression.

[0063] Through the aforementioned column pruning mechanism, the floating-point computation cost of edge inference is significantly reduced by the band retention rate. When the band retention rate is 0.5, the FLOPs of edge inference are reduced by 50%; when the band retention rate is 0.3, the FLOPs of edge inference are reduced by 70%. This technical solution substantially eliminates the hardware idle problem caused by changes in input dimension in traditional masking schemes without changing the dimension of subsequent layers in the network. Furthermore, a dynamic sparse computation kernel can be integrated. When the mask vector of the current batch of data is detected to be the same as that of the previous batch, the re-indexing operation of the weight matrix is skipped, and the previous sparse computation graph is directly reused, further reducing inference latency.

[0064] When the optimal computing power offloading decision is determined to be cloud offloading, the edge gateway performs band clipping on the original spectral data according to the digital binary feature mask, uploads the clipped spectral data and the digital binary feature mask to the cloud, and the cloud teacher network performs high-precision inference. The monitoring results and the updated physical prior weights are then sent to the edge gateway to update the student network.

[0065] Specifically, in Lyapunov multi-objective decision-making, when the cost function minimization result satisfies This indicates that the current optimal decision is cloud offloading, as local computing power is insufficient and energy consumption is too high, requiring high-precision full-band inference. The cloud offloading process can be described as follows: (1) At the edge, the digital binary feature mask vector is packaged with the original spectral data and lightly pruned according to the bands retained by the binary mask. Only the effective band data is uploaded, and the full band is not uploaded. At the same time, the node ID, timestamp, and current data distribution statistics (mean / covariance, used for drift detection) are reported to reduce the amount of data uploaded, reduce bandwidth usage, and speed up the upload.

[0066] (2) The cloud server receives the cropped spectral and mask information uploaded from the edge and sends the data to the teacher network for high-precision inference. If data drift is detected, the cloud performs feature calibration synchronously to achieve the highest precision inference by utilizing the cloud's powerful computing capabilities.

[0067] (3) The cloud-based teacher network loads the physical prior of Gordon's water radiative transfer equation, inputs the clipped spectrum, and outputs: water quality monitoring results (COD, ammonia nitrogen, turbidity, chlorophyll a, etc.), inference confidence, and model activation vector distribution, so as to give full play to the advantages of the cloud-based large model and output more accurate results than the edge student network.

[0068] The model activation vector refers to the feature vector output by the activation function of a certain intermediate network layer (usually a hidden layer) during the inference process.

[0069] (4) The cloud sends the monitoring results and new physical prior weights to the edge. The edge student network updates the local physical prior subnet using knowledge distillation. If data drift occurs, the cloud synchronously sends the affine transformation plugin parameters to complete the knowledge iteration from cloud to edge, making the edge model continuously stronger.

[0070] (5) Save the latest monitoring results at the edge, update the physical prior weights and drift calibration plugin, cache the current model state, and prepare for the next local inference to ensure that the next local inference still has high accuracy.

[0071] It also includes: S5, which extracts the mean and covariance matrix of the activation vectors inside the student network and reports them to the cloud to calculate the second-order Wasserstein drift score to detect whether the data distribution has drifted. When the drift exceeds the set threshold, the affine transformation plugin parameters are solved using the optimal transport theory to perform unsupervised feature drift calibration.

[0072] After first-level column pruning and compression inference, the student network outputs the first-level compressed inference output features. The features output from the first layer of compressed inference are transmitted to subsequent layers of the student network to perform a complete forward inference process, ultimately outputting water quality inversion parameters, including but not limited to key water quality indicators such as chlorophyll a concentration, turbidity, dissolved oxygen content, total nitrogen concentration, and total phosphorus concentration.

[0073] During continuous interaction between the cloud and the edge, the system implements unsupervised feature drift detection and calibration. The edge gateway periodically extracts the mean and covariance matrix of the activation vectors within the student network and reports them to the cloud. The cloud calculates the second-order Wasserstein drift score to detect whether data distribution drift has occurred.

[0074] The formula for calculating the second-order Wasserstein drift fraction is as follows: ; in, The second-order Wasserstein drift score measures the degree of difference between the current data distribution and the baseline distribution; a higher score indicates a more severe data distribution drift. This represents the mean vector of the model activation vectors under the current drift data distribution; This represents the mean vector of the model activation vectors under the baseline data distribution. The trace operation represents the sum of the elements on the main diagonal of a matrix. This represents the covariance matrix of the model activation vectors under the current drift data distribution; This represents the covariance matrix of the model activation vectors under the baseline data distribution. This represents the L2 norm.

[0075] In this embodiment, the model activation vector of the student network is a feature vector output by the intermediate hidden layer, which is used to characterize the feature representation of spectral data in the model. By statistically analyzing the activation vectors of batch samples, the mean vector and covariance matrix of the corresponding feature distribution can be obtained, which can be used for data distribution drift detection and unsupervised feature calibration in the cloud.

[0076] When a drift exceeding a threshold is detected, the cloud uses optimal transport theory to solve for the matrix parameters of the affine transformation plugin. The mathematical calculations of the affine transformation plugin rely on the covariance trace operation of the features in the source and target domains. The cloud encapsulates the affine transformation plugin and distributes it to the edge gateway. The edge gateway inserts a calibration plugin layer into the student network, performs the affine transformation, and achieves unsupervised feature drift calibration.

[0077] The expression for solving the affine transformation plug-in parameters using optimal transport theory is as follows: ; ; in, The scaling matrix represents the affine transformation plugin, used to map the features of the drift distribution back to the reference distribution space; This represents the bias vector of the affine transformation plugin, used in conjunction with the scaling matrix to align the feature distribution.

[0078] In one specific embodiment, assuming the edge gateway is deployed at a lake monitoring station, the initial water sample data exhibits characteristics of low temperature and low turbidity in winter. After the cloud distributes the initial weights to the student network, the edge gateway begins performing water quality monitoring tasks. After several months of operation, the summer high-temperature and high-algae period arrives, and the spectral characteristics of the water body change significantly. At this time, the cloud detects that the second-order Wasserstein drift fraction exceeds the threshold of 0.15, triggering the calibration plugin distribution mechanism. Based on the recent source domain feature distribution and the historical target domain feature distribution, the cloud calculates the optimal transmission affine transformation parameters and pushes the calibration plugin to the edge gateway. After the edge gateway inserts the calibration plugin into the student network, it performs an affine transformation on the activation vector during subsequent inference, causing the output distribution to be realigned to the initial calibration space, ensuring the accuracy and consistency of the water quality inversion results.

[0079] S6, the student network model is decoupled into a shared physical prior subnet for storing physical law knowledge inherited from cloud distillation, and a personalized prediction subnet for adapting to specific characteristics of the local monitoring environment. When the edge gateway is interrupted in communication with the cloud, a topology graph is constructed through LoRa Mesh networking, and decentralized federated gradient aggregation updates are performed based on the gradient weights of neighboring nodes dynamically allocated according to RSSI signals. An EWC regularization term is introduced during local gradient updates to prevent physical prior knowledge from being forgotten.

[0080] When the communication link between the edge gateway and the cloud is interrupted, adjacent edge gateways automatically form a LoRa Mesh topology network for decentralized federated learning. The system decouples the student model into two sub-networks: a shared physical prior sub-network and a personalized prediction sub-network. The shared physical prior sub-network is responsible for storing the physical law knowledge inherited from cloud distillation, and its parameter updates are strictly constrained; the personalized prediction sub-network is responsible for adapting to the specific characteristics of the local monitoring environment, and its parameters can be updated relatively flexibly.

[0081] The expression for the decentralized federated gradient aggregation update is: ; ; in, This refers to the edge gateway node currently performing gradient updates. , For nodes The neighboring nodes; For nodes The set of all neighboring nodes; , They are nodes , The parameter vector of the shared physical prior subnet; The learning rate is used for gradient updates; For nodes arrive The weight transition matrix; Represents a node Receive from node Signal strength; a higher value indicates better communication quality and a lower packet loss rate. Represents a node Receive from node Signal strength; For nodes The loss gradient calculated on local data for parameters of the shared physical prior subnet.

[0082] The expression for introducing the EWC regularization term during local gradient updates is: ; in, This is the current parameter vector of the student network model; The total loss function for local training at the node; The basic loss function for local water quality prediction tasks; The weighting coefficients for the EWC regularization term are used to control the strength of protection for physical prior knowledge. The first parameter in the current parameter vector of the model One parameter; The first of the physical initial prior weights initially distributed from the cloud to the edge gateway One parameter; For the Fisher information matrix The diagonal elements corresponding to each parameter.

[0083] By strictly limiting the initial physical prior weights to the penalty anchor points of the EWC regularization term, the system fundamentally solves the problem of forgetting physical laws caused by local data fitting.

[0084] In one specific embodiment, assume that three edge gateway nodes A, B, and C are deployed in a lake monitoring area. Under normal circumstances, the three nodes maintain a connection with the cloud via a 4G network and periodically receive model updates and calibration plugins from the cloud. When the area enters a mountainous canyon region, the 4G signal is completely interrupted, and the three nodes automatically switch to LoRa Mesh communication mode. Nodes A, B, and C construct a weight transfer matrix based on their respective RSSI signal strengths. Assuming that node A receives an RSSI of -85dBm from node B and an RSSI of -92dBm from node C, the aggregate weight of node A is... 0.58 The value is 0.42. Each node updates its local model based on the weighted aggregated gradient, while adding an EWC regularization term to the loss function to constrain the parameters of the shared physical prior subnet from deviating from the initial weights in the cloud within a preset range. Through this mechanism, the three nodes can still collaboratively train the student network in a network-off environment, ensuring that the model's physical prior knowledge is not forgotten, and push the local updates to the cloud for global aggregation after the signal is restored.

[0085] In practical applications, this invention's system is deployed on an unmanned surface vessel (USV) platform for water quality monitoring. The USV, equipped with a hyperspectral sensor, an edge gateway, and a LoRa communication module, autonomously navigates along a pre-set path and collects water spectral data in real time. Assuming the USV collects a set of raw spectral data vectors during a sunny day, the edge gateway receives this data and performs quality control. When the current network bandwidth is determined to be 8 Mbps and the buffer queue backlog is normal, the system generates a digital binary feature mask with a band retention rate of 0.7. It retains the blue light band, green light band and part of the red light band.

[0086] Mask vectors output by multi-agent Predicted accuracy loss by the accuracy predictor The figure is approximately 2.3%, which meets the monitoring requirements. Then, a decision is made based on the Lyapunov cost function. Choose local inference. Based on the mask. Column pruning was performed on the first-layer weights of the student network, compressing the weight matrix from 512×200 to 512×140, while also compressing the input spectral vector. The compressed inference process consumes only 70% of the original computational cost, achieving a real-time inference frame rate of 30fps on the embedded GPU of the edge gateway. The student network outputs water quality parameters such as chlorophyll a concentration of 15.3 μg / L, turbidity of 8.2 NTU, and dissolved oxygen content of 7.8 mg / L. Finally, the monitoring results are transmitted to the shore base station via LoRa link, and the complete data is recorded in the local storage device of the unmanned vessel for subsequent analysis.

[0087] When the unmanned surface vessel (USV) enters areas under bridges or obscured by trees, network bandwidth drops sharply to below 0.5 Mbps. The automatic adjustment strategy compresses the band retention rate to 0.3, retaining only the characteristic bands most correlated with water quality parameters. At this point, the accuracy predictor estimates an accuracy loss of approximately 5%, but this is still sufficient for emergency monitoring. Decisions are then made based on buffer queue backlog. The system transmits compressed data of key bands to the cloud for high-precision inference, and the cloud inference results are sent back to the unmanned surface vessel (USV) as a calibration reference for local predictions. When cloud-edge communication is completely interrupted, adjacent USVs form a temporary federated learning consortium through a LoRa Mesh network. Each USV shares the gradient of the physical prior subnet, and the model is collaboratively updated based on an RSSI-weighted decentralized aggregation mechanism. This ensures that the student network can continue to evolve and retain physical mechanism knowledge even under poor communication conditions.

[0088] In another application scenario, the system of this invention is deployed on a surface buoy platform to achieve long-term continuous monitoring of water bodies. Multiple buoy nodes are distributed across monitoring sections of reservoirs, lakes, or rivers. Each buoy is equipped with a miniature hyperspectral sensor, a solar power supply system, and a LoRa communication module. The buoy nodes collect spectral data once per hour during the day and switch to a low-power mode at night to collect data once every four hours.

[0089] During daytime monitoring, the spectral data collected by the buoy nodes is processed by a multi-agent system to generate a digital binary feature mask adapted to daytime lighting conditions. Due to ample daytime light and relatively stable water optical properties, the multi-agent model learns a strategy to retain more visible light bands to achieve higher prediction accuracy. During nighttime monitoring, the water spectral characteristics exhibit significant differences due to varying lighting conditions. The multi-agent model automatically adjusts its masking strategy, increasing the retention ratio of the near-infrared band to compensate for the decrease in the signal-to-noise ratio of the visible light band. The edge gateway records the optimal masking strategy under different lighting conditions, constructing a mapping table from lighting conditions to masking strategies to achieve adaptive responses to environmental changes.

[0090] Each buoy node periodically reports the statistical characteristics of its model activation vectors to the cloud, where unsupervised drift detection is performed. When multiple nodes simultaneously exhibit feature drift, it indicates a systemic change in water quality conditions within the monitored area, such as algal blooms or upstream pollutant input. The cloud triggers an emergency calibration process, calculating a global calibration plugin based on optimal transport theory and distributing it to each buoy node. After inserting the calibration plugin, each buoy node can accurately track the changed water quality characteristic distribution and continue to output reliable monitoring results. If a buoy node loses connection with the cloud for more than 48 hours due to equipment failure or weather conditions, the node automatically initiates local EWC constrained training. It uses previously accumulated local data to fine-tune its personalized prediction subnet under the constraints of physical prior subnet parameters, preventing the local forgetting of physical knowledge. Once the network recovers, the node pushes its local updates to the cloud, where federated averaging aggregation updates the global model and distributes a new round of calibration plugins.

[0091] In summary, compared with the prior art, the present invention has the following beneficial effects: First, it achieves maximum computational power release through hardware-software decoupling. This invention is the first to directly reduce the dimensionality of the first-layer feature map of a deep network to a feature subset, i.e., column pruning, through a decision mask. At the hardware level, it strictly reduces the floating-point operation volume to a proportion proportional to the band retention rate. This solution completely solves the hardware bottleneck caused by the need to calculate a large number of zero values in traditional dimensionality reduction algorithms even after reducing the input dimension, thus releasing the maximum effective computing resources under the condition of limited computing power on edge devices.

[0092] Second, it achieves high-dimensional generalization by incorporating physical equations. This invention completely departs from the purely data-driven approach, forcibly introducing the Gordon radiative transfer partial differential equation as a physical conservation residual term into the loss function of the cloud-based teacher network, and injecting physical priors into the edge student network through knowledge distillation. This scheme greatly broadens the generalization ability and robustness of the water quality prediction model under complex lighting conditions, ensuring stable monitoring performance in uncontrolled field environments.

[0093] Third, robust evolution under harsh network conditions is achieved. This invention combines the LoRa decentralized federated learning mechanism of RSSI signal matrix, enabling edge gateways to still perform model collaborative training in network-disconnected environments; at the same time, it achieves unsupervised feature drift calibration through optimal transmission theory, enabling the system to completely get rid of the predicament of being unable to iterate due to the lack of network and manual labeling in wild waters, and realize the ability to continuously adapt and evolve in real wild environments.

[0094] Example 2 like Figure 2 As shown, the second embodiment of the present invention also provides an edge cloud hyperspectral monitoring device based on the synergy of binary mask and mechanism, comprising: The acquisition unit is used to acquire raw spectral data covering the entire visible to near-infrared region. The multi-agent construction unit is used to divide the entire band of the original spectral data into several adjacent sub-bands and deploy independent agents one by one. The three-dimensional reward function based on mutual information regularization is used to guide each agent to select decision output binary actions for the sub-bands. The binary actions are spliced together to form a digital binary feature mask vector. A multi-objective coordination and control unit is used to construct a cost function based on the Lyapunov multi-objective decision framework, and generate the optimal computing power unloading decision by minimizing the cost function. The compression and reconstruction unit is used to transform the digital binary feature mask into a diagonal indicator matrix when the optimal computing power offloading decision is determined to be local inference. It then reconstructs the first-layer computation graph of the network through projection, extracts the non-zero indices of the digital binary feature mask, performs column pruning and dimension compression on the preset first-layer weights of the student network and the original spectral data, and obtains the first-layer compressed inference output features. These features are then used by the student network to output water quality inversion parameters. The student network is a lightweight inference model deployed on an edge gateway, which inherits the physical prior feature weights of the cloud-based teacher network through knowledge distillation during training. The cloud-based teacher network is a deep neural network deployed in the cloud, which incorporates the physical conservation residual term of Gordon's water radiative transfer equation during training to embed prior knowledge of water optical physics laws into the network's hidden layer features.

[0095] Example 3 The third embodiment of the present invention also provides an edge cloud hyperspectral monitoring device based on the synergy of binary mask and mechanism, which includes a memory, a processor, a LoRa communication interface and a neural network accelerator; The processor uses an ARM Cortex-A series embedded chip and is responsible for system resource scheduling and decision logic execution. The memory uses an LPDDR4 chip with a capacity of 4GB and is used to store student network weights, masking strategies, and intermediate calculation results. The LoRa communication module supports multi-band operation of 433MHz / 868MHz / 915MHz, with a maximum transmit power of 20dBm and a communication distance of up to 10 kilometers in open environments. The neural network accelerator uses FPGA or NPU chips, providing 3 TOPS to 10 TOPS of INT8 computing power, which can efficiently perform forward inference of student networks. The memory stores a computer program that can be executed by the processor to implement the edge cloud hyperspectral monitoring method based on binary mask and mechanism synergy, as described above.

[0096] In the specific implementation of the edge gateway device, the digital binary feature mask is stored in high-speed SRAM with an access latency of less than 10 nanoseconds. The precision predictor and small network weights are stored in DRAM, and data interaction with the neural network accelerator is achieved through a DMA controller. When performing column pruning and compression inference, the system first reads the mask vector from SRAM, then generates a sparse weight index table based on the mask vector, and finally starts the neural network accelerator to perform compressed matrix multiplication operations. Sparse matrix operations support dynamic sparse mode, which can dynamically adjust the computation graph according to the mask vector of each inference, avoiding the repeated compilation overhead caused by mask changes in traditional solutions.

[0097] Example 4 The fourth embodiment of the present invention also provides a computer-readable storage medium storing computer-readable instructions. When the computer-readable instructions are executed by the processor of the device where the computer-readable storage medium is located, the edge cloud hyperspectral monitoring method based on binary mask and mechanism synergy, as described above, is implemented.

[0098] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A hyperspectral monitoring method for edge clouds based on the synergy of binary masking and mechanism, characterized in that, include: S1, acquire raw spectral data covering the entire spectral band from visible light to near-infrared; S2, the entire band of the original spectral data is divided into several adjacent sub-bands, and independent agents are deployed one by one. A three-dimensional reward function based on mutual information regularization is used to guide each agent to select decision output binary actions for the sub-bands. The binary actions are spliced together to form a digital binary feature mask vector. S3, construct a cost function based on the Lyapunov multi-objective decision framework, and generate the optimal computing power unloading decision by minimizing the cost function; S4, when the optimal computing power offloading decision is determined to be local inference, the digital binary feature mask is transformed into a diagonal indicator matrix. The first-layer computation graph of the network is reconstructed by projection. The non-zero index of the digital binary feature mask is extracted and the column pruning dimension compression is performed on the preset first-layer weights of the student network and the original spectral data to obtain the first-layer compressed inference output features, so as to output water quality inversion parameters through the student network. The student network is a lightweight inference model deployed on the edge gateway. During training, it inherits the physical prior feature weights of the cloud teacher network through knowledge distillation. The cloud teacher network is a deep neural network deployed in the cloud. During training, the physical conservation residual term of Gordon's water body radiative transfer equation is introduced so that the prior knowledge of the optical physical laws of water bodies is embedded in the hidden layer features of the network.

2. The method for end-edge cloud hyperspectral monitoring based on the synergy of binary mask and mechanism as described in claim 1, characterized in that, Also includes: The mean and covariance matrix of the activation vectors within the student network are extracted and reported to the cloud to calculate the second-order Wasserstein drift score for detecting whether the data distribution has drifted. When the drift exceeds the set threshold, the parameters of the affine transformation plugin are solved using optimal transport theory to perform unsupervised feature drift calibration.

3. The method for edge cloud hyperspectral monitoring based on the synergy of binary mask and mechanism as described in claim 1, characterized in that, Also includes: The student network model is decoupled into a shared physical prior subnet for storing physical law knowledge inherited from cloud knowledge distillation, and a personalized prediction subnet for adapting to specific characteristics of the local monitoring environment. When communication between the edge gateway and the cloud is interrupted, a topology graph is constructed through LoRa Mesh networking, and decentralized federated gradient aggregation updates are performed based on the dynamic allocation of gradient weights of neighboring nodes according to RSSI signals. An EWC regularization term is introduced during local gradient updates to prevent physical prior knowledge from being forgotten.

4. The method for edge cloud hyperspectral monitoring based on the synergy of binary mask and mechanism as described in claim 1, characterized in that, The expression for the three-dimensional reward function is: ；；；； in, The value of the three-dimensional reward function; , , These are the inference accuracy weighting coefficient, retention rate penalty weighting coefficient, and mutual information regularization term coefficient, respectively. For the accuracy corresponding to the spectral band; for Real-time network bandwidth at any given moment The current moment; The overall retention rate of the spectral band; For mutual information, the Pearson cross-correlation matrix is used as an engineering approximation. This represents the total number of spectral bands. For the first Digital binary feature mask decision values for each band; For the first , spectral vectors of each band , The cross-correlation coefficient; Expressing expectations; , Representing spectral vectors respectively , The mean; , Representing spectral vectors respectively , The standard deviation.

5. The method for edge cloud hyperspectral monitoring based on the synergy of binary mask and mechanism as described in claim 1, characterized in that, The cost function is constructed based on queue drift, latency, energy consumption, and accuracy loss, and its expression is: ；；； in, express The value of the Lyapunov multi-objective decision cost function at time t; This represents the single-step queue drift amount; Optimize the weighting coefficients for Lyapunov; Expressing expectations; To unload decision variables, where 0 represents local inference and 1 represents cloud unloading; express Communication and inference latency under real-time decision-making; express Energy consumption under real-time decision-making; express The amount of accuracy loss under real-time decision-making; , , These represent the weighting coefficients for latency, energy consumption, and accuracy loss, respectively. for The backlog in the edge gateway's data buffer queue at any given time; for Data processing rate at any given moment; for Data arrival rate at any given time; This indicates taking the maximum value; The accuracy loss is calculated by using the digital binary feature mask vector as input to the lightweight accuracy predictor for forward inference to output the expected inference accuracy, and then combining it with the full-band reference accuracy. The expression is as follows: ；； in, This refers to the amount of precision loss. To achieve the expected inference accuracy; This represents the highest benchmark inference accuracy across the entire frequency band. This represents the Softmax activation function; This represents the ReLU activation function; , These are the first and second layer weight matrices of the lightweight precision predictor, respectively. This is a digital binary feature mask vector; , These are the first and second layer bias vectors of the lightweight precision predictor, respectively.

6. The method for end-edge cloud hyperspectral monitoring based on the synergy of binary mask and mechanism as described in claim 5, characterized in that, The specific process of performing local inference is as follows: First, the digital binary feature mask is converted into a diagonal indicator matrix. , is represented as: ； in, This is a digital binary feature mask vector; Represents a diagonal matrix operator; Then extract the diagonal indicator matrix. The set of indexes for non-zero columns is used to obtain the set of band indexes. , is represented as: ； in, For the first Digital binary feature mask decision values for each band; Then based on the band index set Column pruning is performed on the pre-defined first-layer weight matrix of the student network and the input raw spectral data to obtain sparse compressed weights. With compressed spectrum ; Reconstruct sparse compression weights With compressed spectrum The output features of the first-layer compressed inference are obtained, and are represented as follows: ； in, This is the output feature of the first-level compressed inference.

7. The method for end-edge cloud hyperspectral monitoring based on the synergy of binary mask and mechanism as described in claim 5, characterized in that, When the optimal computing power offloading decision is determined to be cloud offloading, the edge gateway performs band clipping on the original spectral data according to the digital binary feature mask, uploads the clipped spectral data and the digital binary feature mask to the cloud, and the cloud teacher network performs high-precision inference. The monitoring results and the updated physical prior weights are then sent to the edge gateway to update the student network.

8. The method for end-edge cloud hyperspectral monitoring based on the synergy of binary mask and mechanism as described in claim 2, characterized in that, The formula for calculating the second-order Wasserstein drift fraction is as follows: ； in, The second-order Wasserstein drift score measures the degree of difference between the current data distribution and the baseline distribution; a higher score indicates a more severe data distribution drift. This represents the mean vector of the model activation vectors under the current drift data distribution; This represents the mean vector of the model activation vectors under the baseline data distribution. The trace operation represents the sum of the elements on the main diagonal of a matrix. This represents the covariance matrix of the model activation vectors under the current drift data distribution; This represents the covariance matrix of the model activation vectors under the baseline data distribution. Represents the L2 norm; The expression for solving the affine transformation plug-in parameters using optimal transport theory is as follows: ；； in, The scaling matrix represents the affine transformation plugin, used to map the features of the drift distribution back to the reference distribution space; This represents the bias vector of the affine transformation plugin, used in conjunction with the scaling matrix to align the feature distribution. The training of the cloud-based teacher network not only relies on the prediction error of water quality label data as a supervision signal, but also forcibly introduces a radiative transfer residual term based on the Gordon water optics model to constrain the model output to conform to the optical physics laws of water bodies; the loss function formula of the cloud-based teacher network is: ； in, The loss to the cloud-based teacher network; This represents the true label vector of water quality parameters; The water quality parameter prediction vector output by the teacher network; This is the mean square error loss; These are the weighting coefficients for the physical residual term, used to balance data loss and physical constraints; For water body remote sensing reflectance, , The water body at wavelengths Absorption coefficient and backscattering coefficient at the location; This is the medium constant in Gordon's optical model of water; Represents the L2 norm; The edge student network does not require solving partial derivatives; it inherits the teacher's physical latent features solely by minimizing the total knowledge distillation loss. The expression is: ； in, The total knowledge distillation loss for edge student networks; This is the weighting factor for distillation losses; Predicted water quality parameters output to the student network; For accurate labeling of water quality parameters; Cross-entropy loss; This is the distillation temperature, used to soften the output distribution of the teacher network. The Softmax activation function is used. The unnormalized Logits vector output by the cloud-based teacher network; The unnormalized Logits vector output by the edge student network; This is the KL divergence, used to measure the difference between the output distributions of teachers and students.

9. The method for end-edge cloud hyperspectral monitoring based on the synergy of binary mask and mechanism as described in claim 3, characterized in that, The expression for the decentralized federated gradient aggregation update is: ；； in, This refers to the edge gateway node currently performing gradient updates. , For nodes The neighboring nodes; For nodes The set of all neighboring nodes; , They are nodes , The parameter vector of the shared physical prior subnet; The learning rate is used for gradient updates; For nodes arrive The weight transition matrix; Represents a node Receive from node Signal strength; Represents a node Receive from node Signal strength; For nodes The loss gradient calculated on local data for parameters of the shared physical prior subnet; The expression for introducing the EWC regularization term during local gradient updates is: ； in, This is the current parameter vector of the student network model; The total loss function for local training at the node; The basic loss function for local water quality prediction tasks; The weighting coefficients for the EWC regularization term are used to control the strength of protection for physical prior knowledge. The first parameter in the current parameter vector of the model One parameter; for The first physical prior weight in the initial weights sent from the cloud to the edge gateway One parameter; For the Fisher information matrix The diagonal elements corresponding to each parameter.

10. A hyperspectral monitoring device for edge clouds based on the synergy of binary mask and mechanism, used to implement the hyperspectral monitoring method for edge clouds based on the synergy of binary mask and mechanism as described in any one of claims 1-9, characterized in that, include: The acquisition unit is used to acquire raw spectral data covering the entire visible to near-infrared region. The multi-agent construction unit is used to divide the entire band of the original spectral data into several adjacent sub-bands and deploy independent agents one by one. The three-dimensional reward function based on mutual information regularization is used to guide each agent to select decision output binary actions for the sub-bands. The binary actions are spliced together to form a digital binary feature mask vector. A multi-objective coordination and control unit is used to construct a cost function based on the Lyapunov multi-objective decision framework, and generate the optimal computing power unloading decision by minimizing the cost function. The compression and reconstruction unit is used to transform the digital binary feature mask into a diagonal indicator matrix when the optimal computing power offloading decision is determined to be local inference. It then reconstructs the first-layer computation graph of the network through projection, extracts the non-zero indices of the digital binary feature mask, performs column pruning and dimension compression on the preset first-layer weights of the student network and the original spectral data, and obtains the first-layer compressed inference output features. These features are then used by the student network to output water quality inversion parameters. The student network is a lightweight inference model deployed on an edge gateway, which inherits the physical prior feature weights of the cloud-based teacher network through knowledge distillation during training. The cloud-based teacher network is a deep neural network deployed in the cloud, which incorporates the physical conservation residual term of Gordon's water radiative transfer equation during training to embed prior knowledge of water optical physics laws into the network's hidden layer features.