A visible light-hyperspectral dual-flow feature adaptive fusion identification method and system

By storing the static weights of B-spline branches on the edge computing terminal and using a dual-threshold hysteresis state machine and local information entropy as guidance, the resource fluctuation problem of the edge computing terminal is solved, achieving efficient and stable dual-stream feature fusion and recognition, and ensuring the continuity and accuracy of the recognition model.

CN122244627APending Publication Date: 2026-06-19ZHEJIANG FORESTRY UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHEJIANG FORESTRY UNIVERSITY
Filing Date
2026-05-25
Publication Date
2026-06-19

Smart Images

  • Figure CN122244627A_ABST
    Figure CN122244627A_ABST
Patent Text Reader

Abstract

This invention discloses a recognition method and system for adaptive fusion of visible light and hyperspectral dual-stream features. Addressing the risk of inference interruption caused by high temperatures and memory fragmentation at edge terminals, it acquires visible light and hyperspectral images from the same field of view, constructs spatial texture branches and spectral nonlinear branches, and performs feature weighted fusion using local information entropy priors. During runtime, it collects the maximum continuously allocatable memory block, processor temperature, and load indicators, employs a dual-threshold hysteresis state machine and an emergency protection path to update the mode control factor, and activates and deactivates B-spline enhancement branches as needed. Mode switching is achieved without restarting the model or reloading weights through resident static weight parameters. Combined with spatial sparsity and rollback strategies, it reduces invalid computation, improves inference continuity and stability under resource fluctuation scenarios, and reduces the risk of edge terminal OOM (Out of Memory) crashes.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the interdisciplinary field of computer vision, forestry information technology and edge computing, and specifically relates to a recognition method and system for adaptive fusion of visible light and hyperspectral dual-stream features. Background Technology

[0002] Accurate identification of tree species and disease conditions has become a crucial aspect of the refined forestry management system. Hyperspectral data can capture subtle changes in plant biochemical components (such as chlorophyll and water content), and combined with the spatial texture representation advantages of visible light imagery, developing cross-modal identification methods based on "map-spectrum fusion" is a key approach to overcoming the challenges of forestry monitoring in complex field environments. There has been relevant exploration in the fields of multimodal feature fusion and model design. Existing technology CN114663785A – "A Method and System for Detecting Litchi Diseases Based on UAV Hyperspectral Data" – processes and stitches hyperspectral and visible light data to create a three-dimensional panoramic image, which is then input into a pre-constructed disease data model to determine the plant's health status and disease type.

[0003] However, existing technologies, due to their deployment on edge computing terminals with limited computing power, still face the following problems: First, there is a significant contradiction between the static computation graph design and the variable resource load at the edge. Existing models determine the computation path and power consumption during the compilation phase, but in forestry field inspections, thermal protection due to high temperatures or memory shortages caused by multi-process concurrency are common. Static models lack a closed-loop response that is aware of hardware status, making it difficult to flexibly degrade when resources are limited. Second, there is a conflict between the performance gains of high-order nonlinear operators and the extremely limited computing quota at the edge. Although operators such as B-splines can enhance spectral feature extraction, the additional computing power they introduce often becomes a heavy burden on the terminal side. Currently, there is a lack of elastic switching logic that allows the system to achieve low switching latency and no restart between "pursuing ultimate performance" and "ensuring low power consumption and safety," which can easily lead to system crashes or sudden drops in frame rate during dynamic fluctuations. Third, due to the lack of explicit guidance from physical priors, the interleaving of dual-stream features is often blind. Traditional attention mechanisms primarily rely on data-driven weight learning, which struggles to reliably lock onto high-value information in low-quality forestry imaging environments with uneven lighting or motion interference. To address these challenges, an adaptive framework capable of real-time edge load awareness and immediate computational path reconstruction is needed. By embedding a hardware feedback mechanism into the inference loop, smooth switching can be achieved without interrupting operation or overloading weights. Summary of the Invention

[0004] To address the problems of inference interruption caused by the inability of existing edge-end multimodal recognition models to adapt to resource fluctuations, high latency in switching high-precision nonlinear modules, and lack of physical prior guidance in dual-stream feature fusion, this invention provides a visible-hyperspectral dual-stream feature adaptive fusion recognition method. The purpose of this invention is to store the static weights of the B-spline branch in memory, enabling mode switching to only change the computation path without restarting the model or reloading weights, achieving millisecond-level switching and solving the inference continuity problem at the edge in high-temperature and memory fragmentation scenarios. Another purpose is to use a dual-threshold hysteresis state machine to collect the maximum continuously allocable memory block and processor chip temperature, dynamically controlling the switching of the B-spline branch, allowing the high-precision computation branch to elastically degrade under resource constraints. A further purpose is to introduce local information entropy as a physical prior to guide the generation of channel weights by a lightweight gating network, achieving channel-by-channel complementary weighted fusion, solving the problem of poor robustness of dual-stream feature fusion in low-quality imaging environments. Yet another purpose is to use a spatially sparsity inference mechanism to perform B-spline computation only on the region of interest, solving the problem of invalid computation of the B-spline branch in the background region and saving computational resources at the edge.

[0005] To solve the above-mentioned technical problems, the present invention adopts the following technical solution: a recognition method based on adaptive fusion of visible light and hyperspectral dual-stream features, comprising the following steps: S1: Acquire visible light and hyperspectral images, and construct a feature cube; S2: The spatial feature extraction branch and the spectral feature extraction branch extract features from the feature cube respectively, and output spatial texture features and spectral features; The spectral feature extraction branch includes a basic activation branch and a B-spline enhancement branch, wherein the B-spline enhancement branch is enabled or disabled by a mode control factor. The mode control factor is dynamically updated by the hysteresis state machine based on the set of load indicators during the operation of the edge terminal, and the set of load indicators includes the maximum size of the contiguous allocatable memory block. The static weight parameters in the B-spline augmentation branch are always stored in memory space, and are not released when the B-spline augmentation branch is closed. S3: The spatial texture features and the spectral features are fused to obtain fused features, and the recognition result is output based on the fused features.

[0006] Preferably, the set of load metrics also includes one or more of the following: processor chip temperature, RAM usage, CPU utilization, and GPU utilization.

[0007] Preferably, the hysteresis state machine is configured with a trigger threshold, a recovery threshold, and a continuous count: if the number of times the load index continuously reaches the trigger threshold reaches the continuous count, the B-spline enhancement branch is controlled to be turned off; if the number of times the load index continuously reaches the recovery threshold reaches the continuous count, the B-spline enhancement branch is controlled to be turned on.

[0008] Preferably, the hysteresis state machine is further provided with an emergency threshold: when a single sample reaches the emergency threshold, the B-spline enhancement branch is forced to close.

[0009] Preferably, the spatial texture features and the spectral features are weighted and fused to obtain fused features. The weighted fusion includes: inputting the entropy prior map into a lightweight gating network to generate a channel weight vector, and fusing the spatial texture features and the spectral features according to the channel weight vector.

[0010] Preferably, the entropy prior map is obtained by constructing a single-channel reference map, calculating the local information entropy of each pixel within a sliding window, and assigning the local information entropy to the corresponding pixel to form the entropy prior map.

[0011] Preferably, the spectral feature extraction branch further includes sparsification calculation: the feature cube is divided into a region of interest and a background region according to the spatial texture features, wherein the B-spline enhancement branch is only calculated for the region of interest.

[0012] Preferably, the spectral feature extraction branch also includes a fallback mechanism: before performing the sparsification calculation, the proportion of the region of interest to the sum of the region of interest and the background region is calculated as the region of interest percentage, and the region of interest percentage is compared with a preset threshold. If it exceeds the preset threshold, the calculation is backed up to the full image.

[0013] Preferably, the preset threshold is determined by calculating the time consumed by sparsification calculation and full-image calculation under different regions of interest ratios, and selecting the region of interest ratio corresponding to the time consumed by the two is closest as the preset threshold.

[0014] As a preferred embodiment, a recognition system that adaptively fuses visible light and hyperspectral dual-stream features includes: The multimodal data preprocessing module is used to preprocess, register, and construct feature cubes for visible light and hyperspectral images; The spatial texture modeling module includes a spatial feature extraction branch for extracting spatial texture features from the feature cube; The spectral feature modeling module includes a spectral feature extraction branch for extracting spectral features from the feature cube; The feature fusion module is used to perform weighted fusion of the spatial texture features and the spectral features to obtain fused features; The classification output module is used to classify the fused features and output the recognition results; The calculation mode control module includes a state machine, which is used to collect a set of load indicators of edge terminals in real time and dynamically update the mode control factor according to the set of load indicators.

[0015] Compared with the prior art, the present invention has the following beneficial effects: This invention locks the static weight parameters of the B-spline enhancement branch in memory by keeping static weights resident and setting a control mode factor to control the B-spline switching. During switching, only the calculation path is changed through conditional execution, without releasing weights or reloading them. Compared with traditional solutions, this invention achieves millisecond-level seamless switching without restarting the model or reloading weights, avoiding inference interruption and weight reloading overhead, and ensuring the continuity and real-time performance of edge inference services.

[0016] This invention uses the maximum contiguous allocatable memory block as the core indicator to directly reflect the degree of memory fragmentation and establishes a correspondence with the continuous memory allocation requirements of the B-spline branch dynamic activation tensor. A dual-threshold hysteresis state machine is used to dynamically control the switching of the B-spline branches. The dual thresholds form a buffer, and combined with continuous counting, instantaneous fluctuations are eliminated, suppressing frequent switching. In simulated edge device constrained environments, this effectively avoids service interruptions at the edge due to memory fragmentation or high temperatures.

[0017] This invention introduces local information entropy as a physical prior to guide dual-stream feature fusion. Compared to purely data-driven attention mechanisms such as SE-Block and CBAM, this invention offers stronger physical interpretability and environmental adaptability, enabling more stable locking of high-value information even in low-quality forestry imaging environments. By generating channel weight vectors through a lightweight gating network, the computational complexity is lower than traditional methods, resulting in higher inference efficiency when deployed at the edge, achieving channel-by-channel complementary weighted fusion. This invention achieves temporal-spatial dual adaptive optimization: in the temporal dimension, B-spline branches are dynamically switched via a state machine; in the spatial dimension, B-spline computation is guided to focus only on the region of interest (ROI) through a saliency mask, and an ROI ratio backoff mechanism is set to avoid negative optimization from sparse computation. This invention forms three energy efficiency levels: "fully high precision—sparse high precision—low power consumption and security," enabling fine-grained hierarchical scheduling of computing resources. Attached Figure Description

[0018] Figure 1 This is a flowchart of the overall process of the method of the present invention.

[0019] Figure 2 This is a flowchart of the spectral feature extraction module.

[0020] Figure 3 This is a system module architecture diagram. Detailed Implementation

[0021] The specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the scope of the invention.

[0022] Example 1

[0023] This embodiment uses "early monitoring of pine wilt disease using an embedded edge computing terminal (such as NVIDIA Jetson AGX Orin) on a drone" as a typical application scenario. This scenario faces three major technical challenges: (1) Early symptoms are hidden and cannot be identified under visible light, requiring the use of weak nonlinear features in specific bands of the hyperspectral spectrum, which requires strong nonlinear operators such as B-splines for modeling; (2) The high temperature environment in summer combined with the heat dissipation and computing load of the drone makes the edge chip prone to triggering thermal protection, resulting in a sudden drop in performance or crash; (3) Multi-task concurrency (flight control, image transmission) causes severe memory fluctuations, and the static model is prone to OOM crash due to failure of dynamic tensor allocation.

[0024] The technical solution of this invention specifically addresses the above challenges: (a) B-spline enhancement branches are used to capture the weak spectral features of early-stage diseases; (b) a dual-threshold hysteresis state machine switches between high-precision and low-power modes in real time based on chip temperature and memory status; and (c) a spatial sparsity mechanism concentrates B-spline calculations on suspected disease areas. Those skilled in the art should understand that this scenario is merely an example, and this invention is also applicable to other forestry remote sensing identification scenarios such as forest fire risk early warning and invasive tree species identification.

[0025] The system adopts an "end-edge" collaborative deployment: the end consists of a visible light camera and a hyperspectral sensor mounted on the drone, while the edge consists of an onboard computing module responsible for real-time inference. For example... Figure 1 As shown, the specific steps include:

[0026] (a) Data Acquisition and Preprocessing (1) Data acquisition: Acquire visible light images of the same scene. I v With hyperspectral imagery H Both need to ensure consistent field of view coverage, which can be achieved through synchronous triggering of sensor hardware on the same platform or through time synchronization based on a unified clock.

[0027] (2) Radiometric calibration and geometric correction: The hyperspectral image was radiometrically calibrated using the standard black and white plate calibration method to correct sensor response differences; geometric correction was performed based on preset ground control points (GCPs) and a polynomial correction model to reduce the effects of distortion and attitude changes.

[0028] (3) Dimensionality reduction: Perform dimensionality reduction on the hyperspectral image in the spectral dimension to obtain C h Dimensionally reduced hyperspectral features of the channel H r Dimensionality reduction methods can include principal component analysis, band selection, or equivalent methods; for example, one could take... C h =16 or C h =32.

[0029] (4) Spatial registration and stitching: Based on the visible light image, H r Spatially aligned to the same pixel coordinate system (e.g., eliminating pose differences through homography transformation), and stitched together in the channel dimension to form an input feature cube:

[0030] in I ' v and H ' r For the aligned data, C =3+ C h Or it could be the number of channels after brightness / feature extraction; and it can be used for... X Perform normalization (e.g., use Min-Max normalization to map the values ​​to the [0,1] interval) to adapt to the network input scale.

[0031] (II) Dual-stream feature extraction—spatial texture modeling This embodiment uses a method of "block embedding + multi-directional serialization scanning + state space recursive update" to extract spatial context.

[0032] (1) Block embedding: The input feature cube is divided into blocks. X according to P × P Block partitioning (e.g.) P =4 or P =8), which can be mapped to a two-dimensional token feature map through linear or convolutional mapping:

[0033] in , D represents the token dimension.

[0034] (2) Static Index Mapping Table Construction: The index mapping table is configured to store the mapping relationship between two-dimensional pixel coordinates and one-dimensional sequence subscripts to support fast indexing of four scanning paths: left→right, right→left, top→bottom, and bottom→top. In the preferred embodiment, omnidirectional four-channel scanning is used to maximize spatial context coverage. At the same time, for hardware scenarios with extremely limited computing resources, this embodiment can also adopt a simplified topology structure that retains only horizontal bidirectional or vertical bidirectional interaction. Technicians can adaptively trim the number of scanning paths according to the load envelope of the target hardware platform.

[0035] (3) Serialization and State Space Update: During the inference phase... T Performing index rearrangement (e.g., using the gather or equivalent index operator) yields a one-dimensional sequence. S k And perform recursive updates to the state space of the sequence. An example of an implementable recursive form is:

[0036] Among them, discretization parameters A s, B s It is determined by the state-space parameters of the continuous system , The sampling time step Δ is derived from the zero-order hold (ZOH) principle, which satisfies...

[0037] when When it is irreversible, it can be implemented using pseudo-inverse or stable approximate discretization methods.

[0038]

[0039] in x t For sequence input, h t For state, y t For output, A s, B s, C s, D s These are learnable parameters or coefficients generated by the network; in an element-wise parameterized implementation, the coefficients can be configured in diagonal or element-wise form, with the sign... This indicates element-wise multiplication.

[0040] (4) Multi-directional aggregation: The four-directional outputs are aggregated using an element-wise summation method to obtain spatial texture features. F spatial .

[0041] (III) Dual-stream feature extraction—spectral nonlinear modeling This embodiment performs a combined nonlinear mapping on the input features in the spectral branch, with the basic activation branch executing constantly and the spline term being able to be turned on and off during runtime. The switchable nonlinear enhancement branch can be a nonlinear mapping operator that satisfies the conditional execution constraints; this embodiment uses the B-spline enhancement branch as the preferred implementation for description, but in other embodiments, it can also be implemented using piecewise polynomial basis expansion, lookup table approximation, or equivalent nonlinear basis expansion, and all follow the operating mechanism of "static weights always present + conditional execution switch".

[0042] (1) Combinatorial nonlinear mapping: for any scalar element in the input feature tensor x (That is, perform element-wise operations) Calculate:

[0043] Where σ(·) can be a ReLU, SiLU, or equivalent basic activation; B i,d ( x )for d B-order spline basis functions; n The number of basis functions (e.g.) n =8 or n =16); α∈{0,1} is the mode control factor. w b ,c i These are learnable parameters.

[0044] (2) Example of feasible construction of B-spline terms: The input feature x can be linearly normalized to the interval [0,1] by channel, and a uniform or piecewise uniform node vector can be used. structure d B-order spline basis functions (e.g., using open uniform node vectors, with the first and last node repetition degree being...) d +1, with uniformly distributed intermediate nodes); the B-spline basis functions can be calculated using the Cox-de Boor recursive formula, where: ,

[0045] in,{ u k} represents the node vector. d This represents the spline order; when the denominator is 0, the corresponding fractional term is set to 0 to avoid numerical anomalies. Spline coefficients.c i Resident storage.

[0046] (3) Runtime Topology Reconstruction Strategy: Regarding runtime topology reconstruction, this scheme regulates the task allocation path of the computation graph by setting a mode control factor α. When α = 0, the system activates internal pruning logic, using a logical mask or branch selection instructions to allow the computation flow to bypass the complex B-spline operation subgraph and instead execute only the basic activation path. During this mode switching, relying on the parameter memory residency strategy, the system avoids the scheduling and memory management overhead caused by frequent memory allocation and deallocation, thereby ensuring low latency during switching between high / low precision modes.

[0047] (iv) Feature fusion This embodiment uses local information entropy as a priori guide to generate dual-channel fusion weights.

[0048] (1) Single-channel benchmark map: Constructing a benchmark map from the input feature cube X I ref Visible light brightness component or reduced-dimensional hyperspectral features can be obtained. H r The first principal component.

[0049] (2) Quantization and sliding window histogram: I ref Quantified as L gray levels (e.g.) L =16 or L =32), for each pixel's neighborhood window (e.g. W win = 9×9 or W win = 15×15) Statistical histogram and normalized to probability p j .

[0050] (3) Local information entropy: calculated for the center position of the window.

[0051] Where ε is a constant to avoid log(0) (e.g., ε=10). -6 Based on this, an entropy prior diagram is formed. E .

[0052] (4) Lightweight Gated Network Analysis: Extracting the local information entropy prior graph EA custom-designed gating network is input. This network employs a lightweight structure with alternating cascades of depthwise convolutions and pointwise convolutions, and utilizes the sigmoid function to map the weights of each pathway. The resulting gating graph undergoes spatial dimensionality reduction and compression via a global average pooling layer, ultimately resolving the corresponding channel weighting vector λ. In the specific implementation, the depthwise convolutional layers tend to use 3×3 kernels to cover local spatial features, while the pointwise convolutions are responsible for mapping the channel dimensions to the target size for downstream fusion.

[0053] (5) Channel-by-channel complementary fusion: Based on the weight λ, before performing fusion, the spatial texture features are first... F spatial Upsampled to spectral characteristics via bilinear interpolation F spectral They have the same spatial resolution and are aligned to the same number of channels using 1×1 convolutions;

[0054] (v) Classified Output Fusion features F fused The input is sent to the classification output module. In this embodiment, the classification output module adopts a structure of global average pooling and fully connected cascaded layers.

[0055] Specifically, firstly, global average pooling is performed on the fused features, compressing the spatial information of each feature channel into a scalar value to obtain a feature vector. This operation reduces the spatial dimension to 1, preserving the semantic information of the channel dimension while significantly reducing the number of parameters in subsequent fully connected layers. Then, the feature vector is input into the fully connected layer, mapping the channel dimension to the number of categories to be identified, and obtaining the original output score for each category. In this embodiment, for the early monitoring task of pine wilt disease, the number of categories to be identified is set, corresponding to five output categories: healthy, early disease, mid-stage disease, late disease, and others (non-pine or non-disease targets). Finally, the original output scores are converted into a probability distribution, and the category with the highest probability is taken as the final identification result.

[0056] (vi) Runtime adaptive control and conditional execution The system starts an independent monitoring thread to periodically collect load metrics at sampling intervals Δt = 1 second: The maximum contiguous allocatable memory block size is estimated by parsing the ` / proc / buddyinfo` file in Linux to obtain the number of free page blocks at each level (e.g., if the page size is 4KB, then the 4MB trigger threshold corresponds to order 10). This is a core indicator for determining the degree of memory fragmentation. Processor chip temperature is obtained by reading ` / sys / class / thermal / thermal_zone0 / temp`.

[0057] Dual-threshold hysteresis state machine logic: The system is initially in high-precision mode (α=1). Define a continuous degradation counter cnt_degrade and a continuous recovery counter cnt_recover, initially set to 0.

[0058] Degradation judgment in high-precision mode: 1. Emergency protection path: If a single sample meets the condition of "memory block < emergency threshold (e.g., 2MB)" or "temperature > emergency threshold (e.g., 90℃)", then α is immediately set to 0, continuous counting is skipped, and immediate blocking is performed.

[0059] 2. Standard Degradation Path: If the emergency condition is not met, check if "memory block < trigger threshold (e.g., 4MB) or temperature > trigger threshold (e.g., 80℃)" is satisfied. If satisfied, increment cnt_degrade by 1; otherwise, clear it to zero. When cnt_degrade ≥ K_high (e.g., 3), set α to 0 and switch to low-power safe mode.

[0060] Recovery determination in low-power safe mode: 1. Determine if the following conditions are met: "memory block > recovery threshold (e.g., 16MB) and temperature < recovery threshold (e.g., 65℃)". If both conditions are met, increment cnt_recover by 1; otherwise, clear it to zero.

[0061] 2. When cnt_recover ≥ K_low, set α to 1 and restore to high-precision mode.

[0062] The dual thresholds (e.g., trigger threshold 4MB vs. recovery threshold 16MB) form a hysteresis buffer, which, combined with continuous counting (K_high=3, K_low=5), effectively filters out instantaneous load fluctuations and suppresses frequent mode switching.

[0063] (VII) Experimental Results and Analysis To verify the effectiveness of this invention, tests were conducted in a simulated edge environment, and the main results are as follows: Recognition accuracy: In the five types of forest tree disease identification tasks, the overall accuracy improved from 82.5% to 95.8% when the B-spline enhancement branch was enabled (α=1) compared to when it was disabled (α=0), which verified the key role of B-splines in spectral nonlinear modeling.

[0064] Resource saving and OOM protection: After switching to low-power safe mode (α=0), the peak memory usage of dynamically activated tensors is reduced by approximately 66.7%. In continuous memory stress testing (simulating a 4GB edge environment), the static high-precision model crashed due to OOM, while the inference service of the present invention remained uninterrupted, verifying its OOM protection capability.

[0065] Switching latency: The average end-to-end latency for switching between high / low precision modes is 7.46ms, with a maximum of 12.26ms. The control switching time is only 0.015ms, which is about 300 times faster than the traditional solution that requires weight reloading (~4.67ms). The switching process does not require restarting the model or reloading the weights.

[0066] Spatial sparsity benefits: In sparse scenarios where the ROI accounts for 20% (such as distant trees and sky), sparse computation saves 52.8% of computation compared to full-map computation; when the ROI accounts for 85%, it automatically reverts to full-map computation to avoid negative optimization.

[0067] The above experimental data fully demonstrates that, while ensuring high recognition accuracy, the present invention achieves effective resource elastic scaling, low-latency seamless switching, and stable edge inference continuity.

[0068] Example 2 This embodiment describes the specific implementation of the dual-threshold hysteresis state machine and emergency protection path in runtime adaptive control, and details the update logic of the mode control factor α, the continuous counting mechanism, and the triggering conditions of the emergency protection path.

[0069] (a) Selection of load index In edge computing scenarios, memory overflow (OOM) anomalies in deep learning inference tasks are often not directly caused by insufficient total memory, but rather by memory fragmentation that prevents the allocation of a large, contiguous block of memory. Specifically, the forward computation of complex operators such as B-spline augmentation branches requires the allocation of high-dimensional intermediate feature maps (dynamic activation tensors), which require large blocks of physically or virtually contiguous memory.

[0070] Traditional monitoring methods that use metrics such as "total RAM utilization" or "percentage of available memory" cannot reflect the degree of memory fragmentation. A system may still have total memory remaining, but due to frequent memory allocation and deallocation, the largest available contiguous memory block has become very small. In this case, even if the total memory is "sufficient," allocating a large dynamic activation tensor will still fail, triggering an OutOfMemoryError (OOM) and causing the inference service to crash.

[0071] To address this fundamental problem, this invention creatively introduces the "maximum contiguous allocatable memory block" as a core monitoring metric. This metric directly reflects the current fragmentation state of the system's memory and establishes a direct correspondence with the dynamic activation tensor size required for B-spline enhancement branches. When this metric falls below the trigger threshold, it means that even if there is still surplus total memory, the system is already in a high-risk state, and high-memory-consuming B-spline branches should be immediately shut down to ensure the continuity and stability of the inference service.

[0072] It should be noted that the monitoring indicator set of this invention is logically divided into two categories: Positive risk indicators include processor chip temperature, RAM usage, and CPU / GPU utilization. Higher values ​​for these indicators indicate a heavier system load and a higher risk.

[0073] Negative risk indicator: Maximum contiguous allocatable memory block. The lower the value of this indicator, the more severe the memory fragmentation and the higher the risk of failure in allocating large contiguous memory blocks.

[0074] When making judgments, the hysteresis state machine needs to apply different comparison logics based on the risk direction of different indicators. For the negative indicator, "less than" triggers the threshold, constituting a risk condition, while "greater than" recovers the threshold, constituting a safe condition. This differentiated processing is the key to the accurate perception of OOM risk in this invention.

[0075] (II) Load Indicator Collection An independent monitoring thread is started at the edge, and the following load metrics are periodically collected at a sampling interval of Δt = 1 second: (1) Maximum contiguous allocatable memory block: The size of the "maximum contiguous allocatable memory block" can be obtained or estimated using one of the following methods, depending on the operating system and hardware platform of the target edge terminal: Method 1: Directly read operating system kernel statistics.

[0076] On edge terminals with a standard operating system (such as Linux), this information can be obtained directly by parsing the kernel's exposed memory management interface. For example, in a Linux system, reading the ` / proc / buddyinfo` file provides the number of free page blocks in each memory zone of the Buddy System at different orders. Based on this, the maximum contiguous allocatable memory block size can be estimated as: Maximum contiguous block size = (2^Maximum order of available free page blocks) × System page size. If multiple memory zones exist, the maximum estimated value for each zone can be used.

[0077] Method 2: Tentative memory allocation estimation.

[0078] For platforms that do not support direct access to the above information, a low-overhead, tentative allocation strategy can be used for estimation. Specifically, in a background task isolated from the main inference thread, memory blocks of different sizes are attempted to be allocated using malloc or platform-specific APIs, following a binary search method or a preset size sequence (e.g., 1MB, 2MB, 4MB, ..., maximum threshold). The largest size of the last successful allocation is recorded as an estimate of the current maximum contiguous allocatable memory block. After each attempt, the allocated memory block must be released immediately, and a strict number of attempts and time budget (e.g., no more than 0.5ms per attempt) should be set to ensure that the real-time performance of the main inference task is not affected.

[0079] (2) Processor chip temperature: obtained by reading / sys / class / thermal / thermal_zone0 / temp, and converting the read value to degrees Celsius by dividing by 1000. Let the temperature trigger threshold T_trigger_temp = 80°C, the temperature recovery threshold T_recover_temp = 65°C, and the temperature emergency threshold T_emergency_temp = 90°C.

[0080] The collected indicator data is pushed into a circular queue with a queue length N satisfying N ≥ K_low (K_low = 5 in this embodiment), and is used for subsequent continuous counting judgment.

[0081] (III) Double Threshold Hysteresis State Machine Logic The state machine is initially set to high-precision mode (α = 1). A continuous degradation counter `cnt_degrade` and a continuous recovery counter `cnt_recover` are defined, both initially set to 0.

[0082] (a) Degradation judgment in high-precision mode (α = 1): The following judgment is performed in each sampling period: (1) Emergency protection path judgment: If the current sample satisfies MemBlock < T_emergency_mem or Temp > T_emergency_temp, then immediately set α to 0, and clear cnt_degrade and cnt_recover without waiting for continuous count confirmation. This path is used to deal with the risk of OOM crash or chip thermal damage in extreme cases.

[0083] (2) Normal degradation path judgment: If the emergency condition is not met, then check whether the degradation condition is met: MemBlock< T_trigger_mem or Temp > T_trigger_temp. If it is met, then cnt_degrade is incremented by 1; otherwise, cnt_degrade is cleared.

[0084] (3) When cnt_degrade ≥ K_high (K_high = 3 in this embodiment), α is set to 0, switching to low-power security mode, and cnt_degrade and cnt_recover are cleared. This switch takes effect in the next inference cycle.

[0085] (b) Recovery decision in low-power security mode (α = 0): The following judgment is performed in each sampling period: (1) Determine if the recovery conditions are met: MemBlock > T_recover_mem and Temp < T_recover_temp. If both conditions are met, increment cnt_recover by 1; otherwise, clear cnt_recover.

[0086] (2) When cnt_recover ≥ K_low (K_low = 5 in this embodiment), α is set to 1 to restore the high-precision mode, and cnt_degrade and cnt_recover are cleared. This switch takes effect in the next inference cycle.

[0087] To suppress frequent jitter at the threshold edge, this embodiment employs a hysteresis comparison mechanism: a significant difference exists between the degradation trigger threshold (4MB / 80°C) and the recovery threshold (16MB / 65°C), forming a buffer. Simultaneously, the continuous counting mechanism requires that the indicator meets the conditions multiple times consecutively before switching is executed, further filtering out false triggers caused by sensor sampling noise or short-term sudden load peaks.

[0088] (iv) The immediate activation mechanism of the emergency protection path The emergency protection path is independent of the regular hysteresis switching path and has a higher response priority. When an emergency condition is triggered, the update of α immediately takes effect on the scheduling of subsequent operators that have not yet started in the current inference cycle. The current cycle ends after the started operators have completed their execution. Specifically: If an emergency trigger occurs before the start of the current inference cycle, the entire cycle is executed in low-power safe mode; if an emergency trigger occurs during the execution of the current inference cycle, operators that have been initialized continue to execute, and operators that have not yet started are scheduled in low-power safe mode.

[0089] This mechanism ensures that when memory deteriorates rapidly or temperature spikes suddenly, the system can stop the dynamic activation tensor allocation of the B-spline enhancement branch as quickly as possible, avoiding an OOM crash due to allocation failure.

[0090] (v) Conditional execution and mode switching The mode control factor α is stored in the mode control variable (which can be implemented by a register or memory variable). In the spectral feature extraction branch, the B-spline enhancement branch corresponds to an independent subgraph, and its execution is controlled by a conditional execution mechanism. This embodiment uses conditional branching: def forward(self, x): y = self.base(x) if alpha == 1: y = y + self.spline(x) return y When α = 0, the forward function of the B-spline augmentation branch is not called, the operator corresponding to this branch is not executed, and therefore the intermediate tensor buffer for this branch is not created. At the same time, the static weight parameters of the B-spline augmentation branch (spline coefficients c_i, node vectors, etc.) are already resident in memory when the model is loaded, and are not released or reloaded during the switching process, thus achieving millisecond-level low-latency switching.

[0091] (vi) Joint triggering strategy In a variant of this embodiment, a joint triggering strategy can be enabled. In addition to the maximum contiguous allocatable memory block and processor chip temperature, at least one of RAM usage, CPU utilization, or GPU utilization is collected as an auxiliary judgment indicator. During degradation judgment, a degradation switch is performed when the trigger condition for the maximum contiguous allocatable memory block is met, and at least one auxiliary indicator simultaneously meets its corresponding trigger condition. During recovery judgment, a recovery switch is performed when the recovery condition for the maximum contiguous allocatable memory block is met, and all configured auxiliary indicators simultaneously meet their corresponding recovery conditions. This strategy can further improve the robustness of mode switching and avoid erroneous switching due to anomalies in a single indicator.

[0092] Example 3 To further improve the computational efficiency at the edge, this invention introduces a sparse inference mechanism in the micro-spatial dimension, based on the macro-temporal mode switching, to perform fine-grained spatial scheduling of the computation of the B-spline enhancement branch. This embodiment details the saliency mask generation, ROI ratio calculation, sparse computation execution, and automatic backoff rules based on the break-even point, including the following steps: (a) Saliency mask generation: using the spatial texture features output in step S2 F spatial Generate saliency mask M sal Specifically, for F spatial The mean and standard deviation are calculated along the channel dimension to obtain the spatial significance response map S. resp :

[0093] in( i , j ) represents spatial coordinates, μ F With σ F Here, represents the mean and standard deviation along the channel dimension, respectively, and β is an adjustable hyperparameter (e.g., β = 0.5). S resp Perform adaptive thresholding (e.g., using Otsu's method or a fixed percentile threshold) to obtain a binary saliency mask:

[0094] in The significance threshold, M sal The region marked with =1 is the region of interest (ROI). M sal The area marked with =0 is the background area.

[0095] To prevent diseased areas from being misclassified as background due to insignificant texture features, leading to missed detections, this invention employs a conservative threshold strategy: a saliency threshold. By selecting a lower percentile value from the spatial response map (e.g., the 50th to 70th percentile, preferably the 70th percentile rather than the 90th percentile), the coverage area of ​​the region of interest is ensured to be larger rather than smaller. Furthermore, morphological dilation operations (e.g., dilating 1-2 times using a 3×3 structuring element) are applied to the boundary regions of the binary mask to expand the mask edges, ensuring complete coverage of potential disease areas and their neighborhoods. This conservative strategy trades a small amount of additional computation for a significant reduction in the risk of missed detections.

[0096] (ii) Calculation of ROI percentage Significance mask M sal Downsampled to the same spatial resolution as the B-spline enhanced branch feature map H s × W s (In this embodiment) H s = W s =160). Count the number of pixels marked as 1 in the mask and calculate the ROI percentage:

[0097] The ROI percentage reflects the proportion of the current image where B-spline augmentation calculations need to be performed. For example, in a distant forest + sky scene, the sky area is marked as background, and the ROI percentage is about 20%; in a dense forest scene, most areas are regions of interest, and the ROI percentage can reach over 85%.

[0098] (III) Sparse computation execution: When the mode control factor α = 1 (high-precision mode), the B-spline enhancement branch is computed only on the regions of interest marked by the saliency mask. For the background region, the B-spline term computation is skipped, and only the basic activation terms are output. In practice, sparse computation operators or mask-based conditional execution mechanisms can be used.

[0099] in This represents element-wise multiplication. M sal It is included in the calculation after being aligned with the feature map size. For M sal For background pixel positions where =0, the calculation of the B-spline term is completely skipped, and only the output of the basic activation term is retained.

[0100] (iv) ROI percentage threshold rollback rules when At that time, spatial sparsification is disabled and the full-graph computation is reverted to the B-spline augmentation branch; where The break-even point is determined based on the time consumed by sparse computation and the time consumed by full-map computation on the target platform.

[0101] In this embodiment, The following procedure can be followed for determination: Select a set of candidate thresholds on the target platform (e.g., 0.1~0.9, step size 0.05). For each candidate value, measure the average time taken for "enabling sparsity" and "full graph computation" on a representative sample set (repeat multiple times and take the average). The percentage corresponding to the smallest absolute value of the difference between the two is taken as the break-even point. When there is no precise intersection, take the candidate value closest to the intersection as... .

[0102] (v) Synergistic effect of time-space dual saving mechanism: Time dimension: By using a dual-threshold hysteresis state machine, α is switched to 0 when the system load is too high, and the B-spline branch is completely shut down, achieving a significant saving in computing power and memory;

[0103] Spatial dimension: In the high-precision mode with α = 1, the B-spline calculation is focused only on potential disease areas or areas with complex canopy textures by using a saliency mask, skipping background areas (such as the sky and ground) that contribute little to the recognition results, thereby reducing unnecessary computational overhead; the amount of saving is related to the proportion of the region of interest (ROI) in the scene.

[0104] like Figure 2As shown, the time-space dual adaptive mechanism enables the system to flexibly switch between three energy efficiency states: (i) fully high-precision mode (α = 1, full graph computation), (ii) sparse high-precision mode (α = 1, ROI computation only), and (iii) low-power safe mode (α = 0, B-splines completely off), thereby achieving fine-grained hierarchical scheduling of computing resources.

[0105] Example 4 This embodiment provides a recognition system that adaptively fuses visible light and hyperspectral dual-stream features, used to implement the method described in any one of embodiments 1 to 3. Figure 3 As shown, the system includes a multimodal data preprocessing module, a spatial texture modeling module, a spectral nonlinear modeling module, a feature fusion module, a classification output module, and a computation mode control module.

[0106] (a) Multimodal data preprocessing module The multimodal data preprocessing module is used to preprocess, register, and construct input feature cubes for visible light and hyperspectral images. This module includes the following sub-units: The data acquisition unit acquires visible light and hyperspectral images of the same field of view through hardware synchronization triggering or time synchronization with a unified clock.

[0107] The radiometric calibration unit uses a black-and-white plate correction method to perform radiometric calibration on hyperspectral images and correct for differences in sensor response.

[0108] The geometric correction unit, based on preset ground control points, uses a polynomial correction model to perform geometric correction on hyperspectral images, reducing the impact of distortion and attitude changes.

[0109] The spectral dimensionality reduction unit performs principal component analysis on the hyperspectral image in the spectral dimension to obtain the dimensionality-reduced hyperspectral features.

[0110] The spatial registration unit uses the visible light image as a reference, employs ORB feature point matching and RANSAC to estimate the homography matrix, and resamples the dimension-reduced hyperspectral features to the pixel coordinate system of the visible light image.

[0111] Channel stitching and normalization unit: The registered visible light image and the reduced hyperspectral features are stitched together in the channel dimension to form the input feature cube X.

[0112] (II) Spatial Texture Modeling Module The spatial texture modeling module is used to extract spatial texture features from the input feature cube. This module includes the following sub-units: The block embedding unit divides the input feature cube X into P×P blocks (P=4 in this embodiment), and maps it into a two-dimensional token feature map T through linear mapping or convolution.

[0113] The static index mapping table memory pre-stores the mapping relationship between two-dimensional pixel coordinates and one-dimensional sequence subscripts, supporting fast indexing for four scan paths: left→right, right→left, top→bottom, and bottom→top.

[0114] The multi-directional serialization / rearrangement unit performs index rearrangement on the two-dimensional token feature map during the inference phase to obtain a one-dimensional sequence S. k .

[0115] The state-space recursive update unit is used to perform state-space recursive updates on a one-dimensional sequence, capturing long-distance dependencies within the sequence. This unit uses learnable state transition parameters to iteratively update each element of the sequence.

[0116] The spatial texture feature output unit aggregates the outputs in four directions using an element-wise summation method to output spatial texture features.

[0117] (III) Spectral Feature Modeling Module The spectral nonlinear modeling module extracts spectral features from the input feature cube. This module includes a basic activation branch and a switchable B-spline enhancement branch. During the model loading phase, the static weight parameters of the B-spline enhancement branch are locked in memory and cannot be released during inference. This module includes the following sub-units: The basic activation unit is used to perform a basic nonlinear transformation on the input features using the SiLU activation function. This branch always participates in the forward computation and is the basic path for spectral feature extraction.

[0118] The B-spline basis function calculation unit is used to calculate the B-spline basis function values ​​using the Cox-de-Boor recursive formula. This unit pre-calculates the basis function lookup table during system initialization, and quickly retrieves the basis function values ​​during inference by looking up the table, avoiding real-time recursive solutions.

[0119] The spline coefficient storage unit stores the static weight parameters—the spline coefficients—of the B-spline augmentation branches. This unit requests a memory lock from the operating system during model loading to ensure that these parameters are not replaced or released throughout the inference lifetime.

[0120] The mode control factor storage unit stores the mode control factor α, which takes the value of 0 or 1. When α is 1, the B-spline enhancement branch is enabled; when α is 0, the branch is disabled.

[0121] The conditional execution control unit determines whether to execute the B-spline augmentation branch based on the α value. When α is 0, the computation path corresponding to the branch is masked through a conditional branch or scheduling mask, and no storage space for intermediate tensors is allocated for it; when α is 1, the branch is enabled and the corresponding working area is allocated. The switching process does not restart the model or reload the weights.

[0122] The combined mapping output unit is used to selectively superimpose the output of the basic activation unit with the output of the B-spline enhancement branch. When α is 1, the two are superimposed; when α is 0, only the basic activation result is output, forming a combined nonlinear mapping.

[0123] The spectral feature output unit is used to output the spectral feature map after combined nonlinear mapping.

[0124] The saliency mask generation unit is used to generate saliency masks using spatial texture features. This unit calculates the mean and standard deviation along the channel dimension to construct a spatial saliency response map, and uses adaptive thresholding to obtain a binary mask. To ensure that potential disease areas are not missed, this unit adopts a conservative thresholding strategy (taking the 70th percentile) and performs morphological dilation on the mask boundaries.

[0125] The spatial sparsity inference control unit determines the B-spline calculation method based on the proportion of the region of interest when α is 1. This unit counts the proportion of pixels marked as 1 in the mask. If this proportion is lower than a preset backoff threshold, the B-spline is calculated only for the region of interest; if it is higher than the backoff threshold, it automatically backs down to full-image calculation. The backoff threshold is offline calibrated based on the break-even point between sparse and full-image calculations on the target platform.

[0126] (iv) Feature Fusion Module The feature fusion module is used to perform weighted fusion of spatial texture features and spectral features to obtain fused features. This module includes the following sub-units: The single-channel reference map construction unit is used to construct a single-channel reference map from the input feature cube, which can take the brightness component of the visible light image or the first principal component of the dimension-reduced hyperspectral features.

[0127] The quantization unit is used to quantize the baseline image into 16 gray levels, reducing computational complexity.

[0128] The sliding window histogram statistics unit is used to statistically analyze grayscale histograms within a 9×9 sliding window and normalize them into a probability distribution.

[0129] The local entropy calculation unit is used to calculate the local entropy at the center of each window based on the probability distribution, forming an entropy prior map. Regions with high entropy values ​​indicate rich texture, while regions with low entropy values ​​indicate a flat image.

[0130] The depthwise convolutional unit is used to encode local spatial features of the entropy prior map using 3×3 depthwise convolution, and to extract the entropy distribution pattern in the neighborhood.

[0131] Point convolutional units are used to perform cross-channel feature interaction using 1×1 point convolution, fusing entropy information from different channels.

[0132] The Sigmoid normalization unit is used to activate the convolution output with Sigmoid, mapping the weight values ​​to the range of 0 to 1.

[0133] The global average pooling unit is used to compress the spatial dimension to 1, obtain the weight scalar of each feature channel, and form the channel-gated weight vector.

[0134] The channel-wise complementary fusion unit is used to perform channel-wise weighted fusion, which superimposes spatial texture features and spectral features according to channel weights.

[0135] The fusion feature output unit is used to output the fused feature map.

[0136] (v) Classification Output Module The classification output module is used to classify the fused features and output the recognition results. This module includes the following sub-units: The global average pooling unit is used to perform global average pooling on the fused features, compressing the spatial information of each feature channel into a scalar value to obtain a one-dimensional feature vector.

[0137] The classifier unit uses a fully connected layer to map the feature vector to the number of categories to be identified, obtaining the original output score for each category. This embodiment sets up five categories for early monitoring of pine wilt disease: healthy, early-stage disease, mid-stage disease, late-stage disease, and others.

[0138] The Softmax normalization unit is used to apply the Softmax function to the original output score, converting it into a probability distribution where the sum of the probabilities of each category is 1.

[0139] The recognition result output unit is used to take the category corresponding to the maximum probability as the final recognition result, and output the corresponding confidence score. When the confidence score is lower than a preset threshold, the image frame can be marked as pending review, requiring manual confirmation or triggering a second detection.

[0140] (vi) Calculation mode control module The computational mode control module is used to collect the load index set of edge terminals in real time, dynamically update the mode control factor α using a state machine, and control whether the B-spline enhancement branch is executed through a conditional execution mechanism. This module includes the following sub-units: The performance data statistics unit is used to obtain the maximum contiguous allocatable memory block size by parsing the buddy system information in the Linux system's proc file system, and to obtain indicators such as RAM usage, CPU utilization, and GPU utilization through the system monitoring interface.

[0141] The temperature reading unit is used to obtain the processor chip temperature by reading the system temperature sensor's sysfs interface. If the device has multiple temperature zones, the maximum temperature value from the preset temperature zone set is used.

[0142] The indicator aggregation and comparison unit is used to push the collected indicator data into a circular queue for use by the state machine. The length of the circular queue meets the maximum requirement for continuous counting.

[0143] The trigger threshold determination unit stores and compares degradation trigger thresholds to determine whether the current sample meets the degradation conditions. For memory metrics, lower values ​​indicate higher risk; for temperature metrics, higher values ​​indicate higher risk.

[0144] The recovery threshold determination unit stores and compares recovery thresholds to determine whether the current sample meets the recovery conditions. The recovery conditions are more stringent than the trigger conditions, forming a hysteresis interval.

[0145] The hysteresis state machine unit maintains continuous degradation and continuous recovery counters, executing dual-threshold hysteresis logic. In high-precision mode, a degradation command is output when the indicator meets the degradation condition multiple times consecutively; in low-power safe mode, a recovery command is output when the indicator meets the recovery condition multiple times consecutively. The continuous counting mechanism effectively filters out false triggers caused by instantaneous fluctuations.

[0146] The α update control unit receives switching instructions from the state machine and updates the mode control factor α. This unit is also responsible for the emergency protection path: when a single sample meets the memory emergency threshold or the temperature emergency threshold, α is forcibly set to 0 without waiting for continuous count confirmation. The emergency protection path has the highest priority, ensuring that the system can quickly degrade in extreme situations.

[0147] This module connects to the spectral nonlinear modeling module via a control signal interface, outputting the mode control factor α to the conditional execution control unit. Through the collaborative work of these sub-units, this module achieves closed-loop control of "hardware perception → state machine decision → computation graph reconstruction," ensuring the continuity and stability of inference at the edge in resource-fluctuating scenarios.

[0148] The above six modules work together in accordance with the following process to form a complete recognition system.

Claims

1. A method for identification of visible-light-hyperspectral dual-stream feature adaptive fusion, characterized in that, Includes the following steps: S1: Acquire visible light and hyperspectral images, and construct a feature cube; S2: The spatial feature extraction branch and the spectral feature extraction branch extract features from the feature cube respectively, and output spatial texture features and spectral features; The spectral feature extraction branch includes a basic activation branch and a B-spline enhancement branch, wherein the B-spline enhancement branch is enabled or disabled by a mode control factor. The mode control factor is dynamically updated by the hysteresis state machine based on the set of load indicators during the operation of the edge terminal, and the set of load indicators includes the maximum size of the contiguous allocatable memory block. The static weight parameters in the B-spline augmentation branch are always stored in memory space, and are not released when the B-spline augmentation branch is closed. S3: The spatial texture features and the spectral features are weighted and fused to obtain fused features, and the recognition result is output based on the fused features.

2. The method of claim 1, wherein, The set of load metrics also includes one or more of the following: processor chip temperature, RAM usage, CPU utilization, and GPU utilization.

3. The method of claim 2, wherein, The hysteresis state machine is configured with a trigger threshold, a recovery threshold, and a continuous count: if the number of times the load index continuously reaches the trigger threshold reaches the continuous count, the B-spline enhancement branch is controlled to be turned off; if the number of times the load index continuously reaches the recovery threshold reaches the continuous count, the B-spline enhancement branch is controlled to be turned on.

4. The method of claim 3, wherein, The hysteresis state machine is also equipped with an emergency threshold: when a single sample reaches the emergency threshold, the B-spline enhancement branch is forced to close.

5. The method of claim 1, wherein, The weighted fusion includes: inputting the entropy prior map into a lightweight gated network to generate a channel weight vector, and fusing the spatial texture features and the spectral features according to the channel weight vector.

6. The method of claim 5, wherein, The entropy prior map is obtained by constructing a single-channel reference map, calculating the local information entropy of each pixel within a sliding window, and assigning the local information entropy to the corresponding pixel to form the entropy prior map.

7. The method of claim 1, wherein, The spectral feature extraction branch also includes sparsification calculation: the feature cube is divided into a region of interest and a background region according to the spatial texture features, wherein the B-spline enhancement branch is only calculated for the region of interest.

8. The method of claim 7, wherein, The spectral feature extraction branch also includes a rollback mechanism: before performing the sparsification calculation, the proportion of the region of interest to the sum of the region of interest and the background region is calculated as the region of interest percentage, and the region of interest percentage is compared with a preset threshold. If it exceeds the preset threshold, the calculation is rolled back to the full image.

9. The method of claim 8, wherein, The preset threshold is determined as follows: calculate the time consumed by sparsification calculation and full-image calculation under different regions of interest ratios, and select the region of interest ratio corresponding to the time consumed by the two that is closest as the preset threshold.

10. A visible-light-hyperspectral dual-stream feature adaptive fusion identification system, characterized in that, The method of any one of claims 1-9 comprises: The multimodal data preprocessing module is used to preprocess, register, and construct feature cubes for visible light and hyperspectral images; The spatial texture modeling module includes a spatial feature extraction branch for extracting spatial texture features from the feature cube; The spectral feature modeling module includes a spectral feature extraction branch for extracting spectral features from the feature cube; The feature fusion module is used to perform weighted fusion of the spatial texture features and the spectral features to obtain fused features; The classification output module is used to classify the fused features and output the recognition results; The calculation mode control module includes a state machine, which is used to collect a set of load indicators of edge terminals in real time and dynamically update the mode control factor according to the set of load indicators.