Agricultural hyperspectral reconstruction method and portable device
By improving and optimizing the MST++ model and combining it with the design of a portable device, the shortcomings of portable agricultural detection equipment in terms of computing power and hardware integration have been solved, achieving efficient and low-power hyperspectral reconstruction and meeting the needs of real-time field detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HARBIN INST OF TECH
- Filing Date
- 2026-05-06
- Publication Date
- 2026-06-12
Smart Images

Figure CN122199734A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of agricultural detection technology, and in particular to an agricultural hyperspectral reconstruction method and portable device. Background Technology
[0002] Hyperspectral imaging technology, as a core tool for precision detection in modern agriculture, can simultaneously acquire the spatial texture and continuous spectral information of crops, providing strong data support for early diagnosis of pests and diseases, analysis of nutrient stress, and maturity assessment. However, traditional hyperspectral cameras are limited by physical bottlenecks such as large size, high cost, and slow data acquisition speed, making it difficult to achieve large-scale mobile detection in vast fields.
[0003] In recent years, deep learning-based computational spectral reconstruction technology has offered a new approach to solving this problem. It involves acquiring images using low-cost RGB cameras and then reconstructing hyperspectral data using algorithmic models. While advanced models such as MST++ based on the Transformer architecture have achieved significant breakthroughs in reconstruction accuracy, their large number of parameters and high computational costs limit current technologies to running primarily on servers or desktops equipped with high-performance graphics cards. This strong reliance on a fixed power supply and high-performance computing power sharply contradicts the mobile detection requirements of agricultural field operations, which demand portability, low power consumption, and no external power source.
[0004] Existing technical solutions have two main drawbacks. First, traditional portable agricultural testing equipment often uses dedicated chips with low computing power, which are difficult to support complex deep neural networks, resulting in a significant decrease in reconstruction accuracy and failing to meet the testing standards of precision agriculture. Second, if the original model trained under the PyTorch framework is directly deployed on an embedded development board, it often faces technical barriers such as slow inference speed, overflow of video memory, and excessive power consumption, making it difficult to achieve real-time video stream processing.
[0005] Furthermore, existing agricultural detection terminals lack in-depth hardware integration optimization for hyperspectral reconstruction tasks in their system design. There is a lack of efficient collaborative mechanisms between camera calibration, power management, and model inference, and the human-computer interaction methods are limited, which greatly restricts the popularization and application of intelligent spectral detection technology in the fields. Summary of the Invention
[0006] The purpose of this invention is to provide an agricultural hyperspectral reconstruction method and a portable device, which aims to solve or improve at least one of the above-mentioned technical problems.
[0007] To achieve the above objectives, the present invention provides the following solution: An agricultural hyperspectral reconstruction method, comprising: Obtain an agricultural hyperspectral dataset and divide it into a test set and a validation set; An improved reconstruction model is constructed based on the MST++ model. Specifically, the input image resolution is constrained to a fixed size; constant folding is performed on the truncated normal distribution and variance scaling calculations used for model weight initialization; the concatenated structures of convolution, normalization, and activation in the MST++ model are fused into a single equivalent operator; the multi-scale spectral attention MS_MSA module is reconstructed and replaced with an integrated computational kernel; and the FeedForward module is reconstructed and replaced with a single composite operator. The reconstruction model is trained based on the test set and validation set to generate the final reconstruction model; The final reconstructed model is deployed, and a precision constraint strategy is set during the TensorRT compilation stage, and memory pooling and space reuse are optimized based on the tensor lifetime. Real-time hyperspectral images are acquired and input into the reconstruction model to generate reconstructed hyperspectral images.
[0008] Furthermore, constant folding is performed on the truncated normal distribution and variance scaling calculation used for model weight initialization, including: The model weight initialization equations are pre-calculated during compilation, including random sampling, inverse transformation of the error function, variance scaling, and boundary truncation. The resulting final weight values are then fixed as constant parameters during compilation.
[0009] Furthermore, the concatenated structures of convolution, normalization, and activation in the MST++ model are fused into a single equivalent operator, including: The compiler will scale the normalization layer. Translation coefficient With the original weights of the convolutional layer and bias The equivalent convolution parameters generated by pre-merging and offline calculation are expressed as follows: , In the formula, The weights after merging; The combined bias term; The activation function is embedded into the convolutional computation kernel.
[0010] Furthermore, the multi-scale spectral attention MS_MSA module has been refactored and replaced with an integrated computational kernel, including: The three independent linear projections of Q, K, and V are merged into a single unified matrix batch transformation operation, and the entire process of L2 normalization, matrix multiplication, scaling factor application, Softmax operation, and weighted aggregation of value vectors is integrated into a single computational kernel.
[0011] Furthermore, the FeedForward module is refactored and replaced with a single composite operator, including: The FeedForward module maintains complete mathematical equivalence, logically concatenates the weights of multiple convolutions, and directly embeds the GELU activation function into the hardware pipeline for convolution computation, integrating them into a single composite operator.
[0012] Further, accuracy constraint strategies include: When building the inference engine, the entire process adopts the FP32 single-precision floating-point calculation mode.
[0013] Furthermore, optimizations to memory pooling and space reuse based on tensor lifetimes include: During the TensorRT engine build phase, the compiler identifies memory regions that have been computed and are no longer depended on by subsequent nodes, and performs overwrite re-allocation.
[0014] A portable device includes: a battery, a power management unit, a data acquisition unit, an inference unit, a storage unit, an interaction unit, and a heat dissipation unit; The battery powers other modules and eliminates the need for an external power source. Power management includes overcharge, over-discharge, and overcurrent protection circuits; The acquisition unit is used to acquire RGB images of crops and reduces field lighting interference through filter calibration to ensure image consistency. The inference unit deploys the TensorRT optimized model and performs image preprocessing, model inference, and result reconstruction. Storage units are used to store deployment files and intermediate files for device operation; The interactive unit is used to visualize the reconstructed structure of the image, and can collect / save / transmit commands and display battery level / device status through touch operation. The entire process can be completed without the need for an external computer. The heat dissipation unit can be used to dissipate heat during device operation.
[0015] According to specific embodiments provided by the present invention, the present invention discloses the following technical effects: This invention discloses an agricultural hyperspectral reconstruction method and a portable device. By using TensorRT to construct an inference engine for the MST++ model under FP32 accuracy constraints, the running efficiency of the model on embedded devices is improved. While ensuring the spectral reconstruction accuracy (PSNR > 31dB), it meets the timeliness requirements for real-time field detection.
[0016] Secondly, by employing a memory overlay and repetitive allocation strategy, the memory footprint during model runtime is significantly reduced, enabling complex Transformer models to run stably on low-power edge devices, eliminating dependence on external power supplies and achieving portable motion detection. Attached Figure Description
[0017] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0018] Figure 1 This is a schematic diagram of the method flow in this invention; Figure 2 This is a schematic diagram of the device structure in this embodiment; Figure 3 This is a schematic diagram of the operation flow of the device in this embodiment; Figure 4 This is a schematic diagram comparing the reconstructed spectral data with the original MST++ model in this embodiment; Figure 5 This is a schematic diagram showing the comparison of single-channel loss after reconstruction in this embodiment. Detailed Implementation
[0019] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0020] The purpose of this invention is to provide an agricultural hyperspectral reconstruction method and a portable device, which aims to solve or improve at least one of the above-mentioned technical problems.
[0021] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
[0022] like Figure 1 As shown, the present invention provides an agricultural hyperspectral reconstruction method, comprising: Step 1: Obtain the agricultural hyperspectral dataset and divide it into a test set and a validation set; Step 2: Improve the MST++ model to construct a reconstruction model. This involves: constraining the input image resolution to a fixed size; performing constant folding on the truncated normal distribution and variance scaling calculations used for model weight initialization; fusing the concatenated structures of convolution, normalization, and activation in the MST++ model into a single equivalent operator; reconstructing and replacing the multi-scale spectral attention MS_MSA module with an integrated computational kernel; and reconstructing and replacing the FeedForward module with a single composite operator. Among them, the constant folding of the truncated normal distribution and variance scaling calculation used for model weight initialization includes: The model weight initialization equations are pre-calculated during compilation, including random sampling, inverse transformation of the error function, variance scaling, and boundary truncation. The final weight values generated are then fixed as constant parameters during compilation. The expression for the model weight initialization equation is: In the formula, This refers to the initialized model weight tensor. Let be a random sampled variable that follows a uniform distribution in the interval [0,1]. It is the inverse function of the error function, used to map a uniform distribution back to a normal distribution; The expected value of the normal distribution determines the central location of the weight initialization; The standard deviation of the normal distribution controls the dispersion and scaling of the weight initialization. This is to truncate the lower bound, i.e., to set the minimum weight that can be generated. This is the upper limit for truncation, i.e., the maximum weight that can be generated. This is a truncation function that ensures the final weight values strictly fall within the interval [a, b], with any values exceeding this range being forcibly restricted to the boundary values.
[0023] The constant folding operation described above completely removes all complex mathematical operators related to weight initialization from the forward inference computation flow, thereby significantly simplifying the computation graph structure and eliminating redundant computational overhead.
[0024] Specifically, the concatenated structure of convolution, normalization, and activation in the MST++ model is fused into a single equivalent operator, including: The discrete computation process comprises three independent stages: convolution operation, normalized linear transformation, and nonlinear activation, expressed as follows: In the formula, The intermediate feature tensor is the input; These are the kernel weights of the convolutional layer; This refers to the bias term of the convolutional layer; This is the scaling factor during the normalization process; These are the translation coefficients during the normalization process; It is a non-linear activation function; The feature tensor that is the final output of the module; Based on the associative and distributive laws of algebra, the above multi-step calculations can be equivalently derived and uniformly expressed as: During the underlying compilation and deployment phase, the compiler will normalize the scaling factor of the normalization layer. Translation coefficient With the original weights of the convolutional layer and bias The equivalent convolution parameters generated by pre-merging and offline calculation are expressed as follows: , In the formula, The weights after merging; The combined bias term; The activation function is embedded into the convolutional computation kernel.
[0025] The above steps, through this fusion strategy, allow the model to directly use equivalent parameters to complete calculations during forward inference, completely eliminating intermediate feature tensors. and This eliminates redundant operations of writing to global memory. Data is processed in a pipelined manner throughout the entire computation unit's on-chip cache (such as SRAM or registers), significantly reducing memory access frequency and bandwidth usage, thereby greatly improving the overall inference throughput of the model.
[0026] The multi-scale spectral attention MS_MSA module has been reconstructed and replaced with an integrated computing kernel, including: For the multi-scale spectral attention MS_MSA structure of the MST++ core, this invention integrates a series of discrete operators such as QKV projection, normalization, attention matrix calculation, and weighted aggregation into an integrated computing kernel.
[0027] The multi-scale spectral attention MS_MSA structure is expressed as follows: In the formula, The input feature tensor; , , These are the linear projection transformation matrices for generating the query, key, and value, respectively. This is a normalization function used to stabilize training and enhance the discriminative power of features; These are the query matrix, key matrix, and value matrix obtained after projection and normalization; The scaling factor is set to [value]. ,in The dimension of the key vector; This is the attention weight matrix, representing the correlation between different positions of the input features; The final output is the feature tensor; During the fusion process, the underlying computing framework merges the three independent linear projections of Q, K, and V into a single unified matrix batch transformation operation, and integrates the entire process of L2 normalization, matrix multiplication, scaling factor application, Softmax operation, and weighted aggregation of value vectors into a single computing kernel.
[0028] The above steps eliminate intermediate sorting operators such as Transpose and Reshape, and keep Q, K, and V features resident in on-chip storage throughout the process, avoiding repeated reading and writing of external memory for hyperspectral features during the calculation process, thus improving the parallel efficiency and memory access efficiency of attention calculation.
[0029] The FeedForward module will be restructured and replaced with a single composite operator, including: For the FeedForward module of MST++, by Convolution, GELU activation, Depth convolution, GELU activation and The complex structure formed by alternating concatenations is expressed as: In the formula, This is the input feature tensor for the FeedForward module; for Convolution operations are mainly used to perform linear transformations on feature channels, either increasing or decreasing their dimensionality. For intermediate feature tensors; For activation functions; For depthwise separable convolution; This is the final output of the FeedForward module; In the fusion implementation, the underlying computing framework maintains complete equivalence of the mathematical expressions of the FeedForward module, logically concatenates the weights of the multi-layer convolution, and directly embeds the GELU activation function into the hardware pipeline of convolution computation, integrating it into a single composite operator.
[0030] By constructing this single composite operator, the hyperspectral feature map can be transformed in continuous spatial transformation and nonlinear activation processes without the need to... to Intermediate results are then written to global video memory. Data is processed and transferred entirely within the high-speed on-chip cache of the computing unit, completely eliminating redundant video memory read and write operations, thereby significantly improving computing density and greatly reducing video memory bandwidth consumption.
[0031] Step 3: Train the reconstruction model based on the test set and validation set to generate the final reconstruction model; Step 4 involves deploying the final reconstructed model, setting precision constraint strategies during the TensorRT compilation phase, and optimizing memory pooling and space reuse based on tensor lifetimes, including: Among them, the precision constraint strategy set during the TensorRT compilation stage includes: When building the inference engine, the entire process adopts the FP32 single-precision floating-point calculation mode.
[0032] The above settings ensure that all operator fusions and graph transformations are performed based on strict mathematical equivalence, thereby preserving the hyperspectral reconstruction accuracy of the original model to the greatest extent and ensuring the reliability of agricultural analysis results.
[0033] Among them, memory pooling and space reuse optimization based on tensor lifetime includes: During the TensorRT engine build phase, the compiler performs static analysis on the computation graph of the MST++ model, tracking the active regions of each intermediate feature graph (tensor) and determining the full lifecycle of each tensor.
[0034] Construct a global memory pool. When a certain intermediate tensor (such as the first one) is... Once the computation task of the layer's output is completed, its corresponding video memory area is marked as reclaimable.
[0035] During the TensorRT engine build phase, the compiler identifies memory regions that have been computed and are no longer depended on by subsequent nodes, and performs overwriting-style re-allocation of these regions. Specifically: When it is necessary to allocate video memory for subsequent layers (such as the j-th layer) to store new intermediate results, the memory manager no longer requests new physical video memory, but instead searches the video memory pool for old regions that are large enough and have been marked as reclaimable.
[0036] If it exists, the new data is written directly to the old area, thus overwriting the physical address.
[0037] If it does not exist, then request additional video memory.
[0038] With the above settings, it is not necessary to allocate separate space for each intermediate tensor. This effectively reduces the peak memory usage of edge devices, fundamentally avoiding overflow crashes caused by insufficient video memory during hyperspectral reconstruction on embedded platforms, and ensuring that the model can run stably, continuously, and efficiently on resource-constrained edge devices.
[0039] Step 5: Obtain the real-time hyperspectral image and generate the reconstructed hyperspectral image from the input reconstruction model.
[0040] like Figure 2 As shown, in one embodiment, a portable device for applying an agricultural hyperspectral reconstruction method is provided, comprising: a battery, a power management unit, a data acquisition unit, an inference unit, a storage unit, an interaction unit, and a heat dissipation unit; The battery powers other modules and eliminates the need for an external power source. Power management includes overcharge, over-discharge, and overcurrent protection circuits; The acquisition unit is used to acquire RGB images of crops and reduces field lighting interference through filter calibration to ensure image consistency. The inference unit deploys the TensorRT optimized model and performs image preprocessing, model inference, and result reconstruction. Storage units are used to store deployment files and intermediate files for device operation; The interactive unit is used to visualize the reconstructed structure of the image, and can collect / save / transmit commands and display battery level / device status through touch operation. The entire process can be completed without the need for an external computer. The heat dissipation unit can be used to dissipate heat during device operation.
[0041] like Figure 3 As shown, the device's operation process is as follows: Machine: The lithium battery powers the device, the terminal device starts and loads the TensorRT engine file, and the touchscreen displays the operation interface; Image acquisition: The user triggers the acquisition command through the touchscreen, and the RGB camera is aimed at the crops to acquire 1080P RGB images; Preprocessing: The image is scaled down to the model-adaptive size (e.g., 512×512) and normalized (to [0,1]), with FP32 precision processing throughout; Model inference: The TensorRT engine is called to perform hyperspectral reconstruction, outputting 31 bands (400-700nm) of hyperspectral data within 800ms-1200ms; Result output: The reconstructed hyperspectral image (bands selectable) is displayed visually on the touchscreen, and touch-based saving is also supported; Low-power standby: After 30 seconds of inactivity, the screen brightness and Jetson clock speed are automatically reduced to extend battery life.
[0042] To verify the effectiveness of the present invention, the experimental results are as follows: As shown in Table 1, at FP32 resolution, the reconstruction time for a single 512×512 RGB image is 800ms-1200ms.
[0043] Table 1
[0044] Accuracy comparisons were made between the reconstructed hyperspectral image data and those reconstructed using the original MST++ model. Experimental results show that, at FP32 accuracy, the model has an average inference time of 924.05 ms, an MRSE of 0.000190, an RMSE as low as 0.000031, and a PSNR as high as 92.099869 dB. The extremely low error metrics and extremely high peak signal-to-noise ratio indicate that the reconstructed spectrum almost perfectly matches the original spectrum at the numerical level, demonstrating that the model's reconstruction accuracy has reached an extremely high level.
[0045] To further and more intuitively verify the reconstruction effect, this paper uses multi-dimensional visualization comparison for verification.
[0046] like Figure 4 As shown, the reconstructed spectral data and the original spectral data were compared and verified using 31-channel spectral curves. The curves show a high degree of overlap across the entire channel range with minimal deviation, indicating that the model can stably reproduce the variation trends of the original spectrum in different channels, and the reconstruction results exhibit good global consistency.
[0047] like Figure 5 As shown, channel 10 was selected for single-channel image visualization comparison and verification. The reconstructed single-channel image maintains a high degree of consistency with the original image in terms of spatial texture and detail features, and the spatial distribution of spectral information does not show obvious distortion or blurring, verifying the effectiveness of the model in preserving spatial details and spectral information. Combining quantitative indicators and visualization results, this model demonstrates excellent accuracy and robustness in the spectral reconstruction task, and the reconstruction effect reaches an ideal level.
[0048] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other.
[0049] This document uses specific examples to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the core ideas of the present invention. Furthermore, those skilled in the art will recognize that, based on the ideas of the present invention, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of the present invention.
Claims
1. A method for agricultural hyperspectral reconstruction, characterized in that, include: Obtain an agricultural hyperspectral dataset and divide it into a test set and a validation set; An improved reconstruction model is constructed based on the MST++ model. Specifically, the input image resolution is constrained to a fixed size; constant folding is performed on the truncated normal distribution and variance scaling calculations used for model weight initialization; the concatenated structures of convolution, normalization, and activation in the MST++ model are fused into a single equivalent operator; the multi-scale spectral attention MS_MSA module is reconstructed and replaced with an integrated computational kernel; and the FeedForward module is reconstructed and replaced with a single composite operator. The reconstruction model is trained based on the test set and validation set to generate the final reconstruction model; The final reconstructed model is deployed, and a precision constraint strategy is set during the TensorRT compilation stage, and memory pooling and space reuse are optimized based on the tensor lifetime. Real-time hyperspectral images are acquired and input into the reconstruction model to generate reconstructed hyperspectral images.
2. The agricultural hyperspectral reconstruction method according to claim 1, characterized in that, The constant folding of the truncated normal distribution and variance scaling calculation used for model weight initialization includes: The model weight initialization equations are pre-calculated during compilation, including random sampling, inverse transformation of the error function, variance scaling, and boundary truncation. The resulting final weight values are then fixed as constant parameters during compilation.
3. The agricultural hyperspectral reconstruction method according to claim 1, characterized in that, The process of fusing the concatenated structures of convolution, normalization, and activation in the MST++ model into a single equivalent operator includes: The compiler will scale the normalization layer. Translation coefficient With the original weights of the convolutional layer and bias The equivalent convolution parameters generated by pre-merging and offline calculation are expressed as follows: , In the formula, The weights after merging; The combined bias term; The activation function is embedded into the convolutional computation kernel.
4. The agricultural hyperspectral reconstruction method according to claim 1, characterized in that, The multi-scale spectral attention MS_MSA module is reconstructed and replaced with an integrated computing kernel, including: The three independent linear projections of Q, K, and V are merged into a single unified matrix batch transformation operation, and the entire process of L2 normalization, matrix multiplication, scaling factor application, Softmax operation, and weighted aggregation of value vectors is integrated into a single computational kernel.
5. The agricultural hyperspectral reconstruction method according to claim 1, characterized in that, The process of reconstructing and replacing the FeedForward module with a single composite operator includes: The FeedForward module maintains complete mathematical equivalence, logically concatenates the weights of multiple convolutions, and directly embeds the GELU activation function into the hardware pipeline for convolution computation, integrating them into a single composite operator.
6. The agricultural hyperspectral reconstruction method according to claim 1, characterized in that, The accuracy constraint strategy includes: When building the inference engine, the entire process adopts the FP32 single-precision floating-point calculation mode.
7. The agricultural hyperspectral reconstruction method according to claim 1, characterized in that, The tensor-based memory pooling and space reuse optimization includes: During the TensorRT engine build phase, the compiler identifies memory regions that have been computed and are no longer depended on by subsequent nodes, and performs overwrite re-allocation.
8. A portable device applying the method of claims 1-7, characterized in that, include: Battery, power management, acquisition unit, inference unit, storage unit, interaction unit, and heat dissipation unit; The battery powers other modules and eliminates the need for an external power source. Power management includes overcharge, over-discharge, and overcurrent protection circuits; The acquisition unit is used to acquire RGB images of crops and reduces field lighting interference through filter calibration to ensure image consistency. The inference unit deploys the TensorRT optimized model and performs image preprocessing, model inference, and result reconstruction. Storage units are used to store deployment files and intermediate files for device operation; The interactive unit is used to visualize the reconstructed structure of the image, and can collect / save / transmit commands and display battery level / device status through touch operation. The entire process can be completed without the need for an external computer. The heat dissipation unit can be used to dissipate heat during device operation.