A physically constrained illumination adaptive spectral reflectance reconstruction system
By employing a physically constrained depth unfolding framework and the Alternating Directional Multiplier Method (ADMM), combined with an asymmetric U-Net architecture, the problem of spectral reflectance reconstruction under unknown lighting conditions was solved. This enabled accurate recovery of the spectral reflectance of materials from a single RGB image, improving both reconstruction accuracy and computational efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING INST OF TECH
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-23
AI Technical Summary
Existing spectral reflectance reconstruction methods under unknown or dynamic lighting conditions suffer from limitations of known lighting assumptions, underdeterminacy of single RGB images, dependence on auxiliary hardware, and lack of end-to-end learning and lighting compensation mechanisms with physical constraints, resulting in unstable reconstruction results and poor generalization ability.
A physical constraint-based deep unfolding framework is adopted, combined with the Alternating Directional Multiplier Method (ADMM) and the asymmetric U-Net architecture. The RGB image is upscaled to the spectral channel through the data preprocessing module. Iterative optimization is performed using the model-driven module and the data-driven module. By combining global and local feature modeling, adaptive spectral reflectance reconstruction of illumination is achieved.
Achieving robust spectral reflectance reconstruction under unknown or complex lighting conditions improves reconstruction accuracy and computational efficiency, maintains consistent reconstruction accuracy across different color temperatures and indoor/outdoor scenes, and enhances PSNR performance.
Smart Images

Figure CN122265063A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer vision and computational imaging technology, and in particular relates to an illumination adaptive spectral reflectance reconstruction system based on physical constraints. Background Technology
[0002] Spectral reflectance characterizes the intrinsic optical properties of a material at different wavelengths and remains unchanged with illumination conditions. Each material possesses unique spectral characteristics, like an optical "fingerprint," which can be used to accurately identify and distinguish objects that might be indistinguishable under traditional RGB cameras. Spectral reflectance plays a crucial role in remote sensing, medical diagnostics, agricultural monitoring, and materials analysis. Traditional methods for obtaining spectral reflectance rely on hyperspectral imaging systems, which employ spectral scanning and typically require long exposure times and bulky optical components, limiting their applicability in real-time and large-scale applications.
[0003] Spectral reflectance reconstruction can be categorized into two types based on the reconstruction objective: hyperspectral image reconstruction (HIR) and spectral reflectance reconstruction (SRR). Radiometrics represents the actual light intensity captured by a sensor at each wavelength, varying with the illumination conditions at the time of capture. Conversely, reflectance characterizes the intrinsic light-interaction properties of the material itself, remaining constant regardless of illumination variations. While radiometric reconstruction has been extensively studied, reflectance reconstruction faces more fundamental challenges.
[0004] Deep unfolding networks integrate iterative optimization algorithms into trainable network layers, enabling a combination of traditional optimization methods and the principles of deep learning. This approach offers both the interpretability of optimization models and the expressive power of neural networks, providing a promising strategy for solving complex inverse problems. Deep unfolding methods have demonstrated excellent performance in hyperspectral image reconstruction. However, the application of deep unfolding in spectral reflectance reconstruction remains limited.
[0005] Early optimization-based methods relied on prior knowledge from hand-designed methods, such as principal component analysis (PCA) and Wiener estimation. The core techniques of these methods include:
[0006] (1) PCA method: Ayala et al. (2006) and Parkkinen et al. (1989) represented reflectance as a linear combination of basis vectors and reconstructed the spectrum through a small number of principal components. This method assumes that the spectrum can be represented by a finite number of orthogonal basis functions and obtains the spectral basis by performing singular value decomposition on the training data.
[0007] (2) Wiener estimation method: Stigell et al. (2007) and Nishidate et al. (2013) formulated the reconstruction problem as a minimum mean square error optimization problem, estimated the optimal linear transformation matrix through statistical methods, and predicted the spectral reflectance from the RGB values.
[0008] However, these methods typically assume that lighting conditions are known or standardized, which limits their applicability in complex real-world environments with varying lighting. Prior knowledge of the lighting necessitates either the use of standard light sources (such as CIE D65) or on-site calibration using gray cards or color standard boards, significantly limiting the flexibility of practical deployment.
[0009] The emergence of deep learning has enabled end-to-end learning of reflectivity reconstruction. The main technical solutions include: (1) Basic CNN method: Nguyen et al. (2014) proposed training a convolutional neural network to predict spectral reflectance directly from RGB images. This method uses white balance preprocessing to approximate illumination normalization, attempting to eliminate the influence of illumination by standardizing RGB values. The network architecture employs multiple convolutional and fully connected layers to learn the RGB-to-spectral mapping end-to-end.
[0010] (2) Joint learning of illumination and reflectivity: Fu et al. (2018) attempted to jointly learn illumination and reflectivity through sparse representation, decomposing the problem into two sub-problems: illumination estimation and reflectivity recovery, and solving them by alternating optimization.
[0011] (3) Adaptive camera spectral sensitivity: Li et al. (2020) proposed an adaptive weighted attention network that combines the prior of camera spectral response and adaptively weights the reconstruction of different spectral channels through an attention mechanism.
[0012] However, methods relying solely on a single RGB image remain inadequate because they attempt to infer two distinct physical quantities (reflectivity and illumination) from only three input channels, resulting in limited reconstruction accuracy. Under unknown or dynamic lighting conditions, these methods struggle to perform reliable illumination compensation, reducing stability and consistency in practical applications.
[0013] To mitigate the underdeterminism of single RGB image methods, researchers have proposed a strategy of incorporating auxiliary information: (1) Multi-lighting capture method: Park et al. (2007) used multiplexed lighting to capture multiple images of the same scene under different known light sources. By changing the lighting conditions to provide spectral diversity, the separation of reflectance and illumination becomes more reliable. This method requires a controllable lighting environment and synchronous multiple captures.
[0014] (2) Meta-learning framework: Huo et al. (2024) proposed the SRR-MAXL method, which integrates dual-LED acquisition with meta-assisted learning and image-by-image optimization, and captures image pairs by using two LEDs with different spectral characteristics for alternating illumination. The meta-learning framework learns a general representation across illumination conditions and optimizes it for each image at test time, which significantly improves reconstruction fidelity.
[0015] While these methods are more robust in separating illumination and reflectivity, they require specialized hardware (such as programmable LED arrays or multispectral illumination systems) or involve high computational complexity (such as test-time optimization requiring dozens of forward-backward propagations per image), limiting practical deployment.
[0016] Deep unfolding networks have demonstrated excellent performance in hyperspectral radiance reconstruction: (1) GAP-Net: Meng et al. (2023) combined generalized alternating projection with a CNN denoiser to achieve high-fidelity reconstruction in the Coding Aperture Snapshot Spectral Imaging System (CASSI). The ADMM optimization algorithm was decomposed into a multi-stage network, with each stage containing a data fidelity term and a priori regularization term.
[0017] (2) DAUHST: Cai et al. (2022) introduced a Transformer-based unfolding framework and used degradation-aware parameter estimation to further improve reconstruction accuracy. By learning the parameters of the system degradation model, the reconstruction process is adaptively adjusted.
[0018] (3) DPU: Zhang et al. (2024) used a dual prior mechanism and an asymmetric encoder to significantly improve computational efficiency while maintaining performance. They used two complementary prior representations, local and global, to capture fine textures and global structures, respectively.
[0019] While these methods have achieved success in radiosity reconstruction, the application of depth unfolding in spectral reflectance reconstruction remains limited. SRR requires not only physically guided optimization but also adaptive compensation for unknown scene lighting, a requirement not yet met by existing unfolding frameworks.
[0020] In summary, the existing technology has the following drawbacks: (1) Limitations of known lighting assumptions: Traditional optimization methods assume that lighting conditions are known or standardized, requiring gray card calibration or standard light sources, which severely limits their applicability in real dynamic lighting scenarios. In practical applications, scene lighting is often an unknown and complex mixture of light sources, which may include a combination of sunlight, artificial light and environmental reflections.
[0021] (2) Underdeterminism of single RGB images: Recovering reflectance and illumination from only three channels simultaneously is inherently an underdetermined problem. Although CNN methods can learn complex mappings, they are difficult to reliably separate these two intertwined physical quantities without additional constraints, resulting in reconstruction results that are sensitive to changes in illumination and have poor generalization ability.
[0022] (3) Dependence on auxiliary hardware: Although multi-lighting methods and meta-learning frameworks improve separation robustness, they require dedicated hardware (dual-LED systems, programmable lighting arrays) or intensive test-time optimization (dozens of iterations per image), which limits practical deployment. The high hardware cost and system complexity make it difficult to promote in consumer applications.
[0023] (4) End-to-end learning lacking physical constraints: Purely data-driven deep learning methods lack the constraints of physical imaging models, are prone to overfitting the lighting distribution in the training data, and experience performance degradation under new lighting conditions. The network learns statistical regularities in the dataset rather than physical regularities, thus limiting its generalization ability.
[0024] (5) Lack of illumination compensation mechanism: The existing depth unfolding framework is mainly used for radiometric reconstruction and does not consider illumination adaptability. In the reflectance reconstruction task, it is necessary to dynamically estimate and compensate for illumination during the iterative optimization process, and the existing unfolding framework lacks this key mechanism.
[0025] (6) Limitations of global-local feature modeling: Traditional CNN methods are limited by a finite receptive field, making it difficult to capture long-range spectral correlations and global consistency. Although Transformer can model global dependencies, its computational complexity is high. Existing methods lack an efficient mechanism to balance local fine textures and global structure. Summary of the Invention
[0026] To address the aforementioned issues, this invention provides a physically constrained adaptive spectral reflectance reconstruction system. Through a depth-expanding framework based on physical constraints, it achieves accurate reconstruction of spectral reflectance from a single RGB image without requiring known lighting or auxiliary hardware, and maintains robust performance under diverse lighting conditions.
[0027] A physically constrained adaptive spectral reflectance reconstruction system for illumination includes a data preprocessing module and a depth unfolding network component. The data preprocessing module is used to process the initial RGB image of the 3 channels. Upgrade to the set C spectral channels; The deep unfolded network part is composed of It consists of a series of cascaded driving components, each of which is obtained by cascading a model-driven module, a data-driven module, and a dual variable update module. when At that time, the first Model-driven modules Based on the received 3-channel initial RGB image , No. The reflectivity characteristics output by each data-driven module , No. The dual variables output by the dual variable update module and the The lighting estimate output by each model-driven module It calculates and corrects the deviation between the current predicted RGB image and the actual observed RGB image, and outputs the updated spectral features. and new lighting estimates Then, the first Data-driven modules Spectral characteristics Perform spatial-spectral regularization processing to output refined reflectance characteristics. This is used by the next driver component; simultaneously, the dual variable update module without explicit structure is accessed through... and The difference yields the dual variable. ;No. Data-driven modules Final output This is the reconstructed spectral reflectance image; when At that time, the spectral images of the C spectral channels output by the data preprocessing module are used as the initial reflectance features. , A complete tensor of length C is used as the initial illumination estimate. , will with Use all-zero tensors of the same dimension as initial dual variables .
[0028] Furthermore, the first Model-driven modules Acquiring spectral features and new lighting estimates The specific method is as follows: Obtain the current radiometric characteristic benchmark as follows:
[0029] in, This indicates element-wise multiplication; Will Through the camera's spectral response function Mapped to an RGB image, i.e., the forward-predicted RGB image. Subsequently, the predicted RGB image is calculated. Compared with the initial RGB image observed in practice Deviation between ; Utilizing learnable inverse mapping networks Deviation Back-projecting to the high-dimensional spectral domain yields a bias-corrected spectral image. ; The bias-corrected spectral image obtained by inverse mapping Compared with the aforementioned radiometric characteristic benchmark Adding them together yields the updated high-fidelity radiometric characteristics. :
[0030] Global average pooling from Extract scene-level lighting cues to obtain new lighting estimates. :
[0031] Using updated high-fidelity radiometric characteristics Divide each channel by the new lighting estimate Decoupled the updated spectral features :
[0032] in, This indicates element-wise division.
[0033] Furthermore, global average pooling Specifically, it includes: right The following steps are performed sequentially: global average pooling, first 1×1 convolution, Leaky ReLU activation, second 1×1 convolution, and Sigmoid activation, to obtain... .
[0034] Furthermore, each data-driven module consists of an encoder and a decoder in a corresponding mirror state; wherein, the encoder includes three LMB feature extraction modules connected in the middle by a downsampling operation interval, and the decoder includes three GMB feature extraction modules connected in the middle by an upsampling operation interval. No. Data-driven modules Obtaining reflectivity features The specific method is as follows: Spectral characteristics The shallow local features are extracted sequentially through the first LMB. Then, the size of the shallow local features is halved and the number of feature channels is doubled through the first downsampling. The output of the first downsampling is then passed through the second LMB for mid-level feature modeling, and then subjected to a second downsampling. Finally, the output of the second downsampling is extracted by the third LMB at the deepest layer to obtain the final encoded semantic features. The encoded semantic features are processed through a first 1×1 convolution and then fed into the first Global Contextual Block (GMB) for global context information fusion. Then, the output of the first GMB is doubled in size and the number of feature channels is halved through a first upsampling process. The output of the first upsampled GMB and the encoded semantic features are then fed into a second 1×1 convolution for feature fusion, followed by another GMB for global context information fusion. The output of the second GMB is then doubled in size and the number of feature channels is halved through a second upsampling process. The output of the second upsampled GMB and the output of the second Local Multi-Band (LMB) are then fed into a third 1×1 convolution for feature fusion, followed by another GMB for global context information fusion. Finally, the output of the third GMB is used as the reflectance feature. .
[0035] Furthermore, any GMB includes sequentially cascaded layer normalization processing layers, global / window Mamba layers, 1×1 dimensionality reduction convolutional layers, Leaky ReLU activation function layers, and 1×1 dimensionality increase convolutional layers; the final output of the GMB is the sum of the final output of the global / window Mamba layer and the output of the 1×1 dimensionality increase convolutional layer. The global / window Mamba layer consists of a first normalization layer, a first linear layer, a depthwise separable convolutional layer, a global / window G-SSM layer, a second normalization layer, and a second linear layer, all cascaded sequentially. The output of the first normalization layer then passes through a third linear layer and a SiLU activation layer before entering the second linear layer. The sum of the output of the second linear layer and the input of the first normalization layer serves as the final output of the global / window Mamba layer.
[0036] Furthermore, the global / window G-SSM layer uses a Hilbert curve as a rearrangement path to traverse the spatial positions of each pixel in the received two-dimensional feature map, thereby rearranging the two-dimensional feature map into a one-dimensional sequence for output.
[0037] Furthermore, any LMB consists of sequentially cascaded layer normalization layers, global / window Mamba layers, 1×1 dimensionality reduction convolutional layers, Leaky ReLU activation function layers, and 1×1 dimensionality increase convolutional layers; the final output of the LMB is the sum of the final output of the global / window Mamba layer and the output of the 1×1 dimensionality increase convolutional layer. The global / window Mamba layer consists of a first normalization layer, a first linear layer, a depthwise separable convolutional layer, a global / window L-SSM layer, a second normalization layer, and a second linear layer, all cascaded sequentially. The output of the first normalization layer then passes through a third linear layer and a SiLU activation layer before entering the second linear layer. The sum of the output of the second linear layer and the input of the first normalization layer serves as the final output of the global / window Mamba layer.
[0038] Furthermore, the global / window L-SSM layer uses multiple non-overlapping local windows to traverse the spatial positions of each pixel in the received two-dimensional feature map. Then, the pixel regions extracted in each local window are rearranged into one-dimensional subsequences in a sequential manner. Finally, all one-dimensional subsequences are concatenated into the final one-dimensional sequence for output.
[0039] Furthermore, the dual variable update module obtains the dual variable. The specific method is as follows:
[0040] in, This refers to the penalty parameter matrix for adaptive learning during the training of the deep unfolded network portion. This indicates element-wise multiplication. For the first The dual variables obtained by the dual variable update module.
[0041] Furthermore, the data preprocessing module consists of sequentially cascaded 3×3 dimensionality-reducing convolutional layers, Leaky ReLU activation function layers, and 3×3 dimensionality-upgrading convolutional layers.
[0042] Beneficial effects: 1. This invention provides a physically constrained illumination-adaptive spectral reflectance reconstruction system. Based on the I-SRR framework, which requires no known illumination, auxiliary hardware, or multiple captures, it accurately recovers the intrinsic spectral reflectance of materials from a single RGB image. It embeds a spectral imaging physical model into a deep unfolding framework, constructs a differentiable optimization structure based on the Alternating Direction Multiplier Method (ADMM), implicitly integrates the spectral formation and recovery processes, and achieves physically guided end-to-end learning through physically constrained deep unfolding optimization. This invention can achieve robust reflectance recovery under unknown or complex lighting conditions. Through a learned compensation mechanism, it adaptively adjusts features, maintaining consistent reconstruction accuracy across different color temperatures (4000K-11200K) and indoor / outdoor scenes. Furthermore, this invention achieves a PSNR of 36.01 dB on the BJTU-UVA dataset, a 2.64 dB improvement over the current best method, while maintaining computational efficiency and real-time performance, thus achieving robust reconstruction under dynamic lighting conditions.
[0043] 2. This invention provides a physically constrained adaptive spectral reflectance reconstruction system for lighting. It utilizes scene average spectral statistics to provide reliable clues to ambient lighting, estimates global scene lighting and performs channel-level spectral compensation through a scene statistics-driven adaptive correction mechanism for lighting estimation, without the need for physical calibration or reference targets.
[0044] 3. This invention provides a physically constrained illumination adaptive spectral reflectance reconstruction system, which adopts a dual Mamba module asymmetric architecture. The local Mamba block (LMB) captures fine textures and material boundaries through 8×8 window processing, while the global Mamba block (GMB) models long-range correlations through Hilbert curve space rearrangement. The complementary local-global representation can enhance detail fidelity and global consistency.
[0045] 4. This invention provides a physically constrained adaptive spectral reflectance reconstruction system for lighting. It captures hierarchical features through a U-Net-style multi-scale architecture, combines the selective mechanism of the state-space model, and relies on multi-scale spatial-spectral modeling to effectively model inter-channel correlations and geometric constraints, thereby improving spectral shape fidelity and material boundary clarity. Attached Figure Description
[0046] Figure 1 The present invention provides an overall architecture for a physically constrained adaptive spectral reflectance reconstruction system for illumination. Figure 2 The deep unfolded network structure diagram provided for this invention; Figure 3 The image preprocessing module structure flowchart provided by this invention; Figure 4 The model-driven module structure diagram provided by this invention; Figure 5 This is a structural diagram of the inverse mapping module provided by the present invention; Figure 6 The data-driven module structure diagram and the dual variable update module structure diagram provided for this invention; Figure 7 The global / window Mamba block structure diagram provided by this invention; Figure 8 This is a schematic diagram illustrating different rearrangement strategies provided by the present invention; Figure 9 A flowchart of the lighting adapter module provided by the present invention. Detailed Implementation
[0047] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
[0048] This invention aims to integrate the physical constraints of the ADMM-based depth unrolling framework into the physical model of spectral imaging.
[0049] Embedded differentiable optimization structures, through The next iteration expands into a model-driven module. and data-driven modules It implicitly integrates the spectral formation and recovery processes to achieve physically guided end-to-end learning.
[0050] Specifically, such as Figure 1 As shown, a physically constrained adaptive spectral reflectance reconstruction system for lighting includes a data preprocessing module and a depth unfolding network. The data preprocessing module is used to process the initial RGB image of the 3 channels. Upgrade to the set C spectral channels; See Figure 2 The deep unfolded network part is composed of It consists of a series of cascaded driving components, each of which is obtained by cascading a model-driven module, a data-driven module, and a dual variable update module. when At that time, the first Model-driven modules Based on the received 3-channel initial RGB image , No. The reflectivity characteristics output by each data-driven module , No. The dual variables output by the dual variable update module and the The lighting estimate output by each model-driven module It calculates and corrects the deviation between the current predicted RGB image and the actual observed RGB image, and outputs the updated spectral features. and new lighting estimates Then, the first Data-driven modules Spectral characteristics Perform spatial-spectral regularization processing to output refined reflectance characteristics. This is used by the next driver component; simultaneously, the dual variable update module without explicit structure is accessed through... and The difference yields the dual variable. ;No. Data-driven modules Final output This is the reconstructed spectral reflectance image; when In the first iteration (i.e., without prior knowledge from the previous iteration), the spectral images of the C spectral channels output by the data preprocessing module are used as the initial reflectance features. , A complete tensor of length C is used as the initial illumination estimate. , will with Use all-zero tensors of the same dimension as initial dual variables .
[0051] Therefore, the initial C-channel features output by the data preprocessing module of this invention will be used as input to the deep unfolding network, and the final spectral reflectance reconstruction result will be obtained after K iterations of optimization. Each iteration of the deep unfolding network includes a model-driven module. and data-driven modules The process consists of two phases: physical-constrained spectral reflectance updating and learning-based spectral reflectance regularization.
[0052] It should be noted that, depending on the actual computational resource constraints of the deployment (such as mobile devices or edge computing devices), the size of the deep unfolded network can be replaced with a lightweight alternative. For example, the K deep unfolded network components can be reduced to 3, the downsampling depth of the asymmetric U-Net can be reduced to 2 layers, or the standard convolutions in the network can be completely replaced with depthwise separable convolutions. This lightweight replacement does not change the core logic and is consistent with the physical constraint adaptive reconstruction idea of this invention.
[0053] The following details the specific structure and processing flow of each module in the physically constrained adaptive spectral reflectance reconstruction system for lighting proposed in this invention.
[0054] I. Data Preprocessing Module like Figure 3 As shown, the data preprocessing module consists of sequentially cascaded 3×3 dimensionality-reducing convolutional layers, Leaky ReLU activation function layers, and 3×3 dimensionality-upgrading convolutional layers.
[0055] The present invention first inputs the RGB image into the data preprocessing module for processing. The function of this module is to upscale the 3-channel RGB image to C spectral channels, providing initial features for subsequent deep unfolding network iterative optimization.
[0056] II. Model-Driven Module Model-driven module This module is used for physically constrained data fidelity alignment between reconstructed reflectance and initial RGB observations, while adaptively estimating and compensating for scene-specific global illumination. The specific structure of this module is as follows: Figure 4 As shown, specifically, the first Model-driven modules Acquiring spectral features and new lighting estimates The specific method is as follows: S1: Radiometric Feature Construction: Obtain the current radiometric feature benchmark. as follows:
[0057] in, This indicates element-wise multiplication; In other words, in the first Second iteration At that time, module Receive reflectivity features from the previous iteration Dual variables Lighting estimation and original RGB image First, and Add together, then combine Element-wise multiplication yields the current radiometric characteristic benchmark; S2: Forward RGB Prediction and Residual Calculation: [The sentence is incomplete and requires further context.] Through the camera's spectral response function Mapped to an RGB image, i.e., the forward-predicted RGB image. Subsequently, the predicted RGB image is calculated. Compared with the initial RGB image observed in practice Deviation between This is to quantify the deviation between the current estimate and the actual observation; S3: Globally Guided Inverse Mapping: Utilizing Learnable Inverse Mapping Networks Deviation Back-projecting to the high-dimensional spectral domain yields a bias-corrected spectral image. ; like Figure 5 The diagram shown illustrates the structure of the inverse mapping module of this invention. The inverse mapping sequentially comprises a 3×3 convolutional layer, a Leaky ReLU activation function, a global Mamba block (GMB), a Sigmoid activation function, and another 3×3 convolutional layer. The global receptive field of the GMB helps capture long-range dependencies during spectral sampling.
[0058] S4: Skip connection: The bias-corrected spectral image obtained from the inverse mapping is then... Compared with the aforementioned radiometric characteristic benchmark Adding them together yields the updated high-fidelity radiometric characteristics. :
[0059] S5: Illumination Adapter (IA) Modulation and Reflectivity Recovery: via Global Average Pooling from Extract scene-level lighting cues to obtain new lighting estimates. :
[0060] Among them, global average pooling Specifically, it includes: In other words, while the main path is being calculated, This information is also input into the Lighting Adapter (IA). The IA extracts scene-level lighting cues from global statistics using global average pooling to calculate a global lighting spectrum estimate; this lighting spectrum estimate... It will be used for forward RGB prediction in the next model-driven module; The lighting adapter module (IA) is embedded in the model-driven module within the overall framework. Within the processing flow, the specific structure is as follows: Figure 9 As shown. The input signal of this module is the radiometric characteristic reference. Input signal First, the spatial features are compressed into scene-level global statistical vectors using a global average pooling unit. Then, a non-linear transformation is performed through a network consisting of two 1×1 convolutional layers and a Leaky ReLU activation function. Finally, a sigmoid activation is applied to generate the output signal, which is the updated estimated illumination spectrum. .
[0061] Therefore, the lighting adaptive correction mechanism (lighting adapter IA) of the present invention uses scene average spectral statistics to estimate global lighting and generates compensation weights through a channel-level MLP network (Conv 1×1 + LeakyReLU + Conv 1×1 + Sigmoid) to achieve lighting adaptation without physical calibration or reference target.
[0062] It should be noted that the illumination adapter (IA) module does not need to participate in each iteration update of the deep unfolded network, but rather acts as an independent pre-estimation module. That is, only before the first iteration, a global illumination spectrum estimate is extracted from the input RGB image through a single network inference. In all subsequent K iterations of the model-driven module, this illumination estimate is consistently used for physical forward prediction. This approach can further improve the model inference speed while slightly sacrificing accuracy.
[0063] S6: Employs updated high-fidelity radiometric characteristics Divide each channel by the new lighting estimate Decoupled the updated spectral features :
[0064] in, This indicates element-wise division.
[0065] In summary, the learnable inverse mapping of the model-driven module of this invention... Spectral upsampling is performed under GMB guidance, converting RGB residuals Back-projected into the spectral domain and modulated via an illumination adapter; iterative joint optimization of illumination estimation and reflectance reconstruction, updating based on the current reflectance estimate in each iteration, achieves coordinated convergence of illumination and reflectance. Based on this, the complete data update formula of the model-driven module can be summarized as follows: ;
[0066] in, This indicates element-wise multiplication. This indicates element-wise division. As mentioned before, when... hour, To preprocess the output features, Let C be a single element with channel dimension C. To and All-zero tensors with the same dimensions.
[0067] III. Data-driven module like Figure 6 As shown, the data-driven module adopts an asymmetric U-Net architecture. Each data-driven module consists of an encoder and a decoder in corresponding mirror states. The encoder includes three LMB feature extraction modules connected at downsampling intervals to progressively extract hierarchical representations at different spatial resolutions. The decoder includes three GMB feature extraction modules connected at upsampling intervals. GMB is used to integrate global context information, and the upsampling operation doubles the feature map size and halves the number of feature channels through transposed convolution. No. Data-driven modules Obtaining reflectivity features The specific method is as follows: Spectral characteristics The system sequentially extracts shallow local features through the first LMB, then halves the size of the shallow local features and doubles the number of feature channels through the first downsampling. The output of the first downsampling is then processed by the second LMB for mid-level feature modeling, followed by a second downsampling. Finally, the output of the second downsampling is processed by the third LMB to extract high-level semantic features at the deepest level, resulting in the final encoded semantic features. It should be noted that the LMB captures fine spatial-spectral relationships within an 8×8 local window. Each time feature modeling is performed, the downsampling operation is achieved by using a convolution with a stride of 2 to halve the feature map size and double the number of feature channels, gradually modeling shallow and low-level local detailed features into deep abstract features.
[0068] The encoded semantic features are processed through a first 1×1 convolution and then fed into the first Global Contextual Block (GMB) for global context information fusion. Then, the output of the first GMB is doubled in size and the number of feature channels is halved through a first upsampling process. The output of the first upsampled GMB and the encoded semantic features are then fed into a second 1×1 convolution for feature fusion, followed by another GMB for global context information fusion. The output of the second GMB is then doubled in size and the number of feature channels is halved through a second upsampling process. The output of the second upsampled GMB and the output of the second Local Multi-Band (LMB) are then fed into a third 1×1 convolution for feature fusion, followed by another GMB for global context information fusion. Finally, the output of the third GMB is used as the reflectance feature. .
[0069] As can be seen, the multi-scale regularization in the data-driven module captures hierarchical spatial-spectral representations through a three-layer encoder-decoder structure (64-128-256 channels). Each layer contains downsampling / upsampling operations and corresponding Mamba processing, achieving feature learning from coarse to fine.
[0070] It should be noted that, between the encoder and decoder at the same level, a skip connection is used in conjunction with a 1×1 convolution to adaptively fuse shallow high-resolution local features from the encoder with deep global semantic features from the decoder, thereby effectively mitigating the spatial information loss caused by feature downsampling. The essential technical advantage of this asymmetric design lies in: by limiting the receptive field (LMB) during the downsampling stage to reduce the sequence length and enhance local modeling capabilities and parallel computing speed, while expanding the receptive field (GMB) during the upsampling stage to ensure long-range global dependencies, an optimal balance between local texture details and global spectral consistency is achieved while maintaining linear computational complexity.
[0071] Furthermore, the GMB structure is as follows: Figure 7 As shown on the left, it is located in the model-driven module. Learnable inverse mappings in In the middle, and also in the data-driven module It is used in the decoder path of the internal asymmetric U-Net. GMB models long-range spatial-spectral dependencies across the entire feature map by combining state-space modeling with policy-space rearrangement; any GMB consists of sequentially cascaded layer normalization layers, global / window Mamba layers, 1×1 dimensionality-reducing convolutional layers, Leaky ReLU activation function layers, and 1×1 dimensionality-increasing convolutional layers; the final output of the GMB is the sum of the final output of the global / window Mamba layer and the output of the 1×1 dimensionality-increasing convolutional layer. The structure of the global / window Mamba layer is as follows: Figure 7As shown on the right, the structure includes a first normalization layer, a first linear layer, a depthwise separable convolution (DSConv), a global / window G-SSM layer (Global State-Space Model, G-SSM), a second normalization layer, and a second linear layer, all cascaded sequentially. The output of the first normalization layer then passes through a third linear layer and a SiLU activation layer before entering the second linear layer. The sum of the output of the second linear layer and the input of the first normalization layer serves as the final output of the global / window Mamba layer.
[0072] It should be noted that G-SSM is the core module of GMB, and its internal structure adopts the discretized state space model (SSM) conventional in this field. Specifically, the standard SSM mainly consists of a linear projection layer, one-dimensional depthwise convolution, feature serialization, and state space discretization update operations, used to achieve linear complexity causal modeling of the input sequence. Since this structure is a mature technology in this field, its internal structural details will not be elaborated here.
[0073] The innovation of this invention lies in introducing a global rearrangement (GR) strategy based on Hilbert curves during the feature serialization process of G-SSM, replacing the conventional line-by-line flattening operation. For details on the specific principles of this strategy and its comparison with other rearrangement methods, please refer to [link to relevant documentation]. Figure 8 And corresponding explanations. Figure 8 This paper demonstrates three spatial rearrangement strategies for serializing 2D feature maps into 1D sequences: Sequential Rearrangement (SR), Global Rearrangement (GR), and Window Rearrangement (WR). Each column in the figure comprises two parts: the upper squares represent the spatial positions of pixels in the original 2D feature map, where different colored blocks (blue, pink, green, yellow, etc.) are used only to identify spatially adjacent pixel groups for intuitive tracking of their relative positions after serialization; the lower bars represent the corresponding 1D unfolded sequence, where the arrangement of blocks of the same color within the bars reflects the relative distances of that group of spatially adjacent pixels after serialization. A cluster of blocks of the same color indicates that spatially adjacent pixels remain close to each other in the sequence, while a dispersed cluster indicates that spatial relationships have been disrupted.
[0074] Sequential rearrangement (SR) is a conventional row-by-row flattening operation: pixels are arranged sequentially into a one-dimensional sequence in row-major order. This method is simple to implement, but it disrupts the local continuity of a two-dimensional image. For example... Figure 8As shown in the SR row, pixels that were originally adjacent vertically in two-dimensional space are separated into positions that are a full row width apart in a one-dimensional sequence, making it difficult for the state space model to capture the spatial relationships across rows.
[0075] Global Reordering (GR) is a global serialization strategy based on Hilbert curves used in GMB in this invention. Hilbert curves, as space-filling curves, possess the continuous traversal characteristics of fractal geometry, enabling them to recursively map a two-dimensional plane into a one-dimensional path. For example... Figure 8 As shown in the GR row, compared to the SR strategy, the GR strategy ensures that pixels (same color blocks) that were originally adjacent in two-dimensional space still maintain high proximity in the unfolded one-dimensional sequence, providing the optimal contextual structure prerequisite for the long-range global spectral consistency of subsequent G-SSM modeling.
[0076] Therefore, this invention dedicates the GR strategy to GMB, corresponding to the modeling objective of the global receptive field. Specifically, the global / window G-SSM layer uses a Hilbert curve as the rearrangement path to traverse the spatial positions of each pixel in the received two-dimensional feature map, thereby rearranging the two-dimensional feature map into a one-dimensional sequence for output.
[0077] As can be seen, this invention is based on an asymmetric U-Net structure. The encoder uses LMB for local window processing and parallel computation (capturing 8×8 fine texture), the decoder uses GMB for global context aggregation (Hilbert curve traversal), and cross-scale skip connections fuse multi-resolution features through 1×1 convolution.
[0078] It should be noted that in the Global Mamba Block (GMB), in addition to using Hilbert curves for spatial rearrangement of one-dimensional sequences, other space-filling curves (such as Z-order curves and Peano curves) can also be used, or a multi-directional cross-order rearrangement strategy (such as alternating horizontal, vertical, and diagonal rearrangement) can be adopted to achieve efficient mapping from two-dimensional feature maps to one-dimensional sequences, thereby preserving the global spatial dependencies of the image.
[0079] Furthermore, the LMB is located in the data-driven module. The encoder path of the internal asymmetric U-Net has the following structure: Figure 7 As shown on the left, it is almost identical to GMB, the only difference being that its core module has been replaced by L-SSM instead of G-SSM.
[0080] Specifically, any LMB consists of a series of cascaded layer normalization layers, a global / window Mamba layer, a 1×1 dimensionality reduction convolutional layer, a Leaky ReLU activation function layer, and a 1×1 dimensionality increase convolutional layer; the final output of the LMB is the sum of the final output of the global / window Mamba layer and the output of the 1×1 dimensionality increase convolutional layer. The global / window Mamba layer consists of a first normalization layer, a first linear layer, a depthwise separable convolutional layer, a global / window L-SSM layer, a second normalization layer, and a second linear layer, all cascaded sequentially. The output of the first normalization layer then passes through a third linear layer and a SiLU activation layer before entering the second linear layer. The sum of the output of the second linear layer and the input of the first normalization layer serves as the final output of the global / window Mamba layer.
[0081] It should be noted that L-SSM is the core module of LMB, and its structure is no different from that of the standard SSM. The innovation lies in the window rearrangement strategy (WR) used in its feature serialization process. This strategy divides the feature map evenly into several non-overlapping local windows, and the pixels in each window are independently serialized and then fed into L-SSM for parallel processing.
[0082] Specifically, the global / window L-SSM layer uses multiple non-overlapping local windows to traverse the spatial positions of each pixel in the received two-dimensional feature map. Then, the pixel regions extracted in each local window are rearranged into one-dimensional subsequences in a sequential manner. Finally, all one-dimensional subsequences are concatenated into the final one-dimensional sequence for output.
[0083] like Figure 8 As shown in the WR row, spatially adjacent pixels (same color patches) within the same window remain closely arranged in a one-dimensional sequence, while different windows are independent of each other. This preserves the fine modeling capability of local high-frequency textures and maintains high computational efficiency through parallel operations between windows. Therefore, this invention applies the WR strategy specifically to LMB, corresponding to the modeling objective of "local" window processing.
[0084] It should be noted that the dual Mamba module asymmetric architecture includes an 8×8 window rearrangement of the local Mamba block (LMB) and a global rearrangement of the global Mamba block (GMB). Complementary local-global feature representations are achieved through selective state-space modeling, with a computational complexity of O(L).
[0085] Furthermore, the G-Mamba and L-Mamba modules employ state-space equations.
[0086]
[0087] By combining selective gating, relevant spatial-spectral information is adaptively aggregated to capture long-range dependencies while maintaining linear complexity.
[0088] Meanwhile, in the asymmetric U-Net of the data-driven module and the inverse mapping network of the model-driven module, the global Mamba block (GMB) used to capture long-range dependencies and the local Mamba block (LMB) used for local texture extraction can be replaced with a visual Transformer module based on a multi-head self-attention mechanism (such as Swin-Transformer) or a dilated convolution module with a large receptive field, and the spatial-spectral regularization and refinement of features can still be achieved.
[0089] IV. Dual Variable Update Module See Figure 6 The content highlighted by the dotted line on the left is in the [section name]. Second iteration The final stage (i.e., the model-driven module) Solve And the data-driven module Solve Afterwards, the system optimizes the dual variables according to the ADMM criterion. Perform residual accumulation and update. The specific update formula is:
[0090] in, The penalty parameter matrix is used to adaptively learn during the training of the deep unfolded network part, and is used to dynamically balance the constraint strength of the data fidelity term and the prior regularization term in different iteration stages. In other words, the present invention allows the network to adaptively adjust the constraint strength of different iteration stages during training. This indicates element-wise multiplication. For the first The dual variables obtained by the dual variable update module. The updated dual variables. It will carry information on the differences between "physical-driven" and "data-driven" approaches in the current round, along with... and Passed together to the first The physics-driven module of the next iteration This mechanism avoids the error amplification problem of conventional black-box cascaded networks, ensuring the theoretically rigorous convergence of deep unfolded networks.
[0091] In summary, compared with the prior art, the present invention has the following advantages: 1. Significantly Improved Reconstruction Accuracy: Experiments show that the I-SRR-9 stage model achieves a PSNR of 36.01 dB on the BJTU-UVA dataset, which is 2.64 dB higher than the strongest competing method, SRR-MAXL. It achieves PSNRs of 36.01 dB, 33.95 dB, and 25.95 dB on synthetic, augmented, and real datasets, respectively, comprehensively surpassing all baseline methods (QDO: 28.17 / 25.63 / 16.64 dB, MST++: 32.66 / 30.11 / 22.78 dB, DPU: 33.51 / 30.87 / 23.62 dB).
[0092] 2. End-to-end reconstruction of a single RGB image: High-fidelity reflectance reconstruction is achieved from a single RGB image without the need for known illumination, auxiliary hardware (such as dual-LED systems or programmable illumination arrays), or multiple captures. Compared to Park et al.'s method, which requires multiplexed illumination, and Huo et al.'s method, which requires dual LEDs and testing-time optimization, this invention significantly reduces system complexity and the barrier to entry.
[0093] 3. Robust generalization across lighting conditions: The lighting adapter module uses scene statistics to adaptively estimate lighting, maintaining high reconstruction quality under diverse lighting conditions such as color temperature range of 4000K-11200K and different times of day (morning and afternoon) indoors and outdoors.
[0094] 4. Interpretable Optimization Based on Physical Constraints: The ADMM optimization algorithm is expanded into a deep network, with each stage corresponding to a clearly defined optimization step. This maintains the interpretability of traditional optimization while possessing the expressive power of deep learning. The embedding of the physical imaging model Φ ensures that the reconstruction results conform to optical principles. Ablation experiments show that removing the physical constraints results in a 2.65 dB performance decrease.
[0095] 5. Efficient Multi-Scale Feature Modeling: The dual Mamba architecture achieves long-range dependency modeling with linear complexity O(L), significantly outperforming the Transformer's O(L²). The 5-stage variant maintains 31.39 dB performance while reducing parameter count (2.07M vs 3.72M) and computational cost (57.70 GFLOPs vs 98.59 GFLOPs) by 44% and 41%, respectively.
[0096] 6. Balancing Fine Texture and Global Structure: In the asymmetric U-Net design, the encoder's LMB captures material boundaries and 8×8 local textures, while the decoder's GMB integrates scene-level context. Ablation experiments show that the GMB (data-driven module) contributes a +1.58 dB improvement, and the LMB contributes a +0.13 dB improvement.
[0097] 7. Accuracy of Illumination Estimation: Experimental validation shows that under illumination from sources A and D65, the scene's average spectral response accurately matches the true illumination spectrum. The lighting adapter module contributed the largest improvement (+2.65 dB) in the ablation experiments, confirming that adaptive lighting compensation is key to accurate reflectance reconstruction.
[0098] 8. Balance between computational efficiency and performance: Compared to SRR-MAXL (81.40 GFLOPs with additional iterations per image) which requires test-time optimization, I-SRR-9stg can complete inference at 98.59 GFLOPs without test-time optimization. The inference time for a single 512×512 image is approximately 0.15 seconds (5 stages) or 0.25 seconds (9 stages), meeting the requirements of real-time applications.
[0099] Of course, the present invention may have other various embodiments. Without departing from the spirit and essence of the present invention, those skilled in the art can make various corresponding changes and modifications according to the present invention, but these corresponding changes and modifications should all fall within the protection scope of the appended claims.
Claims
1. A physically constrained adaptive spectral reflectance reconstruction system for illumination, characterized in that, Includes a data preprocessing module and a deep unrolling network component; The data preprocessing module is used to process the initial RGB image of the 3 channels. Upgrade to the set C spectral channels; The deep unfolded network part is composed of It consists of a series of cascaded driving components, each of which is obtained by cascading a model-driven module, a data-driven module, and a dual variable update module. when At that time, the first Model-driven modules Based on the received 3-channel initial RGB image , No. The reflectivity characteristics output by each data-driven module , No. The dual variables output by the dual variable update module and the The lighting estimate output by each model-driven module It calculates and corrects the deviation between the current predicted RGB image and the actual observed RGB image, and outputs the updated spectral features. and new lighting estimates Then, the first Data-driven modules Spectral characteristics Perform spatial-spectral regularization processing to output refined reflectance characteristics. This is used by the next driver component; simultaneously, the dual variable update module without explicit structure is accessed through... and The difference yields the dual variable. ;No. Data-driven modules Final output This is the reconstructed spectral reflectance image; when At that time, the spectral images of the C spectral channels output by the data preprocessing module are used as the initial reflectance features. , A complete tensor of length C is used as the initial illumination estimate. , will with Use all-zero tensors of the same dimension as initial dual variables .
2. The illumination adaptive spectral reflectance reconstruction system based on physical constraints as described in claim 1, characterized in that, No. Model-driven modules Acquiring spectral features and new lighting estimates The specific method is as follows: Obtain the current radiometric characteristic benchmark as follows: in, This indicates element-wise multiplication; Will Through the camera's spectral response function Mapped to an RGB image, i.e., the forward-predicted RGB image. Subsequently, the predicted RGB image is calculated. Compared with the initial RGB image observed in practice Deviation between ; Utilizing learnable inverse mapping networks Deviation Back-projecting to the high-dimensional spectral domain yields a bias-corrected spectral image. ; The bias-corrected spectral image obtained by inverse mapping Compared with the aforementioned radiometric characteristic benchmark Adding them together yields the updated high-fidelity radiometric characteristics. : Global average pooling from Extract scene-level lighting cues to obtain new lighting estimates. : Using updated high-fidelity radiometric characteristics Divide each channel by the new lighting estimate Decoupled the updated spectral features : in, This indicates element-wise division.
3. The illumination adaptive spectral reflectance reconstruction system based on physical constraints as described in claim 2, characterized in that, Global average pooling Specifically, it includes: right The following steps are performed sequentially: global average pooling, first 1×1 convolution, Leaky ReLU activation, second 1×1 convolution, and Sigmoid activation, to obtain... .
4. The illumination adaptive spectral reflectance reconstruction system based on physical constraints as described in claim 1, characterized in that, Each data-driven module consists of an encoder and a decoder in a corresponding mirror state; the encoder includes three LMB feature extraction modules connected by a downsampling operation interval, and the decoder includes three GMB feature extraction modules connected by an upsampling operation interval. No. Data-driven modules Obtaining reflectivity features The specific method is as follows: Spectral characteristics The shallow local features are extracted sequentially through the first LMB. Then, the size of the shallow local features is halved and the number of feature channels is doubled through the first downsampling. The output of the first downsampling is then passed through the second LMB for mid-level feature modeling, and then subjected to a second downsampling. Finally, the output of the second downsampling is extracted by the third LMB at the deepest layer to obtain the final encoded semantic features. The encoded semantic features are processed through a first 1×1 convolution and then fed into the first Global Contextual Block (GMB) for global context information fusion. Then, the output of the first GMB is doubled in size and the number of feature channels is halved through a first upsampling process. The output of the first upsampled GMB and the encoded semantic features are then fed into a second 1×1 convolution for feature fusion, followed by another GMB for global context information fusion. The output of the second GMB is then doubled in size and the number of feature channels is halved through a second upsampling process. The output of the second upsampled GMB and the output of the second Local Multi-Band (LMB) are then fed into a third 1×1 convolution for feature fusion, followed by another GMB for global context information fusion. Finally, the output of the third GMB is used as the reflectance feature. .
5. The illumination adaptive spectral reflectance reconstruction system based on physical constraints as described in claim 4, characterized in that, Any GMB consists of sequentially cascaded layer normalization layers, global / window Mamba layers, 1×1 dimensionality reduction convolutional layers, Leaky ReLU activation function layers, and 1×1 dimensionality increase convolutional layers; the final output of the GMB is the sum of the final output of the global / window Mamba layer and the output of the 1×1 dimensionality increase convolutional layer. The global / window Mamba layer consists of a first normalization layer, a first linear layer, a depthwise separable convolutional layer, a global / window G-SSM layer, a second normalization layer, and a second linear layer, all cascaded sequentially. The output of the first normalization layer then passes through a third linear layer and a SiLU activation layer before entering the second linear layer. The sum of the output of the second linear layer and the input of the first normalization layer serves as the final output of the global / window Mamba layer.
6. The illumination adaptive spectral reflectance reconstruction system based on physical constraints as described in claim 5, characterized in that, The global / window G-SSM layer uses a Hilbert curve as a rearrangement path to traverse the spatial position of each pixel in the received two-dimensional feature map, thereby rearranging the two-dimensional feature map into a one-dimensional sequence for output.
7. The illumination adaptive spectral reflectance reconstruction system based on physical constraints as described in claim 4, characterized in that, Any LMB consists of a series of cascaded layer normalization layers, a global / window Mamba layer, a 1×1 dimensionality reduction convolutional layer, a Leaky ReLU activation function layer, and a 1×1 dimensionality increase convolutional layer; the final output of the LMB is the sum of the final output of the global / window Mamba layer and the output of the 1×1 dimensionality increase convolutional layer. The global / window Mamba layer consists of a first normalization layer, a first linear layer, a depthwise separable convolutional layer, a global / window L-SSM layer, a second normalization layer, and a second linear layer, all cascaded sequentially. The output of the first normalization layer then passes through a third linear layer and a SiLU activation layer before entering the second linear layer. The sum of the output of the second linear layer and the input of the first normalization layer serves as the final output of the global / window Mamba layer.
8. The illumination adaptive spectral reflectance reconstruction system based on physical constraints as described in claim 7, characterized in that, The global / window L-SSM layer uses multiple non-overlapping local windows to traverse the spatial positions of each pixel in the received two-dimensional feature map. Then, the pixel regions extracted in each local window are rearranged into one-dimensional subsequences in a sequential manner. Finally, all one-dimensional subsequences are concatenated into the final one-dimensional sequence for output.
9. The illumination adaptive spectral reflectance reconstruction system based on physical constraints as described in claim 1, characterized in that, Dual variable update module obtains dual variables The specific method is as follows: in, This refers to the penalty parameter matrix for adaptive learning during the training of the deep unfolded network portion. This indicates element-wise multiplication. For the first The dual variables obtained by the dual variable update module.
10. The illumination adaptive spectral reflectance reconstruction system based on physical constraints as described in claim 1, characterized in that, The data preprocessing module consists of sequentially cascaded 3×3 dimensionality-reducing convolutional layers, Leaky ReLU activation function layers, and 3×3 dimensionality-upgrading convolutional layers.