Facility leafy vegetable image enhancement method based on multi-dimensional feature fusion

By employing an image enhancement method that integrates multi-dimensional feature fusion and correction, the problem of insufficient image quality in facility-grown leafy vegetable scenes has been solved. This method enables the correction and analysis of high-quality leafy vegetable images, thereby improving the precision management and intelligent level of facility agriculture.

CN122265064APending Publication Date: 2026-06-23KUNMING UNIV OF SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
KUNMING UNIV OF SCI & TECH
Filing Date
2026-03-20
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing general image enhancement technologies are not well adapted to the leafy vegetable scene in greenhouses. They cannot effectively correct defects such as uneven exposure, highlight spots, color cast noise, etc. Moreover, relying on single-dimensional feature extraction methods has severed the synergistic relationship between the edge texture, frequency domain details and multi-scale illumination context of leafy vegetable images, making it difficult to meet the accurate needs of leafy vegetable phenotypic analysis.

Method used

A method of multi-dimensional feature extraction, feature fusion and correction, and multi-scale decoding output is adopted. By extracting edge texture, frequency domain details, and cross-scale contextual features, combined with a conditional diffusion network, image correction is performed to output high-quality leafy vegetable enhanced images.

Benefits of technology

It effectively corrects defects such as uneven exposure, highlighting spots, color cast, and noise in images of leafy vegetables in greenhouses, and outputs high-quality enhanced images with uniform exposure, clear texture, and true colors. It also supports the accurate extraction of phenotypic indicators such as leafy vegetable canopy coverage and vegetation index, as well as yield prediction.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122265064A_ABST
    Figure CN122265064A_ABST
Patent Text Reader

Abstract

This invention discloses a multi-dimensional feature fusion method for enhancing images of leafy vegetables in greenhouses, belonging to the field of image processing. It addresses problems such as uneven exposure, highlight spots, and color distortion in leafy vegetable images caused by changes in light intensity and leaf characteristics under greenhouse conditions, achieving high-quality image enhancement. The method includes image normalization, multi-branch multi-dimensional feature extraction, feature fusion and conditional diffusion network correction, as well as multi-scale decoding and output. By introducing complementary features of various types, such as edge, frequency domain, cross-scale context, and exposure prior, and utilizing a lightweight diffusion backbone network and dynamic weight loss optimization, the clarity and realism of the images are effectively improved. The enhanced leafy vegetable images are suitable for agricultural phenotypic analysis scenarios such as canopy coverage, vegetation index extraction, and yield prediction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of image processing, and more specifically relates to a method for enhancing images of leafy vegetables in greenhouses through multi-dimensional feature fusion. Background Technology

[0002] Leafy vegetables are a core crop in greenhouse agriculture. The quality of leafy vegetable images directly determines the accuracy of extracting phenotypic indicators (canopy coverage, vegetation index, etc.) and predicting yield, which is of great significance for precision management in greenhouse agriculture. Traditional leafy vegetable image enhancement methods mostly rely on manual adjustments or general image processing tools, which are not only cumbersome and inefficient, but also difficult to adapt to the image defects under the complex lighting conditions of greenhouses, and cannot meet the needs of large-area, high-precision leafy vegetable phenotypic analysis. With the development of computer vision and artificial intelligence technologies, using image enhancement techniques to optimize leafy vegetable image quality has become a new direction for phenotypic research on leafy vegetables in greenhouse agriculture.

[0003] However, existing image enhancement techniques (such as sub-region enhancement, latent variable diffusion reconstruction, illumination mapping, and convolutional enhancement) are mostly designed for general visual scenes. While they can achieve basic image quality optimization, they have significant limitations in the context of facility-grown leafy vegetables: 1. They fail to dynamically correlate environmental factors such as illumination fluctuations and uneven diffuse scattering in facility scenes with the reflectivity of leafy vegetable leaves. Relying solely on general enhancement logic to process leafy vegetable images makes it difficult to comprehensively and accurately correct the unique defects of leafy vegetable images, such as uneven exposure, highlight spots, and color cast noise, which can easily lead to distortion of phenotypic features and affect the accuracy of subsequent phenotypic analysis; 2. They rely on single-dimensional feature extraction methods (such as general convolution, edge-preserving filtering, and structural feature extraction), which sever the collaborative correlation between edge texture, frequency domain details, and multi-scale illumination context of leafy vegetable images, failing to simultaneously ensure texture fidelity and illumination uniformity required for leafy vegetable phenotypic analysis. Therefore, there is an urgent need for a facility-grown leafy vegetable image enhancement method that integrates multi-dimensional features to achieve accurate correction of leafy vegetable images, preserve key phenotypic features, provide high-quality image support for leafy vegetable phenotypic index extraction and yield prediction, and contribute to the intelligent development of facility agriculture.

[0004] Image Processing Method and Apparatus (WO2017148035A1): An image processing method and apparatus, the method comprising: acquiring an image frame; dividing the image frame into several sub-partitions; performing image feature analysis and statistics on each sub-partition to obtain image feature information for each sub-partition; applying an enhancement algorithm to each sub-partition for image enhancement processing based on the image feature information of each sub-partition; and merging the enhanced images to obtain a processed image frame. Through sub-partition image analysis and enhancement processing, the characteristics of local image regions can be taken into account, achieving fine-tuning of the image and improving the image processing effect; furthermore, through sensitive area equalization enhancement processing, the abrupt change effect between blocks caused by partition processing in certain cases can be avoided, improving the fine-tuning degree of the image enhancement algorithm.

[0005] Image Enhancement Method, Apparatus, Electronic Device, Computer-Readable Storage Medium, and Computer Program Product (WO2025039694A1): This application provides an image enhancement method, apparatus, electronic device, computer-readable storage medium, and computer program product, which can be applied to various scenarios such as cloud technology, artificial intelligence, intelligent transportation, and assisted driving. The method includes: obtaining latent variables of an object image to be enhanced, and adding noise to the latent variables to obtain a noisy latent variable of the object image, wherein the object image is an image of a target object; extracting object structural features of the target object in the object image; combining the object structural features to perform denoising processing on the noisy latent variable to obtain a denoised latent variable of the object image; and performing image reconstruction on the denoised latent variable to obtain a first object-enhanced image of the object image.

[0006] An image enhancement method, apparatus, and storage medium (WO2020173320A1): This application discloses an image enhancement method, apparatus, and storage medium. The method involves acquiring an original image, performing synthetic processing on the features of the original image to obtain a first illumination map corresponding to the original image. The resolution of the first illumination map is lower than that of the original image. Based on the first illumination map, a mapping relationship for mapping the image to a second illumination map is obtained. Based on the mapping relationship, the original image is processed to obtain a second illumination map. The resolution of the second illumination map is the same as that of the original image. Image enhancement processing is performed on the original image according to the second illumination map to obtain a target image. This scheme can improve the efficiency of image enhancement.

[0007] Image Enhancement Method and Apparatus (WO2021063341A1): This application discloses an image enhancement method and apparatus in the field of computer vision within the field of artificial intelligence. The image enhancement method includes: acquiring an image to be processed; performing feature enhancement processing on the image to be processed through a neural network to obtain enhanced image features, wherein the neural network includes N convolutional layers, where N is a positive integer; and performing color enhancement processing and brightness enhancement processing on the image to be processed based on the enhanced image features to obtain an output image. The technical solution of this application improves the performance of the image to be processed in terms of detail, color, and brightness, thereby enhancing the image enhancement effect.

[0008] Image Enhancement Method, Data Processing Device, and Storage Medium (WO2019157966A1): This application discloses an image enhancement method, a data processing device, and a storage medium. The image enhancement method, executed by a data processing device, includes: performing edge-preserving filtering on an original image to obtain a first processed image; acquiring detailed features of the original image; determining a second processed image based on the detailed features and the first processed image; and using the original image as a guide map, processing the second processed image based on a guide map filtering method to obtain a third processed image.

[0009] Existing image enhancement technologies are mostly designed for general visual scenes. While they can achieve basic image quality optimization, they still have significant limitations in adaptability to the leafy vegetable scene. They fail to dynamically correlate environmental factors such as ambient light fluctuations and uneven diffuse scattering with the reflective characteristics of leafy vegetables, making it difficult to accurately correct defects unique to leafy vegetable images, such as uneven exposure, highlighting spots, and color cast noise, which can easily lead to phenotypic distortion. Furthermore, their reliance on single-dimensional feature extraction methods severs the synergistic relationship between leafy vegetable image edge texture, frequency domain details, and multi-scale lighting context. This makes it impossible to simultaneously achieve the texture fidelity and lighting uniformity required for leafy vegetable phenotypic analysis, hindering the accurate extraction of subsequent leafy vegetable phenotypic indicators and yield prediction. Summary of the Invention

[0010] This invention aims to address the limitations of existing general image enhancement technologies in adaptability to greenhouse leafy vegetable scenarios. Based on the correction of specific defects in greenhouse leafy vegetable images (uneven exposure, highlight spots, color cast noise), this invention uses multi-dimensional feature extraction, feature fusion and correction, and multi-scale decoding output techniques to obtain high-quality leafy vegetable enhanced images with uniform exposure, clear texture, and realistic colors. This provides reliable data support for the extraction of phenotypic indicators such as leafy vegetable canopy coverage and vegetation index, as well as yield prediction, thus contributing to the precision management and intelligent development of facility agriculture.

[0011] To achieve the above objectives, the present invention employs the following technical solution: the method comprises:

[0012] Image standardization: Obtain the original visible light images of leafy vegetables collected in the facility environment. The original images have defects such as overexposure / underexposure, specular highlights and color distortion due to greenhouse lighting and leaf characteristics.

[0013] Multi-dimensional feature extraction: Based on a standardized image, complementary features are extracted through four parallel branches;

[0014] Feature fusion and correction: The extracted multi-dimensional features are fused and targeted correction is achieved through a conditional diffusion network;

[0015] Multi-scale decoding and output: Reconstructing the denoised latent space features into high-resolution, high-fidelity enhanced images to meet the needs of phenotypic analysis;

[0016] Implementation details: All convolutional layers use 3×3 convolutional kernels with zero padding to maintain consistent size. The activation function is Gaussian error linear unit (GELU). The composite loss function is GradNorm dynamic weight balancing strategy used for model training.

[0017] In one approach, the image standardization includes: acquiring original visible light images of different varieties of leafy vegetables such as rapeseed, lettuce, and bok choy collected under facility conditions; uniformly scaling the original images to 512×512; and performing pixel normalization to map pixel values ​​to a normalization range of [0,1] to obtain standardized leafy vegetable images.

[0018] In one approach, the multi-dimensional feature extraction includes: extracting complementary features in parallel from a standardized leafy vegetable image obtained through image standardization by using an edge texture feature extraction branch, a frequency domain detail feature extraction branch, a cross-scale context feature extraction branch, and an exposure prior feature extraction branch.

[0019] In one approach, the feature fusion and correction includes: fusing multi-source features such as edge-aware features, frequency-domain detail features, and cross-scale context features to generate a multi-source fused feature vector; encoding a standardized leafy vegetable image using a variational autoencoder to obtain latent space features; and inputting the latent space features as a query, the multi-source fused feature vector as a key, and the exposure prior features as a value into a conditional diffusion network for feature interaction and exposure correction.

[0020] In one scheme, the multi-scale decoding and output includes: inputting the denoised and repaired latent space features into the multi-scale decoder, and gradually reconstructing from the low-scale features to a resolution consistent with the original image;

[0021] The multi-scale decoder is a 4-level cascaded structure, with each level including an upsampling layer and residual dense blocks; preferably, each level includes two layers of residual dense blocks, which are used to gradually restore spatial resolution and enhance detail reconstruction capabilities, and finally output a preliminary reconstructed image and perform color correction through the output layer.

[0022] Subsequently, residual noise was eliminated and color correction was performed on the preliminary reconstructed image to obtain a high-quality leafy vegetable enhanced image. This high-quality leafy vegetable enhanced image is suitable for subsequent analysis needs such as leafy vegetable canopy coverage, vegetation index phenotypic index extraction, and yield prediction.

[0023] In one implementation scheme, the implementation details include: all convolutional layers (except for 1×1 dimensionality reduction layers) use 3×3 convolutional kernels, and the padding method is 'Same' to ensure consistent size; the activation function uses Gaussian error linear unit (GELU) to enhance the model's ability to fit nonlinear patterns to complex illumination distributions.

[0024] The model training employs the GradNorm dynamic weight balancing strategy, and the composite loss function includes reconstruction loss, structural loss, frequency domain consistency loss, and perceptual loss to optimize the enhancement effect.

[0025] In one scheme, the multi-dimensional feature extraction includes: an edge texture feature extraction branch, which includes: extracting edge-aware features by converting the standardized image to grayscale and concatenating two layers of depthwise separable convolution with the GELU activation function; a frequency domain detail feature extraction branch, which includes performing multi-scale decomposition of the standardized image using Haar wavelet basis functions and sub-band adaptive weight balancing; a cross-scale context feature extraction branch, which includes outputting cross-scale context features through multi-scale pooling, multi-branch pooling, depthwise separable convolution and standardization, and non-linear activation; and an exposure prior feature extraction branch, which includes generating exposure prior features through 3×3 convolution, three layers of residual downsampling, global average pooling, multilayer perceptron, Sigmoid activation and multi-scale fusion.

[0026] In one scheme, the feature fusion and correction includes: inputting the multi-source fusion feature vector, exposure prior features and latent space features into a lightweight diffusion backbone network, using a lightweight conditional diffusion network for attention calculation optimization, and achieving dynamic correction of exposure unevenness and highlight flash under the guidance of multimodal features and exposure prior features, and outputting the denoised and repaired latent space features.

[0027] Beneficial effects of this invention:

[0028] Based on visible light images of leafy vegetables collected in facility environments, a multi-dimensional feature fusion image enhancement method for facility leafy vegetables is constructed. This method can effectively correct defects such as uneven exposure, highlight spots, and color cast noise caused by light fluctuations and leaf reflectivity. It outputs high-quality enhanced images with uniform exposure, clear texture, and realistic colors, providing reliable data support for the extraction of phenotypic indicators such as leafy vegetable canopy coverage and vegetation index, as well as yield prediction, thereby improving the level of precision management and intelligent decision-making in facility agriculture. Attached Figure Description

[0029] Figure 1 This is a flowchart of the method of the present invention. Detailed Implementation

[0030] To facilitate understanding of the present invention, a more complete description will be given below with reference to the accompanying drawings. Typical embodiments of the invention are shown in the drawings. However, the invention can be implemented in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

[0031] Unless otherwise defined, all technical and scientific terms used in this invention have the same meaning as understood by one of ordinary skill in the art to which this invention pertains. The terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the invention. To facilitate understanding, the invention will now be described more fully with reference to the accompanying drawings. Typical embodiments of the invention are shown in the drawings. However, the invention can be implemented in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided to make the disclosure of the invention more thorough and complete.

[0032] like Figure 1 As shown, this invention uses visible light images of leafy vegetables collected in a facility environment as the processing object, aiming to solve the quality defects of the original images caused by light fluctuations and leaf reflection characteristics, such as uneven exposure, highlight spots, color cast noise, etc., and improve the image signal-to-noise ratio and detail fidelity, providing high-quality data support for subsequent extraction of phenotypic indicators such as leafy vegetable canopy coverage and vegetation index, as well as yield prediction. The method mainly includes the following four steps: ① Image standardization: Input the original visible light image of leafy vegetables collected in the facility, and perform standardization preprocessing such as uniform size and pixel normalization to eliminate format differences and pixel fluctuation interference, providing a unified input basis for subsequent feature extraction. ② Multi-dimensional feature extraction: Through four parallel branches (edge ​​texture feature extraction branch, frequency domain detail feature extraction branch, cross-scale context feature extraction branch, and exposure prior feature extraction branch), edge perception features, frequency domain detail features, cross-scale context features, and exposure prior features are extracted respectively, achieving accurate capture of multi-dimensional complementary features. ③ Feature Fusion and Correction: A multi-source fusion feature vector is generated through multi-source feature fusion. Combined with the latent space features obtained from the variational autoencoder, the multi-source fusion feature vector is used as the key, the exposure prior features as the value, and the latent space features as the query. This is input into a lightweight diffusion backbone network. Guided by the multi-modal features and the exposure prior, exposure anomalies and highlight spots are dynamically corrected. ④ Multi-Scale Decoding Output: A multi-scale decoder with two layers of residual dense blocks is used to progressively upsample and reconstruct the denoised latent space features, ultimately outputting a high-quality leafy vegetable enhancement image with uniform exposure, clear texture, and realistic colors. The method flow is shown in Figure 1, and the specific steps are as follows:

[0033] 1) Image Standardization: First, the original visible light images of leafy vegetables collected under facility conditions are acquired. The leafy vegetables include different varieties such as rapeseed, lettuce, and bok choy. Due to factors such as greenhouse lighting and leaf characteristics, the original images have defects such as overexposure / underexposure, specular highlights, and color distortion. Then, the original images are uniformly sized and adjusted to 512×512. At the same time, pixel normalization is performed to map the pixel values ​​to the normalization range of [0,1] to eliminate interference caused by differences in image format and fluctuations in pixel values. Finally, a standardized leafy vegetable image is obtained, which serves as the unified input basis for subsequent feature extraction.

[0034] 2) Multi-dimensional Feature Extraction: Based on the standardized image obtained in step 1), complementary features are extracted through four parallel branches, as follows: ① Edge Texture Feature Extraction Branch: Designed to address the differences in leaf textures (smooth and highly reflective in rapeseed, wrinkled and diffuse in lettuce), this branch can autonomously adapt to different textures to extract fine edge features. The standardized image is grayscaled, and edge features are extracted by concatenating two depthwise separable convolutions (3×3) with the GELU activation function, reducing noise and error in traditional edge extraction methods, and ultimately outputting edge-aware features that accurately preserve the fine texture structure of the leaves. ② Frequency Domain Detail Feature Extraction Branch: This branch performs multi-scale decomposition of the image using the Haar wavelet basis function. Let the input image be X, and the low-pass filter be... The high-pass filter is The four subbands obtained from the decomposition are: (Low-frequency profile) and HL H (High-frequency details), among which, For convolution operations, Indicates matrix transpose. This indicates a downsampling operation. This method reduces the spatial dimensionality of the image while redistributing image information to different frequency sub-bands, achieving separate representation of low-frequency contours and high-frequency details. To balance the contribution of different frequency sub-bands to the reconstruction, adaptive sub-band weights are introduced.

[0035] ,in, The corresponding subband features are: GAP (global average pooling) and MLP (two fully connected layers). Using the Sigmoid activation function. Final frequency domain features. Reconstructed using inverse wavelet transform (IDWT): ③ Cross-scale contextual feature extraction branch: Designed for uneven greenhouse illumination (local strong light / shadow / diffuse scattering), it captures contextual features from different receptive fields through multi-scale pooling, balancing global illumination uniformity with local details. First, local, equalization, and global features are extracted through multi-branch pooling. After feature mapping and multi-scale fusion, depthwise separable convolution (3×3) and normalization and non-linear activation are used to output cross-scale contextual features that combine global illumination uniformity with local detail integrity. ④ Exposure prior feature extraction branch: Designed for uneven exposure and highlight flare defects in leafy vegetable images, it generates exposure guidance features for precise correction. After obtaining deep features through 3×3 convolution and three-layer residual downsampling, it is processed in two parallel paths: one path generates adaptive exposure coefficients through global average pooling + multilayer perceptron + Sigmoid activation; the other path obtains flattened features through multi-scale fusion + adaptive pooling + layer normalization + flattening operation. The outputs of the two paths are merged, and the final output is used to guide the network to repair the exposure prior features of the exposed areas.

[0036] 3) Feature Fusion and Correction: The multi-dimensional features extracted in step 2) are fused, and targeted correction is achieved through a conditional diffusion network, specifically as follows: ① Multi-source feature fusion: Edge-aware features, frequency domain detail features, and cross-scale context features are fused to generate a multi-source fused feature vector (which serves as the key input to the subsequent diffusion network). The multi-source feature fusion is achieved through an efficient conditional fusion mechanism (Cross-KV Attention). The specific logic is as follows: First, the edge-aware features are fused... Frequency domain detail features Cross-scale contextual features The features are concatenated along the channel dimension and a multi-source fusion feature vector is generated through 1 × 1 convolution mapping, serving as the key for the diffusion network. Simultaneously, the exposure prior features are used as the value for the diffusion network, and the latent space features obtained from VAE encoding are used as the query. (Fused features) ① Where d is the feature dimension, the attention mechanism enables the model to dynamically select the edge or frequency domain features most conducive to exposure correction. ② Latent space encoding: The standardized leafy vegetable image is encoded by a variational autoencoder to obtain latent space features (which serve as the Query input for the subsequent diffusion network). ③ Lightweight diffusion backbone network (DiT-s / 4): This network is designed for the hardware resource constraints of agricultural scenarios, achieving lightweight and efficient operation while ensuring feature processing capabilities. The multi-source fusion feature vector (Key), exposure prior features (Value), and latent space features (Query) are input into the DiT-s / 4 network. The feature dimension of this network is set to 384 and the depth to 10. Flash-Attn2 is used to optimize the attention computation efficiency. Under the guidance of multimodal features and exposure priors, it dynamically corrects uneven exposure and highlight flare, and outputs the denoised and repaired latent space features.

[0037] 4) Multi-scale Decoding and Output: The denoised latent space features are reconstructed into a high-resolution, high-fidelity enhanced image to meet the needs of phenotypic analysis. Specifically: ① Multi-scale Decoding and Reconstruction: Designed to progressively increase resolution and repair details, addressing issues of highlight blur and texture loss. The denoised latent space features obtained in step 3) are input into a multi-scale decoder containing two layers of residual dense blocks. From low-scale features (32×32), the image is ultimately reconstructed to a resolution consistent with the original image (512×512). Detail repair is enhanced through residual dense connections, and a preliminary reconstructed image is output. The multi-scale decoder uses a combination of 4-level cascaded residual dense blocks (RRDB) and upsampling layers. The specific steps are as follows: First level, input... Features, extracted using RRDB, are expanded to nearest neighbor interpolation and convolution. Second level, input Features, extracted using RRDB, are upsampled to At the third level, repeat the above process, upsampling to... The fourth level, finally upsampling to Color correction is then performed through the output layer. Each upsampling level includes residual connections, as shown in the formula... This ensures that deep details are not lost. ② Image optimization and output: Residual noise is eliminated and color correction is performed on the preliminary reconstructed image to obtain a high-quality enhanced image of leafy vegetables. This enhanced image can be used for subsequent extraction of phenotypic indicators such as leafy vegetable canopy coverage and vegetation index, as well as yield prediction.

[0038] Furthermore, the present invention also includes the following settings at the implementation level: ① Convolution parameters: all convolutional layers in the present invention (except...) (Except for the dimensionality reduction layer) all adopt ① Convolutional kernels: Padding is uniformly set to 'Same' (i.e., zero padding to maintain consistent size). ② Activation function: Gaussian error linear units (GELU) are used to enhance the model's ability to fit nonlinear patterns to complex lighting distributions. ③ Composite loss function: The model training employs a GradNorm dynamic weight balancing strategy, with a total loss... Defined as: ,in To reconstruct the loss, For structural loss, For frequency domain consistency loss, To perceive loss.

[0039] Example

[0040] Three common leafy vegetables—rapeseed, lettuce, and bok choy—are grown in a greenhouse. Visible light image acquisition equipment is installed on the greenhouse roof to automatically collect raw visible light images under different lighting conditions, including sunny days with strong sunlight and cloudy days with diffuse scattering, at fixed times each day. This captures images with defects such as uneven exposure, highlighting spots, and color cast noise caused by greenhouse light fluctuations and differences in leaf reflectivity (lettuce wrinkles easily create shadows, bok choy has medium texture, and rapeseed is smooth and easily produces highlights). First, the collected raw images of the three types of leafy vegetables undergo image standardization preprocessing, unifying the image size to 512×512 and normalizing pixel values ​​to the [0,1] range to eliminate format and pixel fluctuation interference. Then, a multi-dimensional feature extraction stage is entered, extracting complementary features through four parallel branches: ① Edge texture feature extraction branch: The standardized image is grayscaled, used only to enhance the structural edge response, without affecting the preservation and utilization of original color information by other branches. The algorithm employs a multi-level design to extract fine edge-aware features. The first branch utilizes depthwise separable convolution and GELU activation to autonomously adapt to the texture differences of three types of leafy vegetables. The second branch extracts frequency-domain detail features by using Haar wavelet transform (DWT) to decompose the image into low-frequency contour sub-bands and high-frequency detail sub-bands. After low-frequency contour enhancement, high-frequency detail repair, and adaptive weight adjustment, sub-band features are fused and compressed. Then, inverse Haar wavelet transform (IDWT) is used for reconstruction to repair lost details in highlight areas, outputting frequency-domain detail features. The third branch extracts cross-scale contextual features by using multi-branch pooling to obtain local, balanced, and global features. These features are then fused using feature mapping and multi-scale fusion, followed by depthwise separable convolution, normalization, and non-linear activation. This balances global illumination uniformity with local details, outputting cross-scale contextual features. The fourth branch extracts exposure prior features by using 3×3 convolution and three-layer residual downsampling to obtain deep features. These features are then processed in parallel (global average pooling + multi-layer perceptron; multi-scale fusion + adaptive pooling), combined with Sigmoid activation and layer normalization followed by flattening to generate adaptive exposure coefficients, ultimately outputting exposure prior features. Next, edge-aware features, frequency domain detail features, and cross-scale contextual features are fused from multiple sources to generate a multi-source fused feature vector. Simultaneously, a variational autoencoder encodes the standardized image to obtain latent space features. Then, the multi-source fused feature vector is used as the key, exposure prior features as the value, and latent space features as the query, inputting into a lightweight diffusion backbone network (DiT-s / 4). Guided by the multi-source features and exposure prior, the network dynamically corrects exposure unevenness and highlight flare, outputting denoised and repaired latent space features. Finally, the corrected latent space features are input into a multi-scale decoder containing two layers of residual dense blocks, progressively upsampling from low resolution to the original resolution to optimize residual noise and color distortion, ultimately outputting a high-quality enhanced visible light image of leafy vegetables with uniform exposure, clear texture, and realistic colors.

[0041] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.

[0042] It should be understood that the above detailed description of the technical solutions of the present invention with reference to preferred embodiments is illustrative and not restrictive. Those skilled in the art can modify the technical solutions described in the embodiments or make equivalent substitutions for some of the technical features based on reading this specification; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for enhancing images of leafy vegetables in greenhouses through multi-dimensional feature fusion, characterized in that: The method includes: Image standardization: Obtain the original visible light images of leafy vegetables collected in the facility environment. The original images have defects such as overexposure / underexposure, specular highlights and color distortion due to greenhouse lighting and leaf characteristics. Multi-dimensional feature extraction: Based on a standardized image, complementary features are extracted through four parallel branches; Feature fusion and correction: The extracted multi-dimensional features are fused and targeted correction is achieved through a conditional diffusion network; Multi-scale decoding and output: Reconstructing the denoised latent space features into high-resolution, high-fidelity enhanced images to meet the needs of phenotypic analysis; Implementation details: All convolutional layers use 3×3 convolutional kernels with zero padding to maintain consistent size. The activation function is Gaussian error linear unit (GELU). The composite loss function is GradNorm dynamic weight balancing strategy used for model training.

2. The method for enhancing images of facility-grown leafy vegetables by multi-dimensional feature fusion according to claim 1, characterized in that: The image standardization includes: acquiring original visible light images of different varieties of leafy vegetables such as rapeseed, lettuce, and bok choy collected under facility conditions; uniformly processing the original images to adjust the image size to 512×512; and performing pixel normalization processing to map the pixel values ​​to the normalization range of [0,1] to obtain standardized leafy vegetable images.

3. The method for enhancing images of facility-grown leafy vegetables by multi-dimensional feature fusion according to claim 1, characterized in that: The multi-dimensional feature extraction includes: extracting complementary features in parallel from the standardized leafy vegetable image obtained by image standardization through edge texture feature extraction branch, frequency domain detail feature extraction branch, cross-scale context feature extraction branch and exposure prior feature extraction branch.

4. The method for enhancing images of leafy vegetables in greenhouses by multi-dimensional feature fusion according to claim 1, characterized in that: The feature fusion and correction include: fusing multi-source features such as edge-aware features, frequency domain detail features, and cross-scale context features to generate a multi-source fused feature vector; encoding a standardized leafy vegetable image using a variational autoencoder to obtain latent space features; and inputting the latent space features as a query, the multi-source fused feature vector as a key, and the exposure prior features as a value into a conditional diffusion network for feature interaction and exposure correction.

5. The method for enhancing images of leafy vegetables in greenhouses by multi-dimensional feature fusion according to claim 1, characterized in that: The multi-scale decoding and output includes: inputting the denoised and repaired latent space features into the multi-scale decoder, and gradually reconstructing from the low-scale features to the same resolution as the original image; The multi-scale decoder is a 4-level cascaded structure, with each level including an upsampling layer and residual dense blocks; preferably, each level includes two layers of residual dense blocks, which are used to gradually restore spatial resolution and enhance detail reconstruction capabilities, and finally output a preliminary reconstructed image and perform color correction through the output layer. Subsequently, residual noise was eliminated and color correction was performed on the preliminary reconstructed image to obtain a high-quality leafy vegetable enhanced image. This high-quality leafy vegetable enhanced image is suitable for subsequent analysis needs such as leafy vegetable canopy coverage, vegetation index phenotypic index extraction, and yield prediction.

6. The method for enhancing images of facility-grown leafy vegetables by multi-dimensional feature fusion according to claim 1, characterized in that: The implementation details include: all convolutional layers (except for the 1×1 dimensionality reduction layer) use 3×3 convolutional kernels, and the padding method is 'Same' to ensure consistent size; the activation function uses Gaussian error linear unit (GELU) to enhance the model's ability to fit nonlinear patterns to complex lighting distributions. The model training employs the GradNorm dynamic weight balancing strategy, and the composite loss function includes reconstruction loss, structural loss, frequency domain consistency loss, and perceptual loss to optimize the enhancement effect.

7. The method for enhancing images of facility-grown leafy vegetables by multi-dimensional feature fusion according to claim 3, characterized in that: The multi-dimensional feature extraction includes: an edge texture feature extraction branch, which includes: grayscale conversion of the standardized image and concatenation of two depthwise separable convolutions with the GELU activation function to extract edge-aware features; a frequency domain detail feature extraction branch, which includes multi-scale decomposition of the standardized image using Haar wavelet basis functions and adaptive sub-band weight balancing; a cross-scale context feature extraction branch, which includes outputting cross-scale context features through multi-scale pooling, multi-branch pooling, depthwise separable convolutions and standardization, and non-linear activation; and an exposure prior feature extraction branch, which includes generating exposure prior features through 3×3 convolution, three-layer residual downsampling, global average pooling, multilayer perceptron, Sigmoid activation, and multi-scale fusion.

8. The method for enhancing images of facility-grown leafy vegetables by multi-dimensional feature fusion according to claim 4, characterized in that: The feature fusion and correction includes: inputting multi-source fusion feature vectors, exposure prior features and latent space features into a lightweight diffusion backbone network, using a lightweight conditional diffusion network for attention calculation optimization, and achieving dynamic correction of exposure unevenness and highlight flash under the guidance of multimodal features and exposure prior features, and outputting the denoised and repaired latent space features.