Ranking learning based no-reference hyperspectral image quality assessment method and device

By using an S-Transformer network based on ranking learning to acquire deep features and calculate Wasserstein distance, the problem of poor generalization ability of handcrafted features in existing technologies is solved, thereby improving the accuracy of hyperspectral image quality assessment.

CN117876317BActive Publication Date: 2026-06-19XIDIAN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
XIDIAN UNIV
Filing Date
2024-01-08
Publication Date
2026-06-19

Smart Images

  • Figure CN117876317B_ABST
    Figure CN117876317B_ABST
Patent Text Reader

Abstract

This invention provides a referenceless hyperspectral image quality assessment method and apparatus based on ranking learning. The method involves inputting the restored hyperspectral image to be evaluated into a pre-trained S-Transformer network to obtain depth features and probability distributions. The distance between the probability distribution of the depth features and the baseline distribution is calculated and used as an evaluation metric to assess the distortion level of the restored hyperspectral image. The pre-trained S-Transformer network is trained by using the quality of paired images as a pre-training task. During pre-training, the network calculates self-attention along the spectral dimension to mine quality-related depth features, fully considering translation and size transformations. The degree of shift in the distribution of depth features of the distorted image is used as a reference for measuring image quality, thus making it more suitable for hyperspectral image quality assessment and resulting in better evaluation quality.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of image quality assessment technology, specifically relating to a method and apparatus for assessing the quality of referenceless hyperspectral images based on ranking learning. Background Technology

[0002] Hyperspectral images contain detailed scene representation information and have wide applications in various fields such as remote sensing and object detection. Coordinated aperture snapshot spectral imaging systems enable rapid imaging, and the two-dimensional measurement matrix they capture can theoretically be reconstructed into a three-dimensional hyperspectral image. However, in real-world tasks, only a two-dimensional measurement matrix is ​​available, without a true three-dimensional spectral image, making it impossible to calculate full-reference image quality evaluation metrics. Existing research typically evaluates the performance of reconstruction algorithms on simulated human figures. However, evaluation results on simulated datasets cannot represent the performance of algorithms in real-world tasks.

[0003] The existing method for evaluating no-reference hyperspectral images (Jingxiang Y, Yongqiang Z, Chen Y, et al. No-Reference Hyperspectral Image Quality Assessment via Quality-Sensitive Features Learning[J]. Remote Sensing, 2017, 9(4):305.DOI:10.3390 / rs9040305.) is based on statistical handcrafted features. However, handcrafted features have poor representation and generalization capabilities, and the selection process is cumbersome. Furthermore, when measuring the difference between feature vector distributions, the improved Bhattacharyya distance is used, which measures their similarity by calculating the degree of overlap between two distributions. Translation and scaling transformations have a significant impact on measuring feature vector distributions. Therefore, the existing method is not effective for evaluating hyperspectral images. Summary of the Invention

[0004] To address the aforementioned problems in the existing technology, this invention provides a referenceless hyperspectral image quality assessment method and apparatus based on ranking learning. The technical problem to be solved by this invention is achieved through the following technical solution:

[0005] In a first aspect, the present invention provides a no-reference hyperspectral image quality assessment method based on ranking learning, comprising:

[0006] S100, acquire a restored hyperspectral image to be evaluated;

[0007] S200, Input the restored hyperspectral image into a pre-trained S-Transformer network to obtain deep features and probability distribution;

[0008] The pre-trained S-Transformer network is obtained by training a pre-defined S-Transformer network by taking the evaluation of the quality of paired images on the distortion level dataset as the pre-training task; during the pre-training process, the pre-defined S-Transformer network calculates self-attention along the spectral dimension to mine quality-related deep features.

[0009] S300, calculate the distance between the probability distribution of the depth feature and the baseline distribution, and use this distance as an evaluation index to evaluate the degree of distortion of the restored hyperspectral image.

[0010] In a second aspect, the present invention provides a referenceless hyperspectral image quality assessment device based on ranking learning, comprising:

[0011] The acquisition device is configured to acquire a restored hyperspectral image to be evaluated;

[0012] The extraction device is configured to input the restored hyperspectral image into a pre-trained S-Transformer network to obtain depth features and probability distributions;

[0013] The pre-trained S-Transformer network is obtained by training a pre-defined S-Transformer network by taking the evaluation of the quality of paired images on the distortion level dataset as the pre-training task; during the pre-training process, the pre-defined S-Transformer network calculates self-attention along the spectral dimension to mine quality-related deep features.

[0014] A computing device is configured to calculate the distance between the probability distribution of the depth features and a baseline distribution, and to use this distance as an evaluation metric to assess the degree of distortion of the restored hyperspectral image.

[0015] Beneficial effects:

[0016] This invention provides a referenceless hyperspectral image quality assessment method and apparatus based on ranking learning. The method involves acquiring a reconstructed hyperspectral image to be evaluated; inputting the reconstructed hyperspectral image into a pre-trained S-Transformer network to obtain depth features and probability distributions; calculating the distance between the probability distribution of the depth features and a baseline distribution, and using this distance as an evaluation metric to assess the distortion level of the reconstructed hyperspectral image. The pre-trained S-Transformer network is trained by evaluating the quality of paired images on a distortion level dataset as a pre-training task. During pre-training, the pre-trained S-Transformer network calculates self-attention along the spectral dimension to mine quality-related depth features, fully considering translation and size transformations. The degree of shift in the distribution of depth features of the distorted image is used as a reference for measuring image quality, thus making it more suitable for hyperspectral image quality assessment and resulting in better evaluation performance.

[0017] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0018] Figure 1 This is a flowchart illustrating a no-reference hyperspectral image quality assessment method based on ranking learning provided by the present invention.

[0019] Figure 2 This is a schematic diagram illustrating the evaluation process of restoring hyperspectral images without real images, as provided by the present invention.

[0020] Figure 3 This is a flowchart of the pre-training process of the S-Transformer network provided by the present invention;

[0021] Figure 4 This is a schematic diagram of the S-Transformer network provided by the present invention;

[0022] Figure 5 This is a schematic diagram of the multi-head spectral attention layer provided by the present invention;

[0023] Figure 6 This is a schematic diagram of the calculation process of spectral self-attention provided by the present invention. Detailed Implementation

[0024] The present invention will be further described in detail below with reference to specific embodiments, but the implementation of the present invention is not limited thereto.

[0025] refer to Figure 1 and Figure 2 This invention provides a referenceless hyperspectral image quality assessment method based on ranking learning, comprising:

[0026] S100, acquire a restored hyperspectral image to be evaluated;

[0027] The restored hyperspectral image to be evaluated is a restored hyperspectral image of a new scene without a ground truth image.

[0028] S200, Input the restored hyperspectral image into a pre-trained S-Transformer network to obtain deep features. and probability distribution

[0029] The pre-trained S-Transformer network is obtained by training a pre-defined S-Transformer network by taking the evaluation of the quality of paired images on the distortion level dataset as the pre-training task; during the pre-training process, the pre-defined S-Transformer network calculates self-attention along the spectral dimension to mine quality-related deep features.

[0030] This invention inputs the restored hyperspectral image into a pre-trained S-Transformer network, so that the embedding layer performs embedding processing on the restored hyperspectral image and outputs it to the first multi-head spectral attention layer. The processing of the multi-head spectral attention layer obtains the three-dimensional output matrix of the restored hyperspectral image and outputs it to the combination layer. The output of the second linear mapping layer of the pre-trained S-Transformer network after activation is used as the depth feature of the restored hyperspectral image, and the depth feature is fitted to obtain a probability distribution. This probability distribution is used as the probability distribution of the depth feature of the restored hyperspectral image.

[0031] It is worth noting that the pre-trained S-Transformer network of this invention includes: embedding layer → multi-head attention layer → 4 combined layers (composed of downsampling layer and multi-head attention layer) → downsampling layer → flattening → 7 linear mapping layers (composed of fully connected + ReLU activation). The probability distribution of the deep features is not obtained from the subsequent network layers, but is fitted from this deep feature vector. The fitting formula is as follows:

[0032]

[0033] During the evaluation phase, i.e., the application phase, the depth feature vector of this application is the output of the second linear mapping layer. The output of the 1×1×1 linear mapping layer is needed during the pre-training phase, but only the output of the second linear mapping layer is needed as the depth feature vector during evaluation. The probability distribution is fitted from this vector. After the pre-training phase, this invention can use the pre-trained network to evaluate the quality of restored hyperspectral images without real images.

[0034] This invention is based on deep features, compared to existing solutions that rely on manually designed features. Deep features can learn higher-level abstract features from raw data through multi-layered nonlinear transformations. This not only simplifies the process by eliminating the need for manually designed features but also achieves better performance on several other domain tasks. This invention designs an S-Transformer network for hyperspectral images and pre-trains it on a ranking task to ensure that the mined deep features are quality-sensitive.

[0035] S300, calculate the distance between the probability distribution of the depth feature and the baseline distribution, and use this distance as an evaluation index to evaluate the degree of distortion of the restored hyperspectral image.

[0036] The process of obtaining the baseline distribution in S300 of the present invention includes:

[0037] M original hyperspectral images with ground truth values ​​are input into a pre-trained S-Transformer network. The output of the pre-trained S-Transformer network after activating the second linear mapping layer is then processed. As a reference depth feature, the probability distribution of the reference depth feature follows a Gaussian distribution; wherein, the Gaussian distribution...

[0038] As a baseline distribution.

[0039] This step calculates the Wasserstein distance between the probability distribution of the depth features and the baseline distribution, and uses this distance as an evaluation metric to assess the degree of distortion in the restored hyperspectral image.

[0040] The Wasserstein distance measures the difference between the restored image distribution and the baseline distribution. Compared to the Bhattacharyya distance, the Wasserstein distance considers not only the overlap between distributions but also their translation and scaling transformations. This allows the Wasserstein distance to capture more accurate structural differences between the two distributions and is more advantageous for nontrivial cases.

[0041] refer to Figure 3 The pre-training process of the S-Transformer network of the present invention is as follows:

[0042] S400, the hyperspectral images with random Gaussian noise added at σ = 0.05 and 0.20 are regarded as distorted images with noise levels of 1 and 2, and the hyperspectral images generated by 3×3 and 5×5 blurring kernels are regarded as distorted images with blur levels of 1 and 2. The distorted images are combined into a level dataset.

[0043] S500 takes images of different levels with the same distortion mode and inputs them in pairs into a pre-set S-Transformer network for pre-training, and obtains the score of each image in the output pair of input images.

[0044] refer to Figure 4 The pre-defined S-Transformer network of this invention includes an embedding layer, multiple multi-head spectral attention layers, multiple downsampling layers, and a fully connected layer. The output of the embedding layer is connected to a multi-head spectral attention layer, whose output is input to multiple sequentially connected component layers. Each component layer consists of a downsampling layer and a multi-head spectral attention layer connected sequentially. The output of the last component layer is connected to a downsampling layer, whose output is connected to the input of the fully connected layer. Each downsampling layer downsamples the 3D output matrix of the preceding multi-head spectral attention layer to obtain a feature matrix with half the spatial resolution, and outputs it to the next multi-head spectral attention layer. The last downsampling layer flattens its own output and outputs it to the fully connected layer. The fully connected layer progressively maps the flattened result of the last downsampling layer, outputting a score for each hyperspectral image.

[0045] In the S-Transformer network, a hyperspectral image with shape H×W×C first enters the embedding layer. The output of the embedding layer passes through the multi-head spectral attention layer and the downsampling layer in the first two stages. The multi-head spectral attention layer does not change the shape of the input, while the downsampling layer halves the spatial resolution and doubles the number of channels.

[0046] The structure of the multi-head spectral attention layer is as follows Figure 5 As shown, each multi-head spectral attention layer is used for:

[0047] a, the spatial matrix X of the B hyperspectral images sampled from the previous sampling layer. in Normalization is performed on H×W×C respectively to obtain the feature normalization result for each hyperspectral image; where H×W represents the size of the hyperspectral image feature, and X... in Represents a space matrix;

[0048] b. The normalized result is split according to the spectral dimension to obtain two-dimensional vectors of C channels, with the height and width of each two-dimensional vector being H and W respectively;

[0049] c. Flatten the two-dimensional vector into X∈HW×C, the flattening formula is Flatten(X in (:,:,i))=X(:,i), where i represents the i-th channel;

[0050] d. Calculate the spectral self-attention of the flattened result to obtain the three-dimensional feature matrix, and output it;

[0051] refer to Figure 6 This step linearly maps the flattened result into a value vector. key vector and query vector Represented as V = XW V K = XW K Q = XW Q V, K, Q are split into N heads according to spectral dimensions, and the dimension of each head is... Where V = [V1, ..., V n ],K=[K1,…,K N ],Q=[Q1,…,Q N Then, the spectral self-attention scores of the N heads in the multi-head spectral attention layer are calculated and output, represented as... Where, σ j This represents a learnable adaptive weight coefficient that concatenates the outputs of the N heads of the multi-head spectral attention layer along the spectral dimension, and adds a positional embedding f. p (V), yielding the three-dimensional feature matrix, represented as W l Head represents a learnable weight matrix. j This represents the output of the j-th head.

[0052] e. Perform layer normalization on the three-dimensional feature matrix again to obtain the normalized result of the three-dimensional feature matrix;

[0053] f, the normalized result of the three-dimensional feature matrix is ​​sequentially passed through 1*1 two-dimensional convolution, 3*3 channel-by-channel two-dimensional convolution, and 1*1 two-dimensional convolution to obtain the three-dimensional output matrix of H×W×C.

[0054] This step performs another layer normalization on the output of the multi-head attention block; using a 1×1 2D convolutional layer will change the input from H×W×C to H×W×4C; using a 3×3 channel-wise 2D convolutional layer with different convolutional kernels for different spectral dimensions and setting padding to 1 ensures that the spatial dimensions of the feature map remain unchanged, and the output shape is still H×W×4C; using a 1×1 2D convolutional layer will change the output back to the shape of H×W×C.

[0055] refer to Figure 4 In the three combination layers of the S-Transformer network, everything else remains the same, but the downsampling layer only changes the spatial resolution to half of its original value, while keeping the number of channels the same as the input. After sequential processing by the embedding and combination layers of the S-Transformer network, the desired output will be obtained. The feature map is then flattened into A one-dimensional vector.

[0056] refer to Figure 6 The present invention calculates the spectral self-attention of the flattened result to obtain a three-dimensional feature matrix, and outputs the following:

[0057] The flattened result is linearly mapped to a value vector. key vector and query vector Represented as V = XW V K = XW K Q = XW Q V, K, Q are split into N heads according to spectral dimensions, and the dimension of each head is... Calculate and output the spectral self-attention scores of the N heads of the multi-head spectral attention layer; concatenate the outputs of the N heads of the multi-head spectral attention layer according to the spectral dimension, and add positional embedding f. p (V) yields the three-dimensional feature matrix.

[0058] S600, calculate the pairwise ranking loss using the score of each image in the pairwise input images, and update the network parameters of the S-Transformer network according to the gradient of the pairwise ranking loss.

[0059] The image is progressively mapped to vectors of 1×1×4096, 1×1×4096, 1×1×1000, 1×1×512, 1×1×64, and 1×1×1 through fully connected layers and ReLU activations. The final output f(x; θ) is considered the network's score for the image. The pairwise ranking loss L(x1, x2; θ) is calculated using the scores of the two paired input images, where x1 and x2 represent different levels of paired distortion images, and θ represents the parameters of the neural network. L(x1, x2; θ) = max(0, f(x2; θ) - f(x1; θ) + ε), where ε represents the adjustment amount to ensure that f(x1; θ) ≠ f(x2; θ) is not equal to the two. The gradient of the pairwise ranking loss is then calculated. To update network parameters.

[0060]

[0061] in, and The two images represent their true quality levels, respectively. The above formula is based on the assumption of Assumingrank. x1 >rank x2 .

[0062] S700, repeat S500 to S600 until the training iterations are reached to obtain the pre-trained S-Transformer network.

[0063] This invention aims to propose a novel depth-feature-based distance metric that can evaluate the quality of reconstructed hyperspectral images without the availability of ground-value hyperspectral images. To extract quality-sensitive depth features, we constructed a simulation dataset with varying degrees of distortion. We used the ranking of two hyperspectral images with different distortion levels as a pre-training task to train a Transformer model based on spectral self-attention. This model can capture depth features related to image quality. The distribution of depth features in the distorted images is offset compared to the original images in the training set. We also consider the degree of this offset as a reference for image quality, using Wasserstein distance to calculate the difference between the distribution of the reconstructed image and the baseline image. A larger distance value indicates a greater difference and a worse quality reconstructed image; conversely, a smaller Wasserstein distance value indicates a less distorted reconstructed image.

[0064] This invention provides a referenceless hyperspectral image quality assessment device based on ranking learning, comprising:

[0065] The acquisition device is configured to acquire a restored hyperspectral image to be evaluated;

[0066] The extraction device is configured to input the restored hyperspectral image into a pre-trained S-Transformer network to obtain depth features and probability distributions;

[0067] The pre-trained S-Transformer network is obtained by training a pre-defined S-Transformer network by taking the evaluation of the quality of paired images on the distortion level dataset as the pre-training task; during the pre-training process, the pre-defined S-Transformer network calculates self-attention along the spectral dimension to mine quality-related deep features.

[0068] A computing device is configured to calculate the distance between the probability distribution of the depth features and a baseline distribution, and to use this distance as an evaluation metric to assess the degree of distortion of the restored hyperspectral image.

[0069] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.

[0070] Although this application has been described herein in conjunction with various embodiments, those skilled in the art will understand and implement other variations of the disclosed embodiments by reviewing the accompanying drawings, the disclosure, and the appended claims in carrying out the claimed application. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality.

[0071] The above description, in conjunction with specific preferred embodiments, provides a further detailed explanation of the present invention. It should not be construed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art, various simple deductions or substitutions can be made without departing from the concept of the present invention, and all such modifications and substitutions should be considered within the scope of protection of the present invention.

Claims

1. A ranking learning based no-reference hyperspectral image quality assessment method, characterized in that, include: S100, acquire a restored hyperspectral image to be evaluated; S200, Input the restored hyperspectral image into a pre-trained S-Transformer network to obtain depth features and probability distribution; The pre-trained S-Transformer network is obtained by training a pre-defined S-Transformer network by taking the evaluation of the quality of paired images on the distortion level dataset as the pre-training task; during the pre-training process, the pre-defined S-Transformer network calculates self-attention along the spectral dimension to mine quality-related deep features. S300, calculate the distance between the probability distribution of the depth feature and the baseline distribution, and use this distance as an evaluation index to evaluate the degree of distortion of the restored hyperspectral image; The pre-training process of the S-Transformer network is as follows: S400 will add The hyperspectral images with random Gaussian noise are regarded as distorted images with noise levels of 1 and 2. The hyperspectral images generated by the 3×3 and 5×5 blurring kernels are regarded as distorted images with blur levels of 1 and 2. The distorted images are combined into a level dataset. S500 takes images of different levels with the same distortion mode and inputs them in pairs into a pre-set S-Transformer network for pre-training, and obtains the score of each image in the output pair of input images. S600, calculate the pairwise ranking loss using the score of each image in the pairwise input images, and update the network parameters of the S-Transformer network according to the gradient of the pairwise ranking loss. S700, repeat S500 to S600 until the training number is reached to obtain the pre-trained S-Transformer network; The pre-defined S-Transformer network includes an embedding layer, multiple multi-head spectral attention layers, multiple downsampling layers, and a fully connected layer. The output of the embedding layer is connected to a multi-head spectral attention layer, and the output of the multi-head spectral attention layer is input to multiple sequentially connected component layers. Each component layer consists of a downsampling layer and a multi-head spectral attention layer connected sequentially. The output of the last component layer is connected to a downsampling layer, and the output of the downsampling layer is connected to the input of the fully connected layer.

2. The ranking learning based no-reference hyperspectral image quality assessment method according to claim 1, characterized in that, Each downsampling layer is used to downsample the 3D output matrix of the previous multi-head spectral attention layer to obtain a feature matrix with half the spatial resolution, and output it to the next multi-head spectral attention layer. The last downsampling layer flattens its own output and outputs it to the fully connected layer. The fully connected layer is used to gradually map the result of the flattening of the last downsampling layer, outputting a score for each hyperspectral image.

3. The ranking learning based no-reference hyperspectral image quality assessment method according to claim 2, wherein, Each multi-head spectral attention layer is used for: The spatial matrix of B hyperspectral images sampled from the previous sampling layer Normalization was performed separately to obtain the feature normalization results for each hyperspectral image; among which, The dimensions representing the features of a hyperspectral image, the Represents a space matrix; The normalized result is split along the spectral dimension to obtain C two-dimensional vectors, each with a height and width of [missing information]. H and W ; flattening a two-dimensional vector to ; The three-dimensional feature matrix is ​​obtained by calculating the spectral self-attention of the flattened result and then output. The three-dimensional feature matrix is ​​then normalized again to obtain the normalized result of the three-dimensional feature matrix; The normalized result of the three-dimensional feature matrix is ​​then sequentially processed through a 1*1 two-dimensional convolution, a 3*3 channel-wise two-dimensional convolution, and a 1*1 two-dimensional convolution to obtain the final result. The three-dimensional output matrix.

4. The ranking learning based no-reference hyperspectral image quality assessment method according to claim 3, wherein, The calculated flattened result is used to obtain a three-dimensional feature matrix through spectral self-attention, and the output includes: linearly mapping the flattened result to a value vector , a key vector , and a query vector , denoted as ; Will Splitting according to spectral dimension Head, each head dimension ; computing a multi-headed spectral attention layer spectral self-attention scores for the head and outputs; The outputs of the multiple heads of the spectral attention layer are concatenated along the spectral dimension, and a positional embedding is added to obtain a three-dimensional feature matrix. ​ 5. The ranking learning based no-reference quality assessment method of hyperspectral images according to claim 1, characterized in that, The process of obtaining the baseline distribution in S300 includes: M original hyperspectral images with ground truth values ​​are input into a pre-trained S-Transformer network. The output of the pre-trained S-Transformer network after activating the second linear mapping layer is then processed. As a reference depth feature, the probability distribution of the reference depth feature satisfies a Gaussian distribution; to the reference depth features of the original hyperspectral image are averaged to obtain the benchmark depth features; The probability distribution of the reference depth feature is used as the reference distribution.

6. The ranking learning based no-reference quality assessment method of hyperspectral images according to claim 2, characterized in that, S200 includes: The restored hyperspectral image is input into a pre-trained S-Transformer network, so that the embedding layer performs embedding processing on the restored hyperspectral image and outputs it to the first multi-head spectral attention layer. The three-dimensional output matrix of the restored hyperspectral image is obtained through the processing of the multi-head spectral attention layer and output to the combination layer. The output of the second linear mapping layer of the pre-trained S-Transformer network after activation is used as the depth feature of the restored hyperspectral image. The probability distribution of the depth feature of the restored hyperspectral image is obtained through multiple combination layers, the last downsampling layer and the fully connected layer.

7. The ranking learning based no-reference quality assessment method of hyperspectral images according to claim 1, characterized in that, The S300 includes: The Wasserstein distance between the probability distribution of the depth features and the baseline distribution is calculated, and this distance is used as an evaluation metric to assess the degree of distortion of the restored hyperspectral image.

8. A ranking learning based no-reference hyperspectral image quality assessment device, characterized in that, include: The acquisition device is configured to acquire a restored hyperspectral image to be evaluated; The extraction device is configured to input the restored hyperspectral image into a pre-trained S-Transformer network to obtain depth features and probability distributions; The pre-trained S-Transformer network is obtained by training a pre-defined S-Transformer network by taking the evaluation of the quality of paired images on the distortion level dataset as the pre-training task; during the pre-training process, the pre-defined S-Transformer network calculates self-attention along the spectral dimension to mine quality-related deep features. A computing device is configured to calculate the distance between the probability distribution of the depth features and a reference distribution, and to use the distance as an evaluation metric to assess the degree of distortion of the restored hyperspectral image. The pre-training process of the S-Transformer network is as follows: S400 will add The hyperspectral images with random Gaussian noise are regarded as distorted images with noise levels of 1 and 2. The hyperspectral images generated by the 3×3 and 5×5 blurring kernels are regarded as distorted images with blur levels of 1 and 2. The distorted images are combined into a level dataset. S500 takes images of different levels with the same distortion mode and inputs them in pairs into a pre-set S-Transformer network for pre-training, and obtains the score of each image in the output pair of input images. S600, calculate the pairwise ranking loss using the score of each image in the pairwise input images, and update the network parameters of the S-Transformer network according to the gradient of the pairwise ranking loss. S700, repeat S500 to S600 until the training number is reached to obtain the pre-trained S-Transformer network; The pre-defined S-Transformer network includes an embedding layer, multiple multi-head spectral attention layers, multiple downsampling layers, and a fully connected layer. The output of the embedding layer is connected to a multi-head spectral attention layer, and the output of the multi-head spectral attention layer is input to multiple sequentially connected component layers. Each component layer consists of a downsampling layer and a multi-head spectral attention layer connected sequentially. The output of the last component layer is connected to a downsampling layer, and the output of the downsampling layer is connected to the input of the fully connected layer.