A no-reference image quality evaluation model training method
By learning image ranking information through Siamese networks and adding relevance constraints, the no-reference image quality assessment model is optimized, which solves the problems of insufficient generalization ability and inconsistent objectives in existing methods, and achieves image quality assessment with higher accuracy and better generalization.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIHANG UNIV
- Filing Date
- 2023-02-15
- Publication Date
- 2026-06-30
AI Technical Summary
Existing no-reference image quality assessment methods rely on manually designed features, have limited generalization ability, and the regression methods are too restrictive, lacking the utilization of quality relationships between images, resulting in inconsistencies between model training objectives and evaluation methods.
A Siamese network is used to learn the ranking information between images. The no-reference image quality assessment model is trained by optimizing the monotonicity and correlation loss functions. The monotonicity and correlation between image pairs are used to optimize the monotonicity and correlation between the network predictions and the subjective quality ground truth.
It significantly improves the model's accuracy and generalization performance, enabling it to predict image quality relationships more accurately, outperforming traditional regression methods.
Smart Images

Figure CN116129203B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image quality assessment technology, and more specifically to a method for training a no-reference image quality assessment model. Background Technology
[0002] Referenceless image quality assessment methods have become a research hotspot due to their wide range of applications.
[0003] Currently, most no-reference image quality assessment algorithms are based on regression to complete image quality assessment tasks, and are mainly divided into traditional methods and deep learning-based methods.
[0004] Traditional no-reference image quality assessment models generally consist of two parts: a feature extraction unit and a quality regression model. Depending on the feature extraction method, traditional no-reference image quality assessment models can be categorized into transform domain-based, spatial domain-based, and dictionary-learning-based methods. Based on the regression model, they can be classified into support vector machine-based, probabilistic model-based, and random forest-based methods. However, all of these methods rely on hand-designed features, which typically have limited generalization ability, and the design process requires significant time and extensive expertise, greatly limiting the further development and application of these methods.
[0005] With the successful application of deep learning-based methods in computer vision, some scholars have begun to use deep learning to perform image quality assessment. However, the methods mentioned above all treat image quality assessment as a regression task, using MSE or MAE to directly regress the image quality score. But the goal of image quality algorithms is to obtain an objective quality assessment that is consistent with the subjective quality score, and it does not require that the objective assessment result and the subjective quality score be completely consistent in numerical value.
[0006] Therefore, using regression methods is both too restrictive and lacks utilization of the quality relationships between images, which can also lead to inconsistencies between model training objectives and evaluation methods.
[0007] Therefore, how to provide a no-reference image quality assessment method that can overcome the above-mentioned defects is a problem that urgently needs to be solved by those skilled in the art. Summary of the Invention
[0008] In view of this, the present invention provides a training method for a no-reference image quality assessment model. By considering the monotonicity and correlation between image qualities, the no-reference image quality assessment model is optimized and trained to improve its evaluation accuracy and generalization performance.
[0009] To achieve the above objectives, the present invention adopts the following technical solution:
[0010] A method for training a no-reference image quality assessment model is disclosed. The no-reference image quality assessment model has two identical image quality assessment networks that share parameters. The model is optimized and trained using a monotonic loss function between image pairs.
[0011] l rank (x1,x2)=max(0,f(x2)-f(x1)+ε)
[0012] In the formula, x1 and x2 are the input images of two image quality assessment networks, and the true quality value of image x1 is higher than that of image x2. f(x1) and f(x2) are the corresponding outputs, and ε represents the interval.
[0013] Preferably, the input to the no-reference image quality assessment model is a pair of images or labels.
[0014] As a preferred approach, all images within a mini-batch during training are first sorted in descending order of their ground truth quality. Considering all image pairs within this mini-batch, the monotonicity loss of all image pairs is calculated through one forward propagation, using the following formula:
[0015]
[0016] In the formula, n represents the number of input images, L rank For the monotonicity loss of all image pairs, l rank represents the monotonicity loss between any two images.
[0017] Preferably, a matrix M is constructed based on the true quality value of the input image and the output of the no-reference image quality assessment model, and the L is calculated based on the matrix M. rank .
[0018] Preferably, the no-reference image quality assessment model is optimized and trained using the following correlation loss function.
[0019]
[0020] In the formula, n is the number of input images, and y i Let be the truth value of the i-th image. The average of the true values of n images. This represents the average value of the outputs of n image quality assessment models.
[0021] Preferably, two loss functions are used together to optimize and train the no-reference image quality assessment model.
[0022] Preferably, the no-reference image quality assessment model outputs the assessment results using the sigmoid function.
[0023] As can be seen from the above technical solution, this invention discloses a method for training a no-reference image quality assessment model. Compared with existing technologies, this invention learns the ranking information between images through a Siamese network and learns the numerical quantification of quality differences between images by adding additional linear correlation constraints. The monotonicity and correlation between the network's predicted values and the subjective quality ground truth values are optimized through the loss functions of both methods. The no-reference image quality assessment model trained by this invention can achieve significant improvements in accuracy and generalization performance. Attached Figure Description
[0024] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0025] Figure 1 This is a schematic diagram illustrating the training process of the no-reference image quality assessment model provided by the present invention. Detailed Implementation
[0026] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0027] This invention discloses a novel training method for a no-reference image quality assessment model. The no-reference quality assessment model obtained by the training method of this invention can effectively overcome the shortcomings of regression methods, such as strong constraints and inability to utilize the quality correlation between images. It is superior to the models trained by regression methods in terms of model accuracy and generalization performance.
[0028] Specifically, such as Figure 1 As shown, the evaluation model training method disclosed in this invention is as follows:
[0029] First, in order to directly optimize the quality assessment model during training, this application uses a Siamese network to learn the ranking information between images. That is, the no-reference image quality assessment model has two identical image quality assessment networks that share parameters. The input of the no-reference image quality assessment model is a pair of images or labels. Then, the output of the network and the corresponding loss are calculated by forward propagation, and finally, the model is optimized by backpropagation algorithm.
[0030] Furthermore, the monotonicity loss function between image pairs is calculated using the following formula, and this function is used to optimize and train the no-reference image quality assessment model.
[0031] l rank (x1,x2)=max(0,f(x2)-f(x1)+ε)
[0032] In the formula, x1 and x2 are the input images of two image quality assessment networks, and the true quality value of image x1 is higher than that of image x2. f(x1) and f(x2) are the corresponding outputs, and ε represents the interval.
[0033] The monotonicity between image pairs is an important indicator for evaluating no-reference image quality assessment algorithms, specifically measuring whether the algorithm can correctly predict the relative quality relationships between different images. This invention uses the output value of the no-reference image quality assessment model and the true image quality values to calculate the monotonicity loss between image pairs. From the formula for the monotonicity loss function, it can be seen that when f(x1) is greater than or equal to f(x2), i.e., the network output and input order are consistent, the loss is 0; when the network output and input order are inconsistent, the loss is not 0, and the greater f(x2) is compared to f(x1), the greater the loss. Therefore, this loss can measure the monotonicity between the algorithm's output value and the true subjective image quality value.
[0034] However, a significant drawback of Siamese networks is the large amount of redundant computation during training. For example, considering all possible image pairs composed of three images, in a standard Siamese network implementation, since each image appears in two different image pairs, all three images need to be forward-propagated twice within the network. Because the two branches of a Siamese network are identical, the computational cost of this process is twice the actual requirement.
[0035] To address this issue, we consider all possible image pairs within a mini-batch during training and calculate the loss for all image pairs through a single forward propagation.
[0036] In one embodiment, assuming there are n images in a mini-batch, all possible mini-batch images are considered. For each pair of images, the corresponding loss is as follows:
[0037]
[0038] In the formula, n represents the number of input images, L rank For the monotonicity loss of all image pairs, l rank represents the monotonicity loss between any two images.
[0039] Furthermore, in this embodiment, a matrix M is constructed based on the true quality value of the input image and the output of the no-reference image quality assessment model, and L is calculated based on matrix M. rank This is to accelerate the above algorithm.
[0040] Specifically, in order to fully utilize the parallel computing capabilities of modern GPUs without loss of generality, this application first sorts the images within a small batch in descending order of their quality ground truth, denoted as X = [x1, x2, ..., x...]. n The network output is f = [f(x1), f(x2), ..., f(x...]. n Therefore, the following n×n matrix F is constructed:
[0041]
[0042] Furthermore, matrix M is calculated using the following formula:
[0043] M = F T -F+εP
[0044] Where P is an identity matrix consisting entirely of 1s, each element of matrix M is as follows:
[0045]
[0046] Calculate L based on matrix M rank The process is as follows:
[0047] Let matrix M rank =max(0,M), where the loss value L for the entire mini-batch is... rank Let M be a matrix rank The above calculation process, which sums the upper half of the data (excluding the main diagonal), can fully utilize the parallel acceleration capabilities of the GPU, significantly improving the calculation speed. Moreover, it can be easily implemented using commonly used deep learning frameworks.
[0048] On the other hand, this application proposes a model optimization method that considers correlation. The correlation between the output value of the image quality assessment model and the ground truth image quality is another important indicator for evaluating no-reference image quality assessment algorithms. In order to optimize this objective during training and enable the network to learn the numerical quantization of quality differences between different images, this invention adds an additional linear correlation constraint to make the output of the Siamese network and the ground truth image quality linearly correlated. Let the ground truth image quality within a mini-batch be y = [y1, ..., y2]. n The linear correlation loss for the entire mini-batch is L. line Then the formula for calculating the correlation loss function is:
[0049]
[0050] In the formula, n is the number of input images, and y i Let be the truth value of the i-th image. The average of the true values of n images. This represents the average value of the outputs of n image quality assessment models.
[0051] and
[0052] The present invention further proposes that, when training a no-reference image quality assessment model, two loss functions are used together to optimize the training of the no-reference image quality assessment model.
[0053] During joint training, the loss function is as follows:
[0054] L=αL rank +βL line
[0055] In the formula, L rank L is a monotonic loss function. line Let α be the correlation loss function, and β be the weights of the corresponding loss.
[0056] Furthermore, the last layer of the Siamese network in this application uses a single neuron to predict the quality score of the input image. Since the algorithm in this application does not directly regress the subjective quality score of the image, there is no constraint on the range of the network output value. Therefore, the network needs to learn the range of output values based on the distribution of the data. In order to reduce the difficulty of network learning and make the meaning of the network output value clearer, this application uses the sigmoid function to pass the neurons in the last layer as the final output.
[0057] This invention addresses the goal of image quality assessment by providing a novel method for training a no-reference image quality assessment task. This method learns the ranking information between images through a Siamese network, optimizing the monotonicity between network predictions and subjective quality ground truth values. Simultaneously, it proposes an efficient method for training the Siamese network and adds additional linear correlation constraints based on this method to learn the numerical quantification of quality differences between images, optimizing the correlation between algorithm predictions and subjective quality ground truth values. The no-reference image quality assessment model trained by this invention outperforms models trained using regression methods in both model accuracy and generalization performance.
[0058] To fully illustrate its beneficial effects, this application demonstrates them through the following experiments:
[0059] This invention selected four representative image quality assessment datasets—Koniq-10k, LIVE-C, BID, and LIVE—for experiments, and used Spearman's rank correlation coefficient (SROCC) and Pearson's linear correlation coefficient (PLCC) as evaluation metrics.
[0060] This invention compares with MSE and MAE on the four datasets mentioned above. To ensure a fair comparison, all settings are identical except for the loss function used. Each comparison is repeated 10 times, and the median is taken as the final result. The experimental results are shown in Table 1.
[0061] Table 1 compares regression methods on different datasets.
[0062]
[0063] As can be seen from Table 1, on the Koniq-10k, LIVE-C, and BID datasets, the model trained by this invention performs significantly better than the model obtained by using MSE or MAE as the loss function. On the LIVE dataset, this invention also achieves results comparable to MSE or MAE. This indicates that the method proposed in this invention outperforms the regression method in terms of overall performance and is an effective new approach for evaluating the quality of no-reference images.
[0064] Furthermore, the generalization performance of the regression method and the proposed method was tested on the KonIQ-10k, LIVE-C, and BID datasets. Specifically, the proposed method and the regression method were trained on the three datasets respectively, and then tested on the other two datasets. The experimental results are shown in Table 2.
[0065] Table 2 compares the generalization performance of different datasets and regression methods.
[0066]
[0067] As shown above, in six cross-dataset validation experiments, the method in this application achieved the best results in four out of six experiments, indicating that the generalization ability of this method is better than that of the regression method.
[0068] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to the method section.
[0069] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A method for training a no-reference image quality assessment model, characterized in that, The no-reference image quality assessment model has two identical image quality assessment networks that share parameters. The no-reference image quality assessment model is optimized and trained using two loss functions. The monotonicity loss function between the pairs is optimized using the following image. ; In the formula, , Given two image quality assessment networks as input images, and the images... The true quality is higher than the image The true value of quality, , For two image quality assessment networks , Output quality evaluation results Represents interval; Considering all possible image pairs within a mini-batch during training, calculate the loss for all image pairs through one forward propagation. This includes constructing a matrix M based on the true quality value of the input image and the output corresponding to the no-reference image quality assessment model, and calculating the... : Sort the images within a small batch in descending order of their quality ground truth, thus constructing the following structure. Matrix F, ; This is the output of the no-reference image quality assessment model; Calculate the matrix according to the following formula. ; ; in If the matrix is an identity matrix consisting entirely of 1s, then the matrix... Each element is shown below: ; Calculate based on matrix M The process is as follows: Let matrix At this point, the loss value of the entire small batch For matrix Summing the upper half; The no-reference image quality assessment model is optimized and trained using the following correlation loss function. ; In the formula, n is the number of input images. Let be the truth value of the i-th image. The average of the true values of n images. This represents the average value of the outputs of n image quality assessment models.
2. The method for training a no-reference image quality assessment model according to claim 1, characterized in that, The input to the no-reference image quality assessment model is a pair of images or labels.
3. The method for training a no-reference image quality assessment model according to claim 1, characterized in that, After sorting all possible image pairs in the input image according to their ground truth image quality, the monotonicity loss of all image pairs is calculated through one forward propagation. The calculation formula is as follows: ; In the formula, n represents the number of input images. For the monotonicity loss of all image pairs, represents the monotonicity loss between any two images.
4. The method for training a no-reference image quality assessment model according to claim 3, characterized in that, Based on the true quality value of the input image and the output of the no-reference image quality assessment model, construct matrix M, and calculate the quality assessment results based on matrix M. .
5. The method for training a no-reference image quality assessment model according to claim 1, characterized in that, The no-reference image quality assessment model outputs the assessment results through the sigmoid function.