Methods and apparatus for image quality detection

CN116309246BActive Publication Date: 2026-06-30GONGDAO NETWORK TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GONGDAO NETWORK TECH CO LTD
Filing Date
2022-09-07
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, image noise affects the accuracy of image information extraction. How to improve the accuracy of image quality detection is an urgent technical problem to be solved.

Method used

By scaling the image to be detected to various preset sizes, cutting it into image blocks, and using the feature extraction layer, attention layer, and score calculation layer of the image quality detection model, the quality score of the image is calculated by mimicking human eye observation.

Benefits of technology

It improves the accuracy of image quality detection, making the detection results closer to human subjective recognition and enhancing the accuracy of image information extraction.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116309246B_ABST
    Figure CN116309246B_ABST
Patent Text Reader

Abstract

The specification discloses a method and apparatus for image quality detection. The method includes: acquiring an image to be detected; scaling the image to be detected to multiple preset sizes to obtain scaled images corresponding to each size; dividing each scaled image into several image blocks; inputting the image blocks into a trained image quality detection model, and outputting a quality score for the image to be detected through the image quality detection model; wherein the image quality detection model includes a feature extraction layer, an attention layer, and a score calculation layer. This technical solution makes the image quality score calculated by the model closer to the quality score perceived by human subjective judgment, thereby improving the accuracy of image quality detection.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This specification relates to the field of image processing, and in particular to a method and apparatus for image quality detection. Background Technology

[0002] Image extraction is widely used in daily life, such as extracting user identification information from uploaded ID photos. However, due to factors such as the user's camera, shooting environment, and image transmission methods, images often contain significant noise, resulting in high image density. Excessive image noise can negatively impact the accuracy of subsequent image information extraction.

[0003] Currently, before extracting image information, the image quality can be detected first, and information can only be extracted from images that meet the quality standards. How to improve the accuracy of image quality detection is a technical problem that urgently needs to be solved. Summary of the Invention

[0004] In view of this, this specification provides a method and apparatus for image quality detection.

[0005] Specifically, this specification is implemented through the following technical solution:

[0006] Firstly, this specification proposes a method for image quality detection, which includes:

[0007] Acquire the image to be detected;

[0008] The image to be detected is scaled to multiple preset sizes to obtain a scaled image corresponding to each size;

[0009] For each scaled image, the scaled image is cut into several image blocks;

[0010] Each scaled image is cut into image blocks and then input into a trained image quality detection model. The image quality detection model then outputs the quality score of the image to be detected.

[0011] The image quality detection model includes a feature extraction layer, an attention layer, and a score calculation layer. The feature extraction layer extracts image features, block location features, and size fusion features of each image block, and determines the image block features of each image block based on the image features, block location features, and size fusion features. The attention layer uses an attention mechanism to determine the attention score between image blocks, and determines the comprehensive image features of the image to be detected based on the attention score and the image block features. The score calculation layer calculates the quality score of the image to be detected based on the comprehensive image features.

[0012] Optionally, the process of the feature extraction layer extracting the block location features of each image block includes:

[0013] Determine the initial position features for each image patch;

[0014] For each image block at the reference size, the initial position features of the image block are determined as the block position features of the image block;

[0015] For each image block at the non-reference size, the initial position features of the image block are mapped to the reference size to obtain the block position features of the image block.

[0016] Optionally, the process by which the feature extraction layer determines the image block features of each image block based on the image features, block location features, and size fusion features includes:

[0017] The size fusion feature is decomposed into sub-size features corresponding to each size, and the sub-size features include the block size features of each image block under the corresponding size;

[0018] For each image block, the image block features are determined based on the image features, block position features, and block size features of the image block.

[0019] Optionally, scaling the image to be detected to multiple preset sizes to obtain a scaled image corresponding to each size includes:

[0020] The image to be detected is adjusted based on a preset reference size to obtain an image to be detected with the reference size;

[0021] Based on a variety of preset size factors, the image to be detected at the reference size is scaled to obtain a scaled image corresponding to each size.

[0022] Optionally, the training process of the image quality detection model includes:

[0023] Acquire sample images, which have quality score labels;

[0024] The sample image is scaled to multiple preset sizes to obtain a scaled sample image for each size.

[0025] For each sample scaled image, the sample scaled image is cut into several sample image blocks;

[0026] The sample image block is input into the image quality detection model to obtain the quality prediction score of the sample image output by the image quality detection model.

[0027] Calculate the difference between the predicted quality score and the quality score label of the sample image;

[0028] The image quality detection model parameters are updated based on the differences.

[0029] Optionally, the process of determining the quality score label includes:

[0030] For each sample image, multiple quality scores are obtained, wherein different quality scores are determined by different scorers;

[0031] By combining the multiple quality scores, the quality score label of the sample image is obtained.

[0032] Optionally, the method further includes:

[0033] Determine whether the quality score of the image to be detected is less than a preset quality score threshold;

[0034] When the quality score is less than the preset quality score threshold, a quality failure message is output.

[0035] Secondly, this application also provides an image quality detection apparatus, the apparatus comprising:

[0036] The image acquisition module is used to acquire the image to be detected;

[0037] The image scaling module is used to scale the image to be detected to multiple preset sizes to obtain a scaled image corresponding to each size;

[0038] An image segmentation module is used to segment each scaled image into several image blocks;

[0039] The image quality detection module is used to input the image blocks after each scaled image is cut into a trained image quality detection model, and output the quality score of the image to be detected through the image quality detection model.

[0040] The image quality detection model includes a feature extraction layer, an attention layer, and a score calculation layer. The feature extraction layer extracts image features, block location features, and size fusion features of each image block, and determines the image block features of each image block based on the image features, block location features, and size fusion features. The attention layer uses an attention mechanism to determine the attention score between image blocks, and determines the comprehensive image features of the image to be detected based on the attention score and the image block features. The score calculation layer calculates the quality score of the image to be detected based on the comprehensive image features.

[0041] Thirdly, this specification also provides an electronic device comprising:

[0042] processor;

[0043] Memory used to store machine-executable instructions;

[0044] The processor executes the executable instructions to implement the steps of the method described above.

[0045] Fourthly, this specification also provides a computer-readable storage medium having computer instructions stored thereon that, when executed by a processor, implement the steps of the method described above.

[0046] By employing the above technical solution, the image to be detected can first be scaled to various preset sizes, and then each scaled image can be cut into image blocks and input into the image quality detection model. The feature extraction layer of the image quality detection model extracts the image block features of each image block, and then the attention layer determines the attention score between image blocks through the attention mechanism. Based on the image block features and the attention score, the comprehensive image features of the image to be detected are determined. The attention mechanism enables the image quality detection model to mimic human eye observation of images, reducing the influence of factors such as image block position on the model calculation results. Subsequently, the quality score of the image to be detected is calculated based on the comprehensive image features, making the image quality score calculated by the image quality detection model closer to the image quality score subjectively recognized by humans, thereby improving the accuracy of image quality detection. Attached Figure Description

[0047] Figure 1 This is a schematic flowchart illustrating an exemplary embodiment of an image quality detection method described in this specification.

[0048] Figure 2 This is a structural block diagram of an image quality detection model illustrated in an exemplary embodiment of this specification.

[0049] Figure 3 This is a schematic diagram of an image block matrix after cutting three scaled images, as shown in an exemplary embodiment of this specification.

[0050] Figure 4 This is a schematic flowchart illustrating the training process of an image quality detection model according to an exemplary embodiment of this specification.

[0051] Figure 5 This is a schematic diagram of the structure of an electronic device provided in an exemplary embodiment of this specification.

[0052] Figure 6 This is a block diagram of an image quality detection apparatus provided in an exemplary embodiment of this specification. Detailed Implementation

[0053] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this specification. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this specification as detailed in the appended claims.

[0054] The terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to be limiting of this specification. The singular forms “a,” “the,” and “the” as used in this specification and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.

[0055] It should be understood that although the terms first, second, third, etc., may be used in this specification to describe various information, this information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this specification, first information may also be referred to as second information, and similarly, second information may also be referred to as first information. Depending on the context, the word "if" as used herein may be interpreted as "when," "when," or "in response to determination."

[0056] With the rapid development of artificial intelligence, the application of images is becoming increasingly widespread. For example, in cross-border e-commerce, images of user-uploaded documents can be used to extract customs clearance information, or in the process of document digitization, user-uploaded document images can be converted into electronic documents for storage and use. The quality of user-uploaded images has a significant impact on subsequent image information extraction steps. Therefore, to increase the accuracy of image information extraction, image quality checks can be performed on user-uploaded images before extraction.

[0057] Figure 1 This is a schematic flowchart illustrating an exemplary embodiment of an image quality detection method described in this specification.

[0058] Please refer to Figure 1 The image quality detection method may include the following steps:

[0059] Step 102: Obtain the image to be detected.

[0060] In this specification, the image to be detected can be an image uploaded by the user that requires image quality detection.

[0061] For example, when users upload ID photos while shopping on e-commerce platforms, the platforms can obtain user identity information from the ID photos for subsequent customs clearance. To ensure accurate acquisition of user identity information, the platforms can perform image quality checks on the uploaded ID photos.

[0062] For example, the image to be detected can also be a document image uploaded by a user during digital office work. Other users can view the document image to obtain relevant information. To ensure that other users can accurately obtain the information in the document image, image quality detection can be performed on the document image uploaded by the user.

[0063] The above application scenarios are merely illustrative examples. The image quality detection scheme described in this specification can also be applied to other scenarios that require image quality detection, and this specification does not impose any special limitations on them.

[0064] Step 104: Scale the image to be detected to multiple preset sizes to obtain a scaled image corresponding to each size.

[0065] In this specification, the image to be detected can be adjusted based on a preset reference size, for example, the image to be detected can be adjusted to an image resolution of 640*640. The preset reference size can be determined according to the scenario in which the image quality detection is applied. For example, when the image to be detected is an ID photo, the preset reference size can be 640*640, which means that the number of pixels in the height and width directions of the image is 640. When the image to be detected is a document, the preset reference size can be 1280*1280, etc.

[0066] The pixel count of the user's shooting device varies, and different platforms have different requirements for the size of the images uploaded by the user. This will result in different sizes of the images to be detected obtained in step 102 above. By adjusting the image size as described above, the size of the images to be detected can be standardized, which will facilitate subsequent calculations.

[0067] In this specification, after adjusting the image to be detected to a reference size, the image to be detected at the reference size can be scaled to obtain scaled images corresponding to various preset sizes.

[0068] In one example, the image to be detected, after being adjusted to the reference size, is scaled according to several preset scaling sizes. Taking a reference size of 640*640 as an example, three scaling sizes can be set: 640*640, 512*512, and 320*320. In this step, scaling is performed based on these three sizes to obtain scaled images of 640*640, 512*512, and 320*320.

[0069] In another example, for the image to be detected after being adjusted to the baseline size, it is scaled according to several preset scaling factors. Assuming there are three preset scaling factors: 1, 0.8, and 0.5, and still using the baseline size of 640*640 as an example, scaling with a scaling factor of 1 requires no scaling; scaling with a scaling factor of 0.8 yields a scaled image of 512*512, and scaling with a scaling factor of 0.5 yields a scaled image of 320*320.

[0070] Step 106: For each scaled image, cut the scaled image into several image blocks.

[0071] In this specification, each image of different sizes is cut into several image blocks of the same size. The size requirement for the cut can be set according to the preset size in step 104. For example, the three scaled images with sizes of 640*640, 512*512 and 320*320 in step 104 can be cut into several image blocks of size 32*32.

[0072] In practical applications, smaller scaled images, when sliced, contain more cohesive information, while larger scaled images, when sliced, contain more detailed information. For example, when the image to be detected is a user's ID photo, a 604*604 scaled image slice might contain only a portion of the user's eyes, offering higher detail, while a 320*320 scaled image slice might contain the entire image of the user's eyes, providing better overall cohesion. Since the human eye considers both the overall image and its details when judging image quality, this specification slices scaled images of different sizes into image blocks to facilitate subsequent steps in obtaining more layers of image information from the image to be detected.

[0073] Step 108: Input the image blocks after each scaled image is cut into the trained image quality detection model, and output the quality score of the image to be detected through the image quality detection model.

[0074] Please refer to Figure 2 In this specification, the image quality detection model may include a feature extraction layer, an attention layer, and a score calculation layer, wherein the feature extraction layer includes an image feature extraction unit, a block location feature extraction unit, a size fusion feature extraction unit, and an image block feature determination unit.

[0075] In this specification, the image quality detection model extracts image features for each image block through a feature extraction unit in the feature extraction layer, extracts block position features for each image block through a block position feature extraction unit, extracts size fusion features for each image block through a size fusion feature extraction unit, and then determines image block features for each image block based on the above three features through an image block feature determination unit. An attention layer is used to determine the attention score of each image block feature, and then the comprehensive features of the image to be detected are determined based on the attention score and the image block features. Finally, the quality score of the image to be detected is calculated by a score calculation layer based on the comprehensive features. As described above, the technical solution first scales the image to be detected to various preset sizes, then cuts each scaled image into image blocks and inputs them into the image quality detection model. The feature extraction layer of the image quality detection model extracts the image block features of each image block, and the attention layer determines the attention score between image blocks through the attention mechanism. Based on the image block features and the attention score, the comprehensive image features of the image to be detected are determined. The attention mechanism enables the image quality detection model to mimic human eye observation of images, reducing the influence of factors such as image block position on the model calculation results. Subsequently, the quality score of the image to be detected is calculated based on the comprehensive image features, making the image quality score calculated by the image quality detection model closer to the image quality score subjectively recognized by humans, thereby improving the accuracy of image quality detection.

[0076] Next, combine Figure 2 A detailed description of the image quality detection model is provided.

[0077] I. Image Feature Extraction Unit

[0078] In this specification, after several image patches are input into the image quality detection model, the image feature extraction unit of the feature extraction layer can use a convolutional neural network algorithm to extract the image features of each image patch. For example, the image patches can first be convolved to obtain a convolved image patch matrix. The convolution kernel can be a 7*7 convolution kernel with 64 channels and a stride of 2. Then, the convolved image patch matrix is ​​pooled to obtain a pooled, reduced-dimensional image patch matrix. Finally, the reduced-dimensional image patch matrix is ​​fully connected to obtain the image features of the image patch. For example, after an image patch is input into the quality detection model, convolution processing can obtain a seven-dimensional image patch matrix, which is then reduced to a four-dimensional image patch matrix by pooling, and finally, the image features of the image patch are obtained through fully connected processing.

[0079] Similarly, image features of all image blocks input to the image feature extraction unit can be extracted.

[0080] In practical applications, residual units can be added after the pooling layer. The residual units are used to avoid the influence of excessive depth of the convolutional neural network on the extraction results. The residual units are used to optimize the image block matrix after pooling, and then the optimized image block matrix is ​​fully connected.

[0081] Through the above image feature extraction steps, texture, color, high-dimensional abstract features, etc., in the image blocks are extracted as image features for each image block, so as to facilitate the calculation of subsequent image quality detection steps.

[0082] II. Block Location Feature Extraction Unit

[0083] In this specification, when each scaled image is cut into several image blocks, the image blocks corresponding to each size will form a corresponding image block matrix. Each image block matrix has a separate coordinate system. The initial coordinates of each image block in its image block matrix are used as the initial position features of each image block. The initial coordinates are the horizontal and vertical coordinates of the current image block in its corresponding image block matrix. For example, the initial coordinates of the first image block in the first row of the current image block matrix are (1,1), and the initial coordinates of the second image block in the first row are (1,2). Similarly, the initial coordinates of other image blocks in the image block matrix can be obtained.

[0084] Taking the three scaled image sizes of 640*640, 512*512, and 320*320 as examples, each scaled image is cut into 32*32 image blocks. Please refer to... Figure 3 Matrix 1 is a matrix of image blocks after the scaling of the image to size 640*640, where the initial coordinates of each image block are (A1, B1), (A2, B2)...(A20, B20). Matrix 2 is a matrix of image blocks after the scaling of the image to size 512*512, where the initial coordinates of each image block are (I1, J1), (I2, J2)...(I16, J16). Matrix 3 is a matrix of image blocks after the scaling of the image to size 320*320, where the initial coordinates of each image block are (P1, Q1), (P2, Q2)...(P10, Q10).

[0085] In this specification, the position coordinates of image blocks of different sizes can be mapped to the same size. During implementation, the image block matrix corresponding to a certain size can be pre-set as the image block matrix of the reference size to facilitate subsequent mapping steps. For example, the image size of 640*640 can be set as the reference size, and the image sizes of 512*512 and 320*320 can be set as non-reference sizes.

[0086] For each image block at the reference size, the initial position feature of the image block is determined as the block position feature of the image block. For example, the block position feature of the image block after being cut from a scaled image of size 640*640 is its initial coordinates (A1, B1), (A2, B2)...(A20, B20).

[0087] For each image block at the non-reference size, the initial position features of the image block are mapped to the reference size to obtain the block position features (Ti, Tj) of the image block. The mapping is described by formula (1) below, where H is the height of the scaled image at the non-reference size, W is the width of the scaled image at the non-reference size, h is the height of the image block, w is the width of the image block, and G = height of the scaled image at the reference size / h.

[0088]

[0089] For example, the block position features of each image block corresponding to a scaled image of size 512*512 are (I1*20 / 16, J1*20 / 16), (I2*20 / 16, J2*20 / 16)……(I16*20 / 16, J16*20 / 16), and the block position features of each image block corresponding to a scaled image of size 320*320 are (I′1*20 / 10, J′1*20 / 10), (I′2*20 / 10, J′2*20 / 10)……(I′10*20 / 10, J′10*20 / 10).

[0090] III. Size fusion feature extraction unit

[0091] In practical applications, the image feature extraction unit and block location feature extraction unit in the image quality detection model extract image features and block location features for each image block. However, this cannot reflect the size of each image block, which may result in the loss of some semantic information of the image to be detected, thus affecting the accuracy of the output results of the image quality detection model.

[0092] For example, a scaled-down image of 640*640 has a smaller receptive field for its corresponding image patch, containing more detailed semantic information. However, using only these image features for subsequent calculations can easily lead to a loss of the overall semantic information in the image being detected. For instance, when extracting features from elongated targets, the above steps would segment the target features into multiple image patches before feature extraction. Because the target features contained in each image patch are too detailed, it is impossible to identify the entire target feature through these scattered image patches, thus affecting the accuracy of subsequent quality score calculations. On the other hand, a scaled-down image of 320*320 has a larger receptive field for its corresponding image patch, thus containing more global semantic information, but it is not detailed enough. It needs to be combined with image features from larger image patches for subsequent analysis to more completely obtain the semantic information actually contained in the image.

[0093] To better utilize the size information between images of different sizes, this specification employs a multi-size fusion technique to fuse semantic information from multiple scaled images corresponding to preset sizes. First, the image block matrix corresponding to each scaled image block is obtained and concatenated to form a scaled image corresponding to that block. Similarly, scaled images corresponding to all scaled image blocks are obtained. Then, image features are extracted from the scaled images of various sizes, resulting in several image feature matrices for each size. These extracted feature matrices are then fused to obtain a fused image feature matrix. This fused image feature matrix integrates semantic information from different levels within the scaled image blocks of various sizes. Finally, using a size feature formula, the fused image feature matrix is ​​decomposed into sub-size features S of the scaled images corresponding to multiple preset sizes. The size feature formula is given in Formula 2, where K is the number of scaled images corresponding to multiple preset sizes in the model, R is the fused image feature, and D is the dimension of the fused image feature. Formula 2 is the product of K+1 D-dimensional matrices.

[0094] S = R (K+1)*D -----------------------Formula 2

[0095] Through the above steps, the size fusion feature extraction unit extracts the size fusion features of the image blocks, thereby making up for the lack of information in the image features and block position features, and increasing the accuracy of image quality detection.

[0096] IV. Image Patch Feature Determination Unit

[0097] In this specification, the image block features of each image block are determined based on the image features, block position features, and size fusion features using a feature fusion formula. The feature fusion formula is given in Formula 3, where Z is the image block feature of each image block, xlcass is a learnable flag vector of the attention layer (randomly assigned by the image quality detection model), Ep is the image feature of each image block, Ehse is the block position feature of each image block, and Esce is the block size feature of the block position feature of each image block.

[0098] Z = [xclass; Ep + Ehse + Esce] ---------------------- Formula 3.

[0099] V. Attention Layer

[0100] In practical applications, to reduce the impact of the position of each image patch at input on subsequent calculation results and optimize the image patch features acquired by the attention layer, this specification determines the relative position encoding of each image patch based on its image patch features using trigonometric functions 4 and 5. Please refer to formulas 4 and 5 for trigonometric functions 4 and 5, where pos ranges from 0 to G, G = height of the scaled image in the reference size / h, H is the height of the scaled image in the non-reference size, h is the height of the image patch, and i equals hid_j modulo 2, where hid_j is the dimension of each image patch feature input to the attention layer, ranging from 0 to D, and D is the sum of the dimensions of all input image patch features.

[0101]

[0102]

[0103] By fusing the relative position encoding of the image block and the image block features of the image block, the image block feature vector of the image block is determined. Similarly, the D-dimensional image block feature vectors of all image blocks input to the attention layer can be determined.

[0104] In this specification, the attention layer employs an attention mechanism to obtain the correlation scores between image patch features in the D-dimensional image patch feature vector and their neighboring image patch features. The correlation scores are then normalized using a softmax function, and the normalized result is used as the attention score for the current image patch feature. The attention score calculation is described in Formula 6, where Atten(Q, K, V) represents the attention score, Q is the Query (used to query the correlation score between the current image patch feature and another image patch feature), K is the Key (the current image patch feature), V is the Value (the other image patch feature), and α represents the dimensions of Q, K, and V.

[0105] Attn(Q,K,V)=V*softmax(K*V / α)-----------------Formula 6

[0106] Then, based on the attention score and the image patch features, the comprehensive image features of the image to be detected are determined. The determination of the comprehensive image features is given by formula 7, where X is the comprehensive image feature, Wp is the learnable weight, Attn(Q, K, V) is the attention score, and Z is the vector of all input image patch features.

[0107] X = Wp * Attn(Q, K, V) + Z ----------------- Formula 7

[0108] By calculating the attention layer, the image detection model can mimic how humans process image information, filtering out some important information from several input image patch features and focusing on this important information. This makes the quality score output by the subsequent image quality detection model closer to the quality score recognized by the human eye, thereby improving the accuracy of the image quality detection results.

[0109] VI. Fraction Calculation Level

[0110] Because the comprehensive image features output by the attention mechanism have a very large dimension, typically 512, while in actual score calculation, the commonly used dimension value is 11, which is the score range of 0-10. In order to ensure the accuracy of the subsequent score calculation layer, this specification can also perform dimensionality reduction processing on the comprehensive image features of the image blocks output by the above steps before inputting the comprehensive image features into the score calculation layer.

[0111] In practical applications, the image quality detection model can perform dimensionality reduction processing on the comprehensive image features according to preset dimensionality reduction requirements. First, the comprehensive image features are processed by full connection, and then the nonlinearity of the comprehensive image features is improved by activation function to obtain the dimensionality-reduced comprehensive image features.

[0112] In practical applications, a dropout module can be added to the above dimensionality reduction step. The dropout module is used to prevent overfitting in the above dimensionality reduction step, thereby optimizing the output comprehensive image features.

[0113] In this specification, the score calculation layer uses the softmax function to map the dimensionality-reduced comprehensive image features onto a score range of 0-10, obtaining the probability value of the comprehensive image features in each score range. The layer then selects the value with the highest probability as the quality score of the image to be detected. For example, through the calculation of the score calculation layer, the probability value of the dimensionality-reduced comprehensive image features mapped to a score of 1 is P1, the probability value of 2 is P, and so on, with the probability value of 10 being P10. The sum of the probability values ​​in the score range of 0-10 is 100%, and the probability value P10 at a score of 10 is the maximum of 40%. Therefore, the quality score of the image to be detected is output as 10.

[0114] In practical applications, when the quality score of the image to be detected is less than the preset quality score threshold, the user can be prompted to re-upload the image to be detected. The quality score threshold can be set according to the user's requirements for the accuracy of subsequent image information extraction. The higher the accuracy of image information extraction, the higher the quality score threshold; the lower the accuracy of image information extraction, the lower the quality score threshold.

[0115] When the quality score is greater than or equal to the preset quality score threshold, the user can proceed with subsequent image processing steps, such as extracting user identity information.

[0116] Figure 4 This is a schematic flowchart illustrating the training process of an image quality detection model according to an exemplary embodiment of this specification.

[0117] Please refer to Figure 4 The training process of the image quality detection model may include the following steps:

[0118] Step 402: Obtain a sample image, which has a quality score label.

[0119] In this specification, the sample image may be an image uploaded by the user, and the quality label may be determined with the assistance of a rating expert.

[0120] For example, for each sample image, multiple different raters can determine the quality score of the sample image according to preset rating requirements. In this way, each sample image can have multiple quality scores, and then these quality scores can be combined to determine the quality score label of the sample image.

[0121] For example, suppose 20 raters rate the quality of a sample image, each image receiving 20 quality scores. These 20 scores can then be combined to obtain the quality score label for the sample image. When combining multiple quality scores, methods such as calculating the average or median can be used; this specification does not impose any special restrictions on this approach.

[0122] By collecting images from real-world environments uploaded by users as sample images, the authenticity of semantic information in the sample images can be increased. This makes the parameters of the trained image quality detection model more closely resemble the image quality detection situation in real-world environments, thereby increasing the accuracy of subsequent image quality detection.

[0123] Step 404: Scale the sample image to multiple preset sizes to obtain a scaled image corresponding to each size.

[0124] Consistent with step 104 above, the sample image can first be adjusted based on a preset reference size to obtain a sample image of the reference size. Then, based on a preset size factor, the sample image of the reference size is scaled to obtain a scaled image corresponding to each size. Step 406: For each scaled image, the scaled image is cut into several image blocks.

[0125] Step 408: Input the sample image block into the initial image quality detection model and output the predicted score of the sample image.

[0126] In this specification, the initial image quality detection model is an untrained quality detection model.

[0127] Step 410: Calculate the difference between the predicted score and the score label of the sample image.

[0128] In this specification, the difference L between the predicted score and the score label of the sample image is calculated using a preset loss function formula 8, where qi is the predicted score, ri is the score label of the sample image, and N is the number of all sample images in the training set.

[0129]

[0130] Step 412: Update the image quality detection model parameters based on the differences.

[0131] In this specification, before training the image quality detection model, the sample images can be divided into training set sample images, validation set sample images, and test set sample images. The ratio of the number of images in the three sample image sets can be 8:1:1.

[0132] In practical applications, when iteratively training an image quality detection model, samples are extracted from the training set for training. First, the sample images in the training set are divided into several iterative batches with a batch size of 8. The initial learning rate is 0.001, which decays cosine. Each iteration batch uses 8 sample images from the training set. Based on the current image quality detection model, the predicted score of the sample images is obtained through the forward propagation algorithm. The difference between the predicted score and the quality score label of the sample image is calculated according to the loss function. Based on the difference, the parameters in the image quality detection model are optimized, and the optimized image quality detection model is trained for the next iteration.

[0133] Once the image quality detection model completes training for the preset number of iterations, it stops iterating and saves all parameters in the current model. For example, when the image quality detection model completes 600 iterations, it is considered to have completed one round of training. When the image quality detection model has iterated for 300 rounds, it stops iterating and saves all parameters in the model with the smallest difference between the predicted score and the score label of the sample image during the training process.

[0134] In this specification, if the difference is still greater than a preset difference threshold after completing one iteration of training, the next iteration can be continued. The preset difference threshold can be set according to the user's requirements for the accuracy of image quality detection. The larger the difference threshold, the lower the accuracy of image quality detection; conversely, the smaller the difference threshold, the higher the accuracy of image quality detection.

[0135] The above steps make the prediction results of the image quality detection model closer to the human rating results. The parameters of the quality detection model need to be updated and optimized based on these differences, and then the optimized image quality detection model is saved. To further determine the accuracy of the optimized image quality detection model, validation set samples can be used to check for problems in the training process, such as overfitting. Then, the generalization ability of the trained image quality detection model can be tested using test set samples to ensure that the model is applicable to real-world application scenarios.

[0136] Corresponding to the aforementioned embodiments of the risk video detection method, this application also provides an embodiment of an image quality detection apparatus.

[0137] The embodiments of the risk video detection device of this application can be applied in electronic devices. The device embodiments can be implemented through software, hardware, or a combination of both. Taking software implementation as an example, as a logical device, it is formed by the processor of the electronic device loading the corresponding computer program instructions from non-volatile memory into memory for execution. From a hardware perspective, such as... Figure 5 The diagram shown is a hardware structure diagram of an electronic device for detecting risky videos, in addition to the above. Figure 5 In addition to the processor, memory, network interface, and non-volatile memory shown, the electronic device in which the device is located in the embodiment may also include other hardware depending on the actual function of the electronic device, which will not be described in detail here.

[0138] Figure 6 This is an exemplary embodiment of an image quality detection apparatus as shown in this specification.

[0139] Please refer to Figure 6 The image quality detection device can be applied to the aforementioned Figure 5 The device shown in the electronic device includes:

[0140] Image acquisition module 602 is used to acquire the image to be detected.

[0141] The image scaling module 604 is used to scale the image to be detected to multiple preset sizes to obtain a scaled image corresponding to each size.

[0142] The image segmentation module 606 is used to segment each scaled image into several image blocks.

[0143] The image quality detection module 608 is used to input the image blocks cut from each scaled image into a trained image quality detection model, and output the quality score of the image to be detected through the image quality detection model.

[0144] Optionally, the image quality detection module 608 includes:

[0145] Determine the initial position features for each image patch;

[0146] For each image block at the reference size, the initial position features of the image block are determined as the block position features of the image block;

[0147] For each image block at the non-reference size, the initial position features of the image block are mapped to the reference size to obtain the block position features of the image block.

[0148] Optionally, the image quality detection module 608 includes:

[0149] The size fusion feature is decomposed into sub-size features corresponding to each size, and the sub-size features include the block size features of each image block under the corresponding size;

[0150] For each image block, the image block features are determined based on the image features, block position features, and block size features of the image block.

[0151] Optionally, the image scaling module 604 further includes:

[0152] The image to be detected is adjusted based on a preset reference size to obtain an image to be detected with the reference size;

[0153] Based on a variety of preset size factors, the image to be detected at the reference size is scaled to obtain a scaled image corresponding to each size.

[0154] Optionally, the device further includes:

[0155] Acquire sample images, which have quality score labels;

[0156] The sample image is scaled to multiple preset sizes to obtain a scaled sample image for each size.

[0157] For each sample scaled image, the sample scaled image is cut into several sample image blocks;

[0158] The sample image block is input into the image quality detection model to obtain the quality prediction score of the sample image output by the image quality detection model.

[0159] Calculate the difference between the predicted quality score and the quality score label of the sample image;

[0160] The image quality detection model parameters are updated based on the differences.

[0161] Optionally, the device further includes:

[0162] Determine whether the quality score of the image to be detected is less than a preset quality score threshold;

[0163] When the quality score is less than a preset quality score threshold, a quality failure message is output. Optionally, the device further includes:

[0164] Based on preset scoring requirements, several quality scores are obtained for the sample images;

[0165] The quality score label of the sample image is determined by calculating the aforementioned quality scores.

[0166] The specific implementation process of the functions and roles of each unit in the above device can be found in the implementation process of the corresponding steps in the above method, and will not be repeated here.

[0167] For the device embodiments, since they basically correspond to the method embodiments, the relevant parts can be referred to in the description of the method embodiments. The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this application according to actual needs. Those skilled in the art can understand and implement this without creative effort.

[0168] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer, which can take the form of a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email sending and receiving device, game console, tablet computer, wearable device, or any combination of these devices.

[0169] In a typical configuration, a computer includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0170] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0171] Computer-readable media, including both permanent and non-permanent, removable and non-removable media, can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, disk storage, quantum memory, graphene-based storage media or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0172] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0173] Corresponding to the aforementioned embodiments of the image quality detection method, this application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, performs the following steps:

[0174] Acquire the image to be detected;

[0175] The image to be detected is scaled to multiple preset sizes to obtain a scaled image corresponding to each size;

[0176] For each scaled image, the scaled image is cut into several image blocks;

[0177] Each scaled image is cut into image blocks and then input into a trained image quality detection model. The image quality detection model then outputs the quality score of the image to be detected.

[0178] The image quality detection model includes a feature extraction layer, an attention layer, and a score calculation layer. The feature extraction layer extracts image features, block location features, and size fusion features of each image block, and determines the image block features of each image block based on the image features, block location features, and size fusion features. The attention layer uses an attention mechanism to determine the attention score between image blocks, and determines the comprehensive image features of the image to be detected based on the attention score and the image block features. The score calculation layer calculates the quality score of the image to be detected based on the comprehensive image features.

[0179] Optionally, the process of the feature extraction layer extracting the block location features of each image block includes:

[0180] Determine the initial position features for each image patch;

[0181] For each image block at the reference size, the initial position features of the image block are determined as the block position features of the image block;

[0182] For each image block at the non-reference size, the initial position features of the image block are mapped to the reference size to obtain the block position features of the image block.

[0183] Optionally, the process by which the feature extraction layer determines the image block features of each image block based on the image features, block location features, and size fusion features includes:

[0184] The size fusion feature is decomposed into sub-size features corresponding to each size, and the sub-size features include the block size features of each image block under the corresponding size;

[0185] For each image block, the image block features are determined based on the image features, block position features, and block size features of the image block.

[0186] Optionally, scaling the image to be detected to multiple preset sizes to obtain a scaled image corresponding to each size includes:

[0187] The image to be detected is adjusted based on a preset reference size to obtain an image to be detected with the reference size;

[0188] Based on a variety of preset size factors, the image to be detected at the reference size is scaled to obtain a scaled image corresponding to each size.

[0189] Optionally, the training process of the image quality detection model includes:

[0190] Acquire sample images, which have quality score labels;

[0191] The sample image is scaled to multiple preset sizes to obtain a scaled sample image for each size.

[0192] For each sample scaled image, the sample scaled image is cut into several sample image blocks;

[0193] The sample image block is input into the image quality detection model to obtain the quality prediction score of the sample image output by the image quality detection model.

[0194] Calculate the difference between the predicted quality score and the quality score label of the sample image;

[0195] The image quality detection model parameters are updated based on the differences.

[0196] Optionally, the process of determining the quality score label includes:

[0197] For each sample image, multiple quality scores are obtained, wherein different quality scores are determined by different scorers;

[0198] By combining the multiple quality scores, the quality score label of the sample image is obtained.

[0199] Optionally, the method further includes:

[0200] Determine whether the quality score of the image to be detected is less than a preset quality score threshold;

[0201] When the quality score is less than the preset quality score threshold, a quality failure message is output.

[0202] The foregoing has described specific embodiments of this application. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired results. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0203] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.

Claims

1. A method for image quality detection, characterized in that, The method includes: Acquire the image to be detected; The image to be detected is scaled to multiple preset sizes to obtain a scaled image corresponding to each size; For each scaled image, the scaled image is cut into several image blocks; Each scaled image is cut into image blocks and then input into a trained image quality detection model. The image quality detection model then outputs the quality score of the image to be detected. The image quality detection model includes a feature extraction layer, an attention layer, and a score calculation layer. The feature extraction layer extracts image features, block location features, and size fusion features of each image block, and determines the image block features of each image block based on the image features, block location features, and size fusion features. The attention layer uses an attention mechanism to determine the attention score between image blocks, and determines the comprehensive image features of the image to be detected based on the attention score and the image block features. The score calculation layer calculates the quality score of the image to be detected based on the comprehensive image features. The feature extraction layer extracts the size fusion features of the image to be detected, including: obtaining the image block matrix corresponding to each scaled-size image block, concatenating them into a scaled image corresponding to the scaled-size image, obtaining scaled images corresponding to all scaled-size image blocks, and extracting image features from the scaled images of various sizes to obtain several image feature matrices corresponding to each scaled image, fusing the extracted several image feature matrices corresponding to each scaled size to obtain a fused image feature matrix, and decomposing the fused image feature matrix into sub-size features of multiple scaled images corresponding to preset sizes as the size fusion features using a size feature formula.

2. The method according to claim 1, characterized in that, The process of the feature extraction layer extracting the block location features of each image block includes: Determine the initial position features for each image patch; For each image block at the reference size, the initial position features of the image block are determined as the block position features of the image block; For each image block at the non-reference size, the initial position features of the image block are mapped to the reference size to obtain the block position features of the image block.

3. The method according to claim 1, characterized in that, The process by which the feature extraction layer determines the image block features of each image block based on the image features, block location features, and size fusion features includes: The size fusion feature is decomposed into sub-size features corresponding to each size, and the sub-size features include the block size features of each image block under the corresponding size; For each image block, the image block features are determined based on the image features, block position features, and block size features of the image block.

4. The method according to claim 1, characterized in that, The step of scaling the image to be detected to multiple preset sizes to obtain a scaled image corresponding to each size includes: The image to be detected is adjusted based on a preset reference size to obtain an image to be detected with the reference size; Based on a variety of preset size factors, the image to be detected at the reference size is scaled to obtain a scaled image corresponding to each size.

5. The method according to claim 1, characterized in that, The training process of the image quality detection model includes: Acquire sample images, which have quality score labels; The sample image is scaled to multiple preset sizes to obtain a scaled sample image for each size. For each sample scaled image, the sample scaled image is cut into several sample image blocks; The sample image block is input into the image quality detection model to obtain the quality prediction score of the sample image output by the image quality detection model. Calculate the difference between the predicted quality score and the quality score label of the sample image; The image quality detection model parameters are updated based on the differences.

6. The method according to claim 5, characterized in that, The process of determining the quality score label includes: For each sample image, multiple quality scores are obtained, wherein different quality scores are determined by different scorers; By combining the multiple quality scores, the quality score label of the sample image is obtained.

7. The method according to claim 1, characterized in that, The method further includes: Determine whether the quality score of the image to be detected is less than a preset quality score threshold; When the quality score is less than the preset quality score threshold, a quality failure message is output.

8. An image quality detection device, characterized in that, The device includes: The image acquisition module is used to acquire the image to be detected; The image scaling module is used to scale the image to be detected to multiple preset sizes to obtain a scaled image corresponding to each size; An image segmentation module is used to segment each scaled image into several image blocks; The image quality detection module is used to input the image blocks cut from each scaled image into a trained image quality detection model, and output the quality score of the image to be detected through the image quality detection model. The image quality detection model includes a feature extraction layer, an attention layer, and a score calculation layer. The feature extraction layer extracts image features, block location features, and size fusion features of each image block, and determines the image block features of each image block based on the image features, block location features, and size fusion features. The attention layer uses an attention mechanism to determine the attention score between image blocks, and determines the comprehensive image features of the image to be detected based on the attention score and the image block features. The score calculation layer calculates the quality score of the image to be detected based on the comprehensive image features. The feature extraction layer extracts the size fusion features of the image to be detected, including: obtaining the image block matrix corresponding to each scaled-size image block, concatenating them into a scaled image corresponding to the scaled-size image, obtaining scaled images corresponding to all scaled-size image blocks, and extracting image features from the scaled images of various sizes to obtain several image feature matrices corresponding to each scaled image, fusing the extracted several image feature matrices corresponding to each scaled size to obtain a fused image feature matrix, and decomposing the fused image feature matrix into sub-size features of multiple scaled images corresponding to preset sizes as the size fusion features using a size feature formula.

9. An electronic device, characterized in that, include: processor; Memory used to store processor-executable instructions; The processor implements the method as described in any one of claims 1-7 by executing the executable instructions.

10. A computer-readable storage medium storing computer instructions thereon, characterized in that, When executed by the processor, this instruction implements the steps of the method as described in any one of claims 1-7.