Image downsampling method and apparatus
By extracting the information lost in the image downsampling model using a CNN model and fusing the downsampling results, the problem of information loss in image downsampling methods is solved. This enables deep learning models applicable to images of different resolutions and improves image processing efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA MOBILE INFORMATION TECHNOLOGY CO LTD
- Filing Date
- 2021-11-10
- Publication Date
- 2026-06-30
AI Technical Summary
Existing image downsampling methods lose useful information from images, and subsequent deep learning models can only process images with fixed resolutions.
This method extracts information that may be lost during the image downsampling process using a CNN model, and then uses an image preprocessing model to downsample the image. By fusing the two downsampling results, the downsampling result of the target image is enhanced, making it suitable for image processing at different resolutions.
It preserves useful information in images, improves the ability of deep learning models to process images of different resolutions, reduces the human effort required for training model datasets, and improves image processing efficiency.
Smart Images

Figure CN116109545B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image processing technology, and in particular to an image downsampling method and apparatus. Background Technology
[0002] In the field of computer vision and deep learning, deep learning models have a huge number of parameters. To reduce the number of parameters, decrease model complexity, and improve computation speed, images are downsampled using scaling algorithms such as bilinear and bicubic before being input into the deep learning model for inference. This reduces the input image to a smaller resolution, such as scaling a high-resolution 1920×1080 image down to a smaller 128×128 image.
[0003] Image downsampling inevitably leads to information loss, and this loss is uncontrollable. If the lost information is crucial for subsequent model processing, the downsampling process will impact the accuracy of the deep learning model. Furthermore, fixed scaling algorithms limit the processing of images to a specific resolution. For example, if the training data consists of 1920×1080 images, the model must maintain this resolution during application to ensure accurate output.
[0004] In summary, existing image downsampling methods suffer from the drawback of losing useful information from images, meaning that subsequent deep learning models can only process images with fixed resolutions. Summary of the Invention
[0005] This invention provides an image downsampling method and apparatus to address the shortcomings of existing image downsampling methods, which lose useful information from the image and allow subsequent deep learning models to only process images of fixed resolution. This invention preserves useful information from the image, enabling subsequent deep learning models to be applicable to images of different resolutions.
[0006] This invention provides an image downsampling method, comprising:
[0007] The target image is input into the CNN model in the image preprocessing model, and the first feature of the target image is output.
[0008] The first feature is input into the image downsampling model in the image preprocessing model to obtain the downsampling result of the first feature, and the target image is input into the image downsampling model to obtain the first downsampling result of the target image;
[0009] Based on the first downsampling result of the target image and the downsampling result of the first feature, a second downsampling result of the target image is obtained.
[0010] According to an image downsampling method provided by the present invention, the step of inputting a target image into a CNN model in an image preprocessing model and outputting a first feature of the target image includes:
[0011] The target image is sequentially input into the first convolutional layer, the first nonlinear activation layer, the second convolutional layer, the second nonlinear activation layer, and the first batch of normalization layers in the CNN model, and the first feature of the target image is output.
[0012] According to an image downsampling method provided by the present invention, obtaining a second downsampling result of the target image based on a first downsampling result of the target image and a downsampling result of the first feature includes:
[0013] The downsampling result of the first feature is sequentially input into the third convolutional layer and the second batch normalization layer in the image preprocessing model, and the second feature of the downsampling result of the first feature is output.
[0014] Add the downsampling results of the second feature and the first feature together;
[0015] Based on the summation result and the first downsampling result of the target image, the second downsampling result of the target image is obtained.
[0016] According to an image downsampling method provided by the present invention, obtaining a second downsampling result of the target image based on the summation result and a first downsampling result of the target image includes:
[0017] The summation result is input into the fourth convolutional layer of the image preprocessing model, and the third feature of the summation result is output.
[0018] The third feature and the first downsampling result of the target image are added together to obtain the second downsampling result of the target image.
[0019] According to an image downsampling method provided by the present invention, before inputting the target image into a CNN model in an image preprocessing model and outputting the first feature of the target image, the method further includes:
[0020] The sample image is input into the image preprocessing model, and the second downsampling result of the sample image is output.
[0021] The second downsampling result of the sample image is input into the deep learning model, and the prediction processing result of the sample image is output.
[0022] The predicted processing result of the sample image is compared with the preset actual processing result of the sample image;
[0023] The image preprocessing model and the deep learning model are trained based on the comparison results.
[0024] According to an image downsampling method provided by the present invention, the deep learning model is an image feature extraction model, and the prediction processing result is an image feature.
[0025] The present invention also provides an image downsampling device, comprising:
[0026] The extraction module is used to input the target image into the CNN model in the image preprocessing model and output the first feature of the target image;
[0027] The sampling module is used to input the first feature into the image downsampling model in the image preprocessing model to obtain the downsampling result of the first feature, and to input the target image into the image downsampling model to obtain the first downsampling result of the target image;
[0028] The fusion module is used to obtain a second downsampling result of the target image based on the first downsampling result of the target image and the downsampling result of the first feature.
[0029] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of any of the image downsampling methods described above.
[0030] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of any of the above-described image downsampling methods.
[0031] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of any of the image downsampling methods described above.
[0032] The image downsampling method and apparatus provided by this invention extracts information that may be lost during the sampling process of the image downsampling model using a CNN model. The features output by the CNN model are downsampled using the image downsampling model, so that the size of the features is the same as the size of the downsampled result of the target image. The two downsampling results are fused to enhance the downsampled result of the target image using the features extracted by the CNN model, thus preserving the useful information of the image. In addition, a deep learning image preprocessing model is used instead of the image downsampling model to downsample the image, which is suitable for post-processing of images of various resolutions. Attached Figure Description
[0033] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0034] Figure 1 This is one of the flowcharts illustrating the image downsampling method provided by the present invention;
[0035] Figure 2 This is the second flowchart illustrating the image downsampling method provided by the present invention;
[0036] Figure 3 This is the third flowchart illustrating the image downsampling method provided by the present invention;
[0037] Figure 4 This is a schematic diagram of the image downsampling device provided by the present invention;
[0038] Figure 5 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation
[0039] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0040] The following is combined Figure 1 The present invention describes an image downsampling method, comprising: step 101, inputting a target image into a CNN (Convolutional Neural Network) model in an image preprocessing model, and outputting a first feature of the target image;
[0041] The target image is the image that needs to be downsampled. In this embodiment, the size of the target image is not limited.
[0042] The image preprocessing model ModelA includes a CNN model and an image downsampling model, which are used to preprocess the target image input to the deep learning model ModelB to reduce the resolution of the target image to meet the input resolution requirements of the deep learning model ModelB.
[0043] The image preprocessing model ModelA can be used in conjunction with any deep learning model ModelB, such as ResNet, MobileNet, and EfficientNet, to improve the image processing capabilities of ModelB.
[0044] The image preprocessing model mainly consists of two branches: a CNN model branch, which extracts information that may be lost during the image downsampling process; and an image downsampling model branch, which obtains the downsampling result of the target image. The CNN model branch selectively supplements the lost information from the image downsampling model branch.
[0045] Step 102: Input the first feature into the image downsampling model in the image preprocessing model to obtain the downsampling result of the first feature, and input the target image into the image downsampling model to obtain the first downsampling result of the target image;
[0046] In order to ensure that the size of the first feature output by the CNN model is the same as the size of the first downsampled result of the target image by the image downsampling model, the first feature output by the CNN model is also downsampled using the image downsampling model.
[0047] Optionally, the image downsampling model is a Bilinear model, but this embodiment is not limited to this model.
[0048] Step 103: Obtain the second downsampling result of the target image based on the first downsampling result of the target image and the downsampling result of the first feature.
[0049] By fusing the first downsampling result of the target image with the downsampling result of the target image features, such as by directly adding the two downsampling results, the defect of the image downsampling model in losing important information can be compensated for, thereby enhancing the first downsampling result of the target image.
[0050] This embodiment extracts information that may be lost during the sampling process of the image downsampling model using a CNN model. The features output by the CNN model are downsampled using the image downsampling model, so that the size of the features is the same as the size of the downsampled result of the target image. The two downsampling results are fused to enhance the downsampled result of the target image using the features extracted by the CNN model, thus preserving the useful information of the image. In addition, a deep learning image preprocessing model is used instead of the image downsampling model to downsample the image, which is suitable for post-processing of images of various resolutions.
[0051] Based on the above embodiments, the step of inputting the target image into the CNN model in the image preprocessing model and outputting the first feature of the target image in this embodiment includes: sequentially inputting the target image into the first convolutional layer, the first nonlinear activation layer, the second convolutional layer, the second nonlinear activation layer and the first batch of normalization layers in the CNN model, and outputting the first feature of the target image.
[0052] like Figure 2 As shown, the steps for feature extraction from a target image using a CNN model are as follows:
[0053] 1. Input the target image into the first convolutional layer Conv2D in the CNN model to obtain the output Conv2D_OUTPUT1, which is the shallow features of the image.
[0054] Simultaneously, the target image is input into an image downsampling model, such as a Bilinear scaling layer, to reduce the target image to TARGET_SIZE, obtaining the output Bilinear_OUTPUT1. TARGET_SIZE is the input image resolution size of the deep learning model ModelB.
[0055] Optionally, the output dimension of the first convolutional layer is 16, the kernel size is 7, and the stride is 1.
[0056] It should be noted that each convolutional layer in the image preprocessing model is used to extract deep features of the target image. Considering that the ultimate goal of the convolutional layer is to enhance the output of the image downsampling model, it is not necessary to extract features of excessively high dimensionality, so the output dimension is selected as 16. At the same time, in order to expand the field of view of the CNN, a relatively large value of 7 is selected for the convolutional kernel. The CNN model is used to compensate for the shortcomings of the image downsampling model, not to directly use the CNN for image scaling, so the stride is selected as 1.
[0057] 2. After processing Conv2D_OUTPUT1 through a non-linear activation layer, such as LeakyReLU, it is input into the second convolutional layer, Conv2D, to obtain the output Conv2D_OUTPUT2. Conv2D_OUTPUT2 has deeper image features than Conv2D_OUTPUT1.
[0058] Optionally, the output dimension of the second convolutional layer is 16, the kernel size is 1, and the stride is 1. A non-linear activation layer is used to fit non-linear feature values.
[0059] 3. After processing Conv2D_OUTPUT2 through the non-linear activation layer LeakyReLU and the batch normalization layer BatchNormalization, it is input into the image scaling layer Bilinear and scaled to TARGET_SIZE to obtain the output Bilinear_OUTPUT2.
[0060] Based on the above embodiments, such as Figure 2 As shown, in this embodiment, obtaining the second downsampling result of the target image based on the first downsampling result of the target image and the downsampling result of the first feature includes: sequentially inputting the downsampling result of the first feature into the third convolutional layer and the second batch normalization layer in the image preprocessing model, and outputting the second feature of the downsampling result of the first feature;
[0061] The downsampling result Bilinear_OUTPUT2 of the first feature is sequentially input into the third convolutional layer Conv2D and the second batch normalization layer BatchNormalization to obtain the output BN_OUTPUT2.
[0062] The third convolutional layer has an output dimension of 16, a kernel size of 3, and a stride of 1. The second batch of normalized layers is used to stabilize the training loss, which enables faster convergence.
[0063] Add the downsampling results of the second feature and the first feature together;
[0064] Add BN_OUTPUT2 and Bilinear_OUTPUT2 to obtain ADD_OUTPUT1.
[0065] Based on the summation result and the first downsampling result of the target image, the second downsampling result of the target image is obtained.
[0066] Based on ADD_OUTPUT1 and Bilinear_OUTPUT1, the second downsampling result of the target image is obtained.
[0067] Based on the above embodiments, such as Figure 2 As shown, in this embodiment, obtaining the second downsampling result of the target image based on the summation result and the first downsampling result of the target image includes: inputting the summation result into the fourth convolutional layer in the image preprocessing model, and outputting the third feature of the summation result;
[0068] ADD_OUTPUT1 is input into the fourth convolutional layer, Conv2D, to obtain Conv2D_OUTPUT3. Optionally, if the output of the image downsampling model is a 3-channel image, the output dimension of the fourth convolutional layer is 3. The convolutional kernel has 7 kernels and a stride of 1, used to further adjust the output result.
[0069] The third feature and the first downsampling result of the target image are added together to obtain the second downsampling result of the target image.
[0070] Add Conv2D_OUTPUT3 to Bilinear_OUTPUT1 to obtain the final output ModelA_OUTPUT of the image preprocessing model ModelA.
[0071] Based on the above embodiments, before inputting the target image into the CNN model in the image preprocessing model and outputting the first feature of the target image in this embodiment, the method further includes: inputting a sample image into the image preprocessing model and outputting the second downsampling result of the sample image;
[0072] The second downsampling result of the sample image is input into the deep learning model, and the prediction processing result of the sample image is output.
[0073] The predicted processing result of the sample image is compared with the preset actual processing result of the sample image;
[0074] The image preprocessing model and the deep learning model are trained based on the comparison results.
[0075] This embodiment includes two models: an image preprocessing model (Model A) and a deep learning model (Model B). For example... Figure 3 As shown, in the model application stage, the target image is first input into the image preprocessing model ModelA for downsampling to obtain the downsampling result; then the downsampling result is input into the deep learning model ModelB for image processing to obtain the image processing result. This embodiment does not limit the specific type of ModelB.
[0076] The image preprocessing model ModelA and the deep learning model ModelB are trained as a whole, using the same training method as ModelB. This can be understood as merging ModelA and ModelB into a new model ModelC, and then training ModelC using the same method as training ModelB, thereby improving the overall performance of the model.
[0077] This embodiment trains the image preprocessing model and the deep learning model together, selectively losing information during the image scaling process while retaining useful information. This improves the model's robustness to images of different resolutions, enabling it to better adapt to images of different resolutions in real-world applications without needing to readjust the model. It also reduces the manpower required to prepare the training dataset and improves image processing efficiency.
[0078] Based on the above embodiments, the deep learning model in this embodiment is an image feature extraction model, and the prediction processing result is an image feature.
[0079] Optionally, when the target image is a face image, the image features extracted by the deep learning model are the facial features in the image, that is, the prediction result is the facial features, such as eyes, eyebrows, nose, and mouth. The deep learning model can use CNN, RNN (Recurrent Neural Network), or FNN (Feedforward Neural Network), etc.
[0080] During training, the facial features output by the deep learning model are compared with the facial features of manually annotated sample images, and the parameters of the image preprocessing model and the deep learning model are adjusted simultaneously based on the differences between the two.
[0081] During the model usage phase, the face image is input into the image preprocessing model to obtain the downsampling result of the face image; the downsampling result of the face image is then input into the deep learning model to output the face features. Besides face feature extraction, this embodiment is also applicable to other application scenarios, and this embodiment does not impose specific limitations.
[0082] The image downsampling device provided by the present invention is described below. The image downsampling device described below can be referred to in correspondence with the image downsampling method described above.
[0083] like Figure 4 As shown, the device includes an extraction module 401, a sampling module 402, and a fusion module 403.
[0084] Extraction module 401 is used to input the target image into the CNN model in the image preprocessing model and output the first feature of the target image;
[0085] The sampling module 402 is used to input the first feature into the image downsampling model in the image preprocessing model to obtain the downsampling result of the first feature, and input the target image into the image downsampling model to obtain the first downsampling result of the target image;
[0086] The fusion module 403 is used to obtain a second downsampling result of the target image based on the first downsampling result of the target image and the downsampling result of the first feature.
[0087] This embodiment extracts information that may be lost during the sampling process of the image downsampling model using a CNN model. The features output by the CNN model are downsampled using the image downsampling model, so that the size of the features is the same as the size of the downsampled result of the target image. The two downsampling results are fused to enhance the downsampled result of the target image using the features extracted by the CNN model, thus preserving the useful information of the image. In addition, a deep learning image preprocessing model is used instead of the image downsampling model to downsample the image, which is suitable for post-processing of images of various resolutions.
[0088] Figure 5 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 5 As shown, the electronic device may include a processor 510, a communication interface 520, a memory 530, and a communication bus 540, wherein the processor 510, the communication interface 520, and the memory 530 communicate with each other through the communication bus 540. The processor 510 can call logical instructions in the memory 530 to execute an image downsampling method, which includes: inputting a target image into a CNN model in an image preprocessing model and outputting a first feature of the target image; inputting the first feature into an image downsampling model in the image preprocessing model and obtaining a downsampling result of the first feature; inputting the target image into the image downsampling model and obtaining a first downsampling result of the target image; and obtaining a second downsampling result of the target image based on the first downsampling result of the target image and the downsampling result of the first feature.
[0089] Furthermore, the logical instructions in the aforementioned memory 530 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, essentially, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0090] On the other hand, the present invention also provides a computer program product, the computer program product comprising a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer is able to execute the image downsampling method provided by the above methods. The method includes: inputting a target image into a CNN model in an image preprocessing model and outputting a first feature of the target image; inputting the first feature into an image downsampling model in the image preprocessing model and obtaining a downsampling result of the first feature; inputting the target image into the image downsampling model and obtaining a first downsampling result of the target image; and obtaining a second downsampling result of the target image based on the first downsampling result of the target image and the downsampling result of the first feature.
[0091] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements an image downsampling method provided by the methods described above. The method includes: inputting a target image into a CNN model in an image preprocessing model and outputting a first feature of the target image; inputting the first feature into an image downsampling model in the image preprocessing model and obtaining a downsampling result of the first feature; inputting the target image into the image downsampling model and obtaining a first downsampling result of the target image; and obtaining a second downsampling result of the target image based on the first downsampling result of the target image and the downsampling result of the first feature.
[0092] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0093] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0094] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. An image downsampling method, characterized in that, include: The target image is input into the CNN model in the image preprocessing model, and the first feature of the target image is output. The first feature is input into the image downsampling model in the image preprocessing model to obtain the downsampling result of the first feature, and the target image is input into the image downsampling model to obtain the first downsampling result of the target image; Based on the first downsampling result of the target image and the downsampling result of the first feature, a second downsampling result of the target image is obtained; The step of obtaining a second downsampling result of the target image based on the first downsampling result of the target image and the downsampling result of the first feature includes: The downsampling result of the first feature is sequentially input into the third convolutional layer and the second batch normalization layer in the image preprocessing model, and the second feature of the downsampling result of the first feature is output. Add the downsampling results of the second feature and the first feature together; Based on the summation result and the first downsampling result of the target image, the second downsampling result of the target image is obtained.
2. The image downsampling method according to claim 1, characterized in that, The step of inputting the target image into the CNN model in the image preprocessing model and outputting the first feature of the target image includes: The target image is sequentially input into the first convolutional layer, the first nonlinear activation layer, the second convolutional layer, the second nonlinear activation layer, and the first batch of normalization layers in the CNN model, and the first feature of the target image is output.
3. The image downsampling method according to claim 1, characterized in that, The step of obtaining the second downsampling result of the target image based on the summation result and the first downsampling result of the target image includes: The summation result is input into the fourth convolutional layer of the image preprocessing model, and the third feature of the summation result is output. The third feature and the first downsampling result of the target image are added together to obtain the second downsampling result of the target image.
4. The image downsampling method according to any one of claims 1-3, characterized in that, Before outputting the first feature of the target image in the CNN model of the image preprocessing model, the process further includes: The sample image is input into the image preprocessing model, and the second downsampling result of the sample image is output. The second downsampling result of the sample image is input into the deep learning model, and the prediction processing result of the sample image is output. The predicted processing result of the sample image is compared with the preset actual processing result of the sample image; The image preprocessing model and the deep learning model are trained based on the comparison results.
5. The image downsampling method according to claim 4, characterized in that, The deep learning model is an image feature extraction model, and the prediction processing result is an image feature.
6. An image downsampling device, characterized in that, include: The extraction module is used to input the target image into the CNN model in the image preprocessing model and output the first feature of the target image; The sampling module is used to input the first feature into the image downsampling model in the image preprocessing model to obtain the downsampling result of the first feature, and to input the target image into the image downsampling model to obtain the first downsampling result of the target image; The fusion module is used to obtain a second downsampling result of the target image based on the first downsampling result of the target image and the downsampling result of the first feature; The fusion module is specifically used for: The downsampling result of the first feature is sequentially input into the third convolutional layer and the second batch normalization layer in the image preprocessing model, and the second feature of the downsampling result of the first feature is output. Add the downsampling results of the second feature and the first feature together; Based on the summation result and the first downsampling result of the target image, the second downsampling result of the target image is obtained.
7. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps of the image downsampling method as described in any one of claims 1 to 5.
8. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the image downsampling method as described in any one of claims 1 to 5.
9. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the image downsampling method as described in any one of claims 1 to 5.