An image processing method and system based on a sub-true value image multi-loss function

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By introducing multiple loss functions of sub-ground images into the training of the demosaic network, the problem of training getting stuck in local optima is solved, and the quality and signal-to-noise ratio of the demosaic images are improved.

CN118365559BActive Publication Date: 2026-06-23NORTHWESTERN POLYTECHNICAL UNIV

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: NORTHWESTERN POLYTECHNICAL UNIV
Filing Date: 2024-04-29
Publication Date: 2026-06-23

Application Information

Patent Timeline

29 Apr 2024

Application

23 Jun 2026

Publication

CN118365559B

IPC: G06T5/73; G06T5/60; G06V10/774; G06V10/82; G06N3/0464; G06N3/084

AI Tagging

Application Domain

Image enhancement Biological models

Technology Topics

Pattern recognition Imaging processing

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing demosaic network training methods for hyperspectral imaging are prone to getting stuck in local optima, resulting in poor demosaic image quality.

Method used

A demosaic network model is trained using multiple loss functions based on sub-ground images. By calculating the loss values between the predicted image and the ground image and multiple sub-ground images, the network parameters are optimized to avoid local optima and improve image quality.

Benefits of technology

It effectively avoids training getting stuck in local optima and improves the quality of de-mosaic images, especially by improving the signal-to-noise ratio by 0.7dB.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN118365559B_ABST

Patent Text Reader

Abstract

The application discloses an image processing method and system based on a sub-true value image multi-loss function, relates to the technical field of artificial intelligence and image processing, and is characterized in that the de-mosaic network model used by the image processing method is trained based on a loss function of a sub-true value image and a loss function of a true value image. Specifically, a first loss value between a predicted image and a true value image is calculated; a plurality of sub-true value images are obtained according to the predicted image and the true value image; a plurality of second loss values between the predicted image and the sub-true value images are respectively calculated; and a final loss value is obtained according to the first loss value and the plurality of second loss values to optimize network parameters of a de-mosaic network structure, thereby avoiding the defects that a plurality of loss functions are based on one true value image, one or more loss functions are invalid, or training is trapped in a local optimal point, and the quality of a de-mosaic image is improved.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of artificial intelligence and image processing technology, and in particular to an image processing method and system based on sub-true image multiple loss functions. Background Technology

[0002] Hyperspectral imaging, as an emerging imaging system, has been widely applied in various fields such as land surveying, urban and rural construction, statistical surveys, agricultural and forestry resource monitoring, and real / false target identification, thanks to the continuous maturation of imaging technology. Existing rapid hyperspectral imaging techniques can capture raw images using an MSFA filter array, allowing different pixels to record information (color) from different spectral channels, such as... Figure 1 As shown. However, this method requires a de-mosaic algorithm to process the original image to obtain a full-spectrum, full-resolution image, i.e., a de-mosaic image, such as... Figure 2 As shown.

[0003] Existing demosaicing network training methods generally consist of five steps: initialization, forward propagation, loss calculation, back propagation, and parameter update, with loss calculation being the most crucial. To ensure the network output quality meets various metrics, a weighted average of multiple loss functions is calculated as the final loss value, with each loss function corresponding to a specific metric. Although multiple metrics are used, most of these metrics are based on a ground-value image, and all loss functions point to the same optimization target point. This can potentially render one or more loss functions ineffective or cause the training to get stuck in a local optimum, ultimately affecting the quality of the demosaiced image.

[0004] Therefore, there is an urgent need for an image processing method that can avoid invalid loss functions and prevent the training of the demosaic network from getting stuck in local optima, thereby improving the quality of demosaic images. Summary of the Invention

[0005] The purpose of this invention is to provide an image processing method and system based on multiple loss functions of sub-ground value images. This method can train a demosaic network model based on loss functions of different sub-ground value images and loss functions based on ground value images, thereby mitigating the problem of training getting stuck in local optima and improving the quality of demosaic images.

[0006] To achieve the above objectives, the present invention provides the following solution:

[0007] In a first aspect, the present invention provides an image processing method based on sub-true image multiple loss functions, comprising:

[0008] Obtain the original image of the target;

[0009] The original target image is input into a demosaic network model, and the demosaiced target image is output. The training process of the demosaic network model specifically includes:

[0010] Obtain a training dataset, wherein the training dataset includes multiple sets of image data, and each set of image data includes an original image and a corresponding ground truth image;

[0011] The original image from a set of image data is input into a demosaicing network structure, and the predicted image is output.

[0012] Calculate the loss value between the predicted image and the ground truth image in the current group of image data to obtain a first loss value;

[0013] Based on the predicted image and the ground truth image, multiple sub-ground truth images are obtained;

[0014] Calculate the loss value between the predicted image and the sub-true image respectively to obtain multiple second loss values;

[0015] Based on the first loss value and multiple second loss values, optimize the network parameters of the demosaic network structure;

[0016] The demosaicing network structure is trained by iterating through all groups of image data in the training dataset, and the demosaicing network model is determined based on the network parameters of the trained demosaicing network structure.

[0017] Optionally, a first loss value is obtained by calculating the loss value between the predicted image and the ground truth image in the current group of image data, specifically including:

[0018] Calculate the first root mean square between the predicted image and the ground image;

[0019] The first loss value is determined based on the first root mean square.

[0020] Optionally, based on the predicted image and the ground truth image, multiple sub-ground truth images are obtained, specifically including:

[0021] Using the ground truth image as guiding data and the predicted image as target data, a plurality of sub-ground truth images are generated by a guided filtering method.

[0022] Optionally, the ground truth image is used as guiding data, and the predicted image is used as target data. A guided filtering method is employed to generate multiple sub-ground truth images, specifically including:

[0023] The predicted image is processed using a low-pass filter to obtain an average predicted image;

[0024] The ground truth image is processed using a low-pass filter to obtain an average ground truth image;

[0025] Based on the average predicted image and the average ground truth image, a plurality of sub-ground truth images are generated, wherein the plurality of sub-ground truth images include at least a second ground truth image and a third ground truth image.

[0026] Optionally, the process of generating the second truth image specifically includes:

[0027] Perform an element-wise product operation on the predicted image and the ground image to obtain the first product operation result;

[0028] The result of the first product operation is processed using a low-pass filter to obtain the average coproduct;

[0029] Perform an element-wise product operation on the average predicted image and the average true image to obtain a second product operation result;

[0030] The covariance is obtained based on the difference between the average coproduct and the result of the second product operation;

[0031] Perform element-wise multiplication on the predicted image itself to obtain the third product result;

[0032] The result of the third product operation is processed using a low-pass filter to obtain the average predicted product;

[0033] The prediction variance is obtained based on the difference between the average prediction product and the result of the second product operation;

[0034] The gradient is obtained based on the covariance, the prediction variance, and the fuzzy parameters;

[0035] Perform an element-wise product operation on the average predicted image and the gradient to obtain the fourth product operation result;

[0036] The intersection point is obtained based on the difference between the average true value image and the result of the fourth product operation;

[0037] Perform an element-wise product operation on the true image and the intersection point to obtain the fifth product operation result;

[0038] The second truth image is obtained by summing the intersection point and the result of the fifth product operation.

[0039] Optionally, the process of generating the third truth image specifically includes:

[0040] The high-frequency information of the true image is obtained based on the difference between the true image and the average true image.

[0041] By fusing the high-frequency information of the average predicted image and the ground truth image, a third ground truth image is obtained.

[0042] Optionally, the loss value between the predicted image and the near-true image is calculated respectively, resulting in multiple second loss values, specifically including:

[0043] For each near-truth image, perform the following steps:

[0044] Calculate the second root mean square between the predicted image and the near-true image;

[0045] The second loss value is determined based on the second root mean square.

[0046] Optionally, the network parameters of the demosaic network structure are optimized based on the first loss value and multiple second loss values, specifically including:

[0047] The first loss value and multiple second loss values are weighted and summed to obtain the final loss value;

[0048] Calculate the optimization direction and optimization value for each network parameter based on the final loss value;

[0049] The demosaic network structure is optimized based on the optimization direction and the optimization value.

[0050] Secondly, the present invention provides an image processing system based on sub-true image multiple loss functions, comprising:

[0051] The target image acquisition subsystem is used to acquire the original target image;

[0052] The prediction subsystem is used to input the original target image into the demosaic network model and output the target demosaic image.

[0053] A training subsystem is used to train the demosaic network model;

[0054] The training subsystem specifically includes:

[0055] An initialization module is used to obtain a training dataset, wherein the training dataset includes multiple sets of image data, and each set of image data includes an original image and a corresponding ground truth image;

[0056] The forward propagation module is used to input the original image from a set of image data into the demosaicing network structure and output the predicted image;

[0057] The first loss calculation module is used to calculate the loss value between the predicted image and the ground image in the current group of image data to obtain the first loss value;

[0058] The sub-truth image generation module is used to obtain multiple sub-truth images based on the predicted image and the truth image;

[0059] The second loss value calculation module is used to calculate the loss value between the predicted image and the sub-true image respectively, and obtain multiple second loss values accordingly.

[0060] The optimization module is used to optimize the network parameters of the demosaic network structure based on the first loss value and multiple second loss values;

[0061] The traversal module is used to traverse all groups of image data in the training dataset, complete the training of the demosaicing network structure, and determine the demosaicing network model based on the network parameters of the trained demosaicing network structure.

[0062] According to specific embodiments provided by the present invention, the present invention discloses the following technical effects:

[0063] This invention provides an image processing method and system based on multiple loss functions of sub-ground value images. The demosaic network model used in this image processing method is trained based on the loss function of the sub-ground value image and the loss function based on the ground value image. This avoids the defects of multiple loss functions based on a single ground value image, which may render one or more loss functions ineffective or cause the training to get stuck in a local optimum, thereby improving the quality of the demosaic image. Attached Figure Description

[0064] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0065] Figure 1 A schematic diagram of the MSFA filter array and the original image;

[0066] Figure 2 A schematic diagram illustrating the process of obtaining a de-mosaic image from the original image using a de-mosaic algorithm.

[0067] Figure 3 The flowchart shows the training method of the demosaic network model in an image processing method based on sub-truth image multiple loss function provided in Embodiment 1 of the present invention.

[0068] Figure 4 This is a schematic diagram of the training process of the demosaic network model provided in Embodiment 1 of the present invention;

[0069] Figure 5This is a schematic diagram of the training subsystem in an image processing system based on sub-true image multiple loss functions, as provided in Embodiment 2 of the present invention. Detailed Implementation

[0070] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0071] The purpose of this invention is to provide an image processing method and system based on multiple loss functions of sub-ground value images. This method can train a demosaic network model based on loss functions of different sub-ground value images and loss functions based on ground value images, thereby mitigating the problem of training getting stuck in local optima and improving the quality of demosaic images.

[0072] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0073] Example 1:

[0074] To overcome the shortcomings of existing demosaic network training methods, this embodiment provides a novel approach to training demosaic networks. By establishing multiple loss functions, each targeting a different optimization objective point, one loss function is based on the ground truth image, while the others are based on different near-ground truth images. Applying the trained demosaic network to image processing methods can improve the quality of the demosaic image output by the network. Specifically, this embodiment provides an image processing method based on multiple loss functions using near-ground truth images, including:

[0075] S1: Obtain the original image of the target;

[0076] S2: Input the original target image into the demosaic network model and output the target demosaic image. The training process of the demosaic network model is as follows: Figure 3 and Figure 4 As shown, assuming there are n loss functions, specifically including:

[0077] Step 1000: Initialize the network parameters of the demosaicing network structure and obtain the training dataset, wherein the training dataset should include multiple sets of image data, each set of image data including an original image and the corresponding ground truth image.

[0078] In this embodiment, the DPG-Net network is used as the demosaic network structure. The DPG-Net network parameters are randomly initialized, and a training dataset is prepared. This training dataset contains 69 sets of training data and 35 sets of test data. Each set of data contains one original image and one 16-channel ground truth (demosaic) image.

[0079] Step 2000: Input the original image from a set of image data into the demosaicing network structure and output the predicted image.

[0080] Step 3000: Calculate the final loss value based on the predicted image and the ground truth image in the current group of image data, specifically including:

[0081] Step 3100: Calculate the loss value between the predicted image and the ground truth image in the current group of image data to obtain a first loss value, specifically including:

[0082] Calculate the first root mean square between the predicted image and the ground image;

[0083] The first loss value is determined based on the first root mean square, wherein the expression for the first loss value is:

[0084] First loss value = |Predicted image - Ground image|² 2 .

[0085] Step 3200: Based on the predicted image and the ground truth image, obtain n-1 sub-ground truth images, specifically including:

[0086] Step 3210: Using the ground truth image as guiding data and the predicted image as target data, a plurality of the sub-ground truth images are generated using a guided filtering method.

[0087] A second ground truth image is generated using a simplified guided filter. A classic guided filter requires four inputs: guide data, target data, window size, and blur parameters. This step uses the ground truth image as guide data, the predicted image as target data, and sets the window size and blur parameters to 5 and 0.01, respectively, to generate the second ground truth image, i.e.:

[0088] Sub-true image = simplified guided filter (guided = true image, target = predicted image, window = 5, blur = 0.01).

[0089] Step 3210 may further include:

[0090] Low-pass filters are used to process the predicted image and the ground truth image respectively to obtain the low-frequency information of the predicted image and the ground truth image.

[0091] Based on the ground truth image and its low-frequency information, obtain the high-frequency information of the ground truth image;

[0092] By fusing low-frequency information from the predicted image and high-frequency information from the ground truth image, a sub-ground truth image is obtained.

[0093] Assuming n=3, this embodiment takes the generation process of the second and third ground truth images as examples to introduce two feasible methods for generating sub-ground truth images.

[0094] The process of generating the second ground truth image specifically includes:

[0095] Step 3211: Perform convolution processing on the predicted image using a low-pass filter to obtain the average predicted image:

[0096] Average predicted image = predicted image * low-pass filter;

[0097] In this embodiment, the method used is as follows:

[0098] Step 3212: Perform convolution processing on the ground truth image using a low-pass filter to obtain the average ground truth image:

[0099] Average ground truth image = Ground truth image * Low-pass filter;

[0100] Step 3213: Perform element-wise multiplication on the predicted image and the ground truth image to obtain a first product result; process the first product result using a low-pass filter to obtain the average coproduct.

[0101] Average coproduct = (Predicted image ⊙ Ground image) * Low-pass filter

[0102] Where ⊙ represents element-wise multiplication;

[0103] Step 3214: Perform element-wise multiplication on the average predicted image and the average true image to obtain a second product result; obtain the covariance based on the difference between the average coproduct and the second product result.

[0104] Covariance = Average coproduct - Average predicted image ⊙ Average true image;

[0105] Step 3215: Perform element-wise multiplication on the predicted image and the predicted image to obtain a third product result; process the third product result using a low-pass filter to obtain the average predicted product:

[0106] Average prediction product = (predicted image ⊙ predicted image) * low-pass filter;

[0107] Step 3216: Obtain the prediction variance based on the difference between the average predicted product and the result of the second product operation.

[0108] Prediction variance = Average prediction product - Average predicted image ⊙ Average true image;

[0109] Step 3217: Obtain the gradient based on the covariance, the prediction variance, and the fuzzy parameters:

[0110] Gradient = Covariance / (Prediction Variance + Fuzzy Parameter);

[0111] Step 3218: Perform element-wise multiplication on the average predicted image and the gradient to obtain the fourth product result; obtain the intersection point based on the difference between the average true image and the fourth product result.

[0112] Intersection point = Average true image - Gradient ⊙ Average predicted image;

[0113] Step 3219: Perform element-wise multiplication on the truth image and the intersection point to obtain the fifth product result; obtain the second truth image based on the sum between the intersection point and the fifth product result.

[0114] The second ground truth image = intersection point + gradient ⊙ ground truth image.

[0115] The process of generating the third truth image specifically includes:

[0116] Step 3220: Perform convolution processing on the predicted image using a low-pass filter to obtain the average predicted image:

[0117] Average predicted image = predicted image * low-pass filter;

[0118] Step 3221: Perform convolution processing on the ground truth image using a low-pass filter to obtain the average ground truth image:

[0119] Average ground truth image = Ground truth image * Low-pass filter;

[0120] Step 3222: Based on the difference between the ground truth image and the average ground truth image, obtain the high-frequency information of the ground truth image; fuse the high-frequency information of the average predicted image and the ground truth image to obtain a third ground truth image, namely:

[0121] The third ground truth image = ground truth image - average ground truth image + average predicted image.

[0122] Step 3300: Calculate the loss value between the predicted image and the near-true image respectively, and obtain multiple second loss values (i.e. Figure 3Other loss values in the image), taking the second and third ground truth images as examples:

[0123] Step 3310: Calculate the second root mean square between the predicted image and the second ground truth image, and determine the second loss value based on the second root mean square, i.e.:

[0124] Second loss value = |Predicted image - Second ground truth image|^2 2 .

[0125] Step 3320: Calculate the third root mean square between the predicted image and the third ground truth image, and determine the third loss value based on the third root mean square, i.e.:

[0126] Third loss value = |Predicted image - Third ground truth image|² 2 .

[0127] Step 3400: Based on the first loss value and multiple second loss values, optimize the network parameters of the demosaic network structure, specifically including:

[0128] Step 3410: Perform a weighted summation of the first loss value and multiple second loss values to obtain the final loss value. Taking the first loss value, second loss value, and third loss value as an example, the expression for the final loss value is:

[0129] Final loss value = k1 × first loss value + k2 × second loss value + k3 × third loss value;

[0130] Here we set k1 = 0.9, k2 = 0.1, and k3 = 0.

[0131] Step 3411: Analyze the final loss value according to the backpropagation algorithm, and calculate the optimization direction and optimization value of each network parameter.

[0132] Step 3412: Update each network parameter of the demosaic network structure according to the optimization direction and the optimization value.

[0133] Step 4000: Extract another set of training data and train again until all training data in the training dataset has been traversed 6000 times. Then the training of the demosaic network structure is completed, and the demosaic network model is determined based on the network parameters of the trained demosaic network structure.

[0134] The DPG-Net trained using the training method disclosed in this embodiment can improve the signal-to-noise ratio of the generated de-mosaic image by 0.7 dB compared to the DPG-Net trained using existing training methods.

[0135] The training method in this embodiment can be extended to any supervised image processing network (where ground truth images exist).

[0136] To demonstrate the scalability of the method in this embodiment, this embodiment is used as the training method, and the optical flow FlyingChairs is used as the dataset to train the SPyNet network.

[0137] The Flying Chairs dataset contains 22,872 sets of data, each set containing one original image, one image after movement, and one ground-value optical flow for the image movement. 15,000 sets were used as the training dataset, and the remaining 7,872 sets were used as the test dataset.

[0138] The SpyNet network has two inputs: the original image and the moved image, and one output: the ground truth optical flow. First, all parameters of the SpyNet network are randomly initialized. Second, a set of data is randomly extracted from the training dataset, and the original and moved images are input into the SpyNet network to obtain the predicted optical flow. Then, the mean squared error between the predicted and ground truth optical flows is calculated to obtain the first loss value; the predicted and ground truth optical flows are fused to obtain the second and third ground truth optical flows; the mean squared error between the predicted and second ground truth optical flows is calculated to obtain the second loss value; the mean squared error between the predicted and third ground truth optical flows is calculated to obtain the third loss value; these three loss values are then weighted and averaged using k1 = 0.9, k2 = 0, and k3 = 0.1 to obtain the final loss value. Furthermore, the final loss value is analyzed to determine the optimization direction for each parameter of the SpyNet network, and each parameter of the SpyNet network is updated according to this optimization direction. After the update is complete, another set of data is extracted from the training dataset and input into the SpyNet network, until the training dataset is traversed 50 times, thus completing the training of the SpyNet network. The SpyNet trained using the method disclosed in this embodiment can reduce the error in the generated optical flow by 0.5% compared to SpyNet trained using existing methods.

[0139] Example 2:

[0140] This embodiment provides an image processing system based on sub-true image multiple loss functions, including:

[0141] The target image acquisition subsystem is used to acquire the original target image;

[0142] The prediction subsystem is used to input the original target image into the demosaic network model and output the target demosaic image.

[0143] A training subsystem is used to train the demosaic network model;

[0144] See Figure 5 The training subsystem specifically includes:

[0145] The initialization module M1 is used to obtain the training dataset, wherein the training dataset includes multiple sets of image data, and each set of image data includes an original image and a corresponding ground truth image;

[0146] Forward propagation module M2 is used to input the original image from a set of image data into the demosaicing network structure and output the predicted image;

[0147] The first loss calculation module M3 is used to calculate the loss value between the predicted image and the ground image in the current group of image data to obtain the first loss value;

[0148] The sub-truth image generation module M4 is used to obtain multiple sub-truth images based on the predicted image and the truth image;

[0149] The second loss value calculation module M5 is used to calculate the loss value between the predicted image and the sub-true image respectively, and obtain multiple second loss values accordingly.

[0150] Optimization module M6 is used to optimize the network parameters of the demosaic network structure based on the first loss value and multiple second loss values;

[0151] The traversal module M7 is used to traverse all groups of image data in the training dataset, complete the training of the demosaicing network structure, and determine the demosaicing network model based on the network parameters of the trained demosaicing network structure.

[0152] Each embodiment in this specification focuses on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the systems disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the descriptions are relatively simple; relevant parts can be found in the method section.

[0153] This document uses specific examples to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. Furthermore, those skilled in the art will recognize that, based on the ideas of the present invention, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of the present invention.

Claims

1. An image processing method based on sub-truth image multiple loss functions, characterized in that, The image processing method includes: Obtain the original image of the target; The original target image is input into a demosaic network model, and the demosaiced target image is output. The training process of the demosaic network model specifically includes: Obtain a training dataset, wherein the training dataset includes multiple sets of image data, and each set of image data includes an original image and a corresponding ground truth image; The original image from a set of image data is input into a demosaicing network structure, and the predicted image is output. Calculate the loss value between the predicted image and the ground truth image in the current group of image data to obtain a first loss value; Based on the predicted image and the ground truth image, multiple sub-ground truth images are obtained; Calculate the loss value between the predicted image and the sub-true image respectively to obtain multiple second loss values; Based on the first loss value and multiple second loss values, optimize the network parameters of the demosaic network structure; The demosaicing network structure is trained by iterating through all groups of image data in the training dataset, and the demosaicing network model is determined based on the network parameters of the trained demosaicing network structure.

2. The image processing method based on sub-truth image multiple loss functions according to claim 1, characterized in that, Calculate the loss value between the predicted image and the ground truth image in the current group of image data to obtain a first loss value, specifically including: Calculate the first root mean square between the predicted image and the ground image; The first loss value is determined based on the first root mean square.

3. The image processing method based on sub-truth image multiple loss functions according to claim 1, characterized in that, Based on the predicted image and the ground truth image, multiple sub-ground truth images are obtained, specifically including: Using the ground truth image as guiding data and the predicted image as target data, a plurality of sub-ground truth images are generated by a guided filtering method.

4. The image processing method based on sub-truth image multiple loss functions according to claim 3, characterized in that, Using the ground truth image as guiding data and the predicted image as target data, a guided filtering method is used to generate multiple sub-ground truth images, specifically including: The predicted image is processed using a low-pass filter to obtain an average predicted image; The ground truth image is processed using a low-pass filter to obtain an average ground truth image; Based on the average predicted image and the average ground truth image, a plurality of sub-ground truth images are generated, wherein the plurality of sub-ground truth images include at least a second ground truth image and a third ground truth image.

5. The image processing method based on sub-truth image multiple loss functions according to claim 4, characterized in that, The process of generating the second truth image specifically includes: Perform an element-wise product operation on the predicted image and the ground image to obtain the first product operation result; The result of the first product operation is processed using a low-pass filter to obtain the average coproduct; Perform an element-wise product operation on the average predicted image and the average true image to obtain a second product operation result; The covariance is obtained based on the difference between the average coproduct and the result of the second product operation; Perform element-wise multiplication on the predicted image itself to obtain the third product result; The result of the third product operation is processed using a low-pass filter to obtain the average predicted product; The prediction variance is obtained based on the difference between the average prediction product and the result of the second product operation; The gradient is obtained based on the covariance, the prediction variance, and the fuzzy parameters; Perform an element-wise product operation on the average predicted image and the gradient to obtain the fourth product operation result; The intersection point is obtained based on the difference between the average true value image and the result of the fourth product operation; Perform an element-wise product operation on the true image and the intersection point to obtain the fifth product operation result; The second truth image is obtained by summing the intersection point and the result of the fifth product operation.

6. The image processing method based on sub-truth image multiple loss functions according to claim 4, characterized in that, The process of generating the third truth image specifically includes: The high-frequency information of the true image is obtained based on the difference between the true image and the average true image. By fusing the high-frequency information of the average predicted image and the ground truth image, a third ground truth image is obtained.

7. The image processing method based on sub-truth image multiple loss functions according to claim 1, characterized in that, The loss values between the predicted image and the near-true image are calculated respectively, resulting in multiple second loss values, specifically including: For each near-truth image, perform the following steps: Calculate the second root mean square between the predicted image and the near-true image; The second loss value is determined based on the second root mean square.

8. The image processing method based on sub-truth image multiple loss functions according to claim 1, characterized in that, Based on the first loss value and multiple second loss values, the network parameters of the demosaic network structure are optimized, specifically including: The first loss value and multiple second loss values are weighted and summed to obtain the final loss value; Calculate the optimization direction and optimization value for each network parameter based on the final loss value; The demosaic network structure is optimized based on the optimization direction and the optimization value.

9. An image processing system based on sub-true image multiple loss functions, characterized in that, The image processing system includes: The target image acquisition subsystem is used to acquire the original target image; The prediction subsystem is used to input the original target image into the demosaic network model and output the target demosaic image. A training subsystem is used to train the demosaic network model; The training subsystem specifically includes: An initialization module is used to obtain a training dataset, wherein the training dataset includes multiple sets of image data, and each set of image data includes an original image and a corresponding ground truth image; The forward propagation module is used to input the original image from a set of image data into the demosaicing network structure and output the predicted image; The first loss calculation module is used to calculate the loss value between the predicted image and the ground image in the current group of image data to obtain the first loss value; The sub-truth image generation module is used to obtain multiple sub-truth images based on the predicted image and the truth image; The second loss value calculation module is used to calculate the loss value between the predicted image and the sub-true image respectively, and obtain multiple second loss values accordingly. The optimization module is used to optimize the network parameters of the demosaic network structure based on the first loss value and multiple second loss values; The traversal module is used to traverse all groups of image data in the training dataset, complete the training of the demosaicing network structure, and determine the demosaicing network model based on the network parameters of the trained demosaicing network structure.

Citation Information

Patent Citations

Multi-task network generation method and device, computer equipment and storage medium
CN110334735A
Image processing model training method, image processing method and image processing device
CN112465737A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Multi-task network generation method and device, computer equipment and storage medium

Image processing model training method, image processing method and image processing device