Image completion model training method, image completion method, device, and storage medium
By constructing a dual-module collaborative mechanism of value network and policy network, the training process of diffusion model is optimized, solving the problems of resource waste and instability caused by retraining diffusion model, and achieving efficient image completion.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HARBIN INST OF TECH
- Filing Date
- 2026-02-28
- Publication Date
- 2026-06-12
AI Technical Summary
Existing diffusion models require retraining when training new image categories, resulting in high computational and time costs, and are prone to gradient explosion or gradient vanishing, affecting network performance.
An image completion model training method based on a diffusion model is adopted. By constructing a dual-module collaborative mechanism of value network and policy network, noise reduction is carried out through full iteration and partial iteration, and network parameters are optimized to avoid the problems caused by retraining and maintain gradient stability.
This reduces the time and hardware resource costs of the training process while maintaining gradient stability during training, avoiding the destruction of prior information and improving the performance of the image completion model.
Smart Images

Figure CN122199331A_ABST
Abstract
Description
Technical Field
[0001] This application relates to an image completion model training method, an image completion method, an electronic device, and a computer-readable storage medium, belonging to the field of computer image processing. Background Technology
[0002] Currently, image completion technology can repair damaged or missing parts of an image, making the repaired image look natural and seamless. With the development of deep learning technology, especially with the widespread application of diffusion models, image completion technology has made significant progress.
[0003] In practical applications, it is common to encounter situations where the content to be generated belongs to an image category that the diffusion model did not learn during training. In such cases, it is necessary to add a dataset of images from the new category as additional input to the diffusion model for secondary training. However, retraining the diffusion model from scratch requires excessive computational hardware resources and time costs due to its complex network structure and numerous training steps. Furthermore, retraining may disrupt the generative priors captured during the pre-training process, thus affecting network performance. In addition, the training process of the diffusion model is prone to gradient explosion or vanishing gradient phenomena, exacerbating the instability of the training process. Summary of the Invention
[0004] This application discloses an image completion model training method, an image completion method, an electronic device, and a computer-readable storage medium.
[0005] The image completion model in this application is based on a diffusion model, and the image completion model training method includes:
[0006] Based on the initial policy network parameters or the policy network parameters obtained through iteration, a complete iterative denoising process is performed according to pre-generated random Gaussian noise. The value network loss of the image completion model during the complete iterative denoising process is determined in order to optimize the value network parameters of the image completion model. Based on the initial network parameters or the network parameters obtained through iteration, partial iterative denoising is performed according to the random Gaussian noise. The policy network loss of the image completion model during the partial iterative denoising process is determined in order to optimize the policy network parameters of the image completion model. When the loss change rate of the value network parameters and the policy network parameters meets the preset conditions, the trained image completion model is obtained.
[0007] In some implementations, the step of performing full iterative denoising based on the initial policy network parameters or the policy network parameters obtained through iteration, according to pre-generated random Gaussian noise, and determining the value network loss of the image completion model during the full iterative denoising process, includes: Based on the current policy network in the image completion model, an initial optimized noise is determined according to the random Gaussian noise, wherein the current policy network has the initial policy network parameters or the policy network parameters obtained through iteration; Based on the initial optimized noise, perform the complete iterative noise reduction to obtain the optimized noise at each iteration step and the corresponding iteration step information; Based on the current value network in the image completion model, the quality score of the optimized noise is determined according to the optimized noise and the corresponding iteration step information; The value network loss is determined based on the quality score of the optimized noise at each iteration step to optimize the value network parameters of the image completion model.
[0008] In some implementations, the formula for the value network loss is:
[0009] in The value is the network loss. The generated image for the image completion model, The loss of the generated image compared to the real image, For the first i The quality score of the optimized noise at each iteration step t This indicates the total number of iteration steps.
[0010] In some implementations, the value network parameters of the optimized image completion model are calculated using the following formula:
[0011] in The values of the network parameters before optimization, The optimized values are the network parameters.
[0012] In some implementations, the process of performing partial iterative denoising based on the initial network parameters or the iteratively obtained network parameters, according to the random Gaussian noise, and determining the policy network loss of the image completion model during the partial iterative denoising process, includes: Based on the current policy network in the image completion model, the initial optimized noise is determined according to the random Gaussian noise; Based on the initial optimized noise, the partial iterative noise reduction is performed to obtain the optimized noise at each iteration step and the corresponding iteration step information, wherein the number of iteration steps in the partial iterative noise reduction process is less than that in the complete iterative noise reduction. Based on the current value network in the image completion model, the quality score of the optimized noise is determined according to the optimized noise and the corresponding iteration step information; Based on the quality score of the optimized noise at each iteration step, the policy network loss is determined to optimize the value network parameters of the image completion model.
[0013] In some implementations, the formula for the policy network loss is:
[0014] in The network loss is the strategy described above. This is the upper limit of the number of iteration steps in the aforementioned iterative noise reduction process. The quality score for the optimized noise is given.
[0015] In some implementations, the strategy network parameters for optimizing the image completion model are calculated using the following formula:
[0016] in The network parameters of the strategy before optimization, The optimized network parameters are those for the proposed strategy.
[0017] The image completion method in this application is based on an image completion model trained according to the image completion model training method in the above embodiments. The image completion method includes: The initial optimization noise is determined based on the pre-generated random Gaussian noise; Based on the image completion model, a complete iterative denoising process is performed according to the initial optimized noise, the image to be completed, and the mask information to determine the noise-removed image, wherein the image to be completed and the mask information are obtained from user input information; Based on the noise-removed image, image processing is performed to determine the missing region in the image to be completed, so as to determine the target image after completion.
[0018] The electronic device in this application includes a memory and a processor. The memory stores a computer program, and when the computer program is executed by the processor, it implements the image completion model training method or the image completion method in the above embodiments.
[0019] The computer-readable storage medium in this application embodiment stores a computer program that, when executed by one or more processors, implements the image completion model training method or the image completion method described in the above embodiments.
[0020] The beneficial effects of this application are as follows: The image completion model training method in the embodiments of this application avoids the situation of destroying the generated prior information during the retraining process of the traditional diffusion model by constructing a dual-module collaborative mechanism of "effect judgment-noise optimization". At the same time, by using the complete loop iteration and partial loop iteration between the two modules to perform training, the number of execution steps included in the training process is greatly reduced, thereby reducing the training time cost and hardware resource cost, and maintaining gradient stability during the training process. Attached Figure Description
[0021] Figure 1 This is one of the flowcharts illustrating the image completion model training method in the embodiments of this application; Figure 2 This is the second flowchart illustrating the image completion model training method in the embodiments of this application; Figure 3 This is the third flowchart illustrating the image completion model training method in the embodiments of this application; Figure 4 This is a flowchart illustrating the image completion method in the embodiments of this application. Detailed Implementation
[0022] Please see Figure 1 The image completion model in this application is based on a diffusion model, and the image incompleteness model training method includes: Step 01: Based on the initial policy network parameters or the policy network parameters obtained through iteration, perform full iterative denoising according to the pre-generated random Gaussian noise, determine the value network loss of the image completion model during the full iterative denoising process, and optimize the value network parameters of the image completion model. Step 02: Based on the initial value network parameters or the value network parameters obtained through iteration, perform partial iterative denoising according to random Gaussian noise, determine the policy network loss of the image completion model during the partial iterative denoising process, so as to optimize the policy network parameters of the image completion model; Step 03: If the loss change rate of the value network parameters and the policy network parameters meets the preset conditions, the trained image completion model is obtained.
[0023] Specifically, the image completion model training method in this application trains an image completion model based on a diffusion model. The main training logic aims to enable the image completion model to generate high-quality noisy image information, thereby further utilizing this noisy image information to complete the missing regions in the image to be completed. In addition to the main model body, the aforementioned image completion model includes a value network and a policy network. The main model body is primarily responsible for iteratively denoising random noisy images. The value network is mainly used to score the quality of the noisy image obtained in each iteration of denoising. The policy network, guided by the score from the value network, is mainly used to avoid gradient explosion or gradient vanishing during the iterative denoising process.
[0024] The image completion model training method described above mainly includes two steps: first, training the value network to optimize its parameters, thereby improving the accuracy and stability of the value network's quality scoring of noisy images in each iteration step; second, training the policy network to optimize its parameters, thereby improving the protection of gradient stability during iterative denoising. These two training steps work together to continuously adjust the parameters of the value network and the policy network during multiple iterations.
[0025] To determine the iteration endpoint, after each iteration, a condition check needs to be performed on the value network parameters and policy network parameters. The check involves the rate of change of loss for both the value network parameters and the policy network parameters. If the rate of change of loss for both the value network parameters and the policy network parameters is less than or equal to the corresponding rate of change threshold, then the rate of change of loss for both the value network parameters and the policy network parameters can be considered to meet the preset conditions. At this point, the training process of the image completion model is complete, and the resulting image completion model is the one that meets the usage requirements.
[0026] Before training the value network and policy network, it is necessary to first set up the basic training environment and perform initialization, such as: first, obtaining the pre-trained diffusion model. Furthermore, based on the aforementioned diffusion model, an image completion model for specific image categories is obtained. ,Model Including the image information to be completed, mask information, and the real image used for reference, diffusion model For the model The main body of the model. Then, regarding the model... The parameters of the value network and policy network are initialized to obtain the initial value network parameters and initial policy network parameters. In addition, based on the specific completion scene of the image to be completed (such as faces, natural scenes, buildings, etc.), the hyperparameters during the training process are further set, such as the total number of iterations in the complete iterative denoising process. t The upper limit of the number of iterations in a partial iterative noise reduction process N The learning rate and data batch size during training, as well as the aforementioned rate of change threshold, etc.
[0027] For the specific training process of the value network, please refer to [link / reference] in some implementations. Figure 2 Step 01 includes: Step 011: Based on the current policy network in the image completion model, determine the initial optimization noise according to random Gaussian noise. The current policy network has initial policy network parameters or policy network parameters obtained through iteration; Step 012: Based on the initial optimization noise, perform full iterative noise reduction to obtain the optimization noise at each iteration step and the corresponding iteration step information; Step 013: Based on the current value network in the image completion model, determine the quality score of the optimization noise according to the optimization noise and the corresponding iteration step information; Step 014: Determine the value network loss based on the quality score of the optimized noise at each iteration step, so as to optimize the value network parameters of the image completion model.
[0028] Specifically, the training steps of the value network will be described below as an example.
[0029] In some examples, a random Gaussian noise is first generated using an image completion model. , It follows a standard Gaussian distribution, and is then based on the current policy network included in the image completion model. For random Gaussian noise Perform preliminary optimization to obtain initial optimized noise. In a rather unique case, if the current policy network... If the policy network parameters are the initial policy network parameters obtained during the initialization process, then the actual output initial optimization noise... With random Gaussian noise The same applies if the current policy network If the policy network parameters are obtained through iterative training and adjustment, then the initial optimized noise is obtained according to Equation 1. .
[0030] ... Formula 1 Next, using the image completion model to fill in the main body of the model, and following the inverse stepwise noise reduction direction, from the iteration step... Start to iteration step For initial optimization noise Perform full iterative noise reduction, where t This represents the total number of iterations in the complete iterative noise reduction process. The iterative process is shown in Equation 2.
[0031] ... Formula 2 In this way, at any iteration step m After performing noise reduction, the optimized noise at that iteration step can be obtained. and the corresponding iteration step information m Therefore, in each iteration step, the current value network included in the image completion model can be called. Reduce the optimization noise at the current iteration step and the corresponding iteration step information m Substitute into the current value network The quality scoring is performed in the process, as shown in Formula 3.
[0032] ... Formula 3 in For iteration step m The corresponding optimization noise quality score is a value of the network parameters. It is a function with the independent variable.
[0033] Therefore, after the above quality scoring process, we can obtain... t Group optimization noise, t Information on each iteration step and t A quality score is given. After the complete iterative noise reduction process is finished, the generated image can also be obtained. Based on the generated image With model The generated image can be calculated from the real images used for reference. Loss value relative to the real image .
[0034] Furthermore, based on all the quality scores obtained during the complete iterative denoising process and the final generated image described above... Loss value relative to the real image This means that the overall value network loss can be calculated, thereby quantifying the deviation between the value network quality score and the true quality, as shown in Formula 4.
[0035] ... Formula 4 in To account for network loss, The generated image for the image completion model. To account for the loss of the generated image compared to the real image, For the first i The quality score of the noise is optimized at each iteration step. t This represents the total number of iterations in the complete iterative noise reduction process.
[0036] Furthermore, by performing gradient optimization based on the aforementioned value network loss, the optimization of the value network parameters can be achieved, as shown in Equation 5.
[0037] ... Formula 5 in The values of the network parameters before optimization. These are the optimized network parameters. During the gradient optimization process shown in Equation 5... Since it is an irrelevant constant, it is used in the gradient descent optimization process. It does not participate in gradient calculation.
[0038] After the above steps, the parameters of the value network can be optimized using the complete iterative denoising process for random Gaussian noise, thereby realizing the training process for the value network.
[0039] For the specific training process of the policy network, please refer to [link / reference] in some implementations. Figure 3 Step 02 includes: Step 021: Based on the current policy network in the image completion model, determine the initial optimization noise according to random Gaussian noise; Step 022: Based on the initial optimization noise, perform partial iterative noise reduction to obtain the optimization noise at each iteration step and the corresponding iteration step information. In some iterative noise reduction processes, the number of iterations is less than that in complete iterative noise reduction. Step 023: Based on the current value network in the image completion model, determine the quality score of the optimization noise according to the optimization noise and the corresponding iteration step information; Step 024: Determine the policy network loss based on the quality score of the optimized noise at each iteration step, so as to optimize the value network parameters of the image completion model.
[0040] Specifically, similar to training the value network, the training steps for the policy network will be described below as an example.
[0041] In some examples, a random Gaussian noise is first generated using an image completion model. , It follows a standard Gaussian distribution, and is then based on the current policy network included in the image completion model. For random Gaussian noise Perform preliminary optimization to obtain initial optimized noise. Specifically, the initial optimized noise is obtained according to Formula 1 in the example above. .
[0042] Next, using the image completion model to fill in the main body of the model, and following the inverse stepwise noise reduction direction, from the iteration step... Start to iteration step For initial optimization noise Perform partial iterative noise reduction, where t This represents the total number of iterations in the complete iterative noise reduction process, while... This represents the upper limit of the number of iterations in the partial iterative noise reduction process. The iterative process is shown in Equation 2 in the example above.
[0043] In this way, at any iteration step m After performing noise reduction, the optimized noise at that iteration step can be obtained. and the corresponding iteration step information m Therefore, in each iteration step, the current value network included in the image completion model can be called. Reduce the optimization noise at the current iteration step and the corresponding iteration step information m Substitute into the current value network The quality scoring is performed in the process, as shown in Formula 3 in the example above.
[0044] Therefore, after the above quality scoring process, we can obtain... N Group optimization noise, N Information on each iteration step and N Each quality score is used. Furthermore, based on all the quality scores obtained during the partial iterative denoising process, the overall policy network loss can be calculated, thereby quantifying the deviation between the policy network quality score and the true quality, as shown in Formula 6.
[0045] ... Formula 6 in For policy network loss, This represents the upper limit of the number of iterations in a partial iterative noise reduction process. To optimize the noise quality score, this score is based on the policy network parameters. It is a function with the independent variable.
[0046] Furthermore, by performing gradient optimization based on the aforementioned value network loss, the optimization of the value network parameters can be achieved, as shown in Equation 7.
[0047] ... Formula 7 in The network parameters are the policy parameters before optimization. These are the optimized policy network parameters.
[0048] After the above steps, the parameters of the policy network can be optimized by using the partial iterative denoising process for random Gaussian noise, thereby realizing the training process for the policy network.
[0049] Therefore, the above optimization training steps for value network parameters and policy network parameters are executed iteratively and collaboratively. The value network parameters are optimized using the new policy network parameters, and then the policy network parameters are optimized using the new value network parameters. This iterative training is continuously performed to achieve collaborative updates of the value network parameters and policy network parameters. When the rate of change of the loss of both the value network parameters and the policy network parameters is less than or equal to the corresponding rate of change threshold, it can be considered that the rate of change of the loss of the value network parameters and the policy network parameters meets the preset conditions. At this point, the training process of the image completion model is completed, and the resulting image completion model is the image completion model that meets the usage requirements.
[0050] Please see Figure 4 The image completion method in this application is implemented based on the image completion model trained by the image completion model training method in the above embodiments. The method specifically includes: Step 001: Determine the initial optimization noise based on the pre-generated random Gaussian noise; Step 002: Based on the image completion model, perform complete iterative denoising according to the initial optimized noise, the image to be completed, and the mask information to determine the image to be denoised. The image to be completed and the mask information are obtained from user input. Step 003: Based on the noise-removed image, perform image processing to determine the missing area in the image to be completed, so as to determine the target image after completion.
[0051] Specifically, based on the above implementation method, once the image completion model has been trained, an image completion model that can be directly applied to image completion is obtained. Specifically, this includes the image information to be completed and the mask information pre-input by the user, and the value network parameters and policy network parameters have been optimized. Therefore, in the application model... The specific process of performing image completion can be implemented as shown in the following example.
[0052] First, the image completion model is invoked. The pre-trained policy network For randomly generated random Gaussian noise Perform optimization to output initial optimized noise. Then, the initial optimized noise described above... Together with the image to be completed and the mask information, it forms an image completion model. Medium model main body The input information is executed by the main body of the model. t The complete iterative noise reduction process, step by step, is performed after the iterative noise reduction is completed, based on the initial optimized noise. The denoised image corresponding to the missing region in the image to be completed is obtained, and then the denoised image is finally used to complete the image to be completed, thus forming the completed target image and completing the image completion process.
[0053] The image completion model training method in this application avoids the destruction of prior information generated during the retraining process of traditional diffusion models by constructing a dual-module collaborative mechanism of "effect judgment-noise optimization". At the same time, by using the complete loop iteration and partial loop iteration between the two modules to perform training, the number of execution steps included in the training process is greatly reduced, thereby reducing the training time cost and hardware resource cost, and maintaining gradient stability during the training process.
[0054] The electronic device in this application includes a memory and a processor. The memory stores a computer program, and when the computer program is executed by the processor, it implements the image completion model training method or the image completion method in the above embodiments.
[0055] The computer-readable storage medium in this application embodiment stores a computer program that, when executed by one or more processors, implements the image completion model training method or the image completion method described in the above embodiments.
[0056] The above description is merely a preferred embodiment of this application and is not intended to limit this application in any way. Although this application has disclosed the preferred embodiment as above, it is not intended to limit this application. Any person skilled in the art can make some modifications or alterations to the above-disclosed technical content to create equivalent embodiments without departing from the scope of the technical solution of this application. Any simple modifications, equivalent substitutions, and improvements made to the above embodiments without departing from the technical solution of this application, based on the technical essence of this application and within the spirit and principles of this application, shall still fall within the protection scope of the technical solution of this application.
Claims
1. A method for training an image completion model, characterized in that, The image completion model is built based on a diffusion model, and the method includes: Based on the initial policy network parameters or the policy network parameters obtained through iteration, a complete iterative denoising process is performed according to pre-generated random Gaussian noise. The value network loss of the image completion model during the complete iterative denoising process is determined in order to optimize the value network parameters of the image completion model. Based on the initial network parameters or the network parameters obtained through iteration, partial iterative denoising is performed according to the random Gaussian noise. The policy network loss of the image completion model during the partial iterative denoising process is determined in order to optimize the policy network parameters of the image completion model. When the loss change rate of the value network parameters and the policy network parameters meets the preset conditions, the trained image completion model is obtained.
2. The method according to claim 1, characterized in that, The process involves performing full iterative denoising based on the initial policy network parameters or the policy network parameters obtained through iteration, using pre-generated random Gaussian noise. The value network loss of the image completion model during the full iterative denoising process is determined, including: Based on the current policy network in the image completion model, an initial optimized noise is determined according to the random Gaussian noise, wherein the current policy network has the initial policy network parameters or the policy network parameters obtained through iteration; Based on the initial optimized noise, perform the complete iterative noise reduction to obtain the optimized noise at each iteration step and the corresponding iteration step information; Based on the current value network in the image completion model, the quality score of the optimized noise is determined according to the optimized noise and the corresponding iteration step information; The value network loss is determined based on the quality score of the optimized noise at each iteration step to optimize the value network parameters of the image completion model.
3. The method according to claim 2, characterized in that, The formula for the value network loss is: in The value is the network loss. The generated image for the image completion model, The loss of the generated image compared to the real image, For the first i The quality score of the optimized noise at each iteration step t This indicates the total number of iteration steps.
4. The method according to claim 2, characterized in that, The formula for optimizing the value network parameters of the image completion model is: in The values of the network parameters before optimization, The optimized values are the network parameters.
5. The method according to claim 2, characterized in that, The method involves performing partial iterative denoising based on the initial network parameters or the network parameters obtained through iteration, according to the random Gaussian noise, and determining the policy network loss of the image completion model during the partial iterative denoising process, including: Based on the current policy network in the image completion model, the initial optimized noise is determined according to the random Gaussian noise; Based on the initial optimized noise, the partial iterative noise reduction is performed to obtain the optimized noise at each iteration step and the corresponding iteration step information, wherein the number of iteration steps in the partial iterative noise reduction process is less than that in the complete iterative noise reduction. Based on the current value network in the image completion model, the quality score of the optimized noise is determined according to the optimized noise and the corresponding iteration step information; Based on the quality score of the optimized noise at each iteration step, the policy network loss is determined to optimize the value network parameters of the image completion model.
6. The method according to claim 5, characterized in that, The formula for the network loss of the policy is: in The network loss is the strategy described above. This is the upper limit of the number of iteration steps in the aforementioned iterative noise reduction process. The quality score for the optimized noise is given.
7. The method according to claim 5, characterized in that, The strategy network parameters for optimizing the image completion model are given by the following formula: in The network parameters of the strategy before optimization, The optimized network parameters are those for the proposed strategy.
8. An image completion method, characterized in that, The image completion method is implemented based on an image completion model trained according to the image completion model training method according to any one of claims 1-7, and the image completion method includes: The initial optimization noise is determined based on the pre-generated random Gaussian noise; Based on the image completion model, a complete iterative denoising process is performed according to the initial optimized noise, the image to be completed, and the mask information to determine the noise-removed image, wherein the image to be completed and the mask information are obtained from user input information; Based on the noise-removed image, image processing is performed to determine the missing region in the image to be completed, so as to determine the target image after completion.
9. An electronic device, characterized in that, The electronic device includes a memory and a processor. The memory stores a computer program that, when executed by the processor, implements the image completion model training method as described in any one of claims 1-7 or the image completion method as described in claim 8.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by one or more processors, implements the image completion model training method as described in any one of claims 1-7 or the image completion method as described in claim 8.