A method for constructing a training data set for spatial noise reduction of video frames

By collecting and simulating the noise characteristics of video frames, a noise estimation model is established, which solves the problem of insufficient simulation of non-uniform noise in video frames in existing technologies. It realizes the construction of an efficient training dataset for spatial denoising of video frames, and improves the denoising effect and training efficiency of the model.

CN122243782APending Publication Date: 2026-06-19HEFEI JUNZHENG TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HEFEI JUNZHENG TECH CO LTD
Filing Date
2024-12-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing methods for constructing image denoising datasets are only applicable to denoising single images and are difficult to accurately simulate the non-uniform noise in video frames after temporal denoising. In particular, the noise intensity is different in static areas, motion areas and moving areas, making it impossible to effectively train a spatial denoising model to eliminate non-uniform noise.

Method used

By acquiring flat-field frames, black frames, and noise-free RAW images, a noise estimation model is established, which is divided into signal-related noise and signal-independent noise. Poisson distribution, Tukey-lambda distribution, and uniform distribution are used to simulate noise. Combined with average frame number and brightness adjustment, non-uniform noise image pairing data is generated.

Benefits of technology

It improves the simulation accuracy of non-uniform noise in video frames after temporal denoising, shortens the data acquisition cycle, and is suitable for training efficient spatial denoising models.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243782A_ABST
    Figure CN122243782A_ABST
Patent Text Reader

Abstract

This invention provides a method for constructing a training dataset for spatial domain noise reduction of video frames, comprising: S1. acquiring flat-field frames, black frames, and noise-free RAW images using a target camera; S2. establishing a noise estimation model for the target camera; S3. calibrating the unknown parameters of the noise estimation model using flat-field frames and black frames; S4. performing intensity matching on the signal-independent noise calculated by the noise estimation model based on the acquired black frames containing real noise, estimating the average number of simulated noise frames *m* in the corresponding regions based on the real noise intensity in the stationary, motion, and shadowed regions, thereby more accurately simulating the noise intensity in different regions; S5. adjusting the brightness of the noise-free RAW image to simulate different illumination environments; generating a simulated noise non-uniformity image based on the noise-free RAW image and the noise estimation model, and outputting paired data; S6. generating a simulated noise non-uniformity image based on the noise-free RAW image and the noise estimation model, and outputting the final paired data. This solves the problem of not being able to simulate non-uniform noise in video frames after temporal domain noise reduction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of video image denoising technology, and specifically relates to a method for constructing a training dataset for video frame spatial domain denoising. Background Technology

[0002] With the development of deep learning technology, in the fields of video and image denoising, neural network-based temporal and spatial denoising methods generally outperform traditional methods in denoising performance and have been successfully applied to scenarios with high real-time requirements, such as security monitoring, remote video conferencing, and real-time video calls. Deep learning-based denoising methods are data-driven; the denoising effect mainly depends on the size of the denoising model and the quality of the training dataset. Efficiently and accurately constructing the training dataset is crucial for improving the model's denoising performance and shortening the training cycle.

[0003] Existing methods for constructing image denoising datasets mainly include collecting real-world paired data, building a noise estimation model, and generating noise via a network. Collecting real-world paired data provides high data fidelity, directly reflecting real-world noise conditions and aiding in training high-performance denoising models; however, it is costly, requiring significant time and resources for data collection and labeling. Building a noise estimation model can simulate and generate various types of noise, quickly generating large amounts of data, suitable for large-scale training. However, insufficient accuracy of the noise estimation model can lead to deviations between the generated noise data and the actual situation. Generating noise via a network is similar to building a noise estimation model; the fidelity of the generated noise depends on the accuracy of the model.

[0004] However, the shortcomings of existing technology are:

[0005] Existing methods for constructing image denoising datasets are only applicable to single-image denoising. For video frames containing moving targets after temporal denoising, static areas, motion areas, and motion areas are generated, and the noise intensity of the three areas is different. Existing dataset construction methods have difficulty obtaining noisy images with uneven noise and corresponding noise-free images.

[0006] Furthermore, the terminology used in this art includes:

[0007] RAW image: refers to the raw image data output by the image sensor without any processing or decoding, existing in the form of uncompressed, raw pixel information.

[0008] Black frame: RAW image captured by the camera in a dark environment.

[0009] Flat frame: A RAW image obtained by shooting a uniform object such as a white paper or white wall under uniform lighting. ISO: ISO (International Organization for Standardization) is a standard for measuring the sensitivity of a camera in the field of photography, often referred to as sensitivity or ISO speed. In digital cameras, ISO controls the sensitivity of the camera sensor to light, thus affecting the brightness and noise level of the image.

[0010] The Tukey-lambda distribution: a continuous probability distribution proposed by John Tukey in 1960, defined by its quantile function. This distribution is a family of distributions that can approximate many common distributions. Summary of the Invention

[0011] To address the aforementioned issues, the purpose of this application is to provide a method for constructing a training dataset for spatial domain denoising of video frames. This method aims to solve the problem that existing dataset construction methods cannot accurately simulate the non-uniform noise in video frames after temporal domain denoising and are not suitable for training spatial domain denoising models that eliminate non-uniform noise.

[0012] Specifically, the present invention provides a method for constructing a training dataset for spatial domain denoising of video frames, the method comprising the following steps:

[0013] S1. Use the target camera to acquire flat-field frames, black frames, and noise-free RAW images;

[0014] S2. Establish a noise estimation model for the target camera; based on the physical imaging process of the image sensor, establish a noise estimation model, dividing the total noise into two categories: signal-dependent noise and signal-independent noise; the signal-dependent noise is photon scattering noise, simulated using a Poisson distribution; the signal-independent noise includes strip noise, readout noise, and quantization noise, simulated using Gaussian, Tukey-lambda, and uniform distributions, respectively; to simulate the noise level after averaging multiple frames of noisy images in temporal denoising, an average frame number m is introduced into the noise estimation model;

[0015] S3. The unknown parameters of the noise estimation model are calibrated using flat-field frames and black frames. The unknown parameters include: the system gain of photon scattering noise, the variance of strip noise, and the shape parameter and variance of readout noise.

[0016] S4. Based on the collected black frames containing real noise, the signal-independent noise calculated by the noise estimation model is intensity matched. The noise intensity matching is to make the noise level of the signal-independent noise generated by the noise estimation model consistent with the noise level of the real noise. Based on the real noise intensity of the stationary area, the trailing area and the moving area, the average number of frames m of the simulated noise in the corresponding area is estimated to more accurately simulate the noise intensity of different areas.

[0017] S5. Adjust the brightness of the noise-free RAW image to simulate different illumination environments; generate a simulated noise non-uniformity image based on the noise-free RAW image and the noise estimation model, and output paired data;

[0018] S6. Generate a simulated noise non-uniformity image based on the noise-free RAW image and the noise estimation model, and output the final paired data.

[0019] Step S1 further includes:

[0020] S1.1 Select multiple ISOs within the target camera's ISO range, and capture flat-field frames and black frames at each ISO. For flat-field frames, the camera needs to be facing a uniformly lit white paper, and the exposure parameters are adjusted at each ISO, capturing two RAW images at each exposure parameter. For black frames, the camera needs to be placed in a dark environment, and multiple RAW images are captured at each ISO.

[0021] S1.2, Use the target camera to acquire noise-free RAW images under normal daylight conditions. To ensure the diversity of the dataset, it is necessary to cover different scenes such as indoor and outdoor, distant and close-up, and overexposed areas. Dark areas in RAW images acquired under normal lighting conditions may contain a small amount of noise, which can be removed using a low-pass filter.

[0022] The multiple ISO values ​​mentioned in step S1.1 vary for each camera. Assuming the target camera's maximum ISO is ISO_MAX = 32000, then the multiple ISO values ​​are 1000, 2000, 4000, 8000, 16000, and 32000, which is 1000 * 2. n n = 0, 1, ... The number of RAW images is preferably 30 to 60.

[0023] In step S1.2, the type of low-pass filter is not limited, including mean filtering, Gaussian filtering, bilateral filtering and guided filtering.

[0024] Step S2 further includes:

[0025] The signal-related noise is photon scattering noise N. shot Simulation using Poisson distribution;

[0026] The signal-independent noise includes stripe noise N. band Readout noise N read Quantization noise N quant The simulations were performed using Gaussian, Tukey-lambda, and uniform distributions, respectively.

[0027] The final noise estimation model is shown in equations (1-5):

[0028] N = N p +N band +N read +N quant (1)

[0029]

[0030] In formula (2), This indicates that a Poisson distribution is generated based on given parameters, where I represents the noise-free image and K is the system gain of the target camera; Equation (3) represents the strip noise N. band It follows a pattern with a mean of 0 and a variance of σ. b The Gaussian distribution; Equation (4) represents the readout noise N. read It follows a shape parameter λ, a mean of 0, and a variance of . The Tukey-lambda distribution; Equation (5) represents the quantization noise N. quant obey Uniform distribution within the range.

[0031] Step S3 further includes:

[0032] S3.1, calibrate the system gain corresponding to each ISO based on the flat field frame; for the exposure parameter j at the i-th ISO, respectively, for the two acquired RAW images X ij1 and X ij2 Perform a summation of averages and differences, and calculate the median X of the summation result. ij The variance Y of the sum and difference results ij ;

[0033]

[0034] In equations (6) and (7), median and var represent the operations of finding the median and the variance, respectively;

[0035] After calculating all exposure parameters for the i-th ISO, the corresponding sequence X is obtained. i and Y i , for X i and Y i The slope K is obtained by performing least squares fitting. i The system gain corresponding to the i-th ISO is

[0036] For X i and Y i The steps for performing least squares fitting are as follows:

[0037] 3) Define the linear model Y i =K i Xi +b i ;

[0038] 4) Calculate the slope K according to formulas (8) and (9). i and intercept b i ;

[0039]

[0040] Perform the above calculation process for each pre-selected ISO to obtain the system gain corresponding to all ISOs. Assuming there are n ISOs in total, the final system gain sequence of length n is obtained as K(ISO) = [K(1), K(2), ... K(n)];

[0041] S3.2, calibrate the relevant parameters of readout noise and stripe noise based on the black frame, namely the shape parameter λ and variance σ of the readout noise. TL and the variance σ of strip noise b Subtract the mean of each channel from the Bayer format black frame acquired at the i-th ISO, and estimate the strip noise N based on the mean of all rows of the image. band variance σ b (i); Then subtract the mean of each row from the image to eliminate the influence of strip noise, and estimate the readout noise N. read Shape parameter λ(i) and variance σ TL (i);

[0042] S3.3, In order to transform the discrete ISO sampling space into a continuous space, K(ISO) and σ are estimated based on S3.1 and S3.2. b (ISO) and σ TL (ISO), the system gain K and strip noise N are established by least squares fitting. band variance σ b Readout noise N read variance σ TL Linear relationship:

[0043]

[0044] When sampling noise to estimate the parameters of the model, first randomly sample a system gain from the uniform distribution U(K(1),K(n)). Then calculate according to equations (10) and (13). corresponding and accomplish and Sampling of three noise parameters in a continuous space.

[0045] Step S4 further includes:

[0046] S4.1, Calculate the variance of the noise independent of the real signal and the variance of the noise independent of the simulated signal;

[0047] Based on the actual average frame counts N0, N1, and N2 in the static area, the trailing area, and the moving area, the black frames are superimposed and averaged to calculate the variances V0, V1, and V2 of the average results.

[0048] Based on the initial values ​​M0, M1, and M2 of the estimated average frame number m in the static area, the trailing area, and the moving area, the average frame number M0, M1, and M2 and the noise parameters corresponding to the black frame ISO are input into the noise model to generate signal-independent noise, and the variances V0', V1', and V2' of the signal-independent noise are calculated.

[0049] S4.2 Calculate the absolute differences |V0'-V0|, |V0'-V0|, and |V0'-V0| between the variance of the real signal-independent noise and the variance of the simulated signal-independent noise. Determine whether the absolute difference results are less than a preset threshold ε. The preset threshold ε has a value range of 1 to 5, with a preferred value of 2. If the condition of being less than the preset threshold ε is not met, first adjust the current M0, M1, and M2, and then recalculate the absolute differences |V0'-V0|, |V0'-V0|, and |V0'-V0| until the absolute difference results are less than the preset threshold ε.

[0050] In step S5, the noise-free RAW image is acquired under normal daylight conditions, which has the advantage of reducing or eliminating noise, but it is not suitable for low-light noise reduction. To simulate image brightness in low-light environments, the following processing is required:

[0051] S5.1, Perform automatic white balance on the noise-free image, treating pixels in the image with a brightness greater than a preset brightness threshold L0 as overexposed pixels. After processing, the noise-free image can be divided into overexposed and non-overexposed areas; S5.2, Based on the average brightness L of the target low-light image... target The average brightness L of the noise-free image ori Calculate the brightness adjustment factor For the non-overexposed region I0 in a noise-free image, by By reducing the brightness and leaving the overexposed areas unadjusted, a noise-free low-light image with reduced brightness is obtained.

[0052] Step S6 further includes:

[0053] S6.1, Noise-free RAW image preprocessing; Determine the approximate image brightness range based on the application environment of the target camera, and adjust the brightness of the noise-free RAW image according to this range; Randomly select two adjacent non-overlapping rectangular areas in the noise-free RAW image as the motion area and the trailing shadow area, and the other areas as the still area;

[0054] S6.2, Noise estimation model parameter sampling; randomly sample a system gain from a uniform distribution U(K(1),K(n)). Then calculate according to equations (10) and (13). corresponding and

[0055] S6.3, numerical random perturbations are applied to the estimated average frame numbers M0, M1 and M2 to expand the combination space of non-uniform noise;

[0056] S6.3, sample noise parameters and The noise estimation model is used to input the average frame number M0 / M1 / M2, and simulated noise is generated for the static area, the trailing area and the moving area, respectively.

[0057] S6.4 combines simulated noise from different regions and superimposes it onto the processed noise-free RAW image to obtain a complete image containing non-uniform noise, outputting paired data of "non-uniform noise image - noise-free image".

[0058] Therefore, this application has the following advantages:

[0059] 1. The proposed noise estimation model incorporates an average frame count, which can simulate the noise level after averaging any number of video frame images. Furthermore, it performs intensity matching based on real noise when simulating signal-independent noise, thereby improving the simulation accuracy of the non-uniform noise remaining after time-domain denoising of video frames.

[0060] 2. By adjusting the brightness of noise-free RAW images instead of acquiring RAW images under different illumination levels, the data acquisition cycle can be shortened. Attached Figure Description

[0061] The accompanying drawings, which are provided to further illustrate the invention and form part of this application, are not intended to limit the scope of the invention.

[0062] Figure 1 This is a flowchart illustrating the overall process of this method.

[0063] Figure 2 This is a schematic diagram illustrating the average number of frames for estimating the simulation noise in the static area, the trailing area, and the moving area, as provided in the embodiments of this application.

[0064] Figure 3 This is a schematic diagram of the simulation process for the non-uniform noise image provided in the embodiments of this application. Detailed Implementation

[0065] To better understand the technical content and advantages of the present invention, the present invention will now be described in further detail with reference to the accompanying drawings.

[0066] This invention provides a method for constructing a training dataset for spatial domain denoising of video frames. The overall process of the method is as follows: Figure 1 As shown, it includes the following steps:

[0067] Step S1. Acquire flat-field frames, black frames, and noise-free RAW images using the target camera. Different image sensors have different hardware characteristics and circuit structures, resulting in differences in noise distribution. Therefore, it is necessary to acquire the necessary image data using the target camera when training the denoising model.

[0068] Step S1.1: Select multiple ISOs within the target camera's ISO range. The range will vary for each camera. For example, if the target camera's maximum ISO is ISO_MAX = 32000, then the multiple ISOs would be 1000, 2000, 4000, 8000, 16000, and 32000, which is 1000 * 2. n n = 0, 1, ... Capture flat-field frames and black frames at each ISO setting. For flat-field frames, point the camera directly at a uniformly lit white sheet of paper, adjust the exposure parameters at each ISO setting, and capture two RAW images at each exposure setting. For black frames, place the camera in a dark environment and capture multiple RAW images at each ISO setting, for example, 30 to 60 images.

[0069] Step S1.2: Acquire noise-free RAW images using the target camera under normal daylight conditions. To ensure dataset diversity, it needs to cover different scenes such as indoor and outdoor, distant and close-up views, and overexposed areas. Dark areas in RAW images acquired under normal lighting conditions may contain a small amount of noise, which can be removed using a low-pass filter. In this embodiment, the type of low-pass filter is not limited, including but not limited to mean filtering, Gaussian filtering, bilateral filtering, and guided filtering.

[0070] Step S2. Establish a noise estimation model for the target camera. Based on the physical imaging process of the image sensor, establish a noise estimation model, classifying the total noise into two categories: signal-dependent noise and signal-independent noise. The signal-dependent noise is photon scattering noise N. shot The simulation uses a Poisson distribution; the signal-independent noise includes strip noise N. band Readout noise N read Quantization noise N quant The noise levels were simulated using Gaussian, Tukey-lambda, and uniform distributions, respectively. To simulate the noise level after averaging multiple frames of noisy images in temporal denoising, the average frame number m was introduced into the noise estimation model. The final noise estimation model is shown in equations (1-5).

[0071] N = N p +Nband +N read +N quant (1)

[0072]

[0073] In formula (2), This indicates that a Poisson distribution is generated based on given parameters, where I represents the noise-free image and K is the system gain of the target camera; Equation (3) represents the strip noise N. band It follows a pattern with a mean of 0 and a variance of σ. b The Gaussian distribution of the readout noise N is represented by formula (4). read It follows a shape parameter λ, a mean of 0, and a variance of . The Tukey-lambda distribution, Equation (5) represents the quantization noise N. quant obey Uniform distribution within the range.

[0074] Step S3. In the noise estimation model, all parameters of the noise distribution are unknown. Before generating simulated noise, the parameters of the noise model need to be calibrated using flat frames and black frames.

[0075] Step S3.1: Calibrate the system gain corresponding to each ISO based on the flat field frame. For the exposure parameter j at the i-th ISO, calibrate the system gain of the two acquired RAW images X respectively. ij1 and X ij2 Perform a summation of averages and differences, and calculate the median X of the summation result. ij The variance Y of the sum and difference results ij .

[0076]

[0077] In equations (6) and (7), median and var represent the operations of finding the median and the variance, respectively;

[0078] After calculating all exposure parameters for the i-th ISO, the corresponding sequence X is obtained. i and Y i , for X i and Y i The slope K is obtained by performing least squares fitting. i The system gain corresponding to the i-th ISO is For X i and Y i The steps for performing least squares fitting are as follows:

[0079] 1) Define a linear model Y i =K i X i +b i ;

[0080] 2) Calculate the slope K according to formulas (8) and (9). i and intercept b i ,

[0081]

[0082] Perform the above calculation process for each pre-selected ISO to obtain the system gain corresponding to all ISOs. Assuming there are n ISOs in total, the final system gain sequence of length n is obtained as K(ISO) = [K(1), K(2), ... K(n)];

[0083] Step S3.2: Based on the black frame calibration, determine the relevant parameters of readout noise and stripe noise, namely the shape parameter λ and variance σ of the readout noise. TL and the variance σ of strip noise b Subtract the mean of each channel from the Bayer format black frame acquired at the i-th ISO, and estimate the strip noise N based on the mean of all rows of the image. band variance σ b (i); Then subtract the mean of each row from the image to eliminate the influence of strip noise, and estimate the readout noise N. read Shape parameter λ(i) and variance σ TL (i) In step S2, the given parameters of all distributions are unknown. In step S3, the unknown parameters of the distributions are estimated:

[0084] Perform the above estimation process on n ISOs to obtain the strip noise N corresponding to all ISOs. band variance parameter σ b (ISO)=[σ b (1),σ b (2),...,σ b [n] and readout noise N read variance parameter σ TL (ISO)=[σ TL (1),σ TL (2),...,σ TL (n)].

[0085] Step S3.3, in order to transform the discrete ISO sampling space into a continuous space, based on the K(ISO) and σ estimated in S3.1 and S3.2 b (ISO) and σ TL (ISO), the system gain K and strip noise N are established by least squares fitting. band variance σ b Readout noise N read variance σ TL Linear relationship:

[0086]

[0087] When sampling noise to estimate the parameters of the model, first randomly sample a system gain from the uniform distribution U(K(1),K(n)). Then calculate according to equations (10) and (13). corresponding and accomplish and Sampling of three noise parameters in a continuous space.

[0088] Step S4. For a frame in the video after temporal denoising, the noise intensity gradually increases in the static area, the trailing shadow area, and the moving area. To more accurately simulate the noise intensity in different areas, the signal-independent noise calculated by the noise estimation model is intensity-matched based on the acquired black frames containing real noise. According to the real noise intensity in the static area, the trailing shadow area, and the moving area, the average number of frames m of the simulated noise in the corresponding area is estimated. The estimation process for the average number of frames m is as follows: Figure 2 As shown; the estimation process for the average frame count m is as follows Figure 2 As shown, step S2 only provides the mathematical prototype of the noise estimation model; the specific parameter solutions are handled in steps S3 and S4. Step S4.1: Calculate the variance of the real signal-independent noise and the variance of the simulated signal-independent noise; based on the real average frame numbers N0, N1, and N2 in the still area, trailing area, and moving area, superimpose the black frames and calculate the average, then calculate the variances V0, V1, and V2 of the average results; based on the initial values ​​M0, M1, and M2 of the estimated average frame numbers m in the still area, trailing area, and moving area, input the average frame numbers M0, M1, and M2 and the noise parameters corresponding to the black frame ISO into the noise model to generate signal-independent noise, and calculate the variances V0', V1', and V2' of the signal-independent noise.

[0089] Step S4.2: Calculate the absolute differences |V0'-V0|, |V0'-V0|, and |V0'-V0| between the variance of the real signal-independent noise and the variance of the simulated signal-independent noise. Determine whether the absolute difference results are less than a preset threshold ε. The preset threshold ε has a value range of 1 to 5, with a preferred value of 2. If the condition of being less than the preset threshold ε is not met, first adjust the current M0, M1, and M2, and then recalculate the absolute differences |V0'-V0|, |V0'-V0|, and |V0'-V0| until the absolute difference results are less than the preset threshold ε.

[0090] Step S5. Adjust the brightness of the noise-free RAW image to simulate different illumination environments. The noise-free RAW image was acquired under normal daylight conditions, which has the advantage of noise reduction or low noise, but it is not suitable for low-light noise reduction. To simulate image brightness under low-light conditions, the following processing is required:

[0091] Step S5.1: Perform automatic white balance on the noise-free image. Pixels in the image that are greater than the preset brightness threshold L0 are considered overexposed pixels. After processing, the noise-free image can be divided into overexposed areas and non-overexposed areas.

[0092] Step S5.2, based on the average brightness L of the target dark image target The average brightness L of the noise-free image ori Calculate the brightness adjustment factor For the non-overexposed region I0 in a noise-free image, by Reduce brightness, leaving overexposed areas unadjusted, to obtain a noise-free low-light image after brightness reduction. Step S6. Generate a simulated noise non-uniformity image based on the noise-free RAW image and the noise estimation model, and output paired data. The process is as follows. Figure 3 As shown:

[0093] Step S6.1, noise-free RAW image preprocessing. Determine the approximate image brightness range based on the target camera's application environment, and adjust the brightness of the noise-free RAW image according to this range. Randomly select two adjacent, non-overlapping rectangular regions in the noise-free RAW image as the motion region and the motion blur region, and the other regions as the still regions.

[0094] Step S6.2, sampling of noise estimation model parameters. A system gain is randomly sampled from a uniform distribution U(K(1),K(n)). Then calculate according to equations (10) and (13). corresponding and

[0095] Step S6.3: The estimated average frame numbers M0, M1 and M2 are numerically randomly perturbed to expand the combination space of non-uniform noise.

[0096] Step S6.3, sample the noise parameters and The noise estimation model is used to input the average frame number M0 / M1 / M2, and simulated noise is generated for the static area, the trailing area and the moving area, respectively.

[0097] Step S6.4: Combine the simulated noise from different regions and overlay it onto the processed noise-free RAW image to obtain a complete image containing non-uniform noise, and output the "non-uniform noise image - noise-free image" paired data.

[0098] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, various modifications and variations can be made to the embodiments of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for constructing a training dataset for video frame spatial domain denoising, characterized in that, The method includes the following steps: S1. Use the target camera to acquire flat-field frames, black frames, and noise-free RAW images; S2. Establish a noise estimation model for the target camera; based on the physical imaging process of the image sensor, establish a noise estimation model, dividing the total noise into two categories: signal-dependent noise and signal-independent noise; the signal-dependent noise is photon scattering noise, simulated using a Poisson distribution; the signal-independent noise includes strip noise, readout noise, and quantization noise, simulated using Gaussian, Tukey-lambda, and uniform distributions, respectively; to simulate the noise level after averaging multiple frames of noisy images in temporal denoising, an average frame number m is introduced into the noise estimation model; S3. The unknown parameters of the noise estimation model are calibrated using flat-field frames and black frames. The unknown parameters include: the system gain of photon scattering noise, the variance of strip noise, and the shape parameter and variance of readout noise. S4. Based on the collected black frames containing real noise, the signal-independent noise calculated by the noise estimation model is intensity matched. The noise intensity matching is to make the noise level of the signal-independent noise generated by the noise estimation model consistent with the noise level of the real noise. Based on the real noise intensity of the stationary area, the trailing area and the moving area, the average number of frames m of the simulated noise in the corresponding area is estimated to more accurately simulate the noise intensity of different areas. S5. Adjust the brightness of the noise-free RAW image to simulate different illumination environments; generate a simulated noise non-uniformity image based on the noise-free RAW image and the noise estimation model, and output paired data; S6. Generate a simulated noise non-uniformity image based on the noise-free RAW image and the noise estimation model, and output the final paired data.

2. The method for constructing a training dataset for video frame spatial domain denoising according to claim 1, characterized in that, Step S1 further includes: S1.1 Select multiple ISOs within the target camera's ISO range, and capture flat-field frames and black frames at each ISO. For flat-field frames, the camera needs to be facing a uniformly lit white paper, and the exposure parameters are adjusted at each ISO, capturing two RAW images at each exposure parameter. For black frames, the camera needs to be placed in a dark environment, and multiple RAW images are captured at each ISO. S1.2, Use the target camera to acquire noise-free RAW images under normal daylight conditions. To ensure the diversity of the dataset, it is necessary to cover different scenes such as indoor and outdoor, distant and close-up, and overexposed areas. Dark areas in RAW images acquired under normal lighting conditions may contain a small amount of noise, which can be removed using a low-pass filter.

3. The method for constructing a training dataset for video frame spatial domain denoising according to claim 2, characterized in that, The multiple ISO values ​​mentioned in step S1.1 vary for each camera. Assuming the target camera's maximum ISO is ISO_MAX = 32000, then the multiple ISO values ​​are 1000, 2000, 4000, 8000, 16000, and 32000, which is 1000 * 2. n n = 0, 1, ... The number of RAW images is preferably 30 to 60.

4. The method for constructing a training dataset for video frame spatial domain denoising according to claim 1, characterized in that, In step S1.2, the type of low-pass filter is not limited, including mean filtering, Gaussian filtering, bilateral filtering and guided filtering.

5. The method for constructing a training dataset for video frame spatial domain denoising according to claim 1, characterized in that, Step S2 further includes: The signal-related noise is photon scattering noise N. shot Simulation using Poisson distribution; The signal-independent noise includes stripe noise N. band Readout noise N read Quantization noise N quant The noise estimation models were simulated using Gaussian, Tukey-lambda, and uniform distributions, respectively; the final noise estimation models are shown in equations (1-5). N=N p +N band +N read +N quant (1) In formula (2), This indicates that a Poisson distribution is generated based on given parameters, where I represents the noise-free image and K is the system gain of the target camera; Equation (3) represents the strip noise N. band It follows a pattern with a mean of 0 and a variance of σ. b The Gaussian distribution; Equation (4) represents the readout noise N. read It follows a shape parameter λ, a mean of 0, and a variance of . The Tukey-lambda distribution; Equation (5) represents the quantization noise N. quant obey Uniform distribution within the range.

6. The method for constructing a training dataset for video frame spatial domain denoising according to claim 1, characterized in that, Step S3 further includes: S3.1, calibrate the system gain corresponding to each ISO based on the flat field frame; for the exposure parameter j at the i-th ISO, respectively, for the two acquired RAW images X ij1 and X ij2 Perform a summation of averages and differences, and calculate the median X of the summation result. ij The variance Y of the sum and difference results ij ; In equations (6) and (7), median and var represent the operations of finding the median and the variance, respectively; After calculating all exposure parameters for the i-th ISO, the corresponding sequence X is obtained. i and Y i , for X i and Y i The slope K is obtained by performing least squares fitting. i The system gain corresponding to the i-th ISO is For X i and Y i The steps for performing least squares fitting are as follows: 1) Define a linear model Y i =K i X i +b i ; 2) Calculate the slope K according to formulas (8) and (9). i and intercept b i ; Perform the above calculation process for each pre-selected ISO to obtain the system gain corresponding to all ISOs. Assuming there are n ISOs in total, the final system gain sequence of length n is obtained as K(ISO) = [K(1), K(2), ... K(n)]; S3.2, calibrate the relevant parameters of readout noise and stripe noise based on the black frame, namely the shape parameter λ and variance σ of the readout noise. TL and the variance σ of strip noise b Subtract the mean of each channel from the Bayer format black frame acquired at the i-th ISO, and estimate the strip noise N based on the mean of all rows of the image. band variance σ b (i); Then subtract the mean of each row from the image to eliminate the influence of strip noise, and estimate the readout noise N. read Shape parameter λ(i) and variance σ TL (i); S3.3, In order to transform the discrete ISO sampling space into a continuous space, K(ISO) and σ are estimated based on S3.1 and S3.

2. b (ISO) and σ TL (ISO), the system gain K and strip noise N are established by least squares fitting. band variance σ b Readout noise N read variance σ TL Linear relationship: When sampling noise to estimate the parameters of the model, first randomly sample a system gain from the uniform distribution U(K(1),K(n)). Then calculate according to equations (10) and (13). corresponding and accomplish and Sampling of three noise parameters in a continuous space.

7. The method for constructing a training dataset for video frame spatial domain denoising according to claim 1, characterized in that, Step S4 further includes: S4.1, Calculate the variance of the noise independent of the real signal and the variance of the noise independent of the simulated signal; Based on the actual average frame counts N0, N1, and N2 in the static area, the trailing area, and the moving area, the black frames are superimposed and averaged to calculate the variances V0, V1, and V2 of the average results. Based on the initial values ​​M0, M1, and M2 of the estimated average frame number m in the static area, the trailing area, and the moving area, the average frame number M0, M1, and M2 and the noise parameters corresponding to the black frame ISO are input into the noise model to generate signal-independent noise, and the variances V0', V1', and V2' of the signal-independent noise are calculated. S4.2 Calculate the absolute differences |V0'-V0|, |V0'-V0|, and |V0'-V0| between the variance of the real signal-independent noise and the variance of the simulated signal-independent noise. Determine whether the absolute difference results are less than a preset threshold ε. The preset threshold ε has a value range of 1 to 5, with a preferred value of 2. If the condition of being less than the preset threshold ε is not met, first adjust the current M0, M1, and M2, and then recalculate the absolute differences |V0'-V0|, |V0'-V0|, and |V0'-V0| until the absolute difference results are less than the preset threshold ε.

8. The method for constructing a training dataset for video frame spatial domain denoising according to claim 1, characterized in that, In step S5, the noise-free RAW image is acquired under normal daylight conditions, which has the advantage of noise reduction or low noise, but it is not suitable for low-light noise reduction. In order to simulate the image brightness under low-light conditions, the following processing is required: S5.1, perform automatic white balance on the noise-free image, and regard pixels in the image that are greater than the preset brightness threshold L0 as overexposed pixels. After processing, the noise-free image can be divided into overexposed areas and non-overexposed areas; S5.2, based on the average brightness L of the target low-light image... target The average brightness L of the noise-free image ori Calculate the brightness adjustment factor For the non-overexposed region I0 in a noise-free image, by By reducing the brightness and leaving the overexposed areas unadjusted, a noise-free low-light image with reduced brightness is obtained.

9. The method for constructing a training dataset for video frame spatial domain denoising according to claim 1, characterized in that, Step S6 further includes: S6.1, Noise-free RAW image preprocessing; Determine the approximate image brightness range based on the application environment of the target camera, and adjust the brightness of the noise-free RAW image according to this range; Randomly select two adjacent non-overlapping rectangular areas in the noise-free RAW image as the motion area and the trailing shadow area, and the other areas as the still area; S6.2, Noise estimation model parameter sampling; randomly sample a system gain from a uniform distribution U(K(1),K(n)). Then calculate according to equations (10) and (13). corresponding and S6.3, numerical random perturbations are applied to the estimated average frame numbers M0, M1 and M2 to expand the combination space of non-uniform noise; S6.3, sample noise parameters and The noise estimation model is used to input the average frame number M0 / M1 / M2, and simulated noise is generated for the static area, the trailing area and the moving area, respectively. S6.4 combines simulated noise from different regions and superimposes it onto the processed noise-free RAW image to obtain a complete image containing non-uniform noise, outputting paired data of "non-uniform noise image - noise-free image".