A gamma noise-based diffusion model image generation method and medium

By using a diffusion model based on gamma noise and leveraging the asymmetry and skewed jump properties of the gamma distribution, combined with a pre-trained scoring network, the problem of adaptive offset and asymmetric skewness of image data in existing technologies is solved, achieving high-precision image generation and improved training efficiency.

CN122265077APending Publication Date: 2026-06-23SUZHOU UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SUZHOU UNIV
Filing Date
2026-03-12
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing technologies struggle to accurately reconstruct high-frequency details in natural image data with complex statistical characteristics, especially during the backsampling stage where numerical biases are prone to occur. Furthermore, traditional models cannot adapt to overall data positional shifts and asymmetric skewness characteristics.

Method used

A diffusion model based on gamma noise is adopted. By obtaining the gamma noise tensor as the initial state, and taking advantage of the asymmetry and skewed jump properties of the gamma distribution, combined with a pre-trained scoring network, a deterministic probabilistic flow velocity field state update is performed to generate high-quality images.

Benefits of technology

It achieves highly accurate fitting of complex image data, expands the application boundaries of generative artificial intelligence, improves the data throughput efficiency and robustness of model training, and ensures the structural integrity and color fidelity of generated images.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122265077A_ABST
    Figure CN122265077A_ABST
Patent Text Reader

Abstract

The application discloses a kind of diffusion model image generation methods and medium based on gamma noise.The method obtains pure gamma noise tensor and generates initial state tensor as physical base by centralization processing;For each current time step of the preset continuous discrete time grid, generate effective diffusion coefficient, and utilize the score network trained in advance to execute forward action to current state tensor, generate edge score approximation;Based on probability flow ordinary differential equation, numerical integration state update calculation is carried out using the above parameters, to generate previous time step state tensor;In turn cycle until time is zero, execute anti-normalization and numerical truncation operation to output target generated image to final state tensor.The application breaks through the limitation of traditional Gaussian hypothesis, and improves the generation quality and efficiency of complex skewness and sparse structure image.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of image generation, specifically relating to an image generation method and medium based on a diffusion model of gamma noise. Background Technology

[0002] With the rapid evolution of generative artificial intelligence technology, image generation models based on stochastic differential equations have become a core technical path in the field of computer vision. Traditional denoising diffusion probability models are usually built on Wiener processes, whose theoretical basis assumes that noise injection in the forward diffusion process and state transitions in the reverse denoising process strictly follow a standard normal distribution. Although this Gaussian distribution-based assumption has the convenience of obtaining closed-form solutions using Itō's lemma in mathematical derivation, it exhibits fundamental limitations when processing natural image data with complex statistical properties. Image textures, edge gradients, and residual information in real-world scenes often exhibit significant non-Gaussian properties, such as a long-tailed distribution that is heavier than the normal distribution and asymmetric skewness characteristics. The inherent light-tailed property and symmetry of the Gaussian distribution make it difficult for the model to accurately reconstruct high-frequency details of the image when fitting data with complex textures or extreme pixel value distributions.

[0003] To overcome the limitations of the Gaussian assumption, existing techniques have employed gamma distributions as alternatives to Gaussian distributions. For example, the existing paper "Denoising Diffusion Gamma Models" (arXiv:2110.05948) proposes a gamma distribution-based generation method that utilizes the non-negativity and asymmetry of the gamma distribution to simulate noise. However, this existing technique is strictly limited to a two-parameter gamma distribution defined by shape and scale parameters. While the support set of the two-parameter gamma random variable is a non-negative half-axis, this paper typically centers the gamma random quantity to create a zero-mean perturbation when constructing diffuse noise; therefore, its noise term does not simply require the probability density function to start from a fixed origin. Nevertheless, this existing technique does not introduce explicit positional parameters that can independently adjust the overall distribution position, thus lacking direct adaptability to overall data positional shifts. In practical applications, the latent feature distribution of image data is often accompanied by unknown baseline shifts. Existing two-parameter models cannot adapt to such data translations by adjusting positional parameters, leading to numerical deviations in the denoising trajectory at the end of the backsampling process.

[0004] In addition, another existing technique, such as the paper "Heavy-Tailed Diffusion Models" (OpenReview:tozlOEN4qp, arXiv:2410.14171), introduces the Student's t-distribution to address the long-tail problem, enhancing the model's ability to characterize extreme samples and outliers and improving its robustness. However, the Student's t-distribution is a strictly symmetric distribution about its position parameters. This mathematical characteristic means that without introducing skew extension, it is difficult to directly describe the skewness features commonly found in natural image data. When the data distribution exhibits a significant left or right skewness, forcing the use of a symmetric distribution for fitting can easily introduce systematic bias, potentially weakening the expressive power of directional lighting variations or texture statistical features in the generated results. Summary of the Invention

[0005] Firstly, in view of the shortcomings of the prior art, the purpose of this application is to provide an image generation method based on a diffusion model of gamma noise, which solves one or more problems in the prior art in terms of addressing the adaptive offset of noise distribution location, asymmetric skewness, and heavy-tailed characteristics.

[0006] The objective of this application can be achieved through the following technical solutions: A method for generating images based on a diffusion model of gamma noise, comprising: Obtain the preset continuous discrete time grid, the step size of adjacent time steps, the preset noise scheduling parameters corresponding to each time step in the continuous discrete time grid, the preset noise intensity hyperparameter, the preset scheduling extreme value parameters, and the pre-trained scoring network parameters. An initial gamma noise tensor is obtained from a preset gamma distribution using the preset noise scheduling parameter corresponding to the maximum time step in a continuous discrete time grid and the noise intensity hyperparameter. An initial state tensor is then generated from the initial gamma noise tensor. For each current time step of a preset continuous discrete time grid, the corresponding current state tensor is obtained, and the effective diffusion coefficient of the current time step is generated by the preset noise scheduling parameters and the preset scheduling extreme value parameters corresponding to the current time step. Using the current state tensor and the current time step, perform a forward action based on the pre-trained scoring network parameters to generate an approximate edge score for the current state; Based on the initial state tensor, the deterministic probabilistic flow velocity field state update action is performed using the edge score approximation, the effective diffusion coefficient, and the step size of the adjacent time steps to generate the state tensor of the previous time step in the initial state tensor, and so on until the state tensor of each step of the continuous discrete time grid is generated. The state tensor output at each step of a continuous discrete-time grid forms the target generated image.

[0007] Secondly, in view of the shortcomings of the prior art, the purpose of this application is to provide a computer-readable storage medium that solves one or more problems in the prior art in terms of addressing the adaptive offset of noise distribution, asymmetric skewness, and heavy-tailed characteristics.

[0008] The objective of this application can be achieved through the following technical solutions: A computer-readable storage medium having computer instructions stored thereon, which, when executed by a processor, implement the steps of the gamma noise-based diffusion model image generation method as described in the first aspect.

[0009] The beneficial effects of this application are: This invention obtains a pure gamma noise tensor at the bottom layer as the initial physical foundation for inverse evolution. By utilizing the probability density characteristics of the asymmetric, one-sided gamma distribution with skewed jump properties, it can not only complete the unconditional generation of conventional images with high quality, but also fit and generate high-dimensional tensor data with sparse structural features (such as industrial line drawings and circuit diagrams) or physical jump properties (such as medical speckle images and weather radar echoes) with extremely high accuracy, thus fundamentally expanding the application boundaries of generative artificial intelligence. In the data flow stage of model training, this invention does not employ the extremely time-consuming stepwise Markov chain discrete jump simulation. Instead, it utilizes the additivity and scalar multiplication of gamma increments, based on the formula... It performs calculations and realizes the direct closed-loop synthesis of initial real image data and single-sampled training gamma noise tensor at any specified discrete time step. It eliminates the sequence dependency bottleneck in the forward noise addition process of traditional non-Gaussian diffusion models, enabling training data batches to be constructed independently in a highly parallel manner in GPU / NPU memory, which greatly improves the data throughput efficiency and computing power utilization during the model training stage. This invention obtains a conditional score target value specifically for the gamma distribution through analytical derivation, and then uses the formula... The second moment weight coefficient is dynamically generated. By introducing the second moment weight coefficient into the generation calculation of the weighted error loss function, the backpropagation gradient of extreme outliers can be automatically suppressed when the machine performs gradient descent operation. This ensures the stable convergence of the scoring network in the complex asymmetric manifold space and significantly improves the success rate and robustness of large model training. This invention derives and constructs mathematically equivalent probabilistic ordinary differential equations (ODE) and stochastic differential equations (SDE) for dual inverse data flow updates. In practical engineering deployments, if the ultimate end-side mapping speed is desired, a large-step numerical integrator can be configured in response to the deterministic ODE formula to compress and generate trajectories losslessly within a very small number of iterations, meeting the requirements for second-level real-time interaction. If facing extremely complex data distributions, the SDE formula can be switched to, utilizing its built-in local gamma increment for stochastic dynamics prediction and correction, effectively avoiding the generated trajectory from getting trapped in local suboptimal solutions, and providing an extremely flexible adaptive engineering scheduling space between the upper limit of generation quality and computational efficiency. At the end of the reverse evolution calculation, this invention obtains the final state tensor and strictly performs inverse normalization calculation and numerical truncation operation for values ​​exceeding the standard pixel value range threshold. This effectively eliminates the high-frequency abnormal bright spots (white noise artifacts) that may be caused by the heavy-tailed characteristics of the gamma distribution at the end of the reverse generation, ensuring the structural integrity and color fidelity of the final output target generated image in human visual perception devices or subsequent computer vision processing pipelines. Attached Figure Description

[0010] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0011] Figure 1 This is an overall schematic diagram of an embodiment of this application; Figure 2 This is a schematic diagram of a visualized sample grid of CIFAR-10 according to an embodiment of this application; Figure 3 This is a schematic diagram of the generated sample grid of MNIST according to an embodiment of this application; Figure 4 This is a schematic diagram of the phased generation of sample grids for CelebA in an embodiment of this application. Detailed Implementation

[0012] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0013] This application provides a method for generating images based on a diffusion model of gamma noise, including: S100. Obtain a preset continuous discrete time grid, the step size of adjacent time steps, the preset noise scheduling parameters corresponding to each time step in the continuous discrete time grid, the preset noise intensity hyperparameter, the preset scheduling extreme value parameters, and the pre-trained scoring network parameters. S200. Obtain an initial gamma noise tensor from a preset gamma distribution using the preset noise scheduling parameter corresponding to the maximum time step in the continuous discrete time grid and the noise intensity hyperparameter, and generate an initial state tensor using the initial gamma noise tensor. S300. For each current time step of the preset continuous discrete time grid, obtain the corresponding current state tensor, and generate the effective diffusion coefficient of the current time step through the preset noise scheduling parameters and the preset scheduling extreme value parameters corresponding to the current time step. S400. Using the current state tensor and the current time step, perform a forward action based on the pre-trained scoring network parameters to generate an approximate edge score for the current state. S500. Based on the initial state tensor, the deterministic probabilistic flow velocity field state update action is performed using the edge score approximation, the effective diffusion coefficient, and the step size of the adjacent time steps to generate the state tensor of the previous time step in the initial state tensor, and so on until the state tensor of each step of the continuous discrete time grid is generated. The S600, with its state tensor output at each step of a continuous discrete-time grid, forms the target generated image.

[0014] In this application, for S100, the gamma noise-based image generation inference system initially acquires the computational environment data and model weight data required for the inference scene from local memory or video memory. The computational environment data includes a preset continuous discrete time grid, the step size of adjacent time steps, preset noise scheduling parameters for the corresponding time steps, preset noise intensity hyperparameters, and preset scheduling extremum parameters; the model weight data is the pre-trained scoring network parameters.

[0015] In this context, the continuous discrete-time grid can be understood as a global timeline scale that controls the evolution of an image from a purely random state to a state with clear physical semantics. It contains at least a series of time nodes decreasing from the starting point to the ending point, such as... The preset noise intensity hyperparameter can be understood as a constant scaling benchmark that controls the intensity of gamma perturbation in the global system; the preset scheduling extremum parameter can be understood as the theoretical signal-to-noise ratio boundary limit of the diffusion evolution process in the highest and lowest noise states; the pre-trained scoring network parameters can be understood as a deep neural network weight dictionary that can characterize the vector field distribution of the target data manifold space, which includes at least the weight matrix and bias vector corresponding to each network layer in the multilayer perceptron or U-Net architecture.

[0016] The system acquires computational environment data and model weight data for the inference scenario in real time through a configuration read interface and non-volatile memory. This is done for preset noise intensity hyperparameters. If this static constant is read through the system configuration file, its value can be selected according to the pixel value range of the image data being processed. For example, to ensure that the scale of gamma noise variance is sufficient to cover the true variance of complex high-dimensional data, it can be statically set to... ; For the preset scheduling extremum parameters, it includes the maximum scheduling extremum. Minimum scheduling extreme values The extreme value range can be selected according to the fidelity requirements of the actual generation task. For example, considering the need to provide sufficient prior noise at the starting point of reverse generation to completely erase structural information and to retain as much clear image detail as possible at the end point, the extreme value parameters can be set to... as well as ; Given a preset continuous discrete time grid and the step size of adjacent time steps, the grid partitioning module can be used to... Discretization sampling is performed within the continuous time domain to generate the data. Its granularity (i.e., the total number of time steps) is determined by this process. The number of steps can be dynamically adjusted based on edge computing power limitations and map output quality requirements. For example, considering the computing power required to achieve second-level map output on edge computing devices such as mobile terminals, a uniform partitioning strategy can be adopted to set the total number of steps to [value missing]. At this point, the step size of adjacent time steps is constant. For quality considerations in generating medical-grade high-precision speckle images, the total number of steps can be set to [value missing]. Even higher.

[0017] For the preset noise scheduling parameters corresponding to each time step in the continuous discrete-time grid, a parameter calculation unit can be used to perform real-time dynamic calculation based on a preset geometric scheduling function. For example, in this application, obtaining the preset noise scheduling parameters corresponding to each time step in the continuous discrete-time grid includes: obtaining the preset continuous-time variables of the continuous discrete-time grid. By obtaining the continuous time variable and the preset scheduling extreme value parameters and According to the formula Perform calculations to generate the preset noise scheduling parameters for the corresponding time nodes. .

[0018] For the acquired pre-trained scoring network parameters By using the tensor preprocessing unit embedded in the loading module, floating-point precision alignment and tensor memory allocation operations are employed to parse and deserialize serialized model files in non-volatile memory (such as pre-trained weight files on hard disk), and transfer them to the high-speed video memory of the graphics processing unit (GPU) or a dedicated artificial intelligence accelerator card. This forms a basic computing environment and network weight tensor with unified dimensions and conforming to the calling standards of the numerical solution operator of the underlying differential equation.

[0019] In this application, for S200, the initialization based on gamma noise includes: a priori state initialization unit, used to dynamically instantiate a purely random tensor matrix without any prior structural information of the target image in the video memory or main memory of the computing device at the absolute starting point of the reverse evolution inference trajectory. The priori state initialization unit obtains an initial gamma noise tensor by performing machine sampling operations from a preset gamma distribution using preset noise scheduling parameters corresponding to the maximum time step in a continuous discrete-time grid and preset noise intensity hyperparameters, and generates an initial state tensor by performing algebraic operations on the initial gamma noise tensor.

[0020] The maximum time step can be understood as the endpoint of the diffusion evolution process, that is, the absolute physical starting point of the reverse generation process (such as a time node). The preset noise scheduling parameter corresponding to the maximum time step can be understood as the signal attenuation limit value of the system when it is in the state of highest information entropy (i.e., The preset gamma distribution can be understood as a one-sided asymmetric probability density function with specific shape and scale parameters constructed by calling a pseudo-random number generator (PRNG) at the underlying level; the initial gamma noise tensor can be understood as a multi-dimensional floating-point array in video memory, where each element is independently and identically distributed and filled with pure gamma random distribution values; the initial state tensor can be understood as the data structure (i.e., variables) that, after mean offset correction, truly serves as the physical foundation for the first round of forward inference calculations in the scoring network. ).

[0021] The initial gamma noise tensor is obtained from a preset gamma distribution by calling the random number sampling operator of the underlying deep learning framework through the prior state initialization unit. The configuration of the preset gamma distribution is such that the noise intensity hyperparameter is read from the system memory. Preset noise scheduling parameters corresponding to the maximum time step The shape parameters of the gamma distribution are dynamically configured as follows: And configure the scale parameter as For the generated initial gamma noise tensor Its tensor shape can be dynamically allocated according to the pixel resolution and channel number requirements of the actual target to be generated. For example, for the purpose of generating standardized high-definition color target images, the spatial dimension of the tensor can be set to 1. (corresponding to RGB three channels and) (resolution), thereby instantiating and obtaining the corresponding initial gamma noise tensor in the hardware. .

[0022] For the acquired initial gamma noise tensor, the initial state tensor is generated using tensor scalar subtraction operations through the tensor arithmetic logic unit (ALU) embedded in the data preprocessing module. Since the domain of the original gamma distribution is only along the positive half-axis, the initial gamma noise tensor sampled and output... It is a constant matrix of all positive numbers. However, to prevent the activation functions of subsequent deep neural networks from causing "vanishing gradients" and "neuron death" when processing large-scale positively biased data, and to maintain the stability of global floating-point numerical calculations, it is necessary to perform a left-shifting and centering operation on this matrix.

[0023] The specific machine operation process is as follows: by reading the acquired noise intensity hyperparameters With preset noise scheduling parameters Perform multiplication arithmetic to determine a fixed offset scalar (i.e. ), and for the initial gamma noise tensor in video memory. Perform element-wise subtraction (i.e., based on the formula) (Perform computation) to align the mean of the one-sided heavy-tailed data, thereby forming an initial state tensor that is dimensionally normalized, numerically centered, and meets the input requirements of the underlying network. .

[0024] In this application, for S300, the inverse inference system based on probabilistic flow ordinary differential equations (ODE) or stochastic differential equations (SDE) includes: a dynamic velocity field calculation module, used to calculate in real time the velocity parameters driving the tensor state evolution within each cycle of the inverse generation iteration. For each current time step of a preset continuous discrete time grid, the dynamic velocity field calculation module obtains the corresponding current state tensor, and performs algebraic calculations using preset noise scheduling parameters and preset scheduling extremum parameters corresponding to the current time step to generate the effective diffusion coefficient for the current time step.

[0025] The current time step can be understood as the current discrete time node in the reverse denoising loop process (e.g., indexed by time step). from Time nodes in the process of decreasing to 1 The current state tensor can be understood as an intermediate-state multidimensional floating-point matrix (i.e., variable) in memory at the current time step, which is undergoing denoising evolution and contains a mixture of latent features of the target image and local gamma noise. The effective diffusion coefficient can be understood as the instantaneous evolution velocity scalar of the probability flow ordinary differential equation at the current time point, which determines the theoretical step size (i.e., variable) of the system state transitioning to clear physical semantics at the current time step. ).

[0026] To obtain the current state tensor, the system uses its embedded memory read / write control interface to read tensor data in real-time from the working memory buffer of the graphics processing unit (GPU) or neural network processor (NPU) at the beginning of each discrete time step cycle. The specific machine workflow is as follows: if the current time step is the absolute starting point of the reverse evolution, then the initial state tensor is read directly. The current state tensor is used as the current state tensor; if it is a subsequent iteration step, the state tensor of the previous time step calculated and output in the previous cycle is read. This serves as the current state tensor. For engineering considerations such as reducing inference latency of deep learning models and avoiding frequent cross-bus data copying between motherboard memory and graphics card memory, the state tensor can be resided in the device's high-bandwidth memory (HBM) for in-place reading and overwriting updates.

[0027] To generate the effective diffusion coefficient for the current time step, the floating-point arithmetic logic unit (ALU) or tensor core embedded in the dynamic velocity field calculation module calls the underlying mathematical operator library to perform division, natural logarithm, and multiplication operations. The specific machine workflow is as follows: First, the system reads the preset scheduling extremum parameter, i.e., the maximum scheduling extremum. Minimum scheduling extreme values The ALU performs division and natural logarithm operations to generate an extreme logarithmic scalar that remains constant during global inverse evolution (i.e., Subsequently, within each time step loop, the system obtains the preset noise scheduling parameters corresponding to the current time step. Perform a scalar multiplication operation between it and the extreme value logarithm scalar, that is, strictly according to the formula. Execute machine instructions to perform calculations.

[0028] To ensure computational stability and prevent floating-point underflow or zero-division error at extreme noise boundaries, the aforementioned logarithmic and multiplication operations are all enforced to be performed in the physical chip using single-precision floating-point (FP32) or double-precision floating-point (FP64) data formats. This operation generates an effective diffusion coefficient with accurate numerical values ​​and uniform dimensions. This provides accurate velocity field scaling weights for the edge score approximations output by the scoring network in subsequent steps.

[0029] In this application, for the S400, the deep feature extraction and score prediction module based on pre-trained weights includes: a forward inference computation unit, used to call a deep neural network model pre-residing in the video memory of the computing device in each cycle of the reverse generation iteration to predict the denoised direction field of the current system state in real time. The forward inference computation unit performs forward actions based on the pre-trained scoring network parameters (i.e., the forward propagation mechanism under the deep learning framework) using the current state tensor and the current time step to generate an approximate edge score value for the current state.

[0030] The current state tensor can be understood as the multidimensional image matrix features (i.e., the input tensor) containing a specific order of magnitude of gamma noise in the current iteration period. The current time step can be understood as a time scalar (i.e., an input variable) indicating the specific position of the current system in the entire diffusion evolution process. The pre-trained scoring network parameters can be understood as a dictionary of neural network layer weights (i.e., fixed parameters) that is solidified and stored after convergence based on the weighted error loss function during the model training phase, and is used to characterize the mapping relationship between the data probability density gradient field. Forward actions can be understood as large-scale parallel matrix multiplication and addition operations (MACs) that occur as the data stream sequentially passes through each convolutional layer, attention layer, and activation function layer of the neural network; the edge score approximation can be understood as the gradient feature tensor (i.e., the output tensor) that is completely aligned with the input state tensor in spatial dimension and indicates the direction of ideal noise-free image recovery. ).

[0031] The system utilizes an embedded forward inference computation unit (such as a TensorCore utilizing a GPU) to schedule and execute forward actions based on a neural network model in real time. This is relevant to the current time step. Since neural networks struggle to directly process single-dimensional scalar values, a data preprocessing module is needed to perform high-dimensional feature mapping on the input. For example, to enable the scoring network to fully perceive different diffusion stages and adaptively adjust the denoising feature extraction mode, sinusoidal positional encoding or Fourier feature embedding algorithms can be used to convert the current time step into a multi-dimensional time embedding vector and inject it into each residual block of the scoring network.

[0032] For performing the scoring network based on the pre-trained parameters The forward action, i.e., the control input of the current state tensor. The time embedding vector flows through the multi-layer neural network structure. For engineering considerations such as improving the inference throughput (FPS) of the edge device, reducing peak memory usage, and accelerating matrix multiplication operations, the pre-trained scoring network parameters can be quantized and compressed from standard single-precision floating-point (FP32) to half-precision floating-point (FP16) or BFloat16 format and loaded into the memory; at the same time, during the entire lifecycle of executing the forward action, the memory space used to store the backward gradient is forcibly released through the memory control instructions at the system framework level (such as executing the graph blocking operation of the backpropagation engine, i.e., freezing the gradient record of the computation graph).

[0033] For generating marginal score approximations of the current state This is the final output layer tensor of the scoring network. To ensure that subsequent ordinary differential equations or stochastic differential equations can achieve pixel-level accurate numerical integration, the output layer of the scoring network does not have any nonlinear activation function (i.e., uses linear output) to ensure the generated edge score approximation. It can output unbounded floating-point values ​​containing both negative and positive fields, and its tensor shape (e.g., ) and the current state tensor Maintaining absolute consistency provides a dimensionally aligned mathematical foundation for constructing subsequent explicit integral update formulas.

[0034] In this application, for S500, the inverse evolution integral solving module based on probabilistic flow ordinary differential equations (ODE) includes a state update and loop control unit, which performs numerical integration operations along a continuous time trajectory under the premise of a given evolution velocity field and denoising gradient, to complete the stepwise denoising and structural reconstruction of image features. The state update and loop control unit, based on the initial state tensor, obtains the approximate edge score, the effective diffusion coefficient, and the step size of the adjacent time steps at the current time step, performs a state update action on the deterministic probabilistic flow velocity field, generates the state tensor of the previous time step corresponding to the current state tensor, and iterates cyclically according to the time step index decreasing rule until the state tensor of each step of the continuous discrete time grid is generated.

[0035] In this context, the deterministic probability flow velocity field can be understood as equivalently mapping the complex evolutionary process, which originally contained discrete gamma random jumps, to a smooth, perturbation-free ordinary differential equation vector field; the state update action can be understood as calling a numerical integrator (such as the explicit Euler method or the higher-order Runge-Kutta method) to calculate the increment of tensor state change within an extremely short time slice; the state tensor of the previous time step can be understood as a multidimensional intermediate state matrix (i.e., variable) that has moved one step closer to the physical real image domain on the time axis, with a slight decrease in noise ratio and a slight enhancement in semantic features. ); and so on until the generation... The state tensor of each step can be understood as a decreasing loop control logic at the system's underlying level (i.e., Loop: from Decrease to ), until the system time reaches zero ( ).

[0036] For the state update action of the deterministic probabilistic flow velocity field, the tensor floating-point unit (FPU) embedded in the state update and loop control unit calls the underlying Basic Linear Algebra Subroutine Library (BLAS) to perform multi-dimensional tensor scale multiply-accumulate operations (MACs). The specific machine workflow is as follows: the system reads the effective diffusion coefficient already acquired from the working video memory. (Scalar) and the approximate value of the edge score (Matrix), performs tensor-scalar multiplication and multiplies by a preset constant. Generate a discrete velocity field tensor that represents the direction and magnitude of the current instantaneous velocity (i.e., Subsequently, the step size of the adjacent time step is obtained from the current system clock or configuration table. Multiplying this by the discrete velocity field tensor yields the state change increment within the current time slice; finally, the current state tensor is read. Then, a tensor addition operation is performed with the state change increment. For engineering considerations of accelerating inference on the end-device and eliminating the lengthy random sampling of traditional Markov chains, this embodiment configures it as a first-order explicit Euler numerical integrator, that is, strictly following the formula... Perform deterministic computations quickly to generate the state tensor of the previous time step. .

[0037] For each step of the state tensor, sequentially generating a continuous discrete-time grid, the system's main control program uses control flow instructions (such as For loops or While loops) to maintain the alternation between forward actions and state update actions. For example, to optimize system memory peak usage, the state tensor of the previous time step is generated in each loop iteration. Subsequently, the system framework utilizes the newly generated memory pointers through memory pointer redirection or in-place overwrite operations. Matrix data physically overwrites the old memory in the video memory. Matrix data, and automatically triggers the computation graph cache cleanup mechanism. This decreasing loop logic starts from the initial state tensor (absolute time starting point). ) triggers until the time step index decrements to reach the end of the continuous discrete time grid (e.g. This allows for the deposition and generation of a final noise-free state tensor in the video memory buffer, completely stripped of gamma noise features. .

[0038] In this application, for the S600, the image post-processing and physically based rendering output module based on the explicit integral sequence includes a tensor inversion and pixel mapping unit, used to convert the unbounded floating-point matrix residing in the video memory into legal pixel data conforming to the physical display device or standard image file format after the inverse generation loop has completely terminated. The tensor inversion and pixel mapping unit obtains the final deposited state tensor after each iteration in a continuous discrete-time grid, performs output actions such as inverse normalization and numerical truncation, thereby forming the target generated image.

[0039] In this context, the state tensor of each step in the continuous discrete-time grid can be understood as a series of multidimensional state feature sequences sequentially and alternately generated by the underlying integrator from the point of pure gamma noise to the point of time zeroing, with the state tensor (i.e., variable) corresponding to the final time node being the end of the time interval. This means fitting the true distribution data after completely eliminating prior gamma noise; the output formation can be understood as performing pixel value domain conversion and physical memory dumping operations on the abstract floating-point tensor data structure in the deep learning framework, oriented towards the human visual system or the general computer vision interface (API); the target generated image can be understood as a two-dimensional or three-dimensional digital media file format that contains a standard color space (such as RGB space), a legal bit depth (such as 8-bit unsigned integer), and can be directly photoelectrically rendered on the terminal physical screen.

[0040] For acquiring the state tensor and performing output preprocessing, the tensor inversion and pixel mapping unit obtains the system main control program at the time step index decreasing to the minimum physical time node (e.g., ...). The final state tensor generated at time ) Because in the initial stages of model training and inference, such as to improve the convergence stability of the scoring network weights, the input feature data is usually pre-normalized to a specific compact floating-point range (e.g., ...). (between). Therefore, the system calls the graphics processor's floating-point arithmetic logic unit (ALU) for the read final state tensor. Perform the inverse normalization calculation. The specific machine workflow is as follows: the system calculates the formula... Perform scalar addition and division algebraic calculations to generate a pre-defined interval mapped to the standard image pixel value range (e.g., Numerical matrices between (between).

[0041] To output the numerical matrix to form the target generated image, the system uses an embedded numerical comparator and data type conversion operator to perform outlier truncation and quantization mapping on the numerical matrix. Due to the asymmetry and heavy-tail jumping characteristics of the gamma distribution, and the numerical integration truncation error accumulated by the probability flow ordinary differential equation under finite discrete step lengths, the numerical matrix often contains traces exceeding the standard pixel value range threshold (e.g., less than...). or greater than The system identifies outlier pixel locations in the numerical matrix. For engineering quality considerations, to completely eliminate potentially bright / dark artifacts in the final displayed image, the system performs a numerical truncation operation (i.e., executes the Clamp instruction: forcibly rewriting all negative pixel bits to 0) for outlier pixel locations in the numerical matrix that exceed the standard pixel value range threshold. , all greater than The pixel bits are forcibly rewritten to This generates pixel floating-point data that falls entirely within the valid pixel range. Subsequently, the system executes a data type conversion instruction, multiplying the floating-point data by a scalar. The data is quantized and rounded down to an unsigned integer type (UInt8). Finally, the unsigned integer data is encapsulated, dumped, and output as the target generated image (such as JPEG, PNG, or other standard formats) through the system's video memory output bus or image encoding API, thus completing the end-to-end unconditional image generation process based on gamma noise.

[0042] Symbol explanation: Indicates the original sample values; Indicates diffusion time The values ​​of the noisy samples, Indicates diffusion time or noise level; Indicates model parameters; The probability density function representing the distribution of the original data; This represents the conditional probability density function for forward noise addition; The edge probability density function represents the noisy sample; This represents the posterior conditional probability density function; Indicates to obey The distribution takes the expected value; Indicates that in a given Take expectations under conditions; Represents the variable Find the gradient; These are called fractions; This is called the conditional score; Indicates by parameters Parameterized scoring network output; Construction of diffusion model driven by gamma noise This application replaces traditional Gaussian noise with gamma noise to construct a diffusion process with an asymmetric perturbation mechanism, and establishes a trainable scoring network and an implementable inverse sampling algorithm based on this. Unlike the continuous trajectory corresponding to Gaussian diffusion, gamma noise naturally corresponds to the independent incremental structure of a pure jump process, and is therefore more suitable for characterizing random perturbations with sparse mutation characteristics. The organization of this application follows the main line from distribution to algorithm: first, the distribution structure of noise increment is characterized; second, a synthesizable forward noise chain is constructed; then, the analytical forms of conditional density and conditional score are derived and the training objective is given accordingly; finally, an implementable inverse generation path is established, thus forming a closed loop from modeling to sampling.

[0043] Basic properties of gamma distribution There are two common notations for gamma distribution: shape-scale form. With shape-rate form The only difference between the two is the meaning of the second parameter, which is expressed in terms of scale. with rate Characterizes the scaling of the distribution and satisfies a reciprocal relationship. The corresponding probability density functions are listed side by side as follows: Subsequent derivations will uniformly adopt shape-scale notation. and retain This serves as a hyperparameter of noise intensity during the diffusion process described later.

[0044] In characterizing the diffusion increment, in addition to random fluctuations, deterministic overall translations must also be allowed. To this end, a three-parameter gamma distribution with a position parameter is introduced: Let... ,make Then record Its probability density function is The following are two fundamental properties of the three-parameter gamma distribution.

[0045] 1. Additivity: If and If they are independent and have the same scale parameter, then 2. Scalar multiplication: If Then for any constant have The above properties indicate that, under the corresponding conditions, the three-parameter gamma distribution is closed for addition and proportional scaling operations, and the result still belongs to the gamma distribution.

[0046] Gamma noise driven diffusion model This application presents a gamma noise-driven forward diffusion process. First, the continuous-time state equation is written starting from the state-dependent gamma increment, and then discretized by Euler to obtain the one-step transition density. Subsequently, a class of synthesizable discrete forward chains is constructed, such that perturbation samples at any time can be processed by... It is generated directly from a single gamma sample, providing a closed starting point for subsequent conditional density, scoring function, and training objective.

[0047] Forward and backward modeling of state-dependent gamma increments Scoring function and weighted training objective The scoring network uses conditional scoring. As a supervisory signal, weighted denoised score matching is used for training: The weighting function is chosen to be the reciprocal of the second moment of the conditional score, which is of the same order. As can be seen from the forward closed synthesis form, for a given noise level ,definition .but It follows a three-parameter gamma distribution, and its log-conditional density is determined by neglecting the relationship between... The irrelevant constant term satisfies Regarding (3.5) Differentiate to obtain the conditional score. To calculate the second moment in (3.4), a random variable is introduced. .but It follows an inverse gamma distribution. The mean and variance of this distribution are... From (3.6), we can obtain Therefore, it is acceptable. Substituting (3.9) into (3.3), we get the equivalent result. Furthermore, the weighted conditional score can be written as As can be seen from (3.11), the weighted supervision term is essentially composed of random variables. The dominant centralization and scale normalization constitute, that is, the... The standardized form obtained. When the shape parameters As the gamma noise increases, the skewness and heavy tails of the inverse gamma distribution are significantly reduced, and the standardized variables gradually approach a symmetrical and light-tailed distribution. Therefore, although this application uses gamma noise, the resulting weighted supervision signal still maintains an interpretable connection with the standardized score learning mechanism of the traditional Gaussian noise diffusion model.

[0048] In one specific implementation, the parameter training process of the ScoreNetwork includes the following steps: Step S100: Obtain training data and time step features.

[0049] An initial batch of image data is randomly selected from a pre-defined image training set. Simultaneously, from the preset discrete-time grid The value in the middle represents the time step parameters corresponding to the random uniform sampling of this batch of data. And obtain the noise scheduling parameters corresponding to that time step. .

[0050] Step S200: Generate gamma noise and construct positive perturbation samples.

[0051] According to the noise scheduling parameters and preset noise intensity hyperparameters From the specified gamma distribution Mid-sampling generates gamma noise tensor The gamma noise tensor Inject the initial image data The sample with added noise and perturbation at that time step is obtained by centering it after subtracting the mean offset. The closed-form synthesis formula for the noisy perturbation samples is as follows: Step S300: Calculate the theoretical conditional score target and the second moment weights.

[0052] Calculate the actual injected noise increment variable Based on the aforementioned variables The noise intensity hyperparameter and the scheduling parameters The theoretical conditional score target value used as a network supervision signal is calculated analytically: Meanwhile, to balance the gradient scale at different time steps, the second-order moment weighting coefficients, which depend on the current noise level, are calculated: Step S400: Perform network forward propagation to obtain prediction scores.

[0053] The noisy perturbation sample With the corresponding time step parameters It is fed as a joint input into the scoring network to be trained. In the process, after forward propagation calculation, the prediction score for the current state is output: Step S500: Calculate the weighted error loss and update the network parameters.

[0054] For each sample within the batch, calculate the prediction score. Compared with the theoretical condition score target value The squared error, multiplied by the second-order moment weighting coefficient. Then, the mean is calculated along the batch dimension to construct a weighted denoising score matching loss function: Calculate the loss function with respect to the network parameters. The gradient, and based on the preset learning rate. The parameters of the scoring network are updated using the gradient descent algorithm: Step S600: Iterate until convergence.

[0055] The above steps S100 to S500 are continuously executed in a loop until the parameters of the scoring network are reached. Once the preset convergence conditions are met (such as the loss stabilizing or reaching the maximum number of iterations), the trained score network is output.

[0056] In one specific implementation, the process of generating an image by inverse sampling using a trained scoring network includes the following steps: Step S100: Obtain the prior distribution and initial state.

[0057] Based on the preset maximum discrete time step and its corresponding noise scheduling parameters From the specified gamma distribution Mid-sampling generates the initial gamma noise tensor ;in The preset noise intensity hyperparameter is used. The initial gamma noise tensor is... Subtract mean offset To perform centralized processing, the initial state tensor of the reverse sampling starting point is obtained. .

[0058] Step S200: Enter the reverse discrete time step iteration.

[0059] Set the current discrete time step variable as ,control From the maximum time step Decrease step by step to At each time step The loop operation of steps S300 to S600 is executed sequentially.

[0060] Step S300: Calculate the step size scheduling difference.

[0061] Get current time step Noise scheduling parameters Compared with the previous time step Noise scheduling parameters Calculate the difference between the two. , to characterize the relative noise span of the current iteration step.

[0062] Step S400: Perform forward prediction of the network to obtain edge scores.

[0063] The current iteration state tensor With the current time step As input, the already trained scoring network is invoked. The network forward computes and outputs an approximate edge score for the current state. .

[0064] Step S500: Calculate the reverse increment and state correction coefficient.

[0065] Based on the step size scheduling difference With noise intensity hyperparameter From the gamma distribution with specific parameters Independent sampling was performed to obtain the local gamma noise increment. At the same time, combined with the edge score approximation value With the noise intensity hyperparameter Calculate the correction coefficients used to control the scale of local reverse evolution. .

[0066] Step S600: Update and output the state for the next iteration.

[0067] Combined with the step size scheduling difference The local gamma noise increment and the correction coefficient For the current state tensor Perform inverse dynamics update to obtain the state tensor of the previous time step: Step S700: End the loop and output the generated image.

[0068] When the time step variable Decrease cycle until After completion, stop the iteration and output the final calculated state tensor. The generated image data will be output as the result.

[0069] ODE Evolution Building upon the gamma noise-driven forward process constructed in the previous section, this application presents a deterministic probabilistic flow ODE consistent with its distribution evolution, thereby forming a deterministic generation path applicable to numerical integral sampling. The core idea is to rewrite the effect of stochastic dynamics at the distribution level as an identity of the expectation of the test function, then obtain the density evolution equation through integration by parts, and finally rewrite it as a continuity equation to read out the probabilistic flow velocity field. The velocity field advances the sample along a deterministic characteristic line at each time step, thus maintaining consistency with a stochastic process at the density level. Unlike discrete jump increment sampling, the advancement of the probabilistic flow ODE does not require explicit generation of gamma jump increments, but rather through a scoring function. The influence of random perturbations on density is absorbed into a deterministic velocity field. Therefore, the random diffusion path from the previous section and the ODE path of this application are distributed at the edge. It is equivalent to random sampling and deterministic integration.

[0070] It should be noted that gamma noise corresponds to the independent incremental structure of a pure jump process, and its rigorous generators typically derive Kolmogorov-Feller type forward equations containing integral terms. This application... At this timescale, a second-order truncation approximation based on first- and second-order conditional moments is adopted to express the density evolution in a second-order partial differential form, thus enabling the construction of the probabilistic flow ODE to maintain the same derivation framework as the classical diffusion model. This treatment is consistent with the Kramers-Moyal second-order truncation commonly used in diffusion model literature, which characterizes the dominant influence of the increment on the expectation of the test function using the terms drift and diffusion, and ignores higher-order terms as higher-order minor quantities. Under the parameterization adopted in this application, Both first-order and second-order contributions are Therefore, the density evolution given by the second-order truncation can be used as an effective approximation of the distribution level under the small-step limit. The significance of the probabilistic flow ODE is that it reproduces the characteristic line motion of this effective density evolution in a deterministic velocity field, so that the sampling end can use ODE numerical integration to replace the stepwise simulation of random jump increments, and concentrate all randomness in the sampling of the initial distribution.

[0071] set up edge density is The test function takes If for any such have In the sense of distribution This criterion holds true almost everywhere. This criterion implies that as long as the expectation identity can be established for all tightly supported smooth test functions, the partial differential equation form that uniquely characterizes the density evolution can be obtained. Based on this, this application starts from a one-step discrete update... Starting from the influence of [the factors], the density equation is gradually derived and the probability flow velocity field is read out.

[0072] Starting from the state-dependent gamma increment in the previous section, consider its Euler discrete form. in For drift term, For noise intensity, For shape strength. Because this application uses three-parameter gamma notation. ,when ,scale and shape Sometimes, And the second moment satisfies This means that in At this scale, both the first and second-order contributions of the increment are higher-order moments The contribution order is higher. Based on this, this application adopts the second-order truncation of the Kramers-Moyal expansion, retaining the dominant terms determined by the first and second-order conditional moments, thus obtaining the density evolution equation in the form of a second-order partial differential equation, and further constructing the probabilistic flow ODE. This step corresponds in the diffusion model literature to expressing the dominant influence of random perturbations on density evolution as drift and diffusion coefficients. , Write it out in the form of , and ignore the remaining higher-order terms as higher-order minors.

[0073] right about Performing a second-order Taylor expansion, we have Take the conditional expectation of (3.14) and retain it until... get Again Taking the expected value and writing it in integral form, we have: Subtract from both sides Divide by And order ,get Equation (3.17) gives the characterization of the density evolution on the test function, where and The appearance of these values ​​corresponds to the translation effect and curvature effect of the step increment on the function value, respectively. The next step is to transfer the derivative from the test function to the density, thereby obtaining the partial differential equation.

[0074] Introduction Tight support Integral by parts as well as Substituting (3.19)-(3.20) back into (3.17) and combining with (3.12), we obtain the Fokker-Planck equation. Equation (3.21) gives the density evolution in the sense of second-order truncation, where the first term corresponds to the effective drift. The resulting mass transport, the second term corresponds to the effective diffusion coefficient. This leads to mass dispersion. This expression is identical to the density evolution form in the classical diffusion model, allowing the subsequent construction of the probabilistic flow ODE to completely follow the existing derivation structure. More specifically, if... If we consider it as a probability mass distribution that evolves over time, then Describing mass in a velocity field The horizontal transport below, and Description of local diffusion intensity The resulting diffusion and redistribution.

[0075] To obtain a deterministic ODE path, equation (3.21) is rewritten in the form of a continuity equation. .make This can be verified by directly expanding the code. Therefore (3.21) and Equivalent. The probability flow ODE is obtained by the characteristic line method of the continuity equation. Substituting (3.18) back into (3.22), we obtain the explicit form. Equations (3.23)-(3.24) show that if marginal scores can be obtained... Then, without introducing random increments, the sample trajectory can be advanced by a deterministic velocity field, and the trajectory set can be made consistent with the density evolution of (3.21) at the edge distribution level.

[0076] In the previous section, to enable perturbation samples at any noise level to be directly synthesized from a single gamma sample, this application employs a geometric schedule. And obtain the continuous time coefficients Substituting (3.25) into (3.18) yields... as well as As can be seen from (3.26)-(3.27), under the selected schedule in this application, the effective drift terms cancel each other out in this approximation, so that the probability flow velocity field is determined only by the diffusion term and the scoring term. At this time, (3.22) simplifies to when Only with When changing, Thus further obtain In actual generation, edge score Approximated by the scoring network, i.e. Then the probability flow ODE can be written in an realizable form. Equation (3.30) gives the dynamical system expression ultimately used for ODE sampling in this application, where the time dependence coefficients are... Completely based on noise schedule The decision, and the network only needs to be in any The edge score at that point can be approximated.

[0077] Take discrete time grid ,remember and order At each time point, the network outputs approximate edge scores. Substituting this into the velocity field and updating it using the explicit Euler method yields the deterministic sampling path. Specifically, let... And in the reverse time direction from Gradually advance to Because the right end of (3.30) is in Given the instantaneous velocity, the explicit Euler update is performed in one step. The velocity at a given point approximates the average velocity over the entire interval, thus yielding a simple and stable deterministic iterative scheme. This iteration does not contain explicit random terms, therefore possessing a deterministic and reproducible sampling trajectory under fixed initialization and network parameters. It also ensures that the sampling rate is primarily determined by the number of steps. The computational complexity is determined by the forward computation of the network.

[0078] In some embodiments, a method for generating gamma-noise images based on probability flow ordinary differential equations (ODEs) includes the following steps: Step S100: Construct the time grid and initialize the prior states.

[0079] Discrete-time grid over a fixed continuous time domain And calculate the step size between adjacent time nodes. Get the maximum time step. Corresponding noise scheduling parameters From the preset gamma distribution Medium sampling obtains the initial noise tensor The initial state tensor of the reverse sampling is obtained through centralized calculation. ;in This is the preset noise intensity hyperparameter.

[0080] Step S200: Enter the deterministic inverse numerical integration iteration.

[0081] Set the current time step index variable to ,control From the maximum number of steps Decrease step by step to In each index The loop operation of steps S300 to S500 is executed sequentially.

[0082] Step S300: Calculate the effective diffusion coefficient for the current time step.

[0083] Get current time step Noise scheduling parameters Combined with preset scheduling extreme parameters and Calculate the effective diffusion coefficient that characterizes the diffusion velocity field at the current moment. .

[0084] Step S400: Perform forward prediction of the network to obtain edge scores.

[0085] The current iteration state tensor With the current continuous time step As joint input, the already trained scoring network is invoked. The network forward computes and outputs an approximate edge score for the current state. The marginal score approximation is mathematically used to approximate the log probability density gradient of the real data stream. .

[0086] Step S500: Perform state update of the probabilistic flow velocity field.

[0087] Based on the explicit Euler method, combined with the effective diffusion coefficient The approximate edge score and the step size The current state tensor along the deterministic probability flow trajectory By performing dynamic propulsion, the state tensor of the previous time node is calculated: Step S600: End the loop and output the generated image.

[0088] When the time step index variable Decrease cycle until After completion, stop the iteration and output the final calculated state tensor. The generated image data will be output as the result.

[0089] Experiment and Results Analysis This application systematically verifies and analyzes the proposed gamma noise-driven diffusion model through experiments, focusing on answering three key questions: First, can the constructed gamma noise forward perturbation mechanism and analytical conditional score supervision form a stable and trainable optimization loop? Second, can the generation quality of the proposed method reach a quantization level consistent with mainstream diffusion models on a standard image generation benchmark dataset? Third, when the data distribution expands from low-dimensional grayscale numbers to more semantically and texturally complex natural images and even face data, what patterns emerge in the model's convergence behavior, sampling error morphology, and visualization quality? To ensure the reproducibility and interpretability of the experimental conclusions, this application unfolds in four parts: dataset and preprocessing, evaluation metrics and evaluation protocols, experimental configuration and model settings, and result presentation and analysis. Finally, it provides an explanation of the experimental boundaries under resource constraints.

[0090] Compared to traditional Gaussian diffusion, the core difference in this application's method lies solely in replacing the symmetric Gaussian noise increment distribution with an asymmetric gamma increment, thereby deriving the corresponding inverse update and training supervision signals. Aside from this difference, the experiments in this application maintain as much consistency as possible with mainstream diffusion models in terms of network skeleton, training paradigm, and evaluation pipeline, thus allowing the experimental comparison to focus more on the impact of noise modeling itself. In particular, the CIFAR-10 metrics report and the qualitative demonstration on MNIST follow the common reproduction process found in diffusion model literature; while the results for CelebA are presented as exploratory experiments, clarifying the reasons and boundaries for its lack of rigorous quantitative evaluation.

[0091] Datasets and Data Preprocessing MNIST dataset MNIST is a classic handwritten digit image dataset, with images in grayscale single-channel format, containing 10 categories (digits 0–9). Its typical characteristics include low resolution, strong structure, simple semantics, and intra-class variations primarily manifested in morphological differences such as stroke thickness, writing slant, and local breaks and connections. For diffusion models, MNIST's distribution is relatively concentrated, making training easier to converge, and the quality of the sampled data is more intuitively assessable. Therefore, MNIST plays two main roles in the experiments of this application: first, as a rapid verification platform for the method's trainability and stability, used to observe loss reduction, whether the sampling process diverges, and whether the generated samples can form a clearly discernible digit structure; second, as a qualitative demonstration platform to compensate for the instability of the FID metric in this data domain.

[0092] In terms of preprocessing, this application employs uniform size processing and numerical normalization for MNIST. Specifically, the original images are uniformly scaled or cropped to the resolution required for the model input, and the pixel values ​​are linearly normalized to the range specified by the training end. Normalization adopts the principle of "training and sampling consistency," meaning that the model output will strictly denormalize the generated samples and crop them back to the legal pixel range for visualization and consistent IS calculation. To avoid introducing uncontrollable effects caused by additional augmentation, this application does not use data augmentation operations such as random cropping or random rotation on MNIST, but only uses batch random sampling and random noise injection as sources of training randomness.

[0093] CIFAR-10 dataset CIFAR-10 is one of the most commonly used low-resolution benchmarks for natural image generation tasks. It includes 10 semantic categories, and the images are... The CIFAR-10 dataset contains RGB three-channel color images. Compared to MNIST, CIFAR-10 exhibits a more complex distribution, with more significant differences between categories, and intra-class variations encompassing multidimensional factors such as pose, background, illumination, and local texture. For diffusion models, CIFAR-10 not only tests the model's ability to fit multimodal distributions but also more readily exposes sampling error accumulation and local artifacts. Therefore, this application uses CIFAR-10 as the core quantitative evaluation platform and reports FID and IS as the primary metrics.

[0094] In terms of preprocessing, this application adopts a standardized process for CIFAR-10: maintaining the original resolution or unifying it to [the required resolution]. The RGB three-channel input format is preserved, and pixel values ​​are linearly normalized to the specified range for training. Similar to MNIST, the sampled output is inversely normalized and cropped to a valid pixel range, ensuring that the images input to the feature extraction network have the correct numerical distribution during evaluation. To maintain evaluation fairness, this experiment does not introduce additional enhancement strategies that would significantly alter the generation quality; the training end only uses the dataset's native random shuffling and batch sampling mechanism.

[0095] CelebA dataset CelebA is a large-scale face attribute dataset containing numerous face images with different poses, lighting, and backgrounds, and providing rich attribute labels. Of course, to comply with privacy protection requirements, animal face images or other publicly available image data that does not involve personal privacy can also be used for face attribute datasets. Compared to CIFAR-10, CelebA's generation task places greater emphasis on structural consistency: faces have obvious geometric priors, such as the relative positions of facial features, facial contour symmetry, and natural transitions in local textures. Diffusion models on CelebA can typically generate samples with strong semantic consistency, but they are also more prone to exposing structural artifacts. These phenomena place higher demands on the local accuracy of the score field and the control of sampler error.

[0096] This application conducted exploratory training and sampling on CelebA. Due to limitations in equipment resources and time costs, this application was unable to complete full training on CelebA. In the later stages of training, the model was able to generate samples with recognizable facial contours and basic facial feature layouts, and exhibited a certain degree of statistical regularity in skin color and hairstyle texture. Based on the rigor of the research boundaries, this application positions this part as a feasibility and phenomenon supplement, aiming to demonstrate the engineering feasibility of the gamma noise diffusion construction in this application for training and image generation on data distributions with stronger structural constraints, rather than providing quantitative conclusions that can be compared with the literature.

[0097] In the preprocessing of CelebA, this application employs common face centering and uniform scaling methods to align the images to the training input resolution, and uses a normalization strategy consistent with CIFAR-10. The motivation for this choice is to maximize the reuse of the same network skeleton and training script, avoid introducing additional variables due to engineering differences, and thus more purely observe the training and sampling behavior after noise modeling replacement.

[0098] Evaluation indicators and evaluation protocols This application uses InceptionScore (IS) and FréchetInceptionDistance (FID) as the main quantitative indicators, supplemented by sampling grid visualization for qualitative evaluation. To ensure consistency in indicator interpretation, this application not only provides definitions but also emphasizes the boundaries of indicator applicability on different datasets and explains the evaluation protocol and randomness control methods of this application.

[0099] InceptionScore IS is used to measure the identifiability and diversity of generated samples, and it is defined as follows: in To generate the sample distribution induced by the model, For the pre-trained classifier to test samples The predicted distribution The IS value represents the marginal class distribution. A larger IS value generally indicates clearer samples and more balanced coverage. Since IS depends on the classifier's training domain and class system, its numerical meaning is mainly comparable under the same dataset and implementation. Therefore, this application uses IS as a unified reference metric for MNIST and CIFAR-10, and supplements its interpretation with visualization results.

[0100] FréchetInceptionDistance FID is used to measure the distance between the generated distribution and the true distribution in the feature space. Let the true data and generated data be approximated as Gaussian distributions in the Inception feature space. and ,but A smaller FID indicates that the generated distribution is closer to the true distribution. Compared to IS, FID is more inclined to characterize the overall distribution alignment and is more often consistent with visual quality in the natural image domain. However, the reliability of FID depends on the fit of the Inception features and the sufficiency of the sample size.

[0101] On MNIST, due to the grayscale data and significant differences from the Inception network training domain, FID is often unstable or lacks interpretability. Therefore, this application does not report the MNIST FID, but uses a visual grid and IS as the main evaluation methods. For CelebA, FID / IS can be calculated in principle, but requires sufficient training and sample statistics under a fixed protocol. Since this application only completed exploratory experiments on CelebA and failed to meet the rigorous evaluation requirements, its FID / IS values ​​are not reported, only a description of the visual phenomenon is provided.

[0102] Evaluation Protocol and Randomness Control After model training converges, this application samples from the final model checkpoint to obtain a generated sample set, and calculates the FID and IS of CIFAR-10 under a fixed evaluation configuration. To reduce the impact of randomness, a consistent sampler configuration and random seed control strategy are used in the evaluation phase, and the metrics are calculated under the same implementation. For MNIST, this application generates several grid images under fixed sampling settings to demonstrate category coverage and intra-class style variations, using the continuity of digit strokes, structural recognizability, and diversity as the main criteria. For CelebA, this application also generates samples and displays grids under fixed sampling settings, focusing on observing whether the facial structure appears stably, whether the geometric relationships of facial features are reasonable, and whether there are obvious structural artifacts.

[0103] Experimental setup and model configuration This application provides all reproducible experimental settings and clarifies the alignment of the training and sampling ends at the implementation level. Since the method in this application maintains the main structure of the score-based framework in its theoretical derivation, only replacing the noise with gamma perturbations and rewriting the conditional scores and training weights accordingly, the experimental configuration also follows the same paradigm. That is, it uniformly provides the perturbation schedule, scoring network structure, optimization strategy, and sampler settings, thereby ensuring consistent caliber in metric calculation and visualization comparison.

[0104] Disturbance schedule and noise intensity parameterization Both training and sampling occur in continuous time intervals. On the job, among which To avoid degradation at numerical boundaries, this application uses a geometric noise calendar similar to VE-SDE to parameterize the noise intensity, making... Consistent with the aforementioned gamma perturbation structure, this application fixes the noise intensity hyperparameter as follows: And construct gamma noise parameters point-by-point along the pixel dimension. For any training sample With time ,make And sample pixel-by-pixel noise Using centralized perturbation to generate training input The above centralized operation guarantees This makes the disturbance intensity mainly determined by This control method concentrates the learning pressure at different noise scales into changes in the scale of the scoring items, making it easier to stabilize the training using time weights.

[0105] Network structure and input / output conventions The scoring network uses the NCSN++ backbone and continuous conditionalization. The network input is an image tensor. With noise scalar Time-condition embedding The input is processed by Fourier feature mapping to obtain a 256-dimensional embedding, which is then mapped to 512 dimensions through two fully connected layers and injected additively into each residual block. GroupNorm is used for network normalization, SiLU is used for non-linearity, and the residual blocks adopt a BigGAN-style structure with varying resolution. A self-attention module is added to enhance the ability to model mesoscale dependencies.

[0106] Taking CIFAR-10 as an example, the input resolution is The number of channels is 3. The basic number of network channels is... Multi-scale channel magnification Each scale contains four residual blocks. Both downsampling and upsampling use a resampling structure with convolution, and an FIR filter kernel is enabled to reduce interpolation artifacts. The output is set to scale_by_sigma, and the final network output is divided by... This makes the output value range more stable under different noise scales, which is convenient for training with a unified optimizer hyperparameter.

[0107] Regarding the data input method, this application adopts a decentralized data setting, and the pixel value remains at [a certain value]. During the training phase, no additional strong enhancements that alter the distribution are introduced. Only random horizontal flipping is enabled for CIFAR-10 to improve generalization stability under limited computing power. The preprocessing of MNIST and CelebA follows the same principle, namely, maintaining the same data scale and channel conventions as the evaluation, so that the training output can be directly denormalized to the pixel domain for visualization and metric calculation.

[0108] Training objectives, weights, and optimization strategies The training objective uses weighted, denoised score matching. Pixel-level perturbation variables are applied. and The conditional score supervision signal is written as And make the network output as To suppress training instability caused by the difference in gradient scale between high-noise and low-noise regions, this application uses a coefficient term consistent with the aforementioned derivation to scale the error. In implementation, [the following is taken]. When this term is not numerically positive, it degenerates into... This avoids numerical anomalies caused by coefficient failure. Pixel-level loss is... The overall loss is obtained by summing the pixels and then taking the average value of each batch. This definition is consistent with the weighted supervision in the theoretical part of this application, and its effect is equivalent to standardizing the supervision signal at different noise scales to obtain a smoother optimization surface.

[0109] The optimizer uses Adam, and the learning rate is set to... , , , No weight decay is used. Training employs epoch-based warmups, with two warmup rounds. To avoid gradient explosion and unstable oscillations, parameter gradients are treated with... Norm pruning with a threshold of 1.0 is applied. To improve the stability of evaluation and sampling, the exponential moving average parameter is maintained throughout training, with an EMA decay rate set to 0.999. Evaluation and sampling are performed uniformly under the EMA parameter, thereby reducing metric fluctuations caused by training noise.

[0110] Regarding the number of training epochs, this application organizes training in two phases. The first phase involves training from scratch for 2000 epochs to establish stable initial values ​​for the score field. The second phase continues training for 8000 epochs to further refine the structure and texture quality in the low-to-medium noise range. During training, the loss is evaluated using the EMA parameter on the validation set every 5 epochs, and an optimal point preservation strategy is implemented after more than 50 epochs of training, using the improved validation loss as an additional preservation condition. Simultaneously, a checkpoint is saved for each epoch to ensure recoverability after training interruptions and facilitate backtracking on changes in generation quality at different stages.

[0111] Sampling strategy, grid visualization and evaluation process The sampling phase employs a Predictor-Corrector framework to balance sample quality and numerical stability. Discrete-time grids are used from... Decrease to The distance from the walk is set to Each step begins with a LangevinCorrector update, where the Corrector's step size is determined by the signal-to-noise ratio parameter. Adaptive determination is performed, with the Corrector step count set to 1. Then, a ReverseDiffusionPredictor update is executed to complete the main propagation. This sampling process corresponds to improving accessibility to the high-dimensional space with a local Markov correction at each noise scale, and using the predictor to complete the main reverse time propagation, thereby achieving better generation quality with a finite number of steps.

[0112] The initial samples are derived from prior sampling consistent with the gamma modeling in this application. In the implementation, an initial noise tensor is obtained by sampling from the gamma distribution, and a channel mean bias is superimposed under CIFAR-10 conditions to align with the pixel-domain statistics of the training data. The final output uses the `noise_removal` setting enabled, employing the predictor's output. As a result, the high-frequency graininess caused by residual noise at the end is reduced.

[0113] For visualization, a snapshot sampling mechanism is enabled during training. After saving the checkpoint at each epoch, an additional mini-batch of samples is sampled for rapid quality inspection. The mini-batch size is set to 36 and exported as an image grid, facilitating observation of class coverage, structure formation, and texture refinement processes at different training stages. It is recommended that this grid image be presented in pairs with quantitative indicators in the experimental results and analysis of this application; that is, grid samples from the same checkpoint should be provided along with the FID or IS data, so that quantitative and qualitative conclusions support each other.

[0114] In terms of evaluation, the final results of CIFAR-10 generated 60,000 samples under fixed sampling settings, with a batch size of 1000. FID was calculated uniformly after generation to ensure sample size and statistical stability, reducing variance caused by small sample evaluation. MNIST often exhibits unstable FID performance due to the mismatch between the Inception feature space and data domain. Therefore, this application primarily uses it as a qualitative evaluation dataset, focusing on reporting the clarity, inter-class separability, and intra-class diversity of grid samples, supplemented by IS as a comparable quantitative reference. Furthermore, this application conducted supplementary attempts on the CelebA dataset. CelebA belongs to the face distribution, with stronger structural priors and richer texture details, making it more sensitive to noise modeling and sampling stability. Due to limitations in training equipment and time budget, this experiment did not fully train to convergence, nor did it perform FID statistics, but it was able to generate samples with recognizable faces at intermediate checkpoints. Based on this, this application presents the CelebA results as a separate supplementary experiment to illustrate the transferability and improvement potential of the method, and clearly outlines the subsequent plan to improve it into a complete comparative experiment in the summary and outlook sections.

[0115] Experimental Results and Analysis CIFAR-10 Quantitative Results and Phenomena Interpretation On the CIFAR-10 dataset, the model in this application achieved good quantitative results: FID of 2.58 and IS of 10.02. The lower FID indicates that the generated distribution is closer to the true distribution in the Inception feature space, while the higher IS indicates that the generated samples have good class discriminability and coverage balance. (Compared with the visualized sample grid...) Figure 2Consistent with the above criteria, the generated images exhibit naturalness in terms of subject structure, local texture, and background color statistics, and are able to cover multiple target classes while maintaining a certain degree of intra-class diversity. To standardize the evaluation criteria, Table 1 summarizes the FID and IS of several representative methods on the CIFAR-10 unconditional generation task for comparison.

[0116] Table 1. CIFAR-10 sample quality. From the perspective of error morphology, the visible defects of the proposed method on CIFAR-10 are mainly manifested in slight oversmoothing of local textures, slight jitter of edge details, or small-scale color drift. Compared with Gaussian diffusion, the asymmetric increment of gamma noise makes the inverse update more sensitive to score estimation bias: when there is a systematic bias in the score field in certain noise intervals, the error is more likely to be amplified in subsequent steps by offset accumulation, manifesting as structural artifacts. The weighted training objective of this application alleviates the problem of inconsistent supervision scales in different noise intervals to a certain extent, making the score estimation at the medium-to-high noise end more stable, thereby reducing the impact of initial sampling errors on the final sample quality. This can be reflected in the lower final FID value.

[0117] Furthermore, the CIFAR-10 results demonstrate that the key derivations and engineering implementations of this application form a stable closed loop: the forward closed synthesis mechanism ensures that perturbation samples with arbitrary noise levels can be directly constructed during training; the analytical conditional score provides a rigorous supervisory signal; the weighting strategy improves the numerical conditions for optimization; and the inverse sampler, guided by the network output, can gradually return from the prior to the data distribution. These elements collectively support the achievement of high-quality generated results on complex natural image distributions.

[0118] MNIST Qualitative Results and Indicator Boundaries On the MNIST dataset, due to its grayscale handwritten digits and relatively simple data distribution, commonly used Inception features are not suitable, resulting in a lack of stable comparability of FID results. Therefore, this application does not report the MNIST FID, but primarily uses generated sample grids for qualitative evaluation, and reports IS as an auxiliary reference. Experimental visualization results show that the generated digits have continuous strokes and clear structure, with significant differences between different categories, and reasonable variations in writing style are still preserved within the same category, indicating that the model can learn an effective generation distribution on this low-dimensional data distribution. For easier visualization, Figure 3 The generated sample grid for MNIST is given.

[0119] The effective structure of MNIST mainly consists of a small number of edges and local stroke shapes, exhibiting an overall sparse characteristic. The abrupt perturbations of gamma noise are more consistent with this sparsity, thus being more conducive to learning a stable denoised orientation field. Although this application did not conduct a larger-scale comparison on MNIST to quantify this advantage, from the perspectives of training convergence and sample clarity, gamma noise replacement did not weaken the generative ability of the diffusion model on simple data distributions; on the contrary, it maintained stable training behavior consistent with Gaussian diffusion.

[0120] CelebA exploratory results This application conducted exploratory training and sampling on the CelebA dataset. Due to limitations in equipment and time budget, training did not reach full convergence, and FID / IS was not calculated according to standard evaluation protocols. Despite early termination of training, the model was able to generate samples with clear facial contours and basic facial feature layouts at later checkpoints, demonstrating the feasibility of this method on face data distributions with stronger structural constraints. For easier visualization, Figure 4 The phased generation sample grid of CelebA is given.

[0121] To obtain quantitative results on CelebA comparable to those in the literature, more training epochs, more standardized face alignment preprocessing protocols, and more refined sampler and hyperparameter tuning are still needed. Due to the rigor of the research boundaries, this application positions the CelebA results as a supplementary demonstration of the method's feasibility, rather than drawing quantitative conclusions about its superiority or inferiority.

[0122] In the description of this specification, the references to terms such as "an embodiment," "example," "specific example," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.

[0123] The foregoing has shown and described the basic principles, main features, and advantages of this application. Those skilled in the art should understand that this application is not limited to the above embodiments. The embodiments and descriptions in the specification are merely illustrative of the principles of this application. Various changes and modifications can be made to this application without departing from the spirit and scope thereof, and all such changes and modifications fall within the scope of the claims of this application.

Claims

1. A method for generating images based on a diffusion model using gamma noise, characterized in that, include: Obtain the preset continuous discrete time grid, the step size of adjacent time steps, the preset noise scheduling parameters corresponding to each time step in the continuous discrete time grid, the preset noise intensity hyperparameter, the preset scheduling extreme value parameters, and the pre-trained scoring network parameters. An initial gamma noise tensor is obtained from a preset gamma distribution using the preset noise scheduling parameter corresponding to the maximum time step in a continuous discrete time grid and the noise intensity hyperparameter. An initial state tensor is then generated from the initial gamma noise tensor. For each current time step of a preset continuous discrete time grid, the corresponding current state tensor is obtained, and the effective diffusion coefficient of the current time step is generated by the preset noise scheduling parameters and the preset scheduling extreme value parameters corresponding to the current time step. Using the current state tensor and the current time step, perform a forward action based on the pre-trained scoring network parameters to generate an approximate edge score for the current state; Based on the initial state tensor, the deterministic probabilistic flow velocity field state update action is performed using the edge score approximation, the effective diffusion coefficient, and the step size of the adjacent time steps to generate the state tensor of the previous time step in the initial state tensor, and so on until the state tensor of each step of the continuous discrete time grid is generated. The state tensor output at each step of a continuous discrete-time grid forms the target generated image.

2. The image generation method based on a diffusion model of gamma noise as described in claim 1, characterized in that, An initial gamma noise tensor is obtained from a preset gamma distribution using the preset noise scheduling parameters corresponding to the maximum time step in a continuous discrete-time grid and the noise intensity hyperparameter. An initial state tensor is then generated from the initial gamma noise tensor, including: By obtaining the preset noise intensity hyperparameter The preset noise scheduling parameters corresponding to the maximum time step and the initial gamma noise tensor ,in accordance with Perform the calculation to generate the initial state tensor. .

3. The image generation method based on a diffusion model of gamma noise as described in claim 1, characterized in that, For each current time step of a preset continuous discrete-time grid, the corresponding current state tensor is obtained. Then, using the preset noise scheduling parameters and the preset scheduling extremum parameters corresponding to the current time step, the effective diffusion coefficient for the current time step is generated, including: By obtaining the preset noise scheduling parameters corresponding to the current time step and the preset scheduling extreme value parameters and According to the formula Perform the calculation to generate the effective diffusion coefficient. .

4. The image generation method based on a diffusion model of gamma noise as described in claim 1, characterized in that, Based on the initial state tensor, a deterministic probabilistic flow velocity field state update operation is performed using the edge score approximation, the effective diffusion coefficient, and the step size of the adjacent time steps to generate the state tensor of the previous time step in the initial state tensor, including: By obtaining the current state tensor The effective diffusion coefficient The approximate edge score and the step size of the adjacent time steps According to the formula Perform the calculation to generate the state tensor of the previous time step. .

5. The image generation method based on a diffusion model of gamma noise according to claim 1, characterized in that, Based on the initial state tensor, a deterministic probabilistic flow velocity field state update operation is performed using the edge score approximation, the effective diffusion coefficient, and the step size of the adjacent time steps to generate the state tensor of the previous time step in the initial state tensor, including: By obtaining the effective diffusion coefficient Approximate value of the edge score According to the formula Perform multiplication to generate the discrete velocity field tensor corresponding to the probability flow ordinary differential equation at the current time step. ; By obtaining the current state tensor The discrete velocity field tensor and the step size of the adjacent time steps The probability flow ordinary differential equation is configured as an explicit Euler numerical integrator, according to the formula... Perform numerical integration update calculations to generate the state tensor of the previous time step. .

6. The image generation method based on a diffusion model of gamma noise as described in claim 1, characterized in that, Obtain the parameters of the pre-trained scoring network. ,include: Get batch size Initial real image data Discrete time step parameters and the corresponding training noise scheduling parameters ; By obtaining the training noise scheduling parameters and the preset noise intensity hyperparameter Execute based on a preset training gamma distribution Random sampling operations are used to generate training gamma noise tensors. ; By acquiring the initial real image data The training gamma noise tensor The preset noise intensity hyperparameter and the training noise scheduling parameters According to the formula Perform calculations to generate centered, noisy, perturbation samples. ; Obtain the centralized, noisy perturbation sample Conditional score target value generated by backpropagation and the second moment weighting coefficients ; The centered, noisy, perturbed samples are processed by a scoring network that obtains parameters to be updated. With the discrete time step parameters Predicted scores generated by performing forward computation and the conditional score target value and the second-order moment weighting coefficients According to the formula Perform calculations to generate loss values. ; By obtaining the loss value and preset learning rate According to the formula Perform calculations to generate updated parameters for the pre-trained scoring network. .

7. The image generation method based on a diffusion model of gamma noise as described in claim 6, characterized in that, The acquisition of the centralized noise-perturbed sample Conditional score target value generated by backpropagation and the second moment weighting coefficients ,include: By obtaining the centered noise-added perturbation sample The initial real image data The preset noise intensity hyperparameter and the training noise scheduling parameters According to the formula Perform calculations to generate noise increment variables. ; By obtaining the noise increment variable The preset noise intensity hyperparameter and the training noise scheduling parameters According to the formula Perform the calculation to generate the conditional score target value. ; By obtaining the preset noise intensity hyperparameter and the training noise scheduling parameters According to the formula Perform the calculation to generate the second-order moment weighting coefficients. .

8. The image generation method based on a diffusion model of gamma noise according to claim 1, characterized in that, Obtaining the preset noise scheduling parameters corresponding to each time step in the continuous discrete-time grid includes: Obtain the continuous time variables of the continuous discrete time grid. ; By obtaining the continuous time variable and the preset scheduling extreme value parameters and According to the formula Perform calculations to generate the preset noise scheduling parameters for the corresponding time nodes. .

9. The image generation method based on a diffusion model of gamma noise according to claim 1, characterized in that, The state tensor The output is a generated image of the target, including: The operation of mapping the state tensor values ​​of each step of the continuous discrete-time grid to the standard image pixel value range generates a numerical matrix; the outlier pixel positions in the numerical matrix that exceed the threshold of the standard pixel value range are subjected to a numerical truncation operation to generate pixel data that falls within the legal pixel range, and the pixel data is output as the target generated image.

10. A computer-readable storage medium having stored thereon computer instructions that, when executed by a processor, implement the steps of the gamma noise-based diffusion model image generation method according to any one of claims 1-9.