Apparatus and method for generating synthetic data in a generative network

By modifying the input variable values ​​and adding noise, and combining the classification model's response with the optimized loss function of the generative model, the gradient vanishing problem in deep generative model training is solved, achieving a more stable latent space search and a more efficient training process.

CN112733875BActive Publication Date: 2026-06-23ROBERT BOSCH GMBH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ROBERT BOSCH GMBH
Filing Date
2020-10-23
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Deep generative models suffer from the vanishing gradient problem during training, which prevents optimization from continuing and/or from converging within a reasonable time, making it difficult to exhaustively search their latent space to understand how the generator constructs the latent space.

Method used

By modifying the input variable values, adding noise, and combining gradients, the loss function of the generative model is optimized using the response of the classification model as a condition, in order to generate a more stable search path and a shorter search time.

Benefits of technology

This approach solves the gradient vanishing problem in the optimization of generative models, allowing for more stable and efficient latent space search, understanding the latent space structure of the generator, and improving the training efficiency and interpretability of the model.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN112733875B_ABST
    Figure CN112733875B_ABST
Patent Text Reader

Abstract

The present disclosure relates to a computer-implemented method for generating synthetic data instances using a generative model. According to various embodiments, a computer-implemented method for generating synthetic data instances using a generative model is described, the method comprising: generating, by the generative model, a synthetic data instance for input variable values of input variables supplied to the generative model; classifying, by a classification model, the synthetic data instance to generate a classification result; determining, for the classification result, a loss function value of a loss function, the loss function evaluating the classification result and determining a gradient of the loss function with respect to the input variables at the input variable values. If an absolute value of the gradient is less than or equal to a predefined threshold, the method comprises: modifying the input variable values to generate a plurality of modified input variable values; determining, for each modified input variable value of the plurality of modified input variable values, a gradient of the loss function with respect to the input variables at the respective modified input variable value; and combining the gradients of the loss function with respect to the input variables at the respective modified input variable values to generate an estimated gradient at the input variable values; and modifying the input variable values in a direction determined by the estimated gradient to generate a further input variable value. Finally, the generative model generates a further synthetic data instance for the further input variable value.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] This disclosure relates to a computer-implemented method for generating synthetic data instances using a generative model.

[0002] Research on deep generative models, such as Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs), has exploded in the past few years. These typically unsupervised models have shown very promising results and are now able to reproduce images that look natural at high resolution and with sufficient quality to even fool human observers.

[0003] The future potential applications of deep generative models are considerable, ranging from anomaly detection and high-quality synthetic image generation to tools that provide interpretability for datasets and models. For example, this could enable the creation of corner situation datasets to test car perception systems and sensors, or the creation of meaningful inputs—inputs that are significant to humans—to test the decision boundaries of a given model.

[0004] However, these potential benefits come at the cost of more difficult and complex training compared to standard classification models. For example, several problems may arise during the training of GANs, such as training instability, pattern collapse, high variance gradients, and generator gradient vanishing.

[0005] In the context of explainable AI (artificial intelligence), understanding how the generator of a generative model constructs its latent space is crucial to understanding what it has truly learned.

[0006] Typically, the latent space of a generator is high-dimensional, making exhaustive search or analysis difficult to handle. In recent years, several methods have been developed for sampling and visualizing the latent space of generative models, such as the method described by Tom White in the paper "Sampling Generative Networks," published in CoRRabs / 1609.04468 in 2016 (available at https: / / arxiv.org / abs / 1609.04468 and incorporated herein by reference). This method is used for searching through the latent space, producing visualizations of the hidden code in a way that illustrates the relationships the generator has learned.

[0007] This search for explanations, that is, a search through the generator's latent space, can be conditioned on the response (output) of the classification (discrimination) model, as a way to ensure that the paths taken by the generator through the latent space produce realistic and meaningful instances (samples). This allows for a better understanding of how well the generator models the transformations between semantic concepts.

[0008] In practice, this conditional effect can be achieved through an optimization process that involves calculating the generator's gradient based on the classifier's (discriminator's) response to the generator's output. The optimization can be, for example, based on the optimization of an objective function, such as maximizing or minimizing the objective function, and may involve one or more neural networks.

[0009] However, a problem can arise during the computation of generator gradients, where the gradient becomes very small or zero. This is similar to the well-known "vanishing generator gradient" problem, which can occur during the training of GANs.

[0010] The method and apparatus of the independent claims allow for the resolution of situations where the gradient at a point in the search (e.g., optimization search) through the latent space of the generator becomes too small or zero. Too small can be understood as the gradient being too small to allow a sufficient reduction or increase in the objective function (e.g., loss function), which would result in a reduction or increase in the objective function that converges within a reasonable time.

[0011] As a result, this can, for example, be based on the search conditionally on the response of the classification (discrimination) model, enabling a more stable search and a shorter search time throughout the generator's latent space.

[0012] In particular, it allows for the finding of feasible implementations of the condition.

[0013] Further examples are described below:

[0014] A computer-implemented method for generating synthetic data instances using a generative model, the generative model generating synthetic data instances for input variable values ​​supplied to it, the computer-implemented method comprising: classifying the synthetic data instances by a classification model to generate a classification result; determining a loss function value for the classification result, the loss function evaluating the classification result; determining the gradient of the loss function relative to the input variable at the input variable value; the method further comprising, if the absolute value of the gradient is less than or equal to a predefined threshold, then: modifying the input variable value to generate a plurality of modified input variable values; for each of the plurality of modified input variable values, determining the gradient of the loss function relative to the input variable at the corresponding modified input variable value; combining the gradients of the loss function relative to the input variable at the corresponding modified input variable value to generate an estimated gradient at the input variable value, and modifying the input variable value in the direction determined by the estimated gradient to generate further input variable values; and the generative model generating further synthetic data instances for the further input variable values. The computer-implemented method mentioned in this paragraph provides a first example.

[0015] This computer-implemented method allows for the resolution of problems that may arise during optimization processes using the loss function of a generative model—such as, for example, the search through the latent space of the generator mentioned above—where the gradient of the loss function becomes so small that the optimization process cannot continue and / or fails to converge in a reasonable time. In such cases, the method described above is used to generate new (further) input variable values ​​in order to compute new gradients at these new input variable values.

[0016] For example, it is possible to have a new (estimated) gradient with a larger absolute value than the (original) gradient, thereby enabling the optimization of the loss function to continue and / or converge in a reasonable time.

[0017] Therefore, the described method can be based on the search conditioned on the output of any compatible auxiliary model, such as the response of a classification (discrimination) model, to enable, for example, a more stable search across the latent space of the generator and a shorter search time.

[0018] This method may further include combining the gradients of the loss function by summing or averaging the gradients of the loss function relative to the input variables at the corresponding modified input variable values. The features mentioned in this paragraph, combined with the first example, provide a second example.

[0019] For example, by using the arithmetic mean, the gradients at the modified input variable values ​​can be summed or averaged to reduce the influence of possible outliers and ensure that the estimated gradient obtained is close to the true gradient of the loss function.

[0020] The method may further include: modifying the input variable values ​​to generate multiple modified input variable values, including adding noise to the input variable values. The features mentioned in this paragraph, combined with any of the first to second examples, provide a third example.

[0021] Adding controlled noise, such as Gaussian noise, to the input variable values ​​allows for a good estimate of the true gradient of the loss function in situations where no gradient would otherwise be returned or would be less than a predefined threshold. The ideal parameters for the noise added to the input variable can be found, for example, through experimentation.

[0022] The method may further include: checking whether the estimated gradient is higher than a predefined threshold or a further predefined threshold; if the absolute value of the estimated gradient is less than or equal to the predefined threshold or a further predefined threshold, then performing one or more of the following to modify the estimated gradient: modifying the input variable values ​​to generate a plurality of modified input variable values; for each of the plurality of modified input variable values, determining the gradient of the loss function relative to the input variable at the corresponding modified input variable value; and combining the gradients of the loss function relative to the input variable at the corresponding modified input variable values ​​to generate an estimated gradient at the input variable value. The features mentioned in this paragraph, combined with any of the first through third examples, provide a fourth example.

[0023] If the estimated gradient value is not higher than a predefined threshold or a further predefined threshold (i.e., a threshold different from the gradient value can be set for the estimated gradient value), the gradient value can be repeatedly modified until a suitable estimated gradient is found.

[0024] The method may further include: searching for the optimal value of a loss function, wherein the loss function is evaluated along a sequence of input variable values; for each input variable, determining the gradient of the loss function relative to the input variable at the input variable value; if the absolute value of the gradient is less than or equal to a predefined threshold, then: modifying the input variable value to generate a plurality of modified input variable values; for each of the plurality of modified input variable values, determining the gradient of the loss function relative to the input variable at the corresponding modified input variable value; combining the gradients of the loss function relative to the input variable at the corresponding modified input variable values ​​to generate an estimated gradient at the input variable value, and modifying the input variable value in the direction determined by the estimated gradient to generate further input variable values; and stopping the search for the optimal value when the loss function of the last input variable value in the sequence of input variable values ​​satisfies a predetermined convergence criterion. The features mentioned in this paragraph, combined with any of the first through fourth examples, provide a fifth example.

[0025] As mentioned above, this method allows the use of generative models to address the vanishing gradient problem during the optimization of the loss function. During the optimization of the loss function, whenever the absolute value of the gradient of the loss function becomes too small—that is, less than a predefined threshold, for example, too small to allow for reasonable numerical processing (e.g., much larger than the size of the rounding error)—an estimated gradient value is used instead. This continues until the optimization of the loss function stops, i.e., when a predetermined convergence criterion has been met.

[0026] The method may further include: the input variables supplied to the generative model are latent space variables. The features mentioned in this paragraph, combined with any of the first through fifth examples, provide a sixth example.

[0027] In particular, the input variables can be latent variables sampled to follow a predefined probability distribution (e.g., a uniform or Gaussian probability distribution), or they can be intermediate latent space representations of the optimization process.

[0028] The method may further include modifying the input variable values ​​in the direction determined by the estimated gradient to generate further input variable values, including adding or subtracting the estimated gradient multiplied by a predefined learning rate from the input variable values. The features mentioned in this paragraph, combined with any of the first through sixth examples, provide a seventh example.

[0029] Moving the input variable values ​​in the direction of the estimated gradient multiplied by a predefined learning rate allows control over the step size used when generating further input variable values. This step size must be large enough to be far enough away from gradient saturation points, but small enough not to “skip” the extrema of the loss function and / or cause the optimization process of the loss function to converge.

[0030] The method may further include: the classification model being a discriminator of the generative adversarial network. The features mentioned in this paragraph, combined with any of the first through seventh examples, provide an eighth example.

[0031] This method is particularly suitable for discriminators in Generative Adversarial Networks (GANs). GAN discriminators perform binary classification tasks and are therefore often trained using a cross-entropy loss function, which expects an activation function modeling a Bernoulli probability distribution, such as tanh or sigmoid, and such activation functions are known for their strong saturation at both ends. Therefore, when the classification model used in this method is a GAN discriminator, the gradient of the loss function may saturate, i.e., the gradient may be zero or less than a predefined threshold.

[0032] This method may further include: the generative model being one of the following: a decoder portion of an autoencoder or variational autoencoder, a generative portion of a generative adversarial network, a streaming model, and an autoregressive model. The features mentioned in this paragraph, combined with any of the first through eighth examples, provide a ninth example.

[0033] The method may further include: the synthetic data instance being a synthetic image. The features mentioned in this paragraph, combined with any of the first through ninth examples, provide a tenth example.

[0034] The application of methods for synthesizing images can, for example, allow for the examination of object classification in autonomous driving or visual inspection systems. In particular, it can allow for meaningful interpretations of object classification in autonomous driving and visual inspection systems, leading to a better understanding of how sensitive the classification is to changes in the input data. This information can then be used as part of the verification and validation processes performed on the object classification.

[0035] A computer-implemented method for training a neural network may include: providing training sensor data samples to form a training dataset; training the neural network using the training dataset; generating one or more synthetic data instances according to any one of the first to tenth examples using a generative model (which may be an additional neural network) trained on the training dataset; adding the one or more generated synthetic data instances to the training dataset to obtain an expanded training dataset; and training the neural network and / or other neural networks and / or image classifiers, particularly an image classifier including one of the neural networks, using the expanded training dataset. Such a neural network may be given as a convolutional neural network. An eleventh example is provided in the computer-implemented training method mentioned in this paragraph.

[0036] The training method in the eleventh example enables the refinement of existing models / datasets to obtain better trained and more robust neural networks, or to train new models / neural networks.

[0037] An apparatus may include an actuator and a controller, the controller being configured to control the actuator using classification results from a neural network trained according to the eleventh example. The apparatus mentioned in this paragraph provides a twelfth example.

[0038] Such devices enable more robust classification of input sensor data, and thus more reliable control of actuators. For example, in the context of autonomous driving, robust object classification can be achieved, leading to reliable vehicle control.

[0039] A device can be configured to perform a computer-implemented method of any of the first through eleventh examples. The device mentioned in this paragraph provides a thirteenth example.

[0040] A computer program may include instructions arranged to cause a processor system to perform a computer-implemented method of any one of the first through eleventh examples. The computer program mentioned in this paragraph provides a fourteenth example.

[0041] A computer-readable medium may include transient or non-transitory data representing instructions arranged to cause a processor system to perform a computer-implemented method of any one of the first through eleventh examples. The computer-readable medium mentioned in this paragraph provides a fifteenth example.

[0042] In the accompanying drawings, the same reference numerals are used throughout different views and generally refer to the same parts. The drawings are not necessarily to scale, and the emphasis is usually placed instead on illustrating the principles of the invention. In the following description, various aspects are described with reference to the following drawings, wherein:

[0043] Figure 1 An exemplary visual inspection system for detecting defective parts is shown;

[0044] Figure 2 An example of object classification in the context of autonomous driving is shown;

[0045] Figure 3 An example of a generative adversarial network is shown;

[0046] Figure 4 The update of the input code for the generator of the generative adversarial network is shown;

[0047] Figure 5 A flowchart illustrating an exemplary method for gradient estimation is shown.

[0048] The following detailed description references the accompanying drawings, which illustrate by way of imagery specific details and aspects of the present disclosure in which the invention may be practiced. Other aspects may be utilized, and structural, logical, and electrical changes may be made, without departing from the scope of the invention. The various aspects of the present disclosure are not necessarily mutually exclusive, as some aspects of the present disclosure may be combined with one or more other aspects of the present disclosure to form new aspects.

[0049] The various examples will be described in more detail below.

[0050] Figure 1 An inspection system 100 is shown, illustrating an example of its use for detecting defective parts.

[0051] exist Figure 1 In the example, component 101 is positioned on assembly line 102.

[0052] The controller 103 includes a data processing component, such as a processor (e.g., a CPU (Central Processing Unit)) 104, and a memory 105 for storing control software that the controller 103 operates according to its operation and data that the processor 104 operates on it.

[0053] In this example, the stored control software includes instructions that, when executed by processor 104, cause processor 104 to implement visual inspection system 106.

[0054] The visual inspection system 106 includes a classification model 107, which comprises at least one neural network and is capable of classifying input sensor data provided to it. In particular, the classification model may be a binary classifier.

[0055] The data stored in memory 105 may include, for example, image data from one or more image sources (sensors) 108 (e.g., cameras). An image may include a collection of data representing one or more objects or patterns. The one or more image sources 108 may, for example, output one or more grayscale or color images of each of components 101. The one or more image sources 108 may be responsive to visible or invisible light, such as, for example, infrared or ultraviolet light, ultrasonic or radar waves, or other electromagnetic or acoustic signals.

[0056] It should be noted that image classification can be considered equivalent to the classification of objects shown in the image. If the original image shows multiple objects or patterns, segmentation can be performed (possibly via an additional neural network) such that each segment represents an object or pattern, and these segments are used as input to an image classification neural network.

[0057] The controller 103 can determine a defect in one of the components 101 based on image data from the one or more image sources 108. For example, the classification model 107 can be a binary classifier, and if the component meets all quality criteria, it is classified as "qualified"; otherwise, if the component does not meet at least one quality criterion, it can be classified as "unqualified".

[0058] The visual inspection system 106 can further provide an interpretation for the “nonconformity” classification of part 101. This interpretation can be constrained by design to a discrete set of possibilities, such as physical defects in a specific location of the part or changes in control conditions, such as lighting during manufacturing processes.

[0059] Based on the interpretation provided by the visual inspection system 106, the controller 103 can determine that component 101 is defective. The controller 103 can then send a feedback signal 109 to the error handling module 110. The feedback signal 109 contains an explanation, i.e., information indicating why component 101 was determined to be defective (such as characteristics of the component).

[0060] The error handling module 110 can then adapt the manufacturing process accordingly, for example, using the feedback signal 109. For instance, when an interpretation shows that a part has a specific physical defect in a particular location, the error handling module 110 can modify the operating parameters of the manufacturing process, such as the applied pressure, heat, welding time, etc., to reduce the risk of such failures.

[0061] If the interpretation generated by the visual inspection system 106 is unexpected or critical according to predefined (and / or user-defined) criteria, the error handling module 110 can control the manufacturing process to operate in a safe mode after receiving the feedback signal 109.

[0062] Furthermore, this interpretation can be used to determine how a classification model must modify an image to classify it as "acceptable" or "unacceptable" in order to classify it as "unacceptable." This information can then be used with generative models (…). Figure 1 (Not shown) Generate new data samples, which can be incorporated into future training sets to refine the visual inspection system 106 or to train other models / systems.

[0063] Furthermore, such as Figure 1 The system illustrated can be trained to learn the optimal operating parameter settings for manufacturing processes by learning the correspondence between interpretation and operating parameter settings.

[0064] and Figure 1 The system illustrated in the middle can be used in other technological fields, such as in access control systems, computer-controlled machines such as robots, household appliances, power tools, or personal assistants.

[0065] Figure 2 Example 200 for object detection in autonomous driving scenarios is shown.

[0066] exist Figure 2 In the example, a vehicle 201, such as a car, truck, or motorcycle, is provided with a vehicle controller 202.

[0067] The vehicle controller 202 includes data processing components, such as a processor (e.g., a CPU (central processing unit)) 203 and a memory 204 for storing data that the vehicle controller 202 operates according to its control software and the data that the processor 203 operates on it.

[0068] In this example, the stored control software includes instructions that, when executed by processor 203, cause the processor to implement an object detection system 205 including a classification model 206, which includes at least one neural network and is capable of performing classification of input sensor data provided to it.

[0069] The data stored in memory 204 may include input sensor data from one or more sensors 207. For example, the one or more sensors 207 may be one or more cameras that acquire images.

[0070] An image may include a collection of data representing one or more objects or patterns. One or more sensors (e.g., cameras) may, for example, output grayscale or color images of the vehicle's environment. One or more sensors 207 may be responsive to visible or invisible light, such as infrared or ultraviolet light, ultrasonic or radar waves, or other electromagnetic or acoustic signals. For example, the one or more sensors 207 may output radar sensor data that measures the distance to objects in front of and / or behind the vehicle 201.

[0071] In this example, we assume that the input sensor data is input image data.

[0072] It should be noted that image classification can be considered equivalent to the classification of objects shown in the image. If the original image shows multiple objects or patterns, segmentation can be performed (possibly via an additional neural network) such that each segment represents an object or pattern, and these segments are used as input to an image classification neural network.

[0073] Based on the input image data and by using classification model 206, the control software can determine the presence of objects, such as stationary objects like traffic signs or road markings, and / or moving objects like pedestrians, animals, and other vehicles.

[0074] Vehicle 201 can be controlled by vehicle controller 202 based on the determination of the presence of the object. For example, vehicle controller 202 can control actuator 208 to control the speed of the vehicle, such as actuating the vehicle's brakes, or can prompt a human driver to take over control of vehicle 201.

[0075] In such Figure 1 The inspection system 100 shown in the diagram or Figure 2Before systems like the object detection system 205 illustrated in the diagram are deployed in the field, they must be tested and investigated for their reliability in classifying objects (parts). In particular, the robustness and reliability of the classification model / neural network must be ensured. To this end, it should be possible to examine the reasons behind the predictions output by one or more classification models, where meaningful classification reasons appear reasonable and understandable to a human operator.

[0076] Generally speaking, the operation of these systems should be thoroughly investigated, especially considering security and reliability. In other words, such systems must be verified and validated before deployment.

[0077] Therefore, in order to obtain a meaningful interpretation of the output of the classification model, the classification model can be investigated with the help of a generative model trained on the same dataset (i.e., the same dataset used for training / validation of the classification model).

[0078] The trained generative model is then used in conjunction with the classification model to investigate the reliability of the classification model.

[0079] Various generative models can be used. For example, a generative model can be part of a generative adversarial network (GAN), can include the decoder part of an autoencoder (e.g., a variational autoencoder (VAE)), can be a streaming model, or can be an autoregressive model.

[0080] The following text will describe examples of generative models as part of GANs in more detail.

[0081] Figure 3 An exemplary generative adversarial network (GAN 300) is shown, in particular, Figure 3 An example of training a GAN is shown.

[0082] exist Figure 3 In the example, training dataset 301 contains training images 302, all of which are of the same size. It should be noted that, in general, training datasets can contain other types of data such as audio, radar, sonar, lidar, motion, etc., and the training data (images) do not necessarily contain samples (images) of the same size.

[0083] The generative neural network 303 receives an input code 304, which is, for example, an instance or value of a vector z that is randomly or pseudo-randomly generated following a predefined probability distribution (e.g., a uniform or normal distribution). Different input codes 304 can be provided to the generative neural network during training.

[0084] Based on the input code 304, the generator (short for "generative neural network") 303 generates a synthetic instance (image) 305 of the same size as the training image 302. In other words, the generator 303 maps the input code to the image space.

[0085] The space from which the input code is sampled is called the generator's latent space. The generator maps this latent space to the image space, which, in itself and in isolation, can be considered meaningless / structureless. This mapping performed by the generator can be seen as providing meaning / structure to the latent space. Therefore, the input code can also be called a latent space representation.

[0086] For example, the structure of the latent space given by the generator can be explored by interpolating and performing vector operations between points (vectors) in the latent space and observing the (meaningful) effects on the output synthetic image.

[0087] Therefore, the latent space representation / input code can be viewed as a compressed representation of the final synthesized image output by the generator.

[0088] For example, based on input code 304, generator 303 can be trained to generate synthetic image 305 similar to the image included in training dataset 301.

[0089] Input codes 304 typically have a smaller dimension than the images (instances) they generate, for example, at most 1% or 0.1% of the number of parameters (features) of the generated images, and therefore at most 1% or 0.1% of the number of parameters (features) of the images included in training dataset 301.

[0090] As a specific example, the synthesized image 305 can be an RGB image of size 256×256×3, and the input code 304 can be a vector with 100 values, that is, the latent space representation of the synthesized image 305 can consist of only 100 features.

[0091] By training the generator 303 to optimally reproduce the training dataset 301, semantic meaning can be attributed to features of the latent space representation. For example, if the generator is used to generate synthetic images of traffic conditions, specific features can indicate the presence of traffic lights and the colors they display, vehicle types, weather conditions, etc.

[0092] During training, the discriminative neural network 306 is supplied with one or more training images, such as a batch of training images 302 from training dataset 301, and one or more synthetic images, such as a batch of synthetic images 305 from generator 303.

[0093] Then, a discriminator (short for "discriminative neural network") 306 is trained to distinguish between the synthetic image generated by the generator 303 and the image contained in the training dataset 301. In other words, the discriminator 306 is trained to maximize the probability of assigning the correct label to both the image from the training dataset 301 and the synthetic image 305 generated by the generator 303.

[0094] The generator 303 is trained to maximize the probability that the discriminator 306 will assign the wrong label to the synthetic image 305, i.e. it is adversarially trained to try to deceive the discriminator 306.

[0095] The discriminator 306 can be trained separately from the generator 303, but typically both are trained on the same or similar training dataset, such as training dataset 301. For example, they can be trained repeatedly and alternately.

[0096] The discriminator 306 then outputs a classification score 307 for each image, received from the training dataset and / or from the generator.

[0097] A classification score of 307 can be a probability value, but it can also be a more general representation of probability; for example, a higher score corresponds to a higher probability classification score.

[0098] Typically, a classification score 307 indicating a probability value of 1 indicates that the input image has been classified by the discriminator 306 as an image from the training dataset 301, and a score indicating a probability value of 0 indicates that the input image has been classified as a synthetic image 305 generated using the generator 303.

[0099] The classification score of 307 is then used in the objective function 308, which is often referred to as the loss, value, or cost function.

[0100] In GAN, the corresponding objective function should reflect the distance between the distribution of the data generated by generator 303 and the distribution of the data contained in training dataset 301.

[0101] In the context of GAN training, the objective function 308 is given by, for example, the following equation (1):

[0102] Equation (1).

[0103] In equation (1), D is the discriminator and G is the generator. It is a loss function. It follows a distribution Input code, Representing an image From the distribution of basic real (training) data Instead of the generator distribution The probability, and Representing an image From generator distribution Instead of then from the training data distribution The probability of.

[0104] Therefore, GAN training can be viewed as a two-player minimax game, where the performance of the two neural networks improves over time, and the goal is to find a Nash equilibrium. Ideally, at the end of training, That is, the discriminator D cannot distinguish between the training images and the synthetic images generated by the generator G.

[0105] In 309, the weights / parameters of both the generator 303 and the discriminator 306 are updated based on the optimization of the objective function 308. For example, the weights / parameters of the discriminator 306 are maximized using gradient ascent and backpropagation, for example, in equation (1). And regarding the weights / parameters of generator 303, gradient descent and backpropagation are used to minimize the values ​​in equation (1). .

[0106] It should be noted that other objective functions can be used to train GANs.

[0107] Once a GAN has been trained, the generator can be used, for example, to create synthetic instances / samples to augment the (training) dataset.

[0108] To explain and understand what a trained GAN generator has learned, it is important to understand how the generator shapes its latent space. However, exhaustive search through the latent space of a trained generator is typically difficult to handle due to the high dimensionality of the latent space.

[0109] Therefore, methods known in the art, such as interpolation through the latent space, have been developed to examine how a generator constructs or shapes the latent space.

[0110] Conditioning the search through the latent space with the output of the discriminator (e.g., discriminator 306) is a way to ensure that the paths taken through the latent space produce meaningful and realistic samples / instances (e.g., synthetic image 305) – that is, samples / instances that show how well the model translates between semantic concepts.

[0111] As mentioned earlier, trained GAN generators can also be used in the field of interpretable AI to enhance the understanding of the outputs of deep neural networks, especially classification models.

[0112] For example, a generator from a GAN can be used to try to find a meaningful interpretation of the output of a classification model, where the generator of the GAN is trained on the same dataset as the classification model.

[0113] The trained GAN generator is then used with the classification model to try to find a meaningful interpretation of the classification model's output.

[0114] In particular, in order to find these meaningful explanations, counterfactuals of synthetic data samples were generated using a generator trained on a GAN.

[0115] Counterfactual data samples are data samples that, according to a classification model, have different classifications and / or significantly different classification scores (e.g., classification scores falling below a predetermined threshold) compared to the corresponding (synthetic) data samples.

[0116] It should be noted that images of the data samples will be referenced in the following text; however, this does not preclude the use and / or generation of other types of data samples.

[0117] The generated counterfactual should ideally have a small amount of meaningful (pixel) difference, that is, a difference that is associated with the image portion representing the semantic object or part—in other words, with the image portion that a human operator would likely view when classifying the object. Therefore, a meaningful difference should appear to a human operator as a plausible justification for the change in classification and / or the change in classification score.

[0118] Generating counterfactual information for synthetic images can be achieved, in particular, by using perturbation masks. This is achieved by using a mask that indicates a perturbation caused by the classification model's classification of the synthetic image. The perturbation mask... The latent space representation of the generated synthetic image is applied. This causes the generator to produce new synthetic images that are classified differently by the classification model. In other words, the mask... This causes the generator to produce the counterfactual of the corresponding synthesized image.

[0119] It should be noted that the newly generated synthetic image does not necessarily have to be counterfactual and / or classified differently; it can be classified as the same, but with low certainty, and it may be close to the decision boundary of the classification model, etc.

[0120] Finding a suitable (small), ideally minimal, perturbation mask that has a significant impact on the classification of synthetic images helps in understanding the properties of classification models, particularly identifying the critical decision boundaries of the model—the non-smooth, typically overfitting decision boundaries where predictions are neither uniform nor predictable. In other words, it leads to a better understanding of how sensitive the classification model is to changes in its input.

[0121] Then, this investigation into the operation of the classification model can be further developed and quantified into validation and verification metrics, particularly into the assessment of the reliability of the classification model.

[0122] In practice, both optimization approaches—namely, conditioned the search through the latent space on the discriminator's output and using a GAN's generator alongside a classification model to attempt to find a meaningful interpretation of the classification results—require computing the gradient of the generator G based on the discriminator D (and correspondingly, the classification model D') response to the generator G's output. In other words, both cases require computing the loss (objective) function to be optimized (e.g., maximized or minimized), specifically the loss function... (correspondingly, ) or containing items (correspondingly, The loss function relative to the input code The gradient.

[0123] However, when performing the required computations, such as, for example, gradient descent (ascent) and / or backpropagation, when relative to Difficulties may arise when the calculated gradient is zero or close to zero, such as when it is less than a predetermined threshold.

[0124] This type of problem typically arises when the classification model and / or discriminator D has a sigmoid or tanh output function. For example, the discriminator D used in GANs—which performs a binary classification task by determining whether an input sample comes from the training dataset or is generated by the generator G—is often trained using a cross-entropy loss function, which expects an activation function that models the Bernoulli probability distribution, such as a tanh or sigmoid output (activation) function.

[0125] The tanh activation function will "squeeze" the output between [-1, 1] and the sigmoid activation function will "squeeze" the output between [0, 1]. Therefore, these types of output functions are severely saturated at one or both ends, meaning that the gradient of these functions converges to zero near at least one of the endpoints.

[0126] This type of problem can also occur during GAN training, known as the "generator gradient vanishing" problem. One possible solution during GAN training is to use different batches of input data for gradient calculation. This increases training time, but at least makes it possible to continue training even when the discriminator saturates.

[0127] However, this is not realistic in cases where the search is conditioned on the output of a discriminator (e.g., a trained discriminator in a GAN), or when using an auxiliary (optimization) model with saturated output activation on top of the generator output—such as the aforementioned method of finding a meaningful interpretation of the classification model's output by applying a perturbation mask to the latent space representation. In such cases, having a gradient equal to or close to zero might mean that the entire process (computation) cannot proceed. For example, it would be impossible to find a meaningful explanation for why a particular image is classified in a certain way by the classification model, unlike during the training of a GAN.

[0128] It should be noted that, such as Figure 1 Systems like the inspection system 100 illustrated for detecting defective parts can have very strict acceptance criteria. This can lead to extreme prediction scores, corresponding to scores with very high (close to one) or very low (close to zero) probabilities.

[0129] For example, with the output function tanh or sigmoid, the system can often predict a "perfect" score with a probability of zero (or very close to zero) when a part has been classified as "unqualified", and the system can often predict a "perfect" score with a probability of one (or very close to one) when a part has been classified as "qualified".

[0130] It should be noted that these “perfect” scores should not be interpreted as an indication of how accurate the model is, but rather as a result of very high acceptance criteria imposed upon it.

[0131] When searching for a meaningful interpretation of the classification scores of the parts inspected by inspection system 100, the perfect scores predicted by a system such as inspection system 100 will lead to the above. The gradient vanishing problem relative to the gradient of z.

[0132] According to various embodiments, in such cases, the modified gradient value is used to make it possible to continue the computation and optimization of the loss function, i.e., to continue optimization processes such as interpolation, search for meaningful interpretations, and / or search for the minimum perturbation mask.

[0133] It should be noted that using modified gradient values ​​during GAN training can also be advantageous, thereby enabling more efficient and robust training of GANs in cases where the discriminator saturates.

[0134] Ideally, the modified gradient value relative to z should remain close to the actual gradient value, for example, differing from the actual gradient value below a predetermined threshold. Furthermore, it should indicate the correct gradient direction on which the optimization of the loss function should continue.

[0135] Figure 4 The diagram illustrates the update of the input code for the generator of a generative adversarial network.

[0136] In particular, Figure 4 This demonstrates how to change the underlying distribution of the input code z to modify, for example, the distribution for... Or targeting The gradient of the loss function with respect to z.

[0137] The dashed line 401 represents the boundary between the processing used to update the input code z and the external input / parameters.

[0138] Input code 402 is a k-dimensional input code z, also known as an input vector, latent vector, or latent space representation. The input code z can be sampled to follow a predefined probability distribution, such as a uniform or Gaussian probability distribution, or it can be an intermediate latent space representation of an optimization process (such as the optimization process described above).

[0139] In step 403, N variations of the input code z are generated by adding noise to the vector z. In this example, the noise follows... Distributed Gaussian noise. This results in N values. ,in .

[0140] It should be noted that it is possible to use other distributions that are concentrated around the mean of the noise added to the vector z, such as, for example, the hyperbolic secant distribution, the Laplace distribution, or the Stuart t distribution.

[0141] In 404, the variance used in step 403 The value is predefined, for example, provided or set by the model user.

[0142] variance A good value for variance can be found experimentally. It should be set as small as possible so that not all values ​​are calculated in the next step 407. All are zero. The smaller the variance of the added Gaussian noise, the better. (or (relative to) The gradient will be closer to (or The gradient is relative to z. However, the variance must be large enough so that the gradient / weight updates are not too small. Therefore, the variance should be large enough to enable optimization processes, such as loss function optimization. For example, it should be large enough to continue interpolation processes, search for meaningful interpretations, and / or search for the minimum perturbation mask.

[0143] Figure 405 denotes the generator G of the GAN. In this example, the generator G generates a model of size k by mapping the k-dimensional input code z to the image space. A composite image.

[0144] .

[0145] The attached figure, denoted by 406, represents the GAN discriminator D, or alternatively, the classification model D'. In this example, the GAN discriminator D will generate a (classification model) of size [missing information]. The image is mapped to a single output:

[0146] .

[0147] The classification model D' typically has a size of Image mapping to Different outputs (categories):

[0148] .

[0149] During the training of a GAN, it is possible to follow the path corresponding to... The gradient direction that minimizes the loss updates the weights / parameters of the generator G. This model is called the critic C.

[0150] .

[0151] Alternatively, the classification model D' outputs a classification score based on the generated input and is "stacked" on top of the generator G.

[0152] The resulting model C' has a mapping similar to that of model C:

[0153] .

[0154] As explained above, calculating the output gradient of model C or C' is equivalent to calculating the gradient relative to the input code z. or This could result in a gradient of zero (or close to zero).

[0155] Therefore, when the gradient of C with respect to z is zero or close to zero, for example, less than a predetermined threshold, in step 407, the gradient of the variant of the input code z is calculated, i.e., for each ( gradient of ) This generates N gradient values. , .

[0156] When classifier D' is used instead of discriminator D, the same calculation is performed for cases where the gradient of C' with respect to z is zero or close to zero, such as less than a predetermined threshold.

[0157] It should be noted that the calculated gradient value should be such that... The variance is selected in such a way that at least one of the variances is different from zero. .

[0158] In step 408, the N gradient values ​​calculated in step 407 are calculated. The arithmetic mean of , that is, the following values ​​are calculated:

[0159] .

[0160] Steps 403, 407, and 408 can be repeated (executed again) until a fitted gradient estimate is found. For example, until gradient estimation Greater than the predefined threshold.

[0161] It should be noted that the noise added to the input code z can vary when the steps are repeated. In particular, the variance can be chosen differently when the steps are repeated. .

[0162] In step 409, the input code z is updated to the new input code z' in the following manner:

[0163]

[0164] in The parameter 410 can be viewed as the step size. Specifically, It can be a predefined learning rate.

[0165] In step 411, the updated input code z' is output. This code can then be used, for example, by the generator G to generate synthetic data instances (images).

[0166] It should be noted that the updated input code z' is not necessarily the final input code for the optimization process. That is, the input code that minimizes the loss function from the optimization process. This final input code... Typically, this is code used to generate a minimal and appropriate perturbation mask.

[0167] Therefore, for one of them (or Adding controlled noise to the input code z relative to the case where the gradient of z is zero or close to zero allows for the use of... (or The gradient of ), not (or The true gradient of ).

[0168] It should be noted that, by Figure 4 The method illustrated can also be used during the training of GANs to provide updated input codes and estimated gradients when the discriminator saturates, thus enabling more efficient and robust training of GANs.

[0169] A further advantage of updating the input code according to the above method steps is that the method is adaptive; that is, near the gradient saturation points on the tanh and sigmoid curves (which are the typical output functions of the discriminator D), the estimated gradient will be biased towards the gradient at points far from the saturation points. Furthermore, the estimation is unbiased when the tanh and sigmoid curves reach their maximum gradient values.

[0170] In summary, according to various embodiments, such as Figure 5 The diagram illustrates a computer-implemented method 500 for generating synthetic data instances using a generative model.

[0171] In step 501, a generative model is used to generate synthetic data instances of the input variable values ​​supplied to the generative model.

[0172] In step 502, the synthetic data instances are classified using a classification model to generate classification results.

[0173] In step 503, the loss function value of the loss function for the classification result is determined, and the loss function evaluates the classification result.

[0174] In step 504, the gradient of the loss function with respect to the input variable is determined at the input variable value.

[0175] In step 505, if the absolute value of the gradient is less than or equal to a predefined threshold, the input variable values ​​are modified to generate multiple modified input variable values.

[0176] In step 506, for each of the plurality of modified input variable values, the gradient of the loss function with respect to the input variable is determined at the corresponding modified input variable value.

[0177] In step 507, the gradients of the loss function with respect to the input variables at the corresponding modified input variable values ​​are combined to generate an estimated gradient at the input variable values.

[0178] In step 508, the input variable values ​​are modified in the direction determined by the estimated gradient to generate further input variable values.

[0179] In step 509, the generative model generates further synthetic data instances with further input variable values.

[0180] Figure 5 The methods described herein can be executed by one or more processors. The term "processor" can be understood as any type of entity that allows the processing of data or signals. For example, data or signals can be treated according to at least one (i.e., one or more) specific functions performed by the processor. A processor can include analog circuits, digital circuits, composite signal circuits, logic circuits, microprocessors, central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), programmable gate arrays (FPGAs), integrated circuits, or any combination thereof. Any other means of implementing the corresponding functions (described in more detail below) can also be understood as a processor or logic circuit. It should be understood that one or more of the method steps described in detail herein can be executed (e.g., implemented) by a processor through one or more specific functions performed by that processor.

[0181] Figure 5 The method can be used to synthesize data corresponding to sensor signals from any sensor, that is, to operate on any type of input sensor data such as video, radar, lidar, ultrasound, and motion.

[0182] It should be noted that synthetic data is not limited to images, but can also correspond to any image-like data (e.g., data constructed in the form of one or more two-dimensional or even higher-dimensional arrays), such as sound spectrograms, radar spectra, ultrasound images, etc. In addition, raw 1D (e.g., audio) or 3D data (video or RGBD (red, green, and blue depth) data) can also be used as input.

[0183] Although specific embodiments have been illustrated and described herein, those skilled in the art will appreciate that various alternatives and / or equivalent implementations can be used instead of the specific embodiments shown and described without departing from the scope of the invention. This application is intended to cover any adaptations or variations of the specific embodiments discussed herein. Therefore, the invention is intended to be limited only by the claims and their equivalents.

Claims

1. A computer-implemented method for generating synthetic data instances using a generative model, the method comprising: Generative models generate synthetic data instances based on the input variable values ​​supplied to them. A classification model is used to classify the synthetic data instances to generate classification results; The loss function value is determined for the classification result, and the loss function evaluates the classification result; Determine the gradient of the loss function with respect to the input variables at the input variable values; If the absolute value of the gradient is less than or equal to a predefined threshold, then Modify the input variable values ​​to generate multiple modified input variable values. For each of the plurality of modified input variable values, determine the gradient of the loss function with respect to the input variable at the corresponding modified input variable value. The gradients of the loss function with respect to the input variables at the corresponding modified input variable values ​​are combined to generate an estimated gradient at the input variable values, and Modify the input variable values ​​in the direction determined by the estimated gradient to generate further input variable values, and Generative models generate further synthetic data instances for further input variable values; The synthetic data instance is a synthetic image.

2. The method of claim 1, wherein combining the gradients of the loss function comprises summing or averaging the gradients of the loss function with respect to the input variables at the corresponding modified input variable values.

3. The method according to any one of claims 1 to 2, wherein modifying the input variable values ​​to generate a plurality of modified input variable values ​​includes adding noise to the input variable values.

4. The method according to any one of claims 1 to 3, further comprising: Check if the estimated gradient is higher than a predefined threshold or a further predefined threshold; If the absolute value of the estimated gradient is less than or equal to the predefined threshold or the further predefined threshold, then the following steps are performed once or more to modify the estimated gradient: Modify the input variable values ​​to generate multiple modified input variable values. For each of the plurality of modified input variable values, determine the gradient of the loss function with respect to the input variable at the corresponding modified input variable value, and The gradients of the loss function with respect to the input variables at the corresponding modified input variable values ​​are combined to generate an estimated gradient at the input variable values.

5. The method according to any one of claims 1 to 4, further comprising: The search for the optimal value of the loss function involves evaluating the loss function along a sequence of input variable values. For each input variable in the sequence of input variable values, determine the gradient of the loss function with respect to the input variable at the input variable value; If the absolute value of the gradient is less than or equal to a predefined threshold, then: Modify the input variable values ​​to generate multiple modified input variable values; For each of the plurality of modified input variable values, determine the gradient of the loss function with respect to the input variable at the corresponding modified input variable value; The gradients of the loss function with respect to the input variables at the corresponding modified input variable values ​​are combined to generate an estimated gradient at the input variable values, and Modify the input variable values ​​in the direction determined by the estimated gradient to generate further input variable values; The search for the optimal value stops when the loss function of the last input variable value in the sequence of input variable values ​​satisfies the predetermined convergence criterion.

6. The method according to any one of claims 1 to 5, further comprising that the input variables supplied to the generative model are latent space variables.

7. The method according to any one of claims 1 to 6, further comprising that the classification model is a discriminator of the generative adversarial network.

8. A computer-implemented method for training a neural network, comprising: Provide training sensor data samples to form the training dataset; Train the neural network using the training dataset; One or more synthetic data instances are generated using a generative model trained on the training dataset according to any one of claims 1 to 7; Add the one or more of the generated synthetic data instances to the training dataset to obtain an expanded training dataset; and The expanded training dataset is used to train neural networks and / or additional neural networks and / or image classifiers. The synthetic data instance is a synthetic image.

9. An apparatus comprising: Actuator; and The controller is configured to control the actuator using the classification results from the neural network trained according to claim 8.

10. An apparatus configured to perform a computer-implemented method according to any one of claims 1 to 8.

11. A computer program comprising instructions arranged to cause a processor system to perform a computer-implemented method according to any one of claims 1 to 8.

12. A computer-readable medium comprising transient or non-transient data representing instructions arranged to cause a processor system to perform a computer-implemented method according to any one of claims 1 to 8.