Training of a generator for producing realistic images with a semantic segmentation discriminator
By generating realistic images using generative adversarial networks and combining them with semantic segmentation training methods, the problem of insufficient training data in existing technologies is solved, enabling efficient training and accurate recognition of image classifiers under rare traffic conditions.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ROBERT BOSCH GMBH
- Filing Date
- 2021-08-20
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to effectively train image classifiers using synthetic image data, especially in situations where traffic conditions are rare, and the tedious manual labeling process leads to insufficient learning by the image classifiers.
A generative adversarial network (GAN) is used to generate realistic images. The generator is trained by combining a semantic map and a discriminator. The parameters of the generator and discriminator are optimized by semantic segmentation to generate realistic images to enrich the training dataset. The image classifier is trained by using the feedback of semantic segmentation.
The training dataset was expanded, improving the accuracy and precision of the image classifier under rare traffic conditions, reducing manual labeling work, and enhancing the semantic meaning recognition ability of the image classifier.
Smart Images

Figure CN116113989B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to training a generator for realistic images, which in turn can be used to train an image classifier. Background Technology
[0002] Approximately 90% of the information a human driver needs to navigate a vehicle in road traffic is visual. Therefore, it is absolutely essential for at least partial automation of driving vehicles to correctly evaluate the image data recorded while monitoring the vehicle's environment, regardless of the modality. Of particular importance for driving tasks is classifying the image data according to what traffic-related objects it contains, such as other road users, lane markings, obstacles, and traffic signs.
[0003] The corresponding image classifier must be trained using training images captured under various traffic conditions. Obtaining training images is relatively difficult and expensive. Traffic conditions that rarely occur in reality may result in insufficient representativeness in the dataset of training images, preventing the image classifier from optimally learning its accurate classification. Furthermore, a significant amount of manual work is required to "label" the training images (or their pixels) using the desired class association ("ground truth").
[0004] Therefore, synthetically generated training data is also used, which is generated by a generator based on a Generative Adversarial Network (GAN). Such a generator for radar data is known from DE 10 2018 204494 B3. Summary of the Invention
[0005] Within the scope of this invention, a method for training an image generator has been developed.
[0006] The concept of an image is not limited to still camera images, but also includes, for example, video images, radar images, lidar images, and ultrasound images.
[0007] The images to be generated, especially for a pre-given application, may be realistic. In this case, "realistic" may specifically mean that these images can be used in the same way as images captured by physical sensors in downstream processing, such as when training an image classifier. The realistically generated images can be used, for example, to enrich the reserve of real, sensor-captured, and subsequently "labeled" training images for the image classifier. Therefore, in the following text, for better readability, the images to be generated will be referred to as "realistic images" or "realistically generated images".
[0008] The generator produces realistic images from a semantic map. This semantic map associates the semantic meaning of the object to which each pixel belongs in the generated realistic image. Therefore, the generated image is not any random realistic image, but rather an image reflecting a pre-defined situation within the semantic map. Thus, the semantic map can, for example, indicate traffic conditions with different lanes, lane boundaries, traffic signs, road users, and other objects.
[0009] This method provides real training images and their corresponding semantic training maps, where each pixel of the training image is associated with a semantic meaning. Therefore, there is one semantic training map for each real training image. Conversely, there is at least one real training image for each semantic training map, because semantically identical images may have been captured using different exposures or other imaging parameters. For example, a semantic training map can be obtained by manually labeling real training images.
[0010] Using a generator to be trained, realistic images are generated from at least one semantic training map. For the same at least one semantic training map, at least one real training image is determined. A discriminator is used to train the generator. To do this, the discriminator is fed the realistic images generated by the generator and at least one real training image.
[0011] The discriminator is constructed to determine a semantic segmentation of the image fed to it, whereby each pixel of the image is associated with a semantic meaning. Based on the semantic segmentation determined by the discriminator, it is evaluated whether the image fed to it is a realistically generated image or merely a real training image. This evaluation can also be performed within the discriminator itself and / or in separate functional units. The evaluation does not need to be trainable, but can also be performed according to static rules.
[0012] The discriminator is fed a realistic image generated by the generator, at least one real training image, and at least one mixed image. The generator parameters, which characterize the generator's behavior, are optimized so that the realistic image generated by the generator is misclassified as a real image by the discriminator.
[0013] Simultaneously, or alternately, the discriminator parameters, which characterize the discriminator's behavior, are optimized with the aim of improving the accuracy in distinguishing between realistically generated images and real images. Therefore, the discriminator is trained such that, after evaluating the semantic segmentation provided by the discriminator, realistically generated images are classified as realistically generated images, and real training images are classified as real training images.
[0014] Using a discriminator has various advantages: the discriminator not only binary distinguishes between real training images on the one hand and realistically generated images on the other hand, but also provides complete semantic segmentation of the images fed to the discriminator.
[0015] The binary discrimination mentioned above always involves the entire image globally. Semantic segmentation, on the other hand, is performed at the local level of individual pixels. Therefore, for example, a portion of the image can undoubtedly be identified by the discriminator as part of the real training image, while other portions of the image can be identified as part of the realistically generated image. This contradiction must only be resolved in the downstream evaluation.
[0016] For example, an image can be rated as a realistically generated image if the proportion of image pixels identified as portions of a realistically generated image exceeds a pre-defined threshold. Conversely, an image can be rated as a real training image if the proportion of image pixels identified as portions of a real training image exceeds a pre-defined threshold. Any rating is possible in between. If the discriminator, for example, identifies 60% of the pixels as portions of a real training image and 40% as portions of an image realistically generated by the generator, then the image fed to the discriminator can be rated as a real training image with a score of 0.6, and can be rated as an image realistically generated by the generator with a score of 0.4.
[0017] Therefore, it is generally advantageous to compare, and / or make related, the number of pixels that the discriminator evaluates as belonging to real training images with the number of pixels that the discriminator evaluates as belonging to realistically generated images when evaluating semantic segmentation.
[0018] By performing semantic segmentation at the pixel level, the discriminator is advantageously incentivized to learn fine details of the image fed to it at the level of one or several pixels, and to detect spatial consistency with the semantic training map. Based on this spatial consistency, the extent to which the discriminator has learned the discriminative semantic features of the image can be examined at any scale. However, if a simple global statement (Aussage) is required from the discriminator, then what is "irrelevant" to the cost function used for optimization in this regard is from which the discriminator obtains that statement in detail. Therefore, detailed work at the pixel level is not "paid for."
[0019] In this application, training of the discriminator, designed for semantic segmentation, can generally be better monitored because the maximum information content of a given semantic training map can be directly considered as the "truth" for training the discriminator. With the cost of labeling training data remaining constant, the more existing labeled training data can be used during training, the higher the accuracy of the trained discriminator will be.
[0020] Furthermore, feedback from the semantic segmentation discriminator enables the generator to be trained to produce realistic images from two-dimensional noise covering the entire input image without requiring additional spatial information. Therefore, the existing "ground truth" used to train the discriminator can, in a sense, also be indirectly used to train the generator.
[0021] In a particularly advantageous construction scheme, the possible semantic meanings in the semantic segmentation provided by the discriminator include at least the semantic meaning of the semantic training map and the classification of parts of the realistically generated image. For example, the semantic meaning of the semantic training map could represent objects of N discrete categories appearing in the real training image. The classification of parts of the realistically generated image could be added to this as another category N+1.
[0022] In a particularly advantageous construction scheme, the discriminator parameters are optimized as follows: the optimal value is adopted through a cost function whereby, for all pixels and all possible semantic meanings, the discriminator's classification scores for the corresponding semantic meaning are summed in a weighted manner using a binary index, namely whether the semantic meaning is correct according to the semantic training map. This cost function maps the cross-entropy of N+1 categories and thus statistically effectively aggregates the discriminator's multiple decisions into a total decision, the number of which corresponds to the number of pixels in the image.
[0023] The cost function L of the discriminator D An example is
[0024] .
[0025] In this regard, E indicates the formation of expected values for all pairs consisting of a real image x and a semantic training map t, or for all pairs consisting of noise z sampled according to a random distribution and the semantic training map t. In this example, the semantic training map t is a three-dimensional tensor. Two dimensions represent spatial coordinates i and j, which can extend up to the height H of the image or up to the width W of the image. Then, the position i, j of category c in the tensor t is encoded along the third dimension as a "one-hot" vector, which contains an entry of 1 only for category c, and only contains 0 otherwise.
[0026] therefore, For each cluster (Konstellation) consisting of coordinates i, j and category c, describe the probability that the association between the pixel at location i, j and category c corresponds to the "true case," and this probability is always 1 or 0. Each time this probability is 1, it can be the probability of the association between pixel i, j and category c output by the discriminator. It is not 1 (which might result in 0 when taking the logarithm), the cost function L D The penalty for this "violation" was increased, either smaller or larger, depending on its "severity." In the example mentioned above, this penalty was calculated using a category-specific factor. To add weight.
[0027] For all pixels i and j that do not actually belong to the real training image, the probabilities of all classes c from 1 to N. The probability is zero. Therefore, the next step involves the realistic image G(z, t) generated by the generator G from the noise z. Ideally, these pixels i, j should be considered as 1 by the discriminator. A pixel is identified as belonging to the realistically generated image. This becomes zero when the logarithm is taken. Any probability less than 1 results in the corresponding pixel being a value of the cost function L. D Contribution penalty amount.
[0028] In a particularly advantageous construction scheme, the semantic meanings of the semantic training map are weighted in the summation using the reciprocal of their measured frequencies in the relevant pixels. This takes into account the fact that the frequencies of semantic meanings in real training images are often very unevenly distributed. If the semantic meaning represents, for example, traffic signs, then priority signs or speed limits appear far more frequently than, for example, warnings about railway crossings or warnings about roads leading to unsafe coastlines. Weighting is also appropriate for such infrequent but still very important traffic signs during training.
[0029] For example, in the instance mentioned above, the weighting factor The following forms can be adopted.
[0030] .
[0031] The generator can also be trained based on the cost function L. G To perform this. For example, the cost function L G It can mimic the cost function L for the discriminator D And adopt the following form
[0032] .
[0033] In another particularly advantageous construction scheme, a discriminator is chosen that comprises an encoder-decoder layout having an encoder structure and a decoder structure. The encoder structure transforms the input image into a reduced-information representation across multiple processing layers. The decoder structure further transforms this reduced-information representation into an association between each pixel of the input image and its semantic meaning. This type of encoder-decoder layout is particularly suitable for determining semantic segmentation.
[0034] In another advantageous construction scheme, bypassing the representation that reduces information, the discriminator has at least one direct connection between the processing layers of the encoder structure and the decoder structure. Then, particularly relevant portions of the information can be selectively transferred from the encoder structure to the decoder structure without traversing the "bottleneck" of the representation that minimizes information. Thus, the discriminator achieves a "U-Net" architecture. This discriminator coherently aggregates global and local information obtained through the aforementioned "bottleneck" or also through the aforementioned direct connections.
[0035] As previously explained, the main application of the training method described herein is to expand the training dataset of the image classifier, and thus, starting from a pre-given training dataset with real training images and their expected associations with semantic meaning, to better train the image classifier overall. Therefore, the present invention also relates to a method for training an image classifier that associates an input image (and / or pixels of the input image) with semantic meaning.
[0036] In this method, a generator is trained as described previously. The trained generator is then used to produce realistic images from semantic maps. These semantic maps are no longer limited to those used to train the generator, but can describe any desired scene.
[0037] From the semantic map, the expected semantic meaning is determined, and the trained image classifier maps realistic images to these expected semantic meanings. In particular, the expected meaning can include, for example, a membership to one or more pre-given categories. If, for example, a vehicle is drawn at a specific location in the semantic map, then the realistically generated image will contain a vehicle at that location. Therefore, the image classifier must at least associate that image region with the category "vehicle".
[0038] The training dataset for the image classifier is expanded to include realistically generated images and their associated semantic expectations. This expanded training dataset contains real training images and their associated semantic expectations. The image classifier is then trained using this expanded training dataset.
[0039] As previously explained, this method, in particular, enriches the training dataset with realistic images of situations that were previously underrepresented in the training dataset. In this way, the image classifier can better manipulate these situations.
[0040] For example, training images of rare but dangerous traffic conditions are often difficult to obtain. For instance, fog, extreme snowfall, or a thin layer of ice, which are major components of the condition, may be present only in small amounts. Other aspects of the condition, such as two vehicles on a collision path, may be too dangerous to handle with real vehicles.
[0041] Therefore, the present invention also relates to another method. In this method, as previously described, an image classifier is trained using realistic images generated by a trained generator. Using the trained image classifier, images captured by at least one sensor carried by the vehicle are associated with semantic meanings. From the semantic meanings determined by the image classifier, a control signal is determined. Using this control signal, the vehicle is controlled.
[0042] Through improved training, the accuracy of the semantic meaning provided by the image classifier is advantageously improved. Therefore, the probability that the vehicle's response triggered by the control signal is appropriate for the traffic situation shown in the image is advantageously increased.
[0043] In particular, the method can be implemented entirely or partially by a computer. Therefore, the present invention also relates to a computer program having machine-readable instructions that, when executed on one or more computers, cause the one or more computers to perform one of the described methods. In this sense, control devices for vehicles and embedded systems for technical devices, which are also capable of executing machine-readable instructions, should also be considered as computers.
[0044] Similarly, the present invention also relates to a machine-readable data carrier having a computer program, and / or to a downloadable product having a computer program. A downloadable product is a digital product that can be transmitted via a data network, that is, a digital product that can be downloaded by a user of the data network, and said digital product may, for example, be sold in an online store for immediate download.
[0045] In addition, a computer may be equipped with computer programs, machine-readable data carriers, or downloadable products. Attached Figure Description
[0046] In the following description of preferred embodiments of the invention, other measures to improve the invention are shown in more detail with reference to the accompanying drawings.
[0047] Figure 1 An embodiment of a method 100 for training generator 1 is shown;
[0048] Figure 2 This illustrates the semantic segmentation 6 created by discriminator 7;
[0049] Figure 3 An embodiment of a method 200 for training an image classifier 9 is shown;
[0050] Figure 4 An embodiment of a method 300 having a complete chain of action leading up to controlling vehicle 50 is shown. Detailed Implementation
[0051] Figure 1 This is a schematic flowchart of an embodiment of method 100. In step 110, a real training image 5 and its associated semantic training map 5a are provided. The semantic training map 5a associates a semantic meaning 4 with each pixel of the corresponding training image 5.
[0052] In step 120, a realistic image 3 is generated from at least one semantic training map 5a using the generator 1 to be trained. In step 130, for the same at least one semantic training map 5a, at least one real training image 5 is determined. For example, this could be the training image 5 whose “labels” are used to fully form the semantic training map 5a.
[0053] In step 140, the discriminator 7 is fed a realistic image 3 generated by the generator 1 and at least one real training image 5, wherein the realistic image 3 generated by the generator 1 and the at least one real training image 5 belong to the same semantic training map 5a. According to block 141, the discriminator 7 may, in particular, have an encoder-decoder layout. The encoder structure in this encoder-decoder layout transforms the input image into a reduced-information representation in multiple successive processing layers. The decoder structure in the encoder-decoder layout further transforms this reduced-information representation into an association between each pixel of the input image and a semantic meaning (4). According to block 141a, in particular, for example in the discriminator 7, at least one direct connection may be provided between the processing layers of the encoder structure and the processing layers of the decoder structure, in the case of bypassing the reduced-information representation.
[0054] In step 150, the discriminator 7 determines a semantic segmentation 6 for the images 3 and 5 fed to it, wherein each pixel of the semantic segmentation is associated with a semantic meaning 4. According to block 151, the possible semantic meanings 4 in the semantic segmentation 6 may include, in particular, the semantic meanings 4 of the semantic training map 5a and the classification as parts of the realistically generated image 3.
[0055] However, this is not the final determination of whether the input image is a realistically generated image 3 or a real training image 5. The reference symbol is determined in step 160, immediately following the evaluation of semantic segmentation 6. The judgment marked "3 or 5". According to block 161, in this case, in particular, the number of pixels evaluated by the discriminator 7 as belonging to the real training image 5 can be compared with the number of pixels evaluated by the discriminator as belonging to the realistically generated image 3, and / or the two pixel numbers can be related.
[0056] In step 170, the generator parameter 1a, which characterizes the behavior of generator 1, is optimized for the purpose of causing the realistic image 3 generated by generator 1 to be misclassified as the real image 5 by discriminator 7. Simultaneously or alternately, in step 180, the discriminator parameter 7a, which characterizes the behavior of discriminator 7, is optimized for the purpose of improving the accuracy in distinguishing between the realistically generated image 3 and the real image 5.
[0057] Here, according to block 181, the discriminator parameters 7a can be optimized as follows: a specific cost function adopts the optimal value. In this cost function, the classification scores of the discriminator 7 for the corresponding semantic meaning 4 are summed for all pixels and all possible semantic meanings 4. The sum in the sum is weighted using a binary index: whether the semantic meaning 3 is correct according to the semantic training map (5a). In this case, according to block 181a, in particular, the meanings 4 of the semantic training map 5a can be weighted in the summation by the reciprocal of their frequencies measured in the relevant pixels. For a given meaning 4, the weights can also be set to zero. Thus, for example, in the training image 5, there may be pixels that are not labeled and thus have the placeholder label "unknown". By resetting the weights to zero, these pixels can then be completely ignored during optimization. Conventional discriminators cannot do this without difficulty because they do not compute their cost function ("Loss function") at the pixel level.
[0058] exist Figure 2 In this example, a simple instance is used to illustrate the formation of semantic segmentation 6 through discriminator 7. Semantic training map 5a contains regions with three different semantic meanings 4, namely sky 41, garden 42, and house 43. These semantic meanings have been learned by “labeling” real training image 5, which contains clear sky 11, house 12, and garden 13.
[0059] Using generator 1, a realistic image 3 is generated from the semantic training map 5a. The realistic image 3 contains a sky with continuous rain 11', houses 12' that are different from those in the real training image 5, but a garden 13 that is the same as the garden in the real training image 5.
[0060] The discriminator 7 processes the realistic image 3 into semantic segmentation 6. In this semantic segmentation 6, the image of a sky with continuous rain 11' is associated with the semantic meaning 41 "sky". The image of a garden 13 is associated with the semantic meaning 42 "garden". However, the image of a house 12' that has been altered relative to the real training image 5 is associated with the semantic meaning 44 "a part of the image 3 that is realistically generated".
[0061] Unlike traditional discriminators, it essentially makes the following judgment at the local pixel level: whether a real image 5 exists, or whether a realistically generated image 3 exists.
[0062] By utilizing downstream evaluation 160, the judgment sought can be obtained in a variety of ways. The question is whether the image should be classified as real image 5 or as realistically generated image 3. If evaluation 160 is appropriate for most pixels, then the image can pass as real image 5 because the sky 11' identified as sky and the garden 13 identified as garden together occupy more pixels than the altered house 12'. However, evaluation 160 can also focus, for example, on whether the image contains an object with the semantic meaning 43 "house" in the expected location, regardless of the garden, sky, or other appendages. In this case, the image can be identified as realistically generated image 3. However, as previously discussed in the discussion of the exemplary cost function L for discriminator 7... D As previously described, the cost function contributions relating to the semantic meanings 41-43 appearing in the real image and the cost function contributions relating to the semantic meaning 44 “the portion of image 3 that is realistically generated” can also be averaged or otherwise aggregated.
[0063] Figure 3 This is a schematic flowchart of an embodiment of a method 200 for training an image classifier 9. In step 210, generator 1 is trained using the previously described method 100. In step 220, the trained generator 1 is used to semantically classify... Figure 2 A realistic image 3 is generated. In step 230, the semantics of the image are used respectively. Figure 2 In determining the semantic expected meaning, the image classifier 9 maps the realistic image 3 (or its pixels) to the semantic expected meaning respectively.
[0064] In step 240, the realistic image 3 generated by generator 1 and its associated expected meaning 4' are added to training dataset 9a, which already contains the real training image 5 and its associated expected meaning 4'. In step 250, the expanded training dataset is used... In order to train the image classifier 9.
[0065] Figure 4 This is a schematic flowchart of an embodiment of method 300. In step 310, an image classifier 9 is trained using the previously described method 200. In step 320, the image classifier 9 is used to associate images 5 captured by at least one sensor 50a carried by the vehicle 50 with semantic meanings 4. In step 330, a control signal 330a is determined from the semantic meanings 4 determined by the image classifier 9. In step 340, the vehicle 50 is controlled using the control signal 330a.
Claims
1. A method (100) for training a generator (1) for an image (3) from a semantic map (2), wherein the semantic map (2) associates each pixel of the image (3) with the semantic meaning (4) of the object to which the pixel belongs, the method comprising the steps of: • Provide (110) real training images (5) and their corresponding semantic training maps (5a), wherein each pixel of the corresponding training image (5) is associated with a semantic meaning (4). • Using the generator (1), an image (3) (120) is generated from at least one semantic training map (5a); • For the same at least one semantic training map (5a), determine (130) at least one real training image (5); • The discriminator (7) is fed (140) an image (3) generated by the generator (1) and at least one real training image (5) that belong to the same semantic training map (5a), so the discriminator (7) determines (150) a semantic segmentation (6) of the image fed to the discriminator (7), the semantic segmentation (6) associating a semantic meaning (4) with each pixel of the image; • Evaluate (160) based on the semantic segmentation (6) determined by the discriminator (7) whether the image fed to the discriminator (7) is an image (3) generated by the generator (1) or a real training image (5); • To optimize (170) the generator parameters (1a) that characterize the behavior of the generator (1) for the purpose of: the image (3) generated by the generator (1) being misclassified as a real image (5); • To optimize (180) the discriminator parameters (7a) that characterize the behavior of the discriminator (7) for the following purpose: to improve the accuracy in distinguishing between the image (3) generated by the generator (1) and the real image (5).
2. The method (100) of claim 1, wherein The possible semantic meanings (4) in the semantic segmentation (6) provided by the discriminator (7) include at least (151) the semantic meanings (4) of the semantic training map (5a) and the classification as part of the image (3) generated by the generator (1).
3. The method (100) of claim 2, wherein, The discriminator parameters (7a) are optimized as follows (181): the optimal value is adopted by the following cost function: in the cost function, for all pixels and all possible semantic meanings (4), the classification scores of the discriminator (7) for the corresponding semantic meaning (4) are summed in a weighted manner using a binary index, namely whether the semantic meaning (4) is correct according to the semantic training map (5a).
4. The method (100) according to claim 3, wherein, The semantic meanings (4) of the semantic training map (5a) are weighted in the summation by the reciprocal of their frequencies measured in the relevant pixels (181a).
5. The method (100) according to any one of claims 1 to 4, wherein, Select (141) a discriminator (7) having an encoder structure and a decoder structure, wherein the encoder structure transforms the input image into a reduced-information representation in multiple successive processing layers, and the decoder structure further transforms the reduced-information representation into an association of each pixel of the input image with a semantic meaning (4).
6. The method (100) according to claim 5, wherein, In the case of bypassing the representation of reduced information, the following discriminator (7) is selected (141a): the discriminator (7) has at least one direct connection between the processing layer of the encoder structure and the processing layer of the decoder structure.
7. The method (100) according to any one of claims 1 to 4, wherein, The evaluation (160) of the semantic segmentation (6) includes: comparing the number of pixels evaluated by the discriminator (7) as belonging to the real training image (5) with the number of pixels evaluated by the discriminator as belonging to the image (3) generated by the generator (1), and / or making the two pixel numbers related (161).
8. A method (200) for training an image classifier (9), the image classifier (9) associating an input image and / or pixels of the input image with semantic meaning, the method (200) comprising the steps of: • The generator (1) is trained (210) using the method (100) according to any one of claims 1 to 7; • Using a trained generator (1), generate (220) images (3) from the semantic map (2); • From the semantic maps (2) used respectively, determine (230) the semantic expected meaning (4'), and the trained image classifier (9) will map the image (3) to the semantic expected meaning (4') respectively; • The training dataset (9a) for the image classifier (9) is expanded (240) with the image (3) generated by the generator (1) and the corresponding semantic expected meaning (4'), the training dataset (9a) for the image classifier (9) containing real training images and corresponding semantic expected meaning (4'). • Utilizing the expanded training dataset ( ), train (250) the image classifier (9).
9. A method (300) for controlling a vehicle, comprising the steps of: • Train (310) an image classifier (9) using the method (200) according to claim 8; • Using the image classifier (9), images (5) captured by at least one sensor (51) carried by the vehicle (50) are associated (320) with semantic meaning (4). • Determine (330) the manipulation signal (330a) from the semantic meaning (4) determined by the image classifier (9); • The vehicle (50) is controlled (340) using the control signal (330a).
10. A computer program product having a computer program containing machine-readable instructions that, when executed on one or more computers, cause the one or more computers to perform the method (100, 200, 300) according to any one of claims 1 to 9.
11. A machine-readable data carrier having a computer program containing machine-readable instructions that, when executed on one or more computers, cause the one or more computers to perform the method according to any one of claims 1 to 9.
12. A computer equipped with a computer program product according to claim 10, and / or equipped with a machine-readable data carrier according to claim 11.