Method for generating synthetic sensor data

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using scene databases and machine learning techniques to transform sensor data in a multi-dimensional space, high-fidelity synthetic sensor data is generated, which solves the problem of insufficient testing for rare scenarios in autonomous driving systems, enables more detailed testing and verification, and reduces the need for data storage and sharing.

CN122240648APending Publication Date: 2026-06-19ZENSEACT AB

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: ZENSEACT AB
Filing Date: 2025-12-18
Publication Date: 2026-06-19

Application Information

Patent Timeline

18 Dec 2025

Application

19 Jun 2026

Publication

CN122240648A

IPC: G06F16/245

CPC: G06N3/045; G06N3/08; G06F11/3013; G06F11/3476; G06F11/3684; G06F16/24535; B60W60/00

AI Tagging

Application Domain

Digital data information retrieval Error detection/correction

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122240648A_ABST

Patent Text Reader

Abstract

The techniques disclosed herein relate to methods for generating synthetic sensor data. More specifically, the disclosed techniques relate at least in part to a computer-implemented method (100) for generating synthetic sensor data using a scene database, the method (100) comprising: obtaining (S102) a request for a specified query scene, wherein the query scene is associated with a query embedding representing the query scene in a multidimensional space; identifying (S106) at least one scene sample within the scene database, the at least one scene sample having a transform therein in which the query embedding is located; and in response to successfully identifying at least one scene sample, generating (S108) synthetic sensor data corresponding to the query scene by transforming the sensor data of the identified at least one scene sample into synthetic sensor data having an associated scene embedding in a multidimensional space within a threshold distance from the query embedding.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of autonomous driving systems. Specifically, this disclosure relates to methods and apparatus for generating synthetic sensor data using existing scene databases. Background Technology

[0002] One of the central principles of validating and validating Automated Driving Systems (ADS) is the use of scenario-based testing. This type of testing is used to systematically evaluate the system's performance across a wide range of driving scenarios, including statistically rare and challenging situations, but requiring extensive real-world driving experience. These scenarios are crucial for exposing potential system weaknesses and ensuring robustness in diverse and unpredictable environments.

[0003] Traditionally, scenario-based testing focuses on the decision-making and control components of the Adaptive Devices (ADS) because these systems can be tested using reliably generated inputs such as pre-recorded trajectories or simulated vehicle behavior. However, this approach excludes the perception system, a crucial subsystem responsible for interpreting raw sensor data and understanding the environment surrounding the vehicle. Ignoring the perception system limits the comprehensiveness of the testing and reduces the usefulness and generalizability of the results.

[0004] To address this issue, a method has been proposed to re-simulate previously captured sensor sequences. This involves reprocessing recorded sensor data through the sensing system to test the system's response. While effective for validating common scenarios, this approach is inherently limited to common and not uncommon scenarios that have already been experienced and collected. Furthermore, the large amounts of raw sensor data collected during ADS operations present significant challenges for storage and sharing due to bandwidth and cloud infrastructure limitations.

[0005] Alternative approaches involve using high-fidelity simulations or sensor data for synthetic renderings to create scenes that are difficult to capture in the real world. While this approach can generate rare and diverse scenes, its effectiveness is hampered by uncertainties regarding the fidelity of the synthetic data and its ability to accurately replicate sensor stimuli in the real world.

[0006] Therefore, there is a growing need for new and improved methods to generate reliable, efficient, and high-quality data for unprecedented and rare scenarios. This approach will enable comprehensive testing of all ADS subsystems, including those focused on perception, and will support extensive development and validation activities. Summary of the Invention

[0007] The techniques disclosed herein aim to mitigate, alleviate, or eliminate one or more of the defects and disadvantages identified above in the prior art to address various problems associated with generating synthetic sensor data. More specifically, the disclosed techniques address the problem of generating effective, reliable, and accurate data for rare or unprecedented scenarios for ADS development, with the added effect of achieving sparser data collection while reliably exhausting the space of possible scenarios. The proposed solutions further enable the querying of new scenarios. Various aspects and embodiments of the disclosed techniques are defined in the following and appended independent and dependent claims.

[0008] According to a first aspect, a computer-implemented method is provided for generating synthetic sensor data using a scene database. The scene database includes multiple scene samples, each scene sample comprising sensor data depicting the surrounding environment of a vehicle over a period of time. Each scene sample is associated with a scene embedding representing that scene sample in a multidimensional space. Each scene embedding is associated with a transform in the multidimensional space. The transform indicates a set of possible transformed scenes that can be generated from the corresponding scene sample. The method includes a request to obtain a specified query scene. The query scene is associated with a query embedding representing the query scene in a multidimensional space. The method further includes: identifying at least one scene sample within the scene database, the at least one scene sample having a transform therein containing the query embedding. The method further includes: in response to successfully identifying at least one scene sample, generating synthetic sensor data corresponding to the query scene by transforming the sensor data of the identified at least one scene sample into synthetic sensor data having an associated scene embedding in the multidimensional space within a threshold distance from the query embedding. Similar advantages and preferred features exist for this aspect of the disclosed technique as for other aspects.

[0009] According to a second aspect, a computer program product including instructions is provided, which, when executed by a computing device, cause the computing device to perform the method according to any embodiment of the first aspect. According to an alternative embodiment of the second aspect, a (non-transitory) computer-readable storage medium is provided. The non-transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a processing system, the one or more programs including instructions for performing the method according to any embodiment of the first aspect. Similar advantages and preferred features exist for this aspect of the disclosed technology as for the other aspects.

[0010] As used herein, the term "non-transitory" is intended to describe a computer-readable storage medium (or "memory") that does not include propagating electromagnetic signals, but is not intended to otherwise limit the type of physical computer-readable storage device included in the term computer-readable medium or memory. For example, the terms "non-transitory computer-readable medium" or "tangible memory" are intended to cover types of storage devices that do not necessarily store information permanently, including, for example, random access memory (RAM). Program instructions and data stored in a non-transitory form on a tangible computer-accessible storage medium can be further transmitted via a transmission medium or a signal such as an electrical, electromagnetic, or digital signal, which can be transmitted via a communication medium such as a network and / or a wireless link. Therefore, as used herein, the term "non-transitory" is a limitation on the medium itself (i.e., tangible, not a signal), rather than a limitation on the persistence of data storage (e.g., RAM vs. ROM).

[0011] According to a third aspect, a computing device is provided for generating synthetic sensor data using a scene database. The scene database includes multiple scene samples. Each scene sample includes sensor data depicting the surrounding environment of a vehicle over a period of time. Each scene sample is associated with a scene embedding representing that scene sample in a multidimensional space. Each scene embedding is associated with a transform in the multidimensional space. The transform indicates a set of possible transformed scenes that can be generated from the corresponding scene sample. The computing device includes control circuitry. The control circuitry is configured to obtain a request for a specified query scene. The query scene is associated with a query embedding representing the query scene in a multidimensional space. The control circuitry is further configured to identify at least one scene sample within the scene database, the at least one scene sample having a transform therein where the query embedding is located. The control circuitry is further configured to, in response to successfully identifying at least one scene sample, generate synthetic sensor data corresponding to the query scene by transforming the sensor data of the identified at least one scene sample into synthetic sensor data having a scene embedding associated in the multidimensional space within a threshold distance from the query embedding. This aspect of the disclosed technology possesses advantages and preferred features similar to those of other aspects.

[0012] The disclosed aspects and preferred embodiments may be appropriately combined with each other in any manner that is obvious to those skilled in the art, such that one or more features or embodiments disclosed with respect to one aspect may also be regarded as disclosed with respect to embodiments of another aspect or another aspect.

[0013] Some embodiments have the advantage of making it easier to obtain and more efficient to develop more comprehensive scenario-based ADS functionality. By performing more thorough scenario-based testing and verification, the robustness and reliability of ADS can be achieved in a wider range of scenarios.

[0014] One advantage of some embodiments is that synthetic sensor data can be generated in a reliable manner by taking into account the transformations of existing scene samples.

[0015] One advantage of some embodiments is that they reduce the need for fleet data transmission while still achieving accurate and reliable testing of the ADS. This is due to the fact that less raw sensor data is needed to exhaustively cover the scene space.

[0016] Some implementations have the advantage of enabling the generation of scenarios that closely resemble data that has already been collected but may have taken months or years to collect from real-world driving.

[0017] One advantage of some embodiments is that they can exhaust the scene space with less collected data.

[0018] One advantage of some embodiments is the ability to generate more realistic synthetic sensor data of simulated scenes rendered by the simulation engine. This can further improve the usability of scene samples for ADS development.

[0019] Further embodiments are defined in the dependent claims. It should be emphasized that, when used in this specification, the term "comprising" is used to indicate the presence of a stated feature, integral, step, or component. It does not exclude the presence or addition of one or more other features, integrals, steps, components, or groups thereof.

[0020] Referring to the embodiments described below, these and other features and advantages of the disclosed technology will be further illustrated below. Attached Figure Description

[0021] The foregoing aspects, features, and advantages of the disclosed technology will be more fully understood through the following illustrative and non-limiting detailed description of exemplary embodiments of the present disclosure, taken in conjunction with the accompanying drawings, in which:

[0022] Figure 1 It is a schematic flowchart representing a method according to some embodiments;

[0023] Figure 2 These are schematic illustrations of a computing device according to some embodiments;

[0024] Figure 3 These are schematic illustrations of a vehicle according to some embodiments;

[0025] Figure 4 The example diagram illustrates the mapping between scene samples and multidimensional space;

[0026] Figure 5 Example diagrams illustrate scene samples in multidimensional space;

[0027] Figure 6A and Figure 6B The example diagram illustrates how to fill the scene space. Detailed Implementation

[0028] This disclosure will now be described in detail with reference to the accompanying drawings, in which some exemplary embodiments of the disclosed technology are illustrated. However, the disclosed technology may be embodied in other forms and should not be construed as limited to the exemplary embodiments disclosed. The exemplary embodiments disclosed are provided to fully convey the scope of the disclosed technology to those skilled in the art. Those skilled in the art will appreciate that the steps, services, and functions set forth herein can be implemented using separate hardware circuitry, software that works in conjunction with a programmed microprocessor or general-purpose computer, one or more application-specific integrated circuits (ASICs), one or more field-programmable gate arrays (FPGAs), and / or one or more digital signal processors (DSPs).

[0029] It will also be appreciated that, when described in terms of method, this disclosure can also be implemented as an apparatus including one or more processors and one or more memories coupled to the one or more processors, wherein computer code is loaded to implement the method. For example, in some embodiments, the one or more memories may store one or more computer programs that, when executed by the one or more processors, cause the apparatus to perform the steps, services, and functions disclosed herein.

[0030] It should also be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. It should be noted that, as used in the specification and appended claims, unless the context clearly specifies otherwise, the articles “a,” “an,” “the,” and “the” are intended to indicate the presence of one or more elements. Thus, for example, in some contexts, references to “a unit” or “the unit” may refer to more than one unit, etc. Furthermore, the words “comprising,” “including,” and “containing” do not exclude other elements or steps. It should be emphasized that, when used in this specification, the term “comprising” is used to indicate the presence of the stated feature, integer, step, or component. It does not exclude the presence or addition of one or more other features, integers, steps, components, or groups thereof. The term “and / or” should be interpreted as meaning “both” and that each is optional.

[0031] It will also be understood that although the terms first, second, etc., may be used herein to describe various elements or features, these elements should not be limited by these terms. These terms are used only to distinguish one element from another. For example, a first machine learning model may be referred to as a second machine learning model, and similarly, a second machine learning model may be referred to as a first machine learning model, without departing from the scope of the embodiments. Both the first machine learning model and the second machine learning model are machine learning models, but they are not the same machine learning model.

[0032] As used herein, the phrase "one or more" in a set of elements (e.g., "one or more of A, B, and C" or "at least one of A, B, and C") should be interpreted as conjunction or disjunction. In other words, it can refer to all elements in a set of elements, one element, or a combination of two or more elements. For example, the phrase "one or more of A, B, and C" can be interpreted as A or B or C, A and B and C, A and B, B and C, or A and C.

[0033] As used herein, the term “in response to…” can be understood, depending on the context, as meaning “when…”, “as soon as…”, or “if.” Similarly, the phrase “in response to successfully [identifying at least one scene sample]” can be interpreted as “when it is determined that at least one scene sample has been identified,” “in the case that at least one scene sample has been identified,” “once at least one scene sample has been identified, then…”, etc.

[0034] Overview

[0035] As previously described, the disclosed technology relates to synthetic data generation for the development of Automated Driving Systems (ADS). More specifically, it can be used to generate valid, reliable, and accurate data for previously unseen scenarios using a database of existing scenarios. The disclosed technology implements the use of as many scenario spaces as possible more efficiently than simply relying on a fleet of vehicles experiencing all possible scenarios. This technology relies on the fact that small perturbations or transformations to the recorded sensor data can be performed while achieving the desired reliability of the generated synthetic sensor data. In this context, reliability refers to the generated synthetic sensor data appearing realistic to the perception system applied to it (i.e., looking and feeling like real sensor data). For example, assuming that real or synthetic data depicts the same scenario, a similar output from the perception system applied to either real or synthetic data is expected. The reliability of synthetic sensor data can involve several different aspects that can indicate the overall quality and credibility of the synthetic sensor data in representing real-world scenarios and supporting robust and reliable testing and development of ADS (or any subsystem thereof). Reliability can, for example, encompass the validity, reliability, and / or accuracy of the synthetic sensor data. In other words, reliability can be viewed as a collective term for one or more aspects. More specifically, validity can refer to the degree to which synthetic sensor data reflects (or conforms to) the characteristics of sensor data or scenarios in the real world. In other words, validity can be a measure of how credible synthetic sensor data is for its intended purpose without introducing artifacts or inconsistencies that could lead to erroneous conclusions. Reliability can reflect the level of consistency of the generated synthetic sensor data. In other words, reliability can mean that the generated data consistently exhibits the same characteristics and behavior under similar conditions, thus ensuring reproducibility and predictability in the use of the generated data for testing or analysis. Accuracy can refer to the degree to which the synthetic sensor data matches the scenario it is supposed to describe. In other words, how close it is to the expected or requested scenario.

[0036] Recent examples of techniques for perturbing data include using neural radiation fields (NeRFs) to shift the viewpoint of an image or even move, add, or remove objects in a context, and using generative models to alter the texture and appearance of certain objects or parts of a context. Generative models can refer to generative adversarial networks (GANs), denoising diffusion probabilistic models (DDPM), and normalized flows, among others. Perturbations or transformations using this approach can render alternative scenes or contexts from input to the original raw data. As implemented by the disclosed techniques, it is assumed that the transformation performed has a reasonable magnitude to make such an alternative scene appear realistic, and therefore, a means of rendering synthetic data based on samples of the original data can be provided. The disclosed techniques use this to combine such a transformation system with a query system for new scenes to achieve a system that can render the queried scene beyond the collected raw data to a desired level of reliability. For this purpose, the disclosed techniques utilize a scene space (also known as a multidimensional space). Reliability can further indicate the fidelity of the generated synthetic sensor data. More specifically, a fidelity representative of real sensor data can be expected.

[0037] definition

[0038] Throughout this disclosure, references are made to various machine learning techniques commonly referred to as machine learning models (or simply "models"). This means, herein, any form of machine learning algorithm, such as a deep learning model or a neural network, capable of learning and adapting from input data and subsequently performing predictions, decisions, classifications, or any other related tasks based on new data.

[0039] Deploying a machine learning model typically involves a training phase where the model learns from labeled or unlabeled training data to achieve accurate predictions during a subsequent inference phase. Training data (and input data during inference) can be, for example, images or image sequences, LiDAR data (i.e., point clouds), radar data, or any other form of data. Furthermore, training / input data can include combinations or fusions of one or more different data types. Additionally or in combination, it can include combinations or fusions of two or more instances of the same data type, such as images from two or more different cameras.

[0040] In some embodiments, the machine learning model may be implemented using publicly available suitable software development machine learning code elements (such as code elements available in PyTorch, TensorFlow, and Keras, or any other suitable software development platform) in any suitable manner known to those skilled in the art.

[0041] An example of this machine learning technique mentioned below is the so-called Neural Radiation Fields (NeRFs). NeRFs are an example of a way to provide a learnable (through backpropagation) representation of a scene and can be used in conjunction with the rendering process. Therefore, NeRFs are examples of learned, rendering-based scene representations. As the name suggests, NeRFs utilize radiation fields and are thus radiation-based techniques. Furthermore, NeRFs are neural-based in the sense that they are (at least in part) constructed from neural networks. For example, NeRFs are capable of rendering new viewpoints in a recorded scene or changing the presence or position of objects in the scene.

[0042] More specifically, NeRF is a neural network that can reconstruct a three-dimensional scene from a partial set of two-dimensional images (or other sensor data types). NeRF can learn the context geometry, objects, and angles of a particular scene. This can be learned, for example, from how light travels through the scene. It can then be used in rendering realistic 3D (or 2D) views from different viewpoints and different sensor data types. These views can be rendered as 2D or 3D views. These views can be further generated in one-dimensional time to generate dynamic scenes. Therefore, these views can also be rendered as 4D views. NeRF is typically constructed from so-called multilayer perceptrons (MLPs), which are fully connected neural network architectures. This network can be trained to map spatial coordinates and viewing directions (e.g., light rays from points in an image) to color and density values. The MLP uses a set of mathematical structures to organize the inputs (e.g., position in 3D space or 2D viewing direction) to determine the color and density values for each point in the 3D image.

[0043] The NeRF needs to be trained (i.e., learned) for each unique context using sensor data (e.g., images) from different viewpoints. Furthermore, the sensor's position and orientation need to be known, requiring sensor tracking. This can be accomplished through some combination of SLAM, GPS, or inertial measurement. Alternatively, it can be done, for example, with the help of a neural network, after the capture of data from the analysis of the sensor data.

[0044] The training process for NeRF can generally be described as follows, using a camera as an example. For each sparse camera (and image) viewpoint provided, camera rays are tracked through the context, generating a set of 3D points with a given radiation direction (the direction entering the camera). For these points, a Model-Level Processing (MLP) is used to predict the volume density and emitted radiation. Given the density, the colors along the rays can be weighted together to give information about occlusion (i.e., objects blocking the rays). The rendered image is then generated using classical volume rendering. The error between the rendered image and the original image can be minimized across multiple viewpoints (e.g., through gradient descent), thus encouraging the MLP to develop a coherent model of the context.

[0045] Another example of a learned rendering-based method for representing a scene is Gaussian splashing. Like NeRFs, Gaussian splashing is a radiation-based technique that includes rasterization. More specifically, Gaussian splashing is a volume rendering technique that processes volume data directly without converting the data into surface or line primitives. This technique integrates sparse points generated during camera calibration and uses a 2D or 3D Gaussian distribution that preserves the properties of a continuous volumetric radiation field to represent the scene. Sparse points (or point clouds) can be initialized, for example, randomly and / or obtained from LiDAR point clouds. The Gaussian distribution may have positions that vary over time and can therefore be used to render dynamic 4D scenes (i.e., including the time dimension).

[0046] In this scenario, the scene representation can include a set of Gaussian distributions. This set of learnable parameters can then correspond to the position, size, rotation, and spherical harmonics of the Gaussian distribution. Rendering can be accomplished by projecting the 3D (or dynamic 4D) Gaussian distribution onto the image plane. Then, for each pixel, the algorithm iterates through the Gaussian distribution of the splashes based on the distance of the Gaussian distribution to the current camera position, accumulating the density and color of the Gaussian distribution.

[0047] As an alternative to rendering-based techniques, generative models can also be used to transform sensor data. Examples of this machine learning technique include Generative Adversarial Networks (GANs) and diffusion models.

[0048] Generative Adversarial Networks (GANs) are machine learning frameworks consisting of two neural networks, a generator and a discriminator, that compete against each other in a zero-sum game. The generator creates data that resembles samples from the real world (e.g., images, audio, or text), while the discriminator evaluates whether a given input is real (from a dataset) or fake (generated by the generator). Through this adversarial process, the generator can improve its ability to create realistic outputs, while the discriminator can improve its ability to distinguish between real and fake data. This dynamic results in the generation of highly realistic synthetic data.

[0049] Diffusion models are a class of generative machine learning models used to create synthetic data that includes sensor data. They work by progressively transforming random noise into structured data via an iterative process. During the training phase, the model learns to reverse the process of incrementally adding noise to real data, effectively destroying the structure of the real data. In the generation phase, the model applies the learned inverse process to generate new data from the random noise that is similar to the initial training data.

[0050] Diffusion models are particularly well-suited for generating synthetic sensor data because they can capture fine details and complex patterns, making them suitable for creating realistic representations of inputs such as images, point clouds, or time-series data.

[0051] All of the above techniques can be used as part of the aforementioned transformation system. This can be used in conjunction with a query system that is based on so-called embedded networks (also known as “encoding networks,” “embedded neural networks,” or “embedded artificial neural networks”).

[0052] Embedded networks refer to a set of computational models or techniques used to enable computers to generate embedded input data (e.g., sensor data, text data, etc.), where "embedded" is a mathematical (vector) representation of the input data. More specifically, embedded networks can be used to transform input data into a more compact representation in a multidimensional space while preserving meaningful relationships between input data points.

[0053] Embedding networks are used, for example, in tasks like Natural Language Processing (NLP) and Computer Vision. These networks take raw input data, such as words in a sentence or pixels in an image, and transform them into fixed-size numeric vectors (embedded ones) that capture the necessary features or characteristics of the input data. More specifically, in NLP, embedding networks transform words into numeric vectors, where words with similar meanings or contextual usages are represented more closely together in the embedding space. Similarly, in computer vision, embedding networks transform images into numeric vectors, enabling the network to understand visual similarities, such as grouping similar objects or situations more closely together in the embedding space (a multidimensional (vector) space).

[0054] Embedded networks themselves can comprise layers of a neural network architecture, typically employing techniques such as convolutional layers, recurrent layers, fully connected layers, attention layers, or transform layers to learn and extract meaningful patterns from input data. Embedded networks can be trained through processes like supervised learning, unsupervised learning, or self-supervised learning to optimize embeddings for specific downstream tasks such as classification, clustering, or recommendation.

[0055] Different data types can be embedded using different embedding networks. These different embedding networks can then be trained to generate embeddings in the same embedding space (the same multidimensional space) such that context-, spatially, and / or temporally relevant embeddings (generated by different embedding networks) point to the same point within the multidimensional space. The term "pointing to the same point within the multidimensional space" should be interpreted broadly in the following text and encompasses "pointing to substantially the same direction within the multidimensional space" or "pointing to substantially the same point within the multidimensional space," etc. More specifically, having two embedding vectors pointing to the same point or the same direction allows for the inference of relationships between two underlying data samples. For example, if there are two embedding vectors, it is possible to calculate how close they are to pointing to the same point or how close they are to pointing to the same direction in order to determine the relationships between underlying data samples, where the closer they are to pointing to the same point or the same direction, the more likely the underlying data samples are to be related to the same object or context.

[0056] For example, this can be accomplished by training a first embedding network to generate embeddings in a multidimensional space based on input data from a first data source. Then, each of the other embedding networks can be trained "against" the first embedding network or in association with the first embedding network (or any of the other trained embedding networks) such that the contextual, spatial, and / or temporal embeddings of the other networks that are related to the embeddings of the first embedding network point to the same point in the multidimensional space as the related embeddings of the first embedding network. For example, if the first embedding network is trained to generate image embeddings for camera images and the second embedding network is designed to generate embeddings for LiDAR data, the second embedding network can be trained by feeding it LiDAR data of a context in which the corresponding image embeddings will be used as the basis for forming the real data (the desired output). By performing this process on each subsequent embedding network, a set of embedding networks capable of absorbing outputs from various data sources and outputting corresponding embeddings in which contextual, spatial, and / or temporal relationships are represented by the proximity or directional similarity of the embeddings (vectors) in the multidimensional space.

[0057] Below, we refer to scene embedding networks and query embedding networks. As generally described above, both refer to embedding networks. The different names should only be understood as indicating different functions of the embedding networks. Some scene embedding networks are configured to generate scene embeddings of scene samples based on sensor data recorded in the vehicle. Then, query embedding networks are configured to generate query embeddings of query scenes. The following will combine... Figure 1 This will be further elaborated upon.

[0058] It should be noted that the disclosed techniques are not limited to the examples of machine learning techniques described above. For example, as those skilled in the art will recognize, other machine learning techniques employing some of the aspects described above, as well as entirely different techniques, can be used.

[0059] The vehicle's surrounding environment can be understood as the general area around the vehicle in which objects (e.g., traffic signs or other vehicles, landmarks, obstacles, etc.) can be detected and identified by the vehicle's sensors (radar, lidar, cameras, etc.) (i.e., within the vehicle's sensor range). Therefore, sensor data can depict the world around the vehicle. In other words, the surrounding environment can refer to the world around the vehicle relevant to its decision-making and control.

[0060] The term "synthetic" in the context of synthetic sensor data (in the sense that sensor data is machine-generated, computer-generated) means "synthetic," not recorded or otherwise collected real-world data. However, it should be understood that, as described below, synthetic sensor data can be generated from "real" sensor data, for example, by performing a transformation on the real sensor data. In the current context, synthetic sensor data can be considered as transformed sensor data—that is, the original sensor data after it has undergone some kind of transformation.

[0061] Example

[0062] Figure 1 This is a schematic flowchart representation of a computer-implemented method 100 for generating synthetic sensor data. More specifically, method 100 can be a method for generating synthetic sensor data for a requested driving scenario (or query scenario). The generated synthetic sensor data can be used for scenario-based testing of the Adaptive Controller (ADS). Method 100 can be executed by a general-purpose computing device such as a server (also referred to as a remote server, cloud server, central server, back-end server, fleet server, or back-end server). More specifically, method 100 can be executed by a server's processing system. The processing system may, for example, include one or more processors and one or more memories coupled to the one or more processors, wherein the one or more memories store one or more programs that, when executed by the one or more processors, perform the steps, services, and functions of method 100 disclosed herein.

[0063] The different steps of method 100 are described in more detail below. Even though explained in a specific order, the steps of method 100 can be performed in any suitable order and multiple times. Therefore, although... Figure 1A specific order of method steps may be shown, but the order of steps may differ from the depicted order. Additionally, two or more steps may be performed simultaneously or partially simultaneously. This variation will depend on the chosen software and hardware system and the designer's choices. All these variations are within the scope of this invention. Similarly, software implementations can be accomplished using standard programming techniques with rule-based logic and other logic to complete the various steps. Further variations of method 100 will become apparent from this disclosure. The embodiments mentioned and described herein are given by way of example only and should not be limiting of the invention. Other solutions, uses, objectives, and functions within the scope of the invention as claimed in the patent claims described below should be apparent to those skilled in the art. It should be further understood that... Figure 1 Method 100 includes steps illustrated as solid-line boxes and steps illustrated as dashed lines. The steps shown as solid lines are those included in the most extensive exemplary embodiment of method 100. The steps included in dashed lines are examples of multiple optional steps that may form part of multiple alternative embodiments. It should be understood that optional steps do not need to be performed sequentially. Furthermore, it should be understood that not all steps need to be performed. Example steps can be performed in any order and in any combination. For example, method 100 may optionally include step S104. Alternatively, or in combination with step S104, the method may optionally include a step indicated as S110. Alternatively, or in combination with steps S104 and / or S110, method 100 may optionally include a step indicated as S112.

[0064] Method 100 uses a scene database. The scene database can be considered a collection of existing scene samples. More specifically, the scene database includes multiple scene samples. Each scene sample includes sensor data depicting the vehicle's surrounding environment over a period of time. In other words, the scene samples in the scene database can include sequences of sensor data. Therefore, the sensor data for each scene sample can be multiple sensor records over a period of time (e.g., image frames that make up a video sequence). This time period can extend over at least two subsequent time points. The sensor data depicting the scene can, for example, include two or more sensor data frames. However, the time period can also be a single time point. Therefore, the sensor data associated with a scene sample can include the sensor data frame at that time point. It should be noted that the sensor data associated with a scene sample can include sensor data of one or more sensor data modalities. In other words, the sensor data can include one or more sensor data types such as image data, LiDAR data, radar data, ultrasonic data, etc. Furthermore, the sensor data can include sensor data captured by two or more instances of the same sensor data type, such as image data from two or more cameras.

[0065] It is important to note that even if a scene sample in the scene database refers to "one" or "that" vehicle, the scene sample can of course be captured by / for multiple different vehicles. Therefore, the vehicle referenced for a particular scene sample refers to any vehicle that has captured the corresponding sensor data or any vehicle otherwise associated with said sensor data (e.g., as described in said sensor data). Thus, the sensor data can be captured by the vehicle's onboard sensors. Alternatively, the sensor data can be captured by non-vehicle sensors, such as sensors of roadside infrastructure or sensors of other road users (e.g., other vehicles), taking into account the vehicle's surrounding environment.

[0066] A driving scenario (or situation) can be understood as a particular situation or a series of events. A scenario can also be described as a situation that evolves over a period of time. Driving scenarios can range from common situations (e.g., following another vehicle on a highway) to more rarer edge cases (e.g., swerving to avoid an obstacle while merging onto a busy road). The purpose of ADS development is to ensure that ADS can effectively handle both typical and exceptional situations.

[0067] Furthermore, the scenario can be defined by a set of environmental conditions under which the vehicle operates during the stated time period. This can encompass various environmental factors that may affect the driving experience, vehicle performance, and how it operates. A driving scenario can include, for example, a specific route, geographical location, type of driving environment (e.g., school zone, urban environment, highway section, etc.), type of road (e.g., highway, street, rural road, intersection, and roundabout), presence and type of other road users, time of day (e.g., morning, noon, evening, night, etc.), traffic level (e.g., peak hours, low traffic density, etc.), weather conditions, road conditions, lighting level, and traffic conditions (e.g., speed, distance from other road users). It should be understood that a driving scenario can be defined by any combination of the examples above. As a non-limiting example, the scenario can be defined as "driving on a city street, in heavy rain, with pedestrians crossing the road, and in unobstructed traffic."

[0068] The term "sample" in "scene sample" can be viewed as an instance of a scene within the scene database. More specifically, a scene sample can be considered as an existing scene that has been recorded and presented in the scene database. Each scene sample can be associated with a set of data such as sensor data collected by onboard sensors and other data added to the sample, as will be further explained below.

[0069] The term "query scenario" can refer to the requested or desired scenario. A query scenario can be a scenario that does not currently exist in a scenario database but is desired to be obtained. A query scenario can be represented by a textual description of the query scenario, such as sentences like "driving on a city street, in heavy rain, with pedestrians crossing the road, and in smooth traffic." Alternatively or in combination, the query scenario can be represented by a computer-simulated scenario. For example, a computer-simulated scenario can be generated by computer graphics. More realistic sensor data of the simulated scenario may then be desired, which can be achieved through synthetic sensor data generated by the proposed method 100. More specifically, the computer-simulated scenario can be represented by a scenario embedding that behaves like "real" sensor data. Method 100 then provides means for generating synthetic sensor data corresponding to the scenario embedding.

[0070] Each scene sample is associated with a scene embedding that represents the scene sample in a multidimensional space. The scene embedding associated with the scene sample can be generated by processing the sensor data of the scene sample via a scene embedding network that has been trained to process data from the input sensor data and output the corresponding scene embedding in a multidimensional space.

[0071] In this paper, the multidimensional space (also referred to as the scene space or embedding space) refers to the mathematical space in which high-dimensional data (e.g., sensor data) can be transformed and represented as low-dimensional vector representations (called embeddings). Multidimensional spaces can be structured so that embeddings capture meaningful patterns, relationships, or features from the original data, enabling efficient processing, comparison, and analysis.

[0072] More specifically, in an embedding space, similar sensor data (e.g., frames from similar driving scenarios) can be mapped to points that are close together, while dissimilar data are mapped to points that are far apart. This is helpful for tasks such as classification, clustering, retrieval, and anomaly detection in autonomous driving systems. For example, the embedding space can be used (e.g., through clustering) to group sensor data from similar driving scenarios or (e.g., through matching algorithms) to identify rare or challenging events for training and testing purposes.

[0073] Embedding spaces can further enable mapping between different data modalities. More specifically, a multidimensional space can be a common space for two or more data modalities. For example, different types of sensor data (e.g., image data, LiDAR data, radar data, etc.) and text data (e.g., scene descriptions) can be mapped to the same embedding space. Thus, by comparing with embeddings associated with image data, such as query embeddings generated for text data, a query embedding can be used to identify, for example, image data. In some examples, the multidimensional space is a common space for both image and text data.

[0074] In the current context, the multidimensional space involves a space spanning different possible driving scenarios, i.e., the "scene space." Therefore, analysis of the scene space can provide information about which scenarios are (or are not) covered by existing scene samples in the scene database. Furthermore, the scene space can enable, for example, the identification of specific scenarios corresponding to a given query scenario, thereby aiding in the retrieval of relevant data from the scene database.

[0075] The construction of data embeddings (or vector representations or encodings) typically involves machine learning models such as neural networks, for example, machine learning models trained to learn representations that preserve the underlying semantics of the input data. As mentioned earlier, such networks can be called embedding networks. Therefore, scene embeddings for a scene database can be generated by processing each scene (or corresponding sensor data) via one or more scene embedding networks. Different scene embedding networks can be used for different types of sensor data. Furthermore, the sensor data for scene samples can include more than one sensor data type, such as two or more sensor data types including image data, LiDAR data, radar data, etc. More generally, the sensor data for scene samples can include sensor data of a first sensor type and sensor data of a second sensor type. Each scene embedding can be formed by aggregating a first sensor embedding generated for sensor data of the first sensor type and a second sensor embedding generated for sensor data of the second sensor type. In other words, separate embeddings can be generated for different sensor data types. Scene embeddings can then be formed by aggregating (or combining) embeddings of different sensor data types in any other way. However, it should be noted that a single embedding network can be trained to directly generate scene embeddings for two or more sensor data types (i.e., two or more sensor modalities).

[0076] Each scene embedding (or more precisely, each scene sample) is further associated with a transform body in a multidimensional space. The transform body indicates a set of possible transformed scenes that can be generated from the corresponding scene sample. Therefore, the transform body of a particular scene sample can be viewed as a subspace within the multidimensional space that covers a set of accessible transforms from said scene sample. In other words, the transform body spans a set of scene embeddings associated with the synthetic sensor data that can be generated from the scene sample. The transform body can be further viewed as a finite volume within the multidimensional space or a constraint on the transforms within the multidimensional space. The constraint on the transform body is the extent to which the initial sensor data can be modified while maintaining a certain level of realism (e.g., meeting a certain reliability threshold). Otherwise, deviating too far from the initial sensor data introduces artifacts, biases, or other errors, which reduces the utility of ADS development. Therefore, the transform body can further indicate a set of possible transformed scenes that can be generated from the scene sample while meeting validity thresholds, reliability thresholds, and / or accuracy thresholds. As previously mentioned, the validity threshold can refer to the extent to which the synthetic sensor data reflects the characteristics of sensor data in the real world. In other words, the validity threshold can be a measure of the credibility of synthetic sensor data for its intended purpose without introducing artifacts or inconsistencies that could lead to erroneous conclusions. The reliability threshold can be a measure of how consistently synthetic sensor data can be generated. The accuracy threshold can be a measure of how well the synthetic sensor data matches the query scenario. In some embodiments, a trustworthiness threshold can be used. The trustworthiness threshold can capture two or more aspects of the validity threshold, reliability threshold, and accuracy threshold. In other words, the trustworthiness threshold can be a measure of how realistic the synthetic sensor data appears or how well it represents sensor data in the real world. The following will combine... Figure 4 and Figure 5 Further explanation of the transformation.

[0077] It is important to note that the scene database can be constructed in different ways depending on the specific implementation. For example, the scene database can be represented by a single database that includes all the data of the scene database described herein. In another example, the data can be distributed across several databases linked together to form the scene database. As an example, the sensor data associated with each scene sample can be stored in a first database. The associated scene embeddings, along with the associated transforms, can be stored in a second database. The first and second databases can then be linked by scene sample identifiers. It should also be noted that the scene database can include additional data. As will be further explained below, for example, each scene sample can have an associated learned, rendering-based scene representation or other means for transforming the sensor data.

[0078] Method 100 includes obtaining a request for a specified query scenario in S102, wherein the query scenario is associated with a query embedding representing the query scenario in a multidimensional space. The query embedding may be received as part of the request to obtain the query scenario. Alternatively, method 100 determines the query embedding based on the obtained query scenario.

[0079] Query embeddings can be generated by processing query scenarios via a query embedding network. The query embedding network is trained to process data from the input query scenario and output the corresponding query embedding in a multidimensional space. The query scenario can be, for example, a textual description of the query scenario or a computer-simulated scenario representation. As previously mentioned, the query embedding network and the scenario embedding network can be trained in association with each other such that when the query embedding and scenario embedding are contextually, spatially, and / or temporally related, the query embedding generated by the query embedding network and the scenario embedding generated by the scenario embedding network point to the same point in the multidimensional space. In other words, they can be trained to be related to the same multidimensional space.

[0080] The term "obtain" is to be interpreted broadly herein and encompasses the receiving, retrieving, collecting, acquiring, etc., directly and / or indirectly between two entities configured to communicate with each other or further with other external entities. However, in some embodiments, the term "obtain" will be interpreted as determining, deriving, forming, calculating, etc.

[0081] In this specific case, obtaining the S102 request may include, for example, receiving a request from a developer. In another example, as part of method 100, the requested query scenario may be determined or identified based on existing scenario samples in a scenario database. For example, the query scenario may be determined by a computing device (e.g., a server) performing method 100 by identifying “holes” or “blank spaces” in a multidimensional space that are not covered by existing scenario samples or their transformations.

[0082] Method 100 further includes: identifying at least one scene sample within a scene database, the at least one scene sample having a transform where a query embedding is located. This can be accomplished by comparing the location of the query embedding in multidimensional space with the corresponding transform associated with the scene sample in the scene database. In other words, method 100 may include finding (or searching) one or more existing scene samples that can be transformed into scene embeddings based on the associated transforms of one or more existing scene samples. In other words, as part of step S106, one or more scene samples having associated transforms covering the query embedding are identified. The following will combine... Figure 5 This section further elaborates on how to perform the identification of at least one scene sample in S106.

[0083] Identified scene samples can be further identified based on the distance between associated scene embeddings and query embeddings in a multidimensional space. In one example, a scene sample can be identified as the closest / most similar scene sample in the scene database based on the matching score between the scene embedding and the query embedding. The matching score can be determined, for example, based on Euclidean distance or cosine similarity.

[0084] In another example, the identified scene samples can be further identified based on an estimate of the probability that the scene sample can be reliably transformed into a query scene. This can be advantageous where the closest (in terms of distance) scene sample is not necessarily the best scene sample to be transformed into the query scene. For example, a scene sample that may be more distant (but still has a transform that covers the query embedding) may be more suitable. In such an implementation, the transform can be further associated with the probability distribution of how likely the associated scene sample is to be transformed into a point in a multidimensional space. For example, two scene samples may have transforms that cover the query embedding. However, at the point corresponding to the query embedding, one scene sample may have a higher (reliable) probability of being transformed into the query embedding compared to the other scene sample. In other words, a scene sample can be identified S106 as the scene sample with the highest probability of being transformed into synthetic sensor data with an associated scene embedding corresponding to the query embedding.

[0085] Method 100 further includes: in response to successfully identifying at least one scene sample, generating synthetic sensor data corresponding to the query scene in S108 by transforming sensor data of at least one identified scene sample into synthetic sensor data having associated scene embeddings within a threshold distance from the query embedding in a multidimensional space. In other words, the synthetic sensor data is generated by transforming at least one existing scene sample such that the scene embedding generated for the synthetic sensor data is within a threshold distance from the query embedding. The threshold distance provides an error margin, ensuring that the synthetic sensor data is “sufficiently” similar to the query scene. In other words, synthetic sensor data can be generated to correspond to the query scene based on similarity to the query embedding in the multidimensional space. The threshold distance can be compared with the Euclidean distance between the query embedding and the scene embedding of the synthetic sensor data. In another example, the threshold distance can be compared with the cosine similarity between the query embedding and the scene embedding of the synthetic sensor data. However, it should be noted that other similarity measures can also be used.

[0086] Furthermore, synthetic sensor data can be generated such that the associated scene is embedded within a transform body of at least one scene sample. This ensures that the synthetic sensor data meets any requirements for realism.

[0087] Synthetic sensor data can include one or more sensor data frames. For example, synthetic sensor data can include multiple subsequent sensor data frames (e.g., image frames, LiDAR point clouds, radar data, etc.). Therefore, in some cases, synthetic sensor data can be video streams over two or more time instances. In another example, synthetic sensor data can be a sensor data frame at a single time point. Furthermore, synthetic sensor data can include sensor data from one or more sensor modalities (i.e., one or more sensor data types).

[0088] The following section will explain aspects related to the transformation of sensor data used to generate synthetic sensor data.

[0089] Transforming sensor data from at least one identified scene sample into synthetic sensor data can be performed by the following steps: (i) applying the transformation to the sensor data to generate updated sensor data; (ii) determining the location of the embedding representing the updated sensor data in a multidimensional space; repeating steps (i) and (ii) until the location of the updated sensor data is within a threshold distance of the query embedding, and providing the updated sensor data as synthetic sensor data. In other words, synthetic sensor data can be generated through an iterative process of transforming (initially) sensor data until it corresponds to a query embedding (at least within a threshold distance). This transformation may include changing the viewpoint, adding / removing objects in the depicted surrounding environment, changing object attributes (e.g., modifying color, texture, or material), changing the weather, changing lighting conditions, changing road layout, changing object trajectories, changing sensor characteristics, changing available sensors, traversing previously unseen areas, etc.

[0090] The iterative transformation process described above can be combined with different machine learning techniques.

[0091] In some examples, generative machine learning techniques can be used. For instance, transforming sensor data from at least one identified scene sample into synthetic sensor data may include feeding the sensor data into a generative adversarial network (GAN) or diffusion model trained to output synthetic sensor data. The GAN and / or diffusion model can be further trained to take a query embedding as input. In another example, the GAN and / or diffusion model can be trained to take a query scene as input. The output synthetic sensor data can be viewed as transformed sensor data.

[0092] Alternatively or in combination, rendering-based techniques can be used. For example, each scene sample in the database can be further associated with a learned rendering-based scene representation configured for subsequent rendering of synthetic sensor data associated with the scene sample. Thus, a scene representation can be learned from the sensor data of the scene sample. Transforming the sensor data of at least one identified scene sample into synthetic sensor data can include rendering the synthetic sensor data using the learned rendering-based scene representation. Therefore, the process of rendering the synthetic sensor data can be rendering a transformed scene that differs from the scene from which the context representation was learned. This can be accomplished, for example, by modifying the parameters of the context representation.

[0093] A scene representation can be understood as a set of learnable parameters that describe a situation together, including different physical properties such as geometry, objects, color, and lighting. In other words, a scene representation can be physically based in such a way that it understands and models the underlying physical processes occurring in the real world and how sensors reflect these processes in sensor data (consider, for example, projection, refraction, lenses, etc.). For example, this can be based on material properties and modeling how light propagates in the environment. Thus, the scene representation can learn to model geometric aspects such as the location, orientation, and scale of a 3D model. It can further model lighting aspects such as color, shadows, brightness, and reflection. It can further model the transparency and translucency that describe how light passes through different materials like glass or fog. The learned, render-based scene representation could be a neural radiation field. In another example, the learned, render-based scene representation could be a Gaussian splash-based model.

[0094] It should also be noted that combinations of the above techniques are possible. As an example, a learned, render-based scene representation, such as NeRF, can be trained. The diffusion model can then be used to perform direct editing on the NeRF representation.

[0095] In some embodiments, method 100 further includes dividing the query scene into multiple subqueries S104, each subquery being associated with a corresponding subquery embedding. The identification step S106 may include identifying scene samples for each subquery with a transformation containing an associated subquery embedding. Synthetic sensor data can be generated S108 by transforming a combination of sensor data from each identified scene sample. In other words, the query scene can be constructed using multiple scene samples from a scene database. In a non-limiting example, the query scene is “a blue bus cuts into a roundabout.” First, scene samples depicting the roundabout can be identified. Then, another scene sample of the blue bus can be identified. A model of the cutting trajectory can then be generated. Subsequently, the different scene samples can be combined in the generation step to form synthetic sensor data depicting the query scene. Optionally, a diffusion model can be used to change the color / lighting / shadow of the blue bus to better fit the roundabout. Furthermore, the diffusion model can be used to change the roundabout from summer time to winter time.

[0096] Method 100 may further include storing the synthetic sensor data (S110) for subsequent use in the development of the autonomous driving system. The synthetic sensor data may be stored (S110) in a scene database. Method 100 may further include applying features from the ADS (Autonomous Driving System) test to the synthetic sensor data.

[0097] Method 100 may further include: in response to the failure to successfully identify a scene sample, transmitting a data collection request S112 to one or more vehicles in the fleet. In other words, if existing scene samples in the scene database cannot be effectively, reliably, and / or accurately transformed into a query scene, the data collection request S112 may be transmitted instead. The data collection request may indicate a query scene. For example, the data collection request may include a query embedding. If a vehicle in the fleet experiences a scene that matches the query scene (and records sensor data of that scene), the vehicle may transmit the associated sensor data to a computing device or any device performing method 100. The data collection request may be generated differently depending on how far the query embedding is from any existing scene embeddings (or their associated transforms). For example, if the query scene has a high priority (e.g., based on its relative distance from existing scene samples), a request for raw sensor data for the recorded scene may be transmitted. In another example, if the query scene has a low priority (e.g., based on its relative proximity to existing scenes), a request for encoded (or compressed) sensor data or a scene that can be transformed into the query scene may be transmitted.

[0098] Executable instructions for performing these functions may optionally be included in a non-transitory computer-readable storage medium or other computer program product configured to be executed by one or more processors.

[0099] Generally, computer-accessible media can include any tangible or non-transitory storage medium or memory medium, such as electronic, magnetic, or optical media, for example, a disk or CD / DVD-ROM coupled to a computer system via a bus. As used herein, the terms “tangible” and “non-transitory” are intended to describe computer-readable storage media (or “memory”) that do not include propagating electromagnetic signals, but are not intended to otherwise limit the types of physical computer-readable storage devices included in the term computer-readable media or memory. For example, the terms “non-transitory computer-readable medium” or “tangible memory” are intended to cover types of storage devices that do not necessarily permanently store information, including, for example, random access memory (RAM). Program instructions and data stored in a non-transitory form on a tangible computer-accessible storage medium can be further transmitted via a transmission medium or a signal such as an electrical, electromagnetic, or digital signal, which can be transmitted via a communication medium such as a network and / or a wireless link.

[0100] Figure 2 This is a schematic illustration of a computing device 200 according to some embodiments of the disclosed technology. The computing device 200 can be configured to perform actions such as those described in conjunction with… Figure 1 Method 100 is described. Therefore, computing device 200 is configured to use the scene database as described above to perform the generation of synthetic sensor data.

[0101] As described herein, computing device 200 refers to a computer system or any device or general-purpose computing system configured to perform various functions. Computing device 200 may, for example, refer to a server. Although computing device 200 is illustrated herein as a single device, it can be a distributed computing system comprised of multiple different devices.

[0102] The computing device 200 includes a control circuit 202. The control circuit 202 may physically comprise a single circuit device. Alternatively, the control circuit 202 may be distributed across several circuit devices.

[0103] like Figure 2 As shown in the example, computing device 200 may further include transceiver 206 and memory 208. Control circuitry 202 is communicatively connected to transceiver 206 and memory 208. Control circuitry 202 may include a data bus, and control circuitry 202 may communicate with transceiver 206 and / or memory 208 via the data bus.

[0104] Control circuitry 202 can be configured to perform overall control of the functions and operations of computing device 200. Control circuitry 202 may include processor 204, such as a central processing unit (CPU), microcontroller, or microprocessor. Processor 204 can be configured to execute program code stored in memory 208 to perform the functions and operations of computing device 200. Control circuitry 202 is configured to perform the functions and operations of computing device 200 as described above. Figure 1 The steps of method 100 are described. These steps can be implemented using one or more functions stored in memory 208.

[0105] Transceiver 206 is configured to enable computing device 200 to communicate with other entities, such as other devices. Transceiver 206 can send data to and receive data from computing device 200. Computing device 200 may, for example, be part of a vehicle. Transceiver 206 can then enable computing device 200 to communicate with other systems of the vehicle or with external entities (e.g., other vehicles or remote servers).

[0106] Memory 208 may be a non-transitory computer-readable storage medium. Memory 208 may be one or more of a buffer, flash memory, hard disk drive, removable media, volatile memory, non-volatile memory, random access memory (RAM), or other suitable means. In a typical arrangement, memory 208 may include non-volatile memory for long-term data storage and volatile memory serving as system memory for computing device 200. Memory 208 may exchange data with circuit 202 on a data bus. Accompanying control lines and address buses may also exist between memory 208 and circuit 202. As described above... Figure 1 As described, memory 208 can further store a scene database. Alternatively, the scene database can be provided externally to computing device 200. Computing device 200 can then communicatively connect to the scene database.

[0107] The functions and operations of computing device 200 can be implemented in the form of an executable logic program (e.g., lines of code, software program, etc.) stored on a non-transitory computer-readable recording medium (e.g., memory 208) of computing device 200 and executed by circuit 202 (e.g., using processor 204). In other words, when circuit 202 is configured to perform a specific function, processor 204 of circuit 202 can be configured to execute a portion of program code stored in memory 208, wherein the stored portion of program code corresponds to a specific function. Furthermore, the functions and operations of circuit 202 can be a standalone software application, or an additional task portion of the execution of a software application related to circuit 202. The described functions and operations can be considered as methods by which the corresponding device is configured to perform (e.g., in conjunction with the above). Figure 1 Method 100 is discussed. Furthermore, although the described functions and operations can be implemented in software, such functions can also be implemented via dedicated hardware or firmware, or some combination of one or more hardware, firmware, and software. The functions and operations of the computing device 200 are described below.

[0108] Control circuit 202 is configured to obtain a request for a specified query scenario. The query scenario is associated with a query embedding that represents the query scenario in a multidimensional space. This can be performed, for example, by executing the obtain function 210.

[0109] The control circuit 202 is further configured to identify at least one scene sample within a scene database, the at least one scene sample having a query embedding of a transform therein. This can be performed, for example, by executing the identification function 212.

[0110] Control circuit 202 is further configured to, in response to successful identification of at least one scene sample, generate synthetic sensor data corresponding to a query scene by transforming sensor data of at least one identified scene sample into synthetic sensor data, the synthetic sensor data having an associated scene embedding in a multidimensional space within a threshold distance from the query embedding. This can be performed, for example, by executing generation function 214.

[0111] Control circuit 202 can be further configured to divide the query scene into multiple subqueries, each subquery associated with a corresponding subquery embedding. This can be performed, for example, by executing partitioning function 216. Control circuit 202 can then be configured to identify scene samples for each subquery that have an associated subquery embedding located therein. Control circuit 202 can then be configured to generate synthetic sensor data by transforming a combination of sensor data from each identified scene sample.

[0112] The control circuit 202 can be further configured to store synthetic sensor data for later use in the development of the autonomous driving system. This can be done, for example, by performing the storage function 218.

[0113] The control circuit 202 can be further configured to transmit a data collection request to one or more vehicles in the fleet in response to an unsuccessfully identified scene sample. This can be performed, for example, by executing the transmission function 220.

[0114] It should be noted that, as mentioned above, Figure 1 The principles, features, aspects, and advantages of method 100 described herein also apply to the computing device 200 described herein. To avoid unnecessary repetition, refer to the foregoing. Therefore, the control circuitry can be configured to perform any of the steps described as part of method 100.

[0115] Figure 3 This is a schematic illustration of a vehicle 500 according to some embodiments. The vehicle 300 may be equipped with an automated driving system (ADS) 310. As used herein, "vehicle" means any form of motorized vehicle. For example, vehicle 300 can be any road vehicle such as a sedan (as illustrated herein), a motorcycle, a (freight) truck, a bus, a smart bicycle, etc. In the current context, vehicle 300 should be understood as a vehicle that can be deployed with an ADS trained using synthetic sensor data generated by the method 100 described herein. Vehicle 300 may further be a vehicle capable of collecting sensor data from different driving scenarios experienced by the vehicle.

[0116] In the current context, an Automated Driving System (ADS) refers to a complex combination of hardware and software components designed to control and operate a vehicle without direct human intervention. ADS technology aims to automate various aspects of driving, such as steering, acceleration, deceleration, and monitoring of the surrounding environment. The primary goal of ADS is to improve safety, efficiency, and convenience in transportation. Classified by standards like SAE J3016, ADS can range from basic driver assistance systems to highly advanced automated driving systems, depending on their level of automation. These systems utilize various sensors, cameras, radar, lidar, and powerful computer algorithms to perceive the environment and make driving decisions. The specific capabilities and characteristics / functions of ADS vary greatly, from systems providing limited assistance to those capable of independently handling complex driving tasks under specific conditions.

[0117] While Advanced Driver Assistance Systems (ADAS) do not necessarily provide complete autonomy, they are technologies that assist the driver during driving. ADAS functions often serve as building blocks for ADS. Examples include adaptive cruise control, lane keeping assist, automatic emergency braking, and parking assist. They enhance safety and convenience but typically require some degree of human supervision and intervention. Autonomous Driving (AD), on the other hand, is a technology designed to control and navigate a vehicle without human supervision. Accordingly, it can be said that the difference between ADAS and AD lies in the level of autonomy and control. ADAS systems are designed to assist and support the driver, while AD aims to provide complete control of the vehicle without the need for continuous human supervision. Accordingly, AD aims for a higher level of autonomy (e.g., Levels 4 and 5 according to SAE International Standards), where the vehicle can operate independently in most or all driving scenarios without human intervention. As mentioned earlier, the term "ADS" is used herein as a general term encompassing both ADAS and AD. In the current context, ADS functions or ADS features can be understood as specific functions or features of the entire ADS stack, such as highway navigation features, traffic jam navigation features, route planning features, etc.

[0118] Vehicle 300 includes several components common in autonomous or semi-autonomous vehicles. It will be understood that vehicle 300 is capable of having… Figure 3 Any combination of the various elements shown herein. Furthermore, vehicle 300 may include, in addition to... Figure 3 Other elements besides those shown herein. Although various elements are shown herein as being located inside vehicle 300, one or more of the elements can be located outside vehicle 300. Furthermore, as will be readily understood by those skilled in the art, even though various elements are depicted in a particular arrangement herein, various elements can be implemented in different arrangements. It should be further noted that various elements can be communicatively connected to each other in any suitable manner. Because the elements of vehicle 300 can be implemented in several different ways, therefore... Figure 3 Vehicle 300 should be considered merely as an illustrative example.

[0119] Vehicle 300 includes a control system 302. The control system 302 is configured to perform overall control of the functions and operations of vehicle 300. The control system 302 includes control circuitry 304 and memory 306. Control circuitry 304 may physically comprise a single circuit device. Alternatively, control circuitry 304 may be distributed across several circuit devices. As an example, control system 302 may share its control circuitry 304 with other parts of the vehicle. Control circuitry 304 may include one or more processors, such as a central processing unit (CPU), microcontroller, or microprocessor. One or more processors may be configured to execute program code stored in memory 306 to perform the functions and operations of vehicle 300. Some of the processors may be or include any number of hardware components for performing data or signal processing or for executing computer code stored in memory 306. In some embodiments, control circuitry 304 or some of its functions may be implemented on one or more so-called system-on-a-chip (SoC). As an example, ADS 310 may be implemented on an SoC. Memory 306 may optionally include high-speed random access memory such as DRAM, SRAM, DDR RAM, or other random access solid-state memory devices; and may also optionally include non-volatile memory such as one or more disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. Memory 306 may include database components, object code components, script components, or any other type of information structure for supporting various operations of this specification.

[0120] In the illustrated example, memory 306 further stores map data 308. Map data 308 can be used, for example, by the ADS 310 of vehicle 300 to perform autonomous functions of vehicle 300. Map data 308 may include high-definition (HD) map data and / or standard-definition (SD) map data. It is contemplated that even though illustrated as a separate element from ADS 310, memory 308 may be provided as an integral element of ADS 310. In other words, according to some embodiments, any distributed or local memory device can be utilized in implementations of the inventive concept. Similarly, control circuitry 304 can be distributed, for example, such that one or more processors of control circuitry 304 are provided as integral elements of ADS 310 or any other system of vehicle 300. In other words, according to exemplary embodiments, any distributed or local control circuitry device can be utilized in implementations of the disclosed technology.

[0121] Vehicle 300 further includes a sensor system 320. Sensor system 320 is configured to acquire sensory data about the vehicle itself or its surrounding environment. Sensor system 320 may, for example, include a Global Navigation Satellite System (GNSS) module 322 (e.g., GPS) configured to collect geographic location data of vehicle 300. Sensor system 320 may further include one or more sensors 324. The one or more sensors 324 may be any type of onboard sensor such as a camera, lidar and radar, ultrasonic sensors, gyroscope, accelerometer, odometer, etc. It should be understood that sensor system 320 may also provide the possibility of acquiring sensor data directly or via dedicated sensor control circuitry within vehicle 300.

[0122] Vehicle 300 further includes a communication system 326. Communication system 326 is configured to communicate with external units such as other vehicles (i.e., via vehicle-to-vehicle (V2V) communication protocols), remote servers (e.g., cloud servers), databases, or other external devices (i.e., vehicle-to-infrastructure (V2I) or vehicle-to-everything (V2X) communication protocols). Communication system 326 can communicate using one or more communication technologies. Communication system 326 may include one or more antennas. Cellular communication technologies can be used for remote communication, such as to remote servers or cloud computing systems. Furthermore, if the cellular communication technology used has low latency, it can also be used for V2V communication, V2I communication, or V2X communication. Examples of cellular radio technologies are GSM, GPRS, EDGE, LTE, 5G, 5G NR, etc., and future cellular solutions are also included. However, in some solutions, short-to-medium range communication technologies such as wireless local area networks (LANs) (e.g., IEEE 802.11-based solutions) can be used for communication with other vehicles near vehicle 300 or with local infrastructure components. ETSI is developing cellular standards for vehicle communications, and 5G is considered a suitable solution, for example, due to its low latency and efficient handling of high bandwidth and communication channels.

[0123] The communication system 326 can further provide the possibility of transmitting outputs (e.g., sensor data recorded for driving scenarios) to remote locations (e.g., remote servers, operators, or control centers) via one or more antennas. Furthermore, the communication system 326 can be further configured to enable various components of the vehicle 300 to communicate with each other. For example, the communication system can provide a local network setup such as CAN bus, I2C, Ethernet, fiber optics, etc. Local communication within the vehicle can also be a wireless type with protocols such as WiFi, LoRa, Zigbee, Bluetooth, or similar medium / short-range technologies.

[0124] Vehicle 300 further includes a control system 328. The control system 328 is configured to control the handling of vehicle 300. The control system 328 includes a steering module 330 configured to control the heading of vehicle 300. The control system 328 further includes a throttle module 332 configured to control actuation of the throttle of vehicle 300. The control system 328 further includes a braking module 334 configured to control actuation of the brakes of vehicle 300. The various modules of the control system 328 can receive manual input from the driver of vehicle 300 (i.e., from the steering wheel, accelerator pedal, and brake pedal, respectively). However, the control system 328 can be communicatively connected to the vehicle's ADS 310 to receive instructions on how the various modules should operate. Therefore, ADS 310 is able to control the handling of vehicle 300.

[0125] As described above, vehicle 300 includes ADS 310. ADS 310 may be part of the vehicle's control system 302. ADS 310 is configured to perform autonomous functions and operations of vehicle 300. ADS 310 may include multiple modules, each responsible for a different function of ADS 310.

[0126] ADS 310 may include a positioning module 312 or a positioning function block / system. The positioning module 312 is configured to determine and / or monitor the geographic location and heading of the vehicle 300, and may utilize data from sensor system 322, such as data from GNSS module 320. Alternatively or in combination, the positioning module 312 may utilize data from one or more sensors 324. Alternatively, the positioning system may be implemented as a real-time dynamic (RTK) GPS.

[0127] ADS 310 may further include a perception module 314 or a perception function block / system. The perception module 314 may refer to any known module and / or function, for example, included in one or more electronic control modules and / or nodes of the vehicle 300, adapted and / or configured to interpret driving-related sensory data of the vehicle 300 to identify, for example, obstacles, lanes, relevant signs, appropriate navigation paths, etc. Therefore, the perception module 314 may be adapted to combine, for example, sensor data from sensor system 320, relying on and obtaining input from multiple data sources such as automotive imaging, image processing, computer vision, and / or in-vehicle networking.

[0128] The positioning module 312 and / or the sensing module 314 can be communicatively connected to the sensor system 320 to receive sensor data from the sensor system 320. The positioning module 312 and / or the sensing module 314 can further send control commands to the sensor system 320.

[0129] The ADS may further include a path planning module 316. The path planning module 316 is configured to determine a planned path for the vehicle 300 based on the vehicle's perception and position, as determined by the perception module 314 and the positioning module 312, respectively. The planned path determined by the path planning module 316 can be sent to the control system 328 for execution. As an example, the determined current position of the vehicle on the navigation map can be transmitted to the path planning module 316.

[0130] The ADS may further include a decision and control module 318. The decision and control module 318 is configured to perform control of the ADS 310 and make decisions for the ADS 310. For example, the decision and control module 318 may decide whether the planned path determined by the path planning module 316 should be executed. The decision and control module 318 may be further configured to detect any deviation behavior of the vehicle, such as deviating from the planned path or expected trajectory of the path planning module 316. This includes evasive actions performed by both the ADS 310 and the vehicle's driver.

[0131] It should be understood that multiple parts of the described solution can be implemented in vehicle 300, in a system located outside the vehicle, or in a combination of inside and outside the vehicle; for example, multiple parts of the described solution can be implemented in a server communicating with the vehicle, i.e., a so-called cloud solution. Different features and principles of the embodiments can be combined in combinations other than those described. Furthermore, the elements (i.e., systems and modules) of vehicle 300 can be implemented in combinations different from those described herein.

[0132] Figure 4 An example diagram illustrates the mapping between a scene sample and a multidimensional space of 400. Figure 4 The purpose is to improve understanding of aspects of the techniques disclosed herein and should not be considered as a limitation on the scope. More specifically, Figure 4 The effect of the transformation of the (initial) sensor data 412 in the embedding space 400 is shown. Specifically, the transformation can be mapped to the embedding space, meaning that it can be observed and analyzed in the embedding space.

[0133] exist Figure 4 The upper part shows a transformation system 410. The transformation system 410 includes a transformation module 414. Herein, transformation module 414 represents a functional block for performing techniques related to the transformation of (initial) sensor data 412 to generate synthetic sensor data 416a, 416b, 416c. Therefore, transformation module 414 can implement the aforementioned techniques such as generative machine learning techniques or rendering-based machine learning techniques.

[0134] exist Figure 4The lower part of the diagram shows an illustration of a multidimensional space 400. For purposes of use, the multidimensional space 400 is illustrated herein as a two-dimensional space spanned by two axes. However, it should be noted that the multidimensional space 400 can have any dimension. More specifically, the multidimensional space 400 can be formed by two or more dimensions.

[0135] Furthermore, as illustrated herein, the transform volume 404 can be a closed set. In other words, the transform volume 404 can be formed by a single closed volume in a multidimensional space. However, it should be noted that although the transform volume 404 is depicted as a closed set herein, it can also be formed by multiple separate subsets. In other words, the transform volume 404 associated with a scene sample can be formed by multiple sub-volumes that are separate in a multidimensional space. In other words, the transform volume can be disjoint volumes. This may be the case, for example, if (at least some) possible transformations occur in discrete steps rather than continuously.

[0136] As mentioned earlier, scene samples (or sensor data of scene samples) can be encoded into scene embeddings within a multi-dimensional space of 400. Figure 4 In the middle, the dashed double arrows indicate how sensor data 412 can be mapped to the associated scene embedding 402 in the multidimensional space 400.

[0137] When sensor data 412 is transformed into synthetic sensor data representing the three different transformed scenes indicated by 416a to 416c in this document, the scene embedding associated with each transformed data is transferred to another point in the multidimensional space 400. More specifically, the first synthetic sensor data 416a is mapped to a first transferred scene embedding 406a in the multidimensional space. Similarly, the second synthetic sensor data 416b is mapped to a second transferred scene embedding 406b. Finally, the third synthetic sensor data 416c is mapped to a third transferred scene embedding 406c.

[0138] Multiple such transformations (with reasonable “amplitudes” of these transformations, as explained below, to keep the results valid / reliable / accurate) are applied to render the transform volume in a multidimensional space. The transform volume 404 can be viewed as a subspace within the multidimensional space, which is spanned by scene embeddings that can be achieved through transformations of scene samples formed by sensor data 412 and associated scene embeddings 402.

[0139] Transformer 404 (also referred to as perturbation space) can be determined for each scene sample in the scene database. More specifically, the size and / or shape of the transformer for one scene sample may differ from the size and / or shape of the transformer for another scene sample. In other words, the transformer can be determined individually for each scene sample. This transform can be determined by performing multiple transforms on the corresponding sensor data to determine to what extent the sensor data can be transformed with sufficient reliability. As illustrated herein, this can then result in an asymmetric transformer. In the illustrated example, the synthesized sensor data 416a to 416c can be considered as lying on the boundary of what can be effectively, reliably, and / or accurately realized from the transform of the initial sensor data 412. The boundary of transformer 404 can then be determined based on the location of the corresponding scene embeddings 406a to 406c in the multidimensional space. As described above, transformer 404 can be determined using an iterative method. The iterative method may include applying a large number of transforms at different magnitudes to see what the corresponding transitions in the embedding space 400 are and what volume this produces. The “amplitude” of a wording change can be understood as the degree or amount of the change performed, or as a general measure of how much the initial sensor data was modified.

[0140] In another example, the transformation module can be rule-based. In other words, the transformation module can be determined for each scene sample based on predefined rules. As a non-restrictive example, the transform volume can be defined as a circle (in 2D embedding space) or a sphere (in 3D embedding space) with a given radius and centered on the scene sample. This is an example of a symmetric transform volume.

[0141] Therefore, the disclosed technology is based on the ability to generate synthetic sensor data corresponding to the scene embedding within the transform volume using the transform system 410. Thus, it is not necessary to collect additional raw sensor data samples from within this volume. The following will combine... Figure 6A and Figure 6B To illustrate this effect, for example, if there is a need to test the ADS using a scene within transform 404, this can be achieved by efficiently transforming the initial sensor data 412 associated with that scene sample into synthetic sensor data that matches the requested scene.

[0142] Figure 5 The following are examples illustrating scene samples in multidimensional space. More specifically, Figure 5 This demonstrates the process of generating synthetic sensor data for a query scenario based on analysis in a multidimensional space of 500.

[0143] exist Figure 5The diagram illustrates first scene embeddings to fifth scene embeddings 502a, 502b, 502c, 502d, and 502e associated with corresponding first to fifth scene samples. Each of the first scene embeddings 502a to fifth scene embeddings 502e is further shown as having a corresponding first transform 504a to fifth transform 504e. As shown herein, transforms 504a to 504e may have different shapes and / or sizes. Furthermore, transforms may be symmetrical (as shown in the first transform 504a and the second transform 504b). However, transforms may also be asymmetrical (as shown in the third transform 504c, the fourth transform 504d, and the fifth transform 504e).

[0144] As mentioned above, combined Figure 1 The proposed method is based on obtaining the query scenario request. Then, the corresponding position in the multidimensional space 500 can be determined. Figure 5 A first query embedding 508a, a second query embedding 508b, and a third query embedding 508c are illustrated. The first query embedding to the third query embedding 508a, 508b, and 508c are associated with corresponding first, second, and third query scenarios. The process then further involves identifying scenario samples with associated transforms that cover the query embeddings.

[0145] In the case of the first query scenario, the fifth scene sample can be identified as the scene sample to be used to generate synthetic sensor data. This is because the first query embedding 508a is located within the fifth transform 502e. Then, based on the location of the scene embedding of the synthetic sensor data in multidimensional space, an appropriate transform can be applied to the fifth scene sample to transfer the initial sensor data to the synthetic data of the query scenario. This transform in multidimensional space is indicated by the arrow between the fifth scene embedding 502e and the first query embedding 508a.

[0146] Similarly, for the second query scenario, both the first scenario sample and the third scenario sample can be identified. This is because the second query embedding 508b is located within both the first transform 504a and the third transform 504c. Based on the above, both the first scenario sample and the third scenario sample can be transformed into the second query scenario. However, as indicated by the corresponding arrows, these may require different transformations and different degrees of transformation. (The above is a continuation of the previous sentence.) Figure 1As described, further evaluation can be performed to determine which of the first and third scene samples should be used to generate synthetic sensor data. For example, the distance between the scene embedding and the query embedding can be used. In this case, the first scene sample can be selected (or identified) as the scene sample for generating synthetic sensor data because the distance between the first scene embedding 502a and the second query embedding 508b is shorter than the distance between the third scene embedding 502c and the second query embedding 508b. In another example, the probability distributions of the corresponding first transform 504a and third transform 504c can be used. Assuming, for example, that the probability of generating reliable synthetic sensor data decreases, the query embedding is closer to the boundary of the transform. In this case, the third scene sample can be selected for generating synthetic sensor data because the second query scene 508b is closer to the boundary of the first transform 504a than the boundary of the third transform 504c. It should be noted that two or more identified scene samples (in this example, the first and third scene samples) can be used together to generate synthetic sensor data corresponding to the query scene.

[0147] In the case of the third query scenario, no existing scenario sample transformation was found to cover the third query embedding 508c. In this case, a data collection request indicating the query scenario can be transmitted.

[0148] Figure 6A and Figure 6B The example diagram illustrates how to fill the scene space. More specifically, Figure 6A and Figure 6B The comparison highlights the effectiveness of the disclosed technique. That is, the scene space can be exhausted using fewer scene samples.

[0149] Figure 6A A first scene space 600a is shown, filled with a set of scene samples, which have associated scene embeddings 602. More specifically, Figure 6A The scenario without utilizing the disclosed techniques is illustrated. In this case, to exhaustively cover the first scene space 600a (i.e., to achieve coverage of all possible scenes), some form of grid sampling technique is required. Sensor data associated with each scene embedding (corresponding to grid points with specific distances (d1, d2) between neighboring points) would all need to be traversed, collected, transmitted, and ultimately stored in the scene database by the convoy. Given that the distances between neighboring points must be relatively small, this means a large number of scenes need to be collected. As mentioned earlier, this is not feasible given the large number of possible scenes, and also considering the rarity of some scenes.

[0150] Figure 6BA second scene space 600b utilizing the disclosed technique is shown. More specifically, it utilizes a transform 604 associated with each scene embedding 602 of scene samples in the scene database. Using the transform system, the same scene space can be exhaustively exhausted with fewer data samples. This is because the distance (d3, d4) between neighboring scene samples can be greater than... Figure 6A The distances (d1, d2) in the example. In other words, the scene space can be filled with a sparser set of scene samples. Mesh sampling methods can still be applied, but since sensor data between grid points can be generated, the mesh can be made sparser. However, it should be noted that it is not necessary to perform mesh sampling methods. For example, since the transform volume can be asymmetric, the scene database can be filled with scene samples from any possible distribution throughout the scene space.

[0151] The disclosed techniques have been presented above with reference to specific embodiments. However, other embodiments besides those described above are also possible and within the scope of the invention. Method steps different from those described above, performed by hardware or software, can be provided within the scope of the invention. Therefore, according to an exemplary embodiment, a non-transitory computer-readable storage medium is provided storing one or more programs configured to be executed by one or more processors of a vehicle control system, the programs including instructions for performing the methods according to any of the embodiments discussed above. Alternatively, according to another exemplary embodiment, a cloud computing system can be configured to perform any of the methods presented herein. The cloud computing system may include distributed cloud computing resources that collectively perform the methods presented herein under the control of one or more computer program products.

[0152] It should be noted that no reference numerals in the accompanying drawings limit the scope of the claims. The invention can be implemented, at least in part, by both hardware and software, and several “devices” or “units” can be represented by the same hardware.

Claims

1. A computer-implemented method (100) for generating synthetic sensor data using a scene database, wherein, The scene database includes multiple scene samples, each scene sample including sensor data depicting the vehicle's surrounding environment over a period of time, wherein each scene sample is associated with a scene embedding representing the scene sample in a multidimensional space, wherein each scene embedding is associated with a transform in the multidimensional space, wherein the transform indicates a set of possible transformed scenes that can be generated from the corresponding scene sample, and the method (100) includes: Obtain (S102) a request for a specified query scenario, wherein the query scenario is associated with a query embedding representing the query scenario in the multidimensional space; Identify (S106) at least one scene sample within the scene database, the at least one scene sample having a transform in which the query embedding is located; and In response to successfully identifying the at least one scene sample, synthetic sensor data corresponding to the query scene is generated (S108) by transforming the sensor data of the identified at least one scene sample into synthetic sensor data having an associated scene embedding in the multidimensional space within a threshold distance from the query embedding.

2. The method (100) according to claim 1, wherein, The identified scene samples are further identified based on the distance between the associated scene embedding and the query embedding in the multidimensional space.

3. The method (100) according to claim 1, further comprising: The query scenario is divided into multiple subqueries (S104), and each subquery is embedded and associated with a corresponding subquery. The identification step (S106) includes identifying scene samples with the associated subquery embedding located therein for each subquery. The synthetic sensor data is generated by transforming the combination of sensor data for each identified scene sample (S108).

4. The method (100) according to claim 1, wherein, The transformation of the sensor data of the at least one identified scene sample into synthetic sensor data is performed through the following steps: (i) Apply the transformation to the sensor data to generate updated sensor data; (ii) Determine the location within the multidimensional space where the updated sensor data is embedded; Repeat steps (i) and (ii) until the location of the updated sensor data is within the threshold distance of the query embedding; as well as The updated sensor data is provided as the synthetic sensor data.

5. The method (100) according to claim 1, wherein, The transform further indicates the set of possible transformed scenarios that can be generated from the scenario sample while satisfying the validity threshold, reliability threshold, and / or accuracy threshold.

6. The method (100) according to claim 1, wherein, Transforming the sensor data of the at least one identified scene sample into synthetic sensor data includes feeding the sensor data into a generative adversarial network or diffusion model trained to output synthetic sensor data.

7. The method (100) according to claim 1, wherein, Each scene sample in the database is further associated with a learned rendering-based scene representation, which is configured for subsequent rendering of synthetic sensor data associated with the scene sample. and Transforming the sensor data of the at least one identified scene sample into synthetic sensor data includes: rendering the synthetic sensor data using the learned rendering-based scene representation.

8. The method (100) according to claim 7, wherein, The learned, rendering-based scene representation is either a neural radiation field or a Gaussian splash-based model.

9. The method (100) according to claim 1, wherein, The multidimensional space is a common space of two or more data modalities.

10. The method (100) according to claim 1, wherein, The query scenario is represented by a textual description of the query scenario or a computer-simulated scenario.

11. The method (100) according to claim 1, wherein, The sensor data in the scene sample includes sensor data of a first sensor type and sensor data of a second sensor type; and Each scene embedding is formed by aggregating a first sensor embedding generated for the sensor data of the first sensor type and a second sensor embedding generated for the sensor data of the second sensor type.

12. The method (100) according to claim 1, further comprising: The synthesized sensor data is stored (S110) for subsequent use in the development of the autonomous driving system.

13. The method (100) according to claim 1, further comprising: In response to the failure to successfully identify the scene sample, a data collection request is transmitted (S112) to one or more vehicles in the fleet.

14. A computer program product comprising instructions which, when executed by a computing device, cause the computing device to perform the method (100) according to claim 1.

15. A computing device (200) for generating synthetic sensor data using a scene database, wherein, The scene database includes multiple scene samples, each scene sample including sensor data depicting the vehicle's surrounding environment over a period of time, wherein each scene sample is associated with a scene embedding representing the scene sample in a multidimensional space, wherein each scene embedding is associated with a transform in the multidimensional space, wherein the transform indicates a set of possible transformed scenes that can be generated from the corresponding scene sample, and the computing device (200) includes a control circuit (202) configured to: A request to obtain a specified query scenario, wherein the query scenario is associated with a query embedding representing the query scenario in the multidimensional space; Identify at least one scene sample within the scene database, the at least one scene sample having a transform where the query embedding is located; and In response to the successful identification of the at least one scene sample, synthetic sensor data corresponding to the query scene is generated by transforming the sensor data of the identified at least one scene sample into synthetic sensor data having scene embeddings associated in the multidimensional space within a threshold distance from the query embedding.