Methods and devices for low-latency event processing
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- COMMISSARIAT A LENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES
- Filing Date
- 2024-05-30
- Publication Date
- 2026-06-11
Smart Images

Figure 2026519106000001_ABST
Abstract
Description
[Technical Field] 【0001】 This application claims priority to French Patent Application No. 2305360, entitled “Method and device for low-latency event processing,” filed on 30 May 2023, which is incorporated herein by reference to the maximum extent permitted by law. 【0002】 This disclosure generally relates to the field of data processors and methods of data processing, and more particularly to a data processor for processing event-based data generated by event-based sensors. [Background technology] 【0003】 Event-based sensors are used in a variety of applications and are particularly well-suited to environments with sparse input data. In fact, event-based sensors offer significant power savings in such environments compared to synchronous sensors that periodically sample input data, even when there is no significant input signal. Furthermore, event-based sensors enable high data compression ratios. 【0004】 For example, an event camera differs from a typical frame-based camera in that, instead of each pixel periodically integrating charge to record light intensity, an event pixel asynchronously generates a binary event when light intensity changes. This allows for large-scale compression of information about motion within a scene. The same principle can be applied to other sensing modalities, such as event-based audio. 【0005】 Convolutional neural networks (CNNs), and more recently vision transformers, offer excellent means of solving frame-based computer vision tasks, but they are inherently incompatible with event data. To process event data using such models, events are often consolidated into so-called dense frames, for example, by counting the number of events generated for each pixel within a given time window. However, this does not allow for the utilization of the sparse nature and low latency of event data. Furthermore, while these techniques allow for the reuse of existing CNN architectures and their optimized hardware implementations, the fine spatiotemporal details captured by event cameras are effectively discarded. This can lead to performance degradation, particularly in applications such as optical flow prediction. Moreover, dense-frame CNNs do not readily leverage the inherent sparse nature of event data to reduce computational requirements. 【0006】 The use of graph neural networks has been proposed to process asynchronous event data. One such example is the publication "AEGNN: Asynchronous Event-based Graph Neural Networks" by Schaefer, Simon, Daniel Gehrig, and Davide Scaramuzza, Proceedings of the IEEE / CVF Conference on Computer Vision and Pattern Recognition, 2022. However, a technical difficulty with this solution is that graph generation is relatively slow and relatively high latency. 【0007】 In this field, there is a need for improved devices and methods that can process event data in real time or near real time in order to enable processing that is tightly coupled with event sensors. [Overview of the project] 【0008】 According to one embodiment, a method is provided for generating a prediction based on event data captured by an event-based sensor, comprising: the data processing device receiving a new event from the event-based sensor, the new event comprising a timestamp and an event data value having one or more dimensions; the data processing device determining a past event within a list of past events that falls within a time distance with respect to the timestamp and within a data value distance of the event data value; the data processing device constructing an event subgraph including a first node representing a new event, further nodes representing each of the determined past events, and unidirectional edges linking each further node to the first node; the data processing device calculating a node embedding of the first node based on edges consisting only of the unidirectional edges linking each further node to the first node; and generating a prediction by applying a function to the node embedding. 【0009】 According to one embodiment, the method further includes the step of controlling one or more actuators based on predictions. 【0010】 According to one embodiment, the step of constructing a unidirectional edge linking each further node to a first node includes constructing a unidirectional edge that points forward in time, i.e., to the future, and links each further node to the corresponding input of the first node. 【0011】 According to one embodiment, the step of determining past events further includes selecting only M past events that form the determined past event, where M is an integer equal to at least 2, and the step of constructing an event subgraph includes creating only M unidirectional edges that link each further node to a first node, where the M past events are selected, for example, randomly or are selected to be the M past events closest to the new event. 【0012】 According to one embodiment, the step of determining past events includes performing a K-nearest neighbor event search within a past search volume having an event data value radius in one or more dimensions of the event data value and a time radius in the time domain. 【0013】 According to one embodiment, the event-based sensor is an event camera, and the prediction is optical flow prediction. 【0014】 According to one embodiment, the data processing device is configured to perform real-time processing of new events, and the data processing device is configured to generate predictions within a processing delay of less than 100 microseconds, preferably less than 5 microseconds, after the time of the new event. 【0015】 According to one embodiment, the step of generating predictions includes passing an event subgraph through a graph neural network. 【0016】 According to one embodiment, the graph neural network includes multiple layers, and the step of passing an event subgraph through the graph neural network includes computing the embedding vectors of a first node and further nodes of the subgraph using the embedding of all events connected by the unidirectional edges, wherein the embedding vectors include each subvector of the layer, and applying a transformation to the embedding vector of each node, or in the case of the first layer of the graph neural network, to the input event data to generate the resulting embedding vector. 【0017】 In a further embodiment, a data processing device is provided for generating predictions based on event data captured by an event-based sensor, the device being configured to receive a new event from the event-based sensor, where the new event includes a timestamp and an event data value having one or more dimensions; determine, within a list of past events, past events that fall within a time distance with respect to the timestamp and within a data value distance of the event data value; construct an event subgraph including a first node representing a new event, further nodes representing each of the determined past events, and unidirectional edges linking each further node to the first node; calculate node embeddings of the first node based on edges consisting only of the unidirectional edges linking each further node to the first node; and generate predictions by applying a function to the node embeddings. 【0018】 According to one embodiment, the data processing device is further configured to control one or more actuators based on predictions. 【0019】 According to one embodiment, constructing a unidirectional edge that links each further node to the first node involves constructing a unidirectional edge that points forward in time, i.e., to the future, and links each further node to the corresponding input of the first node. 【0020】 According to one embodiment, determining a past event further includes selecting only M past events that form the determined past event, where M is an integer equal to at least 2, and constructing an event subgraph includes creating only M unidirectional edges that link each further node to a first node, where the M past events are selected, for example, randomly or are selected to be the M past events closest to the new event. 【0021】 According to one embodiment, the data processing device is configured to determine past events by performing a K-nearest neighbor event search within a past search volume having an event data value radius in one or more dimensions of the event data value and a time radius in the time domain. 【0022】 According to one embodiment, the event-based sensor is an event camera, and the prediction is optical flow prediction. 【0023】 According to one embodiment, the data processing device is configured to perform real-time processing of new events, and the data processing device is configured to generate predictions within a processing delay of less than 100 microseconds, preferably less than 5 microseconds, after the time of the new event. 【0024】 According to one embodiment, the data processing device is configured to generate predictions by passing an event subgraph through a graph neural network. 【0025】 According to one embodiment, the graph neural network includes multiple layers, and passing an event subgraph through the graph neural network includes the graph neural network computing embedding vectors for a first node and further nodes of the subgraph using the embedding of all events connected by the unidirectional edges, wherein the embedding vectors include each subvector of the layer, and the graph neural network applying transformations to the embedding vectors of each node, or in the case of the first layer of the graph neural network, to the input event data to generate the resulting embedding vectors. 【0026】 In a further embodiment, a control system is provided comprising the above-mentioned data processing device, an event-based sensor coupled to the data processing device and configured to generate new events and transmit new events to the data processing device, and one or more actuators. [Brief explanation of the drawing] 【0027】 The aforementioned features and advantages, as well as other features and advantages, will be described in detail in the following description of specific embodiments given as examples rather than limitations with reference to the attached drawings. [Figure 1] An event-based detection and processing system according to an exemplary embodiment of the present disclosure is schematically illustrated. [Figure 2] Figure 1 shows a more detailed schematic illustration of the event processing device according to an exemplary embodiment of the present disclosure. [Figure 3] This graph illustrates an example of an event cloud generated from event camera recordings. [Figure 4] This graph illustrates an example of creating an event graph based on k-hops between indirectly connected events. [Figure 5] This flowchart illustrates an example of how the method for determining optical flow prediction works. [Figure 6] This graph illustrates an example of event graph creation according to an exemplary embodiment of the present disclosure. [Figure 7] This flowchart illustrates an example of the operation in a method for determining optical flow prediction according to one embodiment of the present disclosure. [Figure 8A] An example of updating the event graph when a new event arrives is illustrated. [Figure 8B] An example of updating the event graph when a new event arrives is illustrated. [Figure 9A] This document illustrates how an event graph is updated upon the arrival of a new event, according to one embodiment of the present disclosure. [Figure 9B]This document illustrates how an event graph is updated upon the arrival of a new event, according to one embodiment of the present disclosure. [Figure 10] The generation of embedding vectors in each layer of the event graph according to an exemplary embodiment of this disclosure is illustrated. [Figure 11] The concatenation of embedding vectors and their application to neural networks are illustrated. [Figure 12] A schematic diagram illustrates a neural network head for making predictions according to the exemplary embodiments of this disclosure. [Modes for carrying out the invention] 【0028】 Similar features are designated by the same reference numerals in various figures. In particular, structural and / or functional features common to various embodiments may have the same reference numerals and may have the same structural, dimensional, and material properties. 【0029】 For clarity, only the operations and elements useful for understanding the embodiments described herein are illustrated and described in detail. In particular, devices and methods for event-based detection are known in the art and are not described in detail. Furthermore, training graph neural networks and using graph neural networks to generate event-based predictions or detections are well known to those skilled in the art and are not described in detail. 【0030】 Unless otherwise specified, when referring to two elements connected to each other, this means a direct connection with no intermediate elements other than conductors; when referring to two elements joined to each other, this means that these two elements can be connected, or that these two elements can be joined through one or more other elements. 【0031】 In the following disclosures, unless otherwise specified, when referring to absolute position modifiers such as “front,” “back,” “up,” “down,” “left,” and “right,” relative position modifiers such as “above,” “downward,” “higher,” and “lower,” or orientation modifiers such as “horizontal” and “vertical,” the orientation shown in the illustration is being referred to. 【0032】 Unless otherwise specified, the expressions “approximately,” “about,” “substantially,” and “of the order of” mean within 10%, preferably within 5%. 【0033】 Figure 1 schematically illustrates an event-based detection and processing system 100 according to an exemplary embodiment of the present disclosure. The system 100 comprises an event-based sensor 102 configured to generate temporal event data and an event processing device 104 configured to receive and process the temporal event data in order to generate event-based predictions. 【0034】 For example, the event-based sensor 102 is an event-based camera, or an event camera such as a visible light or infrared event camera. Alternatively, the event-based sensor may be an event-based audio sensor such as an event-based microphone. Other types of event-based sensors are also possible. 【0035】 Temporal event data includes, for example, asynchronous events, and each event includes, for example, a timestamp and an event data value having one or more dimensions. For example, if the event-based sensor 102 is a camera, the event data value may be the x and y addresses of the pixel that captured the event, and the event is generated, for example, when the light level change exceeds a threshold for the light level at which a previous event was generated at that pixel. Furthermore, the detected light level of the pixel can provide further dimensions of the event data value, for example, the polarity or absolute light level of the change at the moment the event was generated. If the event-based sensor 102 is a microphone, the event is generated, for example, due to a relative change in an analog signal generated by the microphone, and the characteristics of each event may also include a timestamp, the absolute value of the signal, and / or the frequency band range of the event if the signal is filtered into many frequency bands. 【0036】 The event-based processing device 104 is configured to process, for example, real-time or near-real-time temporal event data with a processing delay of, for example, less than 100 microseconds, preferably less than 5 microseconds. 【0037】 If the time data is image data, the event-based prediction of the output of the event processing device 104 is, for example, optical flow prediction for each generated event, object classification of the entire event stream, or object detection for each event. If the time data is audio data, the event-based prediction of the output of the event processing device 104 is, for example, keyword spotting or sound class recognition for each group of events. 【0038】 Figure 2 provides a more detailed schematic illustration of the event processing device 104 of Figure 1 according to an exemplary embodiment of the present disclosure. 【0039】 Device 104 comprises, for example, a processing device (P) 202 having one or more processors, and one or more memory devices such as volatile memory (RAM) 204 and / or non-volatile memory (FLASH) 206. In some embodiments, the volatile memory 204 is random access memory and / or the non-volatile memory 206 is flash memory, but other types of volatile and non-volatile memory may be present. 【0040】 The processing device 202 and memories 204 and 206 are linked, for example, by a bus 208. The non-volatile memory 206 stores, for example, software code executed by the processing device 202. 【0041】 The event processing device 104 further comprises, for example, a sensor interface 210 coupled to a bus 208, the sensor interface 210 being configured to communicate with an event-based sensor 102, for example, via a wired or wireless communication interface. 【0042】 In some embodiments, the event processing device 104 further comprises a graph neural network 212, which is also linked to, for example, bus 208, and is configured to generate event-based predictions at the output of the event processing device 104. For example, although not shown in Figure 2, the event-based predictions are provided by the graph neural network to the output interface of the device 104 coupled to bus 208. 【0043】 The graph neural network 212 is hardware circuitry configured to implement a method for generating predictions based on event data captured by, for example, the event-based sensor 102, which is described in more detail herein. In an alternative embodiment, the graph neural network is implemented in software, and the method for generating predictions based on event data captured by the event-based sensor 102 is implemented by software stored in memory 204 or 206 and executed by processing device 202. Alternatively, a processor reads data from the event-based sensor, processes this data, and then feeds this data to the graph neural network in a format appropriate for performing event-based predictions. 【0044】 In some embodiments, event-based predictions generated by a graph neural network 212 are used to control one or more actuators 214 coupled to device 104. One or more actuators 214 include, for example, robotic systems such as robotic arms trained to pull weeds or harvest ripe fruit from trees, autopilot or braking systems in vehicles, or electronic actuators, the electronic actuators being configured to control the operation of one or more circuits, for example, by waking up a circuit from sleep mode, putting a circuit into sleep mode, causing a circuit to produce text output, or performing data encoding or decoding operations. In a further example, device 104 is an eye-tracking unit that determines an area in a scene that the user is looking at and controls an actuator according to the determined area, for example, to change the focus of a stereogram. 【0045】 Figure 3 is a graph illustrating an example of an event cloud generated from event camera recordings. Specifically, the graph in Figure 3 is a 3D graph in which the x-axis represents the x-coordinate of the camera's pixel array (X PIXEL), the y-axis represents the y-coordinate of the pixel array (Y PIXEL), and the z-axis represents time in seconds (TIME(s)). The spiral of data shown in Figure 3 arises from the circular motion of a person's hand in the image. 【0046】 By analyzing the 3D data structure of the event data in Figure 3, or by analyzing the 2D, 4D, or higher-order data structure of the event data, it is possible to perform predictive calculations such as classification to determine the type of object that created the event, prediction of optical flow to determine the velocity vector of an object, or detection of a specific object present in the scene. 【0047】 Figure 4 is a graph illustrating an example of an event graph where the number of k-hops between indirectly connected events is of interest. The construction of an event graph involves, in particular, the creation of a tree structure and subsequent nearest-neighbor search on the tree, which allows for the definition of edges between events (i.e., vertices). For example, the construction of a KD tree structure is described in more detail in the publication "Real-time KD-Tree Construction on Graphics Hardware" by Zhou, Kun et al., ACM Transactions on Graphics (TOG) 27.5 (2008):1-11. A set of events is recorded, and a tree structure is created. This tree structure is used to search for the set of nearest neighbors for each event. Edges are formed between these events (i.e., nodes) based on distance measures in the pixel dimension (e.g., x, y) and the time dimension (t). For example, directed edges are connected between M nearest events within a fixed search radius. 【0048】 This process is illustrated in Figure 4 showing time (TIME) on the x-axis and the x, y position (XY POSITION) of pixel data on the y-axis. A new event e for a given pixel i is detected by an event-based sensor at time te i . Further events are also represented on this graph. A search volume S(r xy , r t ) is defined as being within a time distance r i from the time point te of the new event and within a spatial distance r t from the new event, as represented by the ellipsoid S. Half of this search volume S is before the time point te of the new event xy , and the other half of the search volume S is after the arrival time point te of the new event i . Thus, as illustrated in Figure 4, the search volume includes events within the radius of the new event e i . Among the events that fall within the search volume, the K nearest neighbors are connected to the new event e by edges i . These edges can be essentially bidirectional or unidirectional, but often two unidirectional edges are formed between pairs of events having such an Euclidean-based search paradigm. Further, and importantly, when a new event is generated, if the new event e i becomes one of the nearest neighbors of an existing event, the edge configuration of the existing event in the tree is also updated. As will be explained in more detail below in relation to Figure 5, the event graph is then updated considering the new event e i and the new edges. In particular, for the embedding of the existing event graph nodes, the new event e i is considered iThe effect is potentially evaluated many times across all L event graph neural network layers. Depending on the node embedding update method, this can be computationally expensive and lead to significant latency. To avoid updating the node embedding of the entire active event graph every time a new event is generated, for example, a recursive k-hop graph search function hop() can be applied to find a sparse subset of affected nodes in each of the L layers. Only this subset of nodes is updated upon the arrival of each new event, slightly reducing the computational load. As illustrated in Figure 4, a new event e i The direct connection is established by a single hop corresponding to k=0, and the integer K of further hops is determined via intermediate connection nodes. In the example in Figure 4, K=4 corresponds to the central event e in the figure. i It is between that and the event in the upper left corner. 【0049】 Figure 5 is a flowchart illustrating an example of the operation in a method for determining optical flow prediction according to the method illustrated by the graph in Figure 4. 【0050】 In operation 501, a new event e i It is received. 【0051】 In operation 502, past event e j A subset of is event e i distance r xy and r t This is determined internally. In other words, past events that fall within the search volume S in Figure 4 are identified. 【0052】 In operation 503, each past event e j For each of the K nearest neighbors, a K-nearest neighbor search is performed and edges are created. In particular, for each of the K nearest neighbors, past events e j From a new event e i Edge e heading towards j ⇒e i A new event e is created. iEdge e from to each of the past K nearest neighbors i ⇒e j An event e is created and i The node embeddings of all affected nodes across an integer number of K hops from event e must be updated. j It is likely to be corrected several times as more new events are generated within a delay equal to the time search radius (forming a direct K=0 connection to future events) multiplied by the number of layers (through corrections via nodes connected by a certain number of K hops). 【0053】 In operation 504, a new event e for the existing event graph node embeddings for all L event graph neural network layers. i The impact is evaluated. This involves a recursive k-hop graph search function hop() that finds a sparse subset of affected nodes in each layer l of the event graph, as shown in Figure 4. 【0054】 In operation 505, the affected node embedding Z in each layer l j,l This is updated using a graph convolution function. The graph convolution function, φ(), may be, for example, the spectral graph convolution proposed in the publication "SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS", Kipf 2017, and Z l =f(A'·Z l-1 ·W l ), where A'=D -1 / 2 ·A·D -1 / 2 A is an adjacency matrix where each matrix cell stores either 0 or 1 to indicate whether a node is connected, and D is a degree matrix. 【0055】 In operation 506, the radius r of the search window t A delay equal to the number of graph neural network layers multiplied by (during which event e iAfter further new events affecting it may be received and processed, the function σ() is applied to the node embedding Z i Applies to new event e i Related output prediction V i For example, an optical flow prediction is generated. 【0056】 The method shown in Figure 5 can be represented, for example, by the following algorithm. 【0057】 JPEG2026519106000002.jpg6176 【0058】 Algorithm 1 Sparse global update Input: ev i ={x i ,y i ,t i ,p i}, G={EV,ε}, r t , r xy Output: V i M = EV ∩ B(ev) i ,r xy ,r t ) ev m For ∈M, (for) we do the following: ε m =knn(ev m ,EV,r t ,r xy ) End for loop G=π(G,ε m ,ev i ) For each l in the range (L), perform the following (for): ev j For each ∈hop(M,G,l), perform the following (for loop): z j,l =φ(ev j ,G,l) End for loop End for loop t>t i +(r tIf (×L), then (if) do the following: V i =σ(Z i ) End of if statement 【0059】 Here, ev i This is a new event, x i , y i is the data value of the event data value, and t i This is the event timestamp, p i is the event polarity, G is the event graph, and EV is {x,y,t}∈R 3 A subset equal to the above, where ε is the edge of the event graph connecting the event EV, B() is the function used to extract a subset of the above operation 502, knn() is the K nearest neighbor search, and π() is the previous graph state, the new event, and the new recalculated edge ε. m This function takes as input and returns an updated event graph, where hop() is a recursive k-hop graph search function, and z j,l This is the node embedding of a given layer l. 【0060】 The main drawback of the method in Figure 5 is the excessive number of calculations and relatively high latency between the time a new event arrives and the predicted output. This is partly due to the complexity of the processing and also the delay r, which is typically several hundred milliseconds. t *This is also due to graph construction latency, including L. 【0061】 Figure 6 is a graph illustrating an example of event graph creation according to an exemplary embodiment of the present disclosure. The graph in Figure 6 is modified from the search volume S to the search volume S', and shows past events (EVENT) and new events e, as will be described in more detail below. i The graph is similar to that in Figure 4, except that the edges between them are strictly unidirectional. In particular, the edges allow the output of nodes representing past events in the search volume S' to be used for new events e i Connect to the input of the corresponding node. 【0062】 According to the embodiment shown in Figure 6, graph construction latency is reduced by restricting edge creation in the event graph so that only directed unidirectional edges are formed from past events to the newly arrived event, within a hemispherical or semi-ellipsoidal search radius S' around the newly arrived event. In other words, when a new event arrives, only past neighbors are explored with respect to the event to which the new event connects and acquires data. Edges pointing to past events created according to the previous method are no longer created, and therefore edges are temporally unidirectional from past to future, as shown in Figure 6 by arrows pointing forward in time, i.e., towards the future. This allows for the immediate computation of event predictions for generated events. In fact, it can be guaranteed that neither the node embeddings of this event nor its edges will change in the future. This significantly reduces latency and computational complexity. 【0063】 In graph neural networks, these edges are used to instantly compute node embeddings for each layer of the network, consisting only of newly generated events. Past events, once computed, never modify these embeddings. This is in contrast to traditional methods, which require recalculating a large number of past events upon the arrival of each event. This process uses only embeddings from past events within a specific time and space search radius. 【0064】 According to the method in Figure 6, edges, i.e., the identification information of two events linked by an edge, and arbitrary edge properties such as time differences, are not explicitly stored in memory, for example, because the information related to these edges is implicitly represented by node embeddings of previously generated events. Therefore, this method allows for significant reduction in memory storage, such as memory 204 and / or 206 in Figure 2. In addition, past events outside the time search radius are not stored, for example. This is in contrast to previous methods, where events within a time window equal to the search radius multiplied by the number of neural network layers are stored, and all edges linking these events are also stored. 【0065】 The factors described above enable implementations with a significantly reduced memory footprint compared to other methods. Specifically, real-time implementations of the embodiments described herein incorporate a function that continuously cleans the memory of the system in which the algorithm is embedded, for example, so that events older than a certain time are removed. 【0066】 In some embodiments, to maintain a number of detected events relatively similar to the method in Figure 4, the radius r t For example, it increases with respect to a typical value used according to the method. Typically, this is increased with respect to the radius r so that the search volume is equivalent. t This is achieved by increasing the radius r, for example, by approximately doubling it. For example, the radius r in Figure 4 t For example, this is 10-20 ms, and the radius r in the case of Figure 6 t For example, this is 20-40ms. 【0067】 FIG. 7 is a flowchart illustrating an example of operations in a method for determining optical flow prediction according to an embodiment of the present disclosure. This method can be similarly used to determine other types of predictions, such as classification or object detection. In the case of classification where per-event output may not be required, the node embeddings of all events generated within a specific time window may be aggregated, for example, using max or average pooling, and this aggregated node embedding is periodically provided to, for example, an output prediction function. 【0068】 The method of FIG. 7 is implemented, for example, by the event processing device 104 of FIG. 1. 【0069】 In operation 701, a new event e i is received by the event processing device 104, for example, from an event-based sensor 102. 【0070】 In operation 702, a K-nearest neighbor event search is performed within a past search volume of radius r in one or more dimensions of the event data values xy and radius r in the time domain. t For example, this is achieved by performing a distance measurement between the new event and past events, and the distance calculation is based on, for example, the L2 norm. It is also possible to use other distance metrics such as the L1 norm or simply a maximum distance function (corresponding to the use of the maximum value in all dimensions (e.g., the maximum value in all three dimensions if there are three dimensions)). 【0071】 In operation 703, for the new event e i and each neighbor e j among the K nearest neighbors, a subgraph is created that includes a one-way edge e j ⇒e i In some embodiments, a one-way edge from each of the neighbors e j towards the new event e i is created. In other embodiments, the event e jUnidirectional edges are created from the selection of M events in, where M is, for example, at least equal to 2, for example equal to 2 - 50, preferably around 30. For example, in some embodiments, this selection of M events is the new event e i corresponds to the M nearest neighbors of. In alternative embodiments, the M events can be selected based on another technique such as randomly selecting M past events within the search volume or selecting the M most distant past events within the search volume. 【0072】 In operation 704, using the graph convolution function φ'(), L node embeddings are calculated for the event e i . This function φ'() is, for example, the same as the function φ() except that the input to this function is a graph of only the past as represented in FIG. 6, and thus the edges used for the calculation of the L node embeddings consist only of strictly unidirectional (from past to present) edges. 【0073】 In operation 705, the function σ'() is applied to the node embedding Z i for example, to generate the optical flow prediction V i of the new event e i . For example, the function σ'() is applied by passing a subgraph to a neural network such as that described in relation to FIG. 6. 【0074】 The method of FIG. 7 can be represented, for example, by the following algorithm. 【0075】 JPEG2026519106000003.jpg4484 【0076】 Algorithm 2 Hemisphere update Input: ev i ={x i ,y i ,t i ,p i}, EV, r t , rxy Output: V i K i ,ε i =knn(ev i ,EV,r t ,r xy ) G i ={K i ,ε i} For each l in the range (L), perform the following (for): z i,l =φ'(G i ,l) End for loop V i =σ'(Z i ) 【0077】 Here, ev i This is a new event, x i , y i is the data value of the event data value, and t i This is the event timestamp, p i This is the event polarity, K i ε is a unidirectional edge i The nearest neighbor event connected by G i is a subgraph, and knn() is a K-nearest neighbor search. 【0078】 Figures 8A and 8B illustrate an example of updating the event graph upon the arrival of a new event, based on the techniques of Figures 4 and 5, but only considering edges that point from the past to the future, i.e., edges that are temporally ahead between events, in other words, edges that point towards the future. 【0079】 Figure 8A illustrates an example of an event graph that initially includes four events connected by unidirectional edges that point forward in time, in other words, to the future. In the example in Figure 8A, the unidirectional edges are present at the output of each event and point toward its two nearest neighbor events, in other words, M is equal to 2 with respect to the output edge of each event node. 【0080】 Figure 8B illustrates an example of the arrival of a new event, and thus the addition of a corresponding new event node (represented by striped lines in Figure 8B) to the event graph. This involves the creation of three new edges, represented by thick lines, each pointing from one of the three past events towards the new event. For two of the three past events highlighted with dots in Figure 8B, there was previously a single output edge, and the new edge is the second edge. Furthermore, the third past event node already has two output edges, and the previous edge, represented by the dashed arrow, is removed and replaced with a new edge pointing towards the new event. This results in an update for each of the past events highlighted with dots in Figure 8B, either due to a direct modification of its input (in the case of the dotted event node on the left, where the input edge is removed) or an update of one of its input nodes (in the case of the dotted event node on the right). 【0081】 Therefore, the algorithm applied to handle the new event in the examples of Figures 8A and 8B can be described as follows: For each new event "e", do the following: - Explore all connectable events ○For each event "ec" discovered, do the following: 1. If "ec" already has M output connections, do the following: a. Compare (ec=>e) with M existing (ec=>ei) edges. b. Decide whether to replace one of these with a new (ec=>e) edge. If c.(ec=>ei) is removed, update the GNN's "ei" and its child outputs. 2. Otherwise, add the (ec=>e) connection. 3. Update the "ec" output for GNN. 【0082】 As shown in Figures 8A and 8B, a drawback of such methods is that the arrival of a new event involves re-evaluating the edges defined for past event nodes and therefore updating those past event nodes, which is computationally intensive and implies storing the output and edges of each past event within a given period in memory whenever new edges are created from past event nodes. The period over which the output and edges of past events are stored is proportional to the number of GNN layers and the time radius of the search volume, in particular #GNN_layers*Time_radius. 【0083】 Figures 9A and 9B illustrate an example of updating the event graph upon the arrival of a new event, based on the techniques of Figures 6 and 7. 【0084】 Figure 9A illustrates an example of an event graph that initially includes four events connected by unidirectional edges pointing forward in time, in other words, towards the future. In the example in Figure 9B, a unidirectional edge is present at the input of each event and arrives from its two nearest neighbor events. In other words, M is still equal to 2, but M is defined based on the number of edges at the input to each event node, rather than on the number of edges at the output of each event node, as in Figures 8A and 8B. 【0085】 Figure 9B illustrates an example of the arrival of a new event, and thus the addition of a corresponding new event node (represented by striped lines in Figure 9B) to the event graph. In contrast to the example in Figure 8B, incorporating a new event involves creating only M new edges that advance in time from the identified nearest neighbor toward the new event node, and these edges are final edges that are not modified by the arrival of the new event. Furthermore, in contrast to the case in Figure 8B, the input edges of past event nodes are never modified, so none of the past event nodes are modified in response to the arrival of a new event. 【0086】 Therefore, the algorithm applied to handle the new event in the examples of Figures 9A and 9B can be described as follows: For each new event "e", do the following: ○ Explore the M past events (e0, e1, ..., em) ○ Make connections ((e0=>e), (e1=>e), ..., (em=>e)) ○Calculate the output of "e" for the GNN. 【0087】 In some embodiments, the past M events are the M nearest neighbors within the search radius, but it is also possible to select M past events using other methods, such as randomly selecting M past events that have a search volume, or selecting M further past events within the search volume. 【0088】 The advantage of the techniques applied in Figures 9A and 9B is that the processing at the time of arrival of each new event is relatively light, which means that real-time or near real-time processing is possible. Furthermore, given that previously defined input edges of past events are not updated by the arrival of new events, the number of past event outputs stored in memory can be relatively small, corresponding only to the outputs of past nodes that enter the search volume, for example. 【0089】 Figure 10 illustrates the generation of embedding vectors in each layer of the event graph according to an exemplary embodiment of the present disclosure. A graph neural network is used to calculate the node embeddings. Figure 10 shows the event graph G i An example of an event graph neural network applied to this is illustrated, which has five layers l1 to l5. Each layer uses the embedding of connected events to create an embedding vector Z for the event. j,l This is calculated. For example, Figure 10 shows a new event e i Connected event e j The embedding matrix Z of event j associated with one of the events.j An example is illustrated, showing the embedding matrix Z j This is the sub-vector Z of each layer 1-5. j,1 ~Z j,5 Includes the embedding vector Z of the first layer. j,1 This includes, for example, time information in the form of a timestamp and event data values having one or more dimensions, formed from corresponding events. In the example in Figure 10, each embedding vector Z j,1 This contains three values, which correspond, for example, to the timestamp, X and Y pixel coordinates in the first layer, and to the abstract vector representation of the event in subsequent layers. To obtain the embedding in each of these subsequent layers, each subsequent layer applies a transformation to the embedding vector of each node, or in the case of the first layer, to the input event data, resulting in the embedding vector Z. i,l This is generated. This transformation also takes into account the embeddings of other nodes to which the node is connected by edges. For example, the transformation can be the sum or average of the embedding vectors, or actually more complex, based on edge information. The layers of the event graph neural network are chained sequentially, for example, as in a standard neural network. 【0090】 Figure 11 illustrates the concatenation of embedding vectors and their application to a neural network according to exemplary embodiments of the present disclosure. In particular, new event e i Embedding vectors Z for each layer of the neural network related to i,l For example, a new event e i They are processed by passing them through a neural network to generate predictions related to them. In the example in Figure 11, this is the embedding vector Z i,l Concatenating them creates a single vector Z iThis involves forming a vector and processing this vector using a multilayer perceptron. For example, in the first layer L1 of a multilayer perceptron neural network, instance normalization can be applied to generate a first vector representation v1 of an event. For example, a prediction V, which is an optical flow prediction. i To generate this, this first vector representation v1 is, for example, the event optical flow vector V i Two output values V corresponding to the x and y components of x and V y It is processed by one or more further layers, including an output layer that returns a specific value. 【0091】 Figure 12 schematically illustrates a neural network head 1200 for making predictions according to exemplary embodiments of the present disclosure. Figure 12 specifically illustrates an example where there are five layers 1202 referenced l1 to l5 within a graph neural network, and the output state of each layer for each node is provided as input to the neural network head 1200. For layers l2 to l5, the input to each layer is the output of the previous layer, although, for example, new weights and message passing parameters are applied. For example, in each layer, the same shared weights are applied to all nodes. The weights differ from layer to layer. Message passing depends on the type of convolution. For many types of convolution, there are no parameters. For more complex types, such as B-spline convolution, there is, for example, a number of basic functions / kernels that can be set as parameters that vary from layer to layer. 【0092】 The neural network head 1200, for example in the example of Figure 12, comprises a layer L of four neurons that receive node embeddings of each given node in layers l1 to l5 of the graph neural network. For example, the neurons in layer L are configured to apply a rectified linear activation function. Instance normalization can be applied to the activation of this layer. The outputs of the neurons in layer L are fed to a single output neuron Nout, for example, which is configured to generate predictions corresponding to probabilities between 0 and 1. 【0093】 The advantages of the graphing algorithm described herein are that it enables significantly lower latency compared to conventional methods, typically around four orders of magnitude lower, and lower total computational load, for example, two orders of magnitude lower. Furthermore, edges are, for example, no longer stored in memory, and events older than a time equal to the temporal search radius are removed from memory, for example. 【0094】 Various embodiments and variations are described. Those skilled in the art will understand that specific features of these embodiments can be combined, and other variations will be readily conceivable to those skilled in the art. 【0095】 For example, in the case of graph neural networks, pooling operations can be used to coarseen the graph, thereby reducing the number of nodes and the computational load required. For example, pooling operations are periodically triggered on a given graph neural network layer, for example, using a given voxel grid size, i.e., a periodic grid of the input event cloud in spatial (xy) and temporal (t) dimensions. For example, events within a single voxel have their features aggregated into a single superevent, and the connections between these superevents are based either on maintaining connections between previously existing voxels or on performing new searches only in the past and defining a new set of edges. 【0096】 Furthermore, although the embodiments are described primarily in the context of camera data, i.e., 2D x and y event data values, the techniques described herein are applicable to other types of event-based sensors, such as event-based Cochlear or silicon ears. In such cases, instead of encoding events in space and time, the silicon ear encodes events in energy and time in a frequency band, for example, resulting in a 2D point plane instead of a 3D point cloud in the case of an event camera. For example, such a silicon ear is described in detail in the publication "AER EAR: A matched silicon Cochlear pair with address event representation interface" by Chan, Vincent, Shih-Chii Liu, and Andr van Schaik, IEEE Transactions on Circuits and Systems I: Regular Papers 54.1(2007):48-59. Moreover, the embodiments described herein can be applied to any application having temporal or causal dimensions. For example, these can be applied to data structures that can be gradually explored, with the possible constraint that no loops are created (the condition "edges connect only from past to future" can be rephrased as meaning that a graph obtained using unidirectional edges will not have loops). 【0097】 Furthermore, arbitrary operations can be applied to the data before and / or after the proposed graph construction. For example, data filtering can be applied so that events at a given pixel with the same polarity emitted within a specific time window are discarded before processing. Other options include normalization of the input features and encoding of time using a single or pair of periodic functions, such as a sawtooth wave, instead of absolute time, which may impose a certain memory load. 【0098】 Furthermore, the distance between nodes can be calculated using any distance metric (L2 norm, L1 norm (also known as Manhattan distance, Euclidean distance, etc.)). 【0099】 Furthermore, the features included in each node can be modified, for example. Some of the available possibilities include the event time, x and y coordinates, an approximate normalized vector to the local event plane (see the publication "Learning visual motion segmentation using event surfaces" by Mitrokhin, Anton et al., Proceedings of the IEEE / CVF Conference on Computer Vision and Pattern Recognition, 2020), and the event polarity. 【0100】 Node embedding normalization can be used to normalize node features within an intermediate graph neural network layer. Of particular interest is the use of instance normalization, because it is a technique that uses only node-level statistics, rather than batch norms or layer norms that require grouping statistics from multiple nodes or multiple graphs. 【0101】 The features stored in the created edges can also be modified. For example, simple binary flags can be used to indicate the spatial and temporal coordinate differences between connections or connected events, and this has been shown to exhibit particularly high performance. 【0102】 The search radius can be changed statically, in other words, in the same way for all graphs, or dynamically, according to certain rules, for example. For example, in the case of a time-dense number of events suggesting fast-moving edges in a particular area of the scene, the search radius can be reduced to maintain the spatial relationships between events generated by objects regardless of their velocity. Otherwise, the search radius can be fitted using an approximated normal vector of the events, and the search radius can be reduced using a larger vector of a given dimension, as this also suggests fast-moving objects. Conversely, in the case of a small normal vector, the search radius can be expanded. 【0103】 In some embodiments, the number of neighbors can be limited, fixed, dynamic, or unlimited. 【0104】 The processes performed are not necessarily graph neural networks, as discussed in the publication titled "Graph Convolutional Neural Network (GCN), with the gconv and spline layers," available at https: / / pytorch-geometric.readthedocs.io / en / latest / modules / nn.html. For example, non-AI-based processing methods, including graph search algorithms, can be applied. 【0105】 Finally, the actual implementation of the embodiments and variations described herein is within the capabilities of those skilled in the art based on the functional descriptions provided above.
Claims
[Claim 1] A method for generating predictions based on event data captured by an event-based sensor (102) using a data processing device (104), - The data processing device (104) receives a new event (e i A step of receiving a new event, wherein the new event includes a timestamp and an event data value having one or more dimensions, - The data processing device (104) processes the time distance (r) related to the timestamp within the list of past events. t ) and within the distance between the data values of the event data values (r xy The aforementioned past events (e) that fall within ) j ) and - The data processing device (104) processes the new event (e i A first node representing ) and the determined past event (e j The steps include constructing an event subgraph that includes further nodes representing each of the above, and unidirectional edges linking each further node to the first node, - The data processing device (104) performs node embedding of the first node (Z) based on the edge consisting only of the unidirectional edge that links each further node to the first node. i The steps to calculate ) and - applying a function (σ'()) to the node embedding (Z i ) to generate the prediction (V i ); and Methods that include... [Claim 2] The aforementioned prediction (V i The method according to claim 1, further comprising the step of controlling one or more actuators (214) based on ). [Claim 3] The method according to claim 1 or 2, wherein the step of constructing a unidirectional edge linking each further node to the first node includes constructing a unidirectional edge that points toward the future and links each further node to the corresponding input of the first node. [Claim 4] The aforementioned past events (e j The method according to any one of claims 1 to 3, wherein the step of determining the event subgraph further includes selecting only M past events to form the determined past events, where M is an integer equal to at least 2, and the step of constructing the event subgraph includes creating only M unidirectional edges linking each further node to the first node. [Claim 5] The aforementioned past events (e j The step of determining the event data value radius (r) in the one or more dimensions of the event data value is xy ), and the time radius (r) in the time domain. t The method according to any one of claims 1 to 4, comprising performing a K nearest neighbor event search within a past search volume having ). [Claim 6] The event-based sensor (102) is an event camera, and the prediction (V i The method according to any one of claims 1 to 5, wherein ) is optical flow prediction. [Claim 7] The data processing device (104) processes the new event (e i The data processing device is configured to perform real-time processing of the new event (e i After the time of ), the prediction (V) is processed within a processing delay of less than 100 microseconds, preferably less than 5 microseconds. i The method according to any one of claims 1 to 6, configured to generate ). [Claim 8] The aforementioned prediction (V i The method according to any one of claims 1 to 7, wherein the step of generating the event subgraph includes passing the event subgraph through a graph neural network (212). [Claim 9] The graph neural network (212) includes multiple layers, and the step of passing the event subgraph through the graph neural network (212) is, - Using the embedding of all events connected by the unidirectional edge, the embedding vector (Z) of the first node and the further nodes of the subgraph is calculated. j,l This involves calculating the embedding vector (Z j ) is the sub-vector (Z) of each of the layers j,1 ~Z j,5 ) including, - The embedding vector (Z) of each node j,l ) or, in the case of the first layer of the graph neural network, the input event data is transformed to obtain the resulting embedding vector (Z i,l ) to generate and The method according to claim 8, including the method described in claim 8. [Claim 10] A data processing device for generating predictions based on event data captured by an event-based sensor (102), - A new event (e i ) receives, and the new event includes a timestamp and an event data value having one or more dimensions, - Within the list of past events, the time distance (r) related to the aforementioned timestamp. t ) and within the distance between the event data values (r xy The aforementioned past events (e) that fall within ) j ) decided, - The aforementioned new event (e i A first node representing ) and the determined past event (e j Construct an event subgraph including further nodes representing each of the above, and unidirectional edges linking each further node to the first node, - Based on the edge consisting only of the unidirectional edge that links each further node to the first node, the node embedding of the first node (Z i ) calculate and - The aforementioned node embedding (Z i By applying the function (σ'()) to the prediction (V i ) generates A data processing device configured as follows. [Claim 11] The aforementioned prediction (V i The data processing device according to claim 10, further configured to control one or more actuators (214) based on ). [Claim 12] The data processing device according to claim 10 or 11, comprising constructing a unidirectional edge that links each further node to the first node, pointing toward the future, and constructing a unidirectional edge that links each further node to the corresponding input of the first node. [Claim 13] The aforementioned past events (e j A data processing device according to any one of claims 10 to 12, wherein determining the event subgraph further comprises selecting only M past events to form the determined past events, where M is an integer equal to at least 2, and constructing the event subgraph comprises creating only M unidirectional edges linking each further node to the first node. [Claim 14] The event data value radius (r) in the one or more dimensions of the event data value xy ), and the time radius (r) in the time domain. t Within the past search volume containing the past events (e j A data processing device according to any one of claims 10 to 13, configured to determine ). [Claim 15] The event-based sensor (102) is an event camera, and the prediction (V i A data processing device according to any one of claims 10 to 14, wherein ) is optical flow prediction. [Claim 16] The data processing device (104) processes the new event (e i The data processing device is configured to perform real-time processing of the new event (e i After the time of ), the prediction (V) is processed within a processing delay of less than 100 microseconds, preferably less than 5 microseconds. i A data processing device according to any one of claims 10 to 15, configured to generate ). [Claim 17] By passing the aforementioned event subgraph through a graph neural network (212), the prediction (V i A data processing device according to any one of claims 10 to 16, configured to generate ). [Claim 18] The graph neural network (212) includes multiple layers, and passing the event subgraph through the graph neural network (212) is, - The graph neural network (212) uses the embedding of all events connected by the unidirectional edges to create the embedding vector (Z) of the first node and the further nodes of the subgraph. j,l This involves calculating the embedding vector (Z j ) is the sub-vector (Z) of each of the layers j,1 ~Z j,5 ) including, - The graph neural network (212) determines the embedding vector (Z) of each node. j,l ) or, in the case of the first layer of the graph neural network, the input event data is transformed to obtain the resulting embedding vector (Z i,l ) to generate and A data processing device according to claim 17, including the data processing device described in claim 17. [Claim 19] - A data processing device according to any one of claims 10 to 18, - Coupled with the data processing device, and the new event (e i An event-based sensor (102) configured to generate an event and transmit the new event to the data processing device, - The one or more actuators (214) and A control system equipped with the following features.