High-efficiency streaming processing devices, architectures, methods, and processors for spatiotemporally ordered sparse data streams
By employing a collaborative design of multi-layered waiting queues and multi-row shifted hash tables, the problem of input-output order mismatch in sparse data stream processing is solved, achieving low-latency, high-throughput streaming processing, which is suitable for scenarios such as bio-inspired sensors, LiDAR point cloud processing, and network traffic analysis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TSINGHUA UNIVERSITY
- Filing Date
- 2026-02-28
- Publication Date
- 2026-06-30
Smart Images

Figure CN122308916A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of sparse data stream processing, and more particularly to an efficient streaming processing apparatus, architecture, method, and processor for spatiotemporally ordered sparse data streams. Background Technology
[0002] With the rapid development of artificial intelligence, the Internet of Things, and robotics, the amount of data generated by various sensors is exploding. Among these, sparse data streams, as an important data form, have broad application prospects in many fields such as bio-inspired sensors, LiDAR, network traffic monitoring, and financial transaction analysis. Sparse data streams refer to data that is unevenly distributed in time or space, with no effective data in most time or spatial locations, and only a small number of locations generating data. Bio-inspired sensors, represented by neuromorphic cameras, are a typical application scenario for sparse data streams. These sensors can capture changes in light intensity in a scene with a time resolution on the order of microseconds, outputting an asynchronous data stream containing changes in light intensity, spatial location, timestamp, spatial gradient value, and temporal gradient value. Compared to traditional frame-based cameras, they have significant advantages such as high dynamic range, low latency, and low power consumption.
[0003] As some advanced sensors begin to output sparse data streams with spatiotemporal ordering characteristics, i.e., spatiotemporal differential signals within a certain time interval are sorted and arranged according to their two-dimensional spatial addresses, this spatiotemporal ordering characteristic makes efficient streaming processing of sparse data streams possible. However, existing technologies have not fully utilized this characteristic to design dedicated processing architectures, resulting in technical problems such as input-output order mismatch and high memory access latency in existing processing technologies for spatiotemporal ordering sparse data streams. Summary of the Invention
[0004] In view of this, this disclosure proposes an efficient streaming processing device, architecture, method and processor for spatiotemporally ordered sparse data streams, which can achieve efficient decoupling from the input stream order to the output stream order through window address decoupling, while achieving low latency and high throughput streaming processing capabilities.
[0005] According to one aspect of this disclosure, an efficient streaming processing apparatus for spatiotemporally ordered sparse data streams is provided, comprising: a sparse data scheduler and an operator executor; the operator executor is used to perform operations of a specified sliding window operator on window data; the window data consists of multiple pixel values of any frame of sparse data in the sparse data stream within any sliding window; the sparse data scheduler has a multi-level waiting queue and a multi-row shift hash table; wherein, the multi-level waiting queue is used to store and maintain in an orderly manner the window addresses triggered by input data, the input data includes any valid data in any frame of sparse data in the sparse data stream, and the window address includes the window addresses of multiple sliding windows covering the pixel values of the valid data; the multi-row shift hash table is used to store the pixel values of the valid data in the sparse data and supports parallel reading of window data in a single clock cycle; the sparse data scheduler is used to determine the window data to be processed based on the window addresses stored in the multi-level waiting queues and the pixel values of the valid data stored in the multi-row shift hash table and input them to the operator executor to realize streaming processing of the sparse data stream.
[0006] In one possible implementation, the sparse data scheduler has the following processing states: update state, comparison state, read state, and write state; wherein, determining the window data to be processed and inputting it to the operator executor based on the window addresses stored in the multi-level waiting queue and the pixel values of the valid data stored in the multi-row shift hash table includes: the sparse data scheduler is initially in the update state, and the processing in the update state includes: when receiving any valid data in any frame of sparse data in the sparse data stream, parsing the pixel position and pixel value of the valid data in that frame of sparse data, determining the window addresses of multiple sliding windows covering the pixel values of the valid data based on the pixel position of the valid data and the window size of the sliding window operator, and allocating the window addresses of the multiple sliding windows to the multi-level waiting queue according to the row dimension, and recording the minimum value among the window addresses of the multiple sliding windows; after completing the processing in the update state, the sparse data scheduler... The sparse data scheduler enters a comparison state. The processing in this state includes: determining the global minimum value of the window address in the multi-level waiting queue and comparing it with the recorded minimum value; if the global minimum value is less than the recorded minimum value, the sparse data scheduler enters a read state; the processing in this read state includes: determining multiple target pixel positions involved in the sliding window corresponding to the global minimum value, accessing the multi-row shift hash table based on the multiple target pixel positions to obtain the window data to be processed and inputting the window data to the operator executor, removing the global minimum value from the multi-level waiting queue, and returning to the comparison state; if the global minimum value is greater than or equal to the recorded minimum value, the sparse data scheduler enters a write state; the processing in this write state includes: writing the pixel value of the currently received valid data to the multi-row shift hash table based on the pixel position of the valid data, and returning to the update state.
[0007] In one possible implementation, the window address is a pixel position at a specified location within the sliding window, the specified location including the window center of the sliding window; the number of layers in the multi-level waiting queue and the number of rows in the multi-row shift hash table are equal to the maximum window size of the sliding window operator; each layer of the multi-level waiting queue is used to store the window addresses of sliding windows in the corresponding row of the multiple sliding windows, the window addresses stored in each layer of the waiting queue are arranged in ascending order, and the first window address stored in each layer of the waiting queue is the minimum value among the window addresses cached in that layer of the waiting queue.
[0008] In one possible implementation, determining the global minimum value of the window addresses in the multi-level waiting queue includes: reading the first and first window addresses in each level of the multi-level waiting queue, obtaining the minimum value among the first and first window addresses in the multi-level waiting queue by comparing the first and first window addresses in each level of the waiting queue, and taking the minimum value among the first and first window addresses in the multi-level waiting queue as the global minimum value.
[0009] In one possible implementation, the step of accessing the multi-row shift hash table based on the multiple target pixel positions to obtain the window data to be processed includes: obtaining the storage location corresponding to each target pixel position by hash mapping the row number and column number corresponding to each target pixel position; the multi-row shift hash table outputs multiple pixel values corresponding to the multiple target pixel positions in parallel within a unit clock cycle based on the storage location corresponding to each target pixel position; wherein, for target pixel positions without valid data, the output pixel value is zero.
[0010] In one possible implementation, writing the pixel value of the valid data to the multi-row shift hash table based on the pixel position of the currently received valid data includes: obtaining the storage position of the valid data in the multi-row shift hash table by hash mapping the row number and column number corresponding to the pixel position of the valid data; writing the pixel value of the valid data to the multi-row shift hash table based on the storage position of the valid data in the multi-row shift hash table; wherein, if the storage position of the valid data in the multi-row shift hash table already contains data, the pixel value of the valid data is used to overwrite the data already stored at that storage position.
[0011] In one possible implementation, the processing in the write state further includes: determining the row number difference between the row number corresponding to the pixel position of the currently received valid data and the smallest row number corresponding to the pixel value already stored in the multi-row shift hash table; if the row number difference exceeds the total number of rows in the multi-row shift hash table, the multi-row shift hash table performs a shift operation corresponding to the row number difference, wherein each shift operation is used to remove the oldest pixel value stored in the multi-row shift hash table and update the smallest row number corresponding to the pixel value already stored in the multi-row shift hash table; after completing at least one shift operation corresponding to the row number difference, the pixel value of the valid data is written to the multi-row shift hash table based on the pixel position of the currently received valid data.
[0012] In one possible implementation, the sparse data scheduler uses a finite state machine to manage the state of the update state, the comparison state, the read state, and the write state; the multi-row shift hash table is implemented using shift registers, with each row in the multi-row shift hash table corresponding to a set of shift registers.
[0013] According to another aspect of this disclosure, an efficient streaming processing architecture for spatiotemporally ordered sparse data streams is provided, comprising: multiple cascaded processing devices, wherein the processing devices are the efficient streaming processing devices; wherein the result of the sliding window operator generated by any level of processing device is used as valid data input to the next level of processing device.
[0014] According to another aspect of this disclosure, an efficient streaming processing method for spatiotemporally ordered sparse data streams is provided, applied to a sparse data scheduler. The sparse data scheduler includes a multi-level waiting queue and a multi-row shift hash table. The multi-level waiting queue stores and maintains window addresses triggered by input data in an orderly manner. The input data includes any valid data in any frame of sparse data in the sparse data stream. The window address includes the window addresses of multiple sliding windows covering the pixel values of the valid data. The multi-row shift hash table stores the pixel values of the valid data in the sparse data and supports parallel reading of window data in a single clock cycle. The window data consists of multiple pixel values of any frame of sparse data in the sparse data stream within any sliding window. The processing method includes: the sparse data scheduler determines the window data to be processed based on the window addresses stored in the multi-level waiting queues and the pixel values of the valid data stored in the multi-row shift hash table, and inputs this data to an operator executor to achieve streaming processing of the sparse data stream. The operator executor performs operations on the window data using a specified sliding window operator.
[0015] According to another aspect of this disclosure, a processor is provided, comprising: the processing apparatus described herein, or comprising the high-efficiency streaming processing architecture described herein, or performing the processing method described herein.
[0016] According to various aspects of this disclosure, a sparse data scheduler, in conjunction with an operator executor, enables efficient sliding window operation processing of sparse data streams. In particular, by introducing a window address and a multi-layered waiting queue design, the window address serves as an intermediary, allowing the sparse data scheduler to determine the window data to be processed based on the pixel values stored in a multi-row shifted hash table. This decouples the spatiotemporal order of the input data from the processing order of the sliding window, successfully solving the problem of input-output order mismatch in sparse data stream processing. This allows traditional sliding window operators to be directly applied to spatiotemporally ordered sparse data streams without algorithm reconstruction, greatly improving the versatility and reusability of the device. Furthermore, the multi-row shifted hash table design enables efficient storage of pixel values and parallel access to window data within a single clock cycle. Compared to traditional non-contiguous memory access methods, the access latency for window data is reduced by more than an order of magnitude, thereby improving the overall throughput of sparse data streams and reducing data access latency, achieving a highly efficient pipelined processing flow for sliding window operations on sparse data streams.
[0017] Other features and aspects of this disclosure will become clear from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description
[0018] The accompanying drawings, which are included in and form part of this specification, illustrate exemplary embodiments, features, and aspects of this disclosure together with the specification and serve to explain the principles of this disclosure.
[0019] Figure 1 A block diagram of an efficient streaming processing apparatus for spatiotemporally ordered sparse data streams according to an embodiment of the present disclosure is shown.
[0020] Figure 2 This diagram illustrates adding an address in an N-layer wait queue update logic according to an embodiment of the present disclosure.
[0021] Figure 3 This diagram illustrates the output address in an N-layer wait queue update logic according to an embodiment of the present disclosure.
[0022] Figure 4 A schematic diagram illustrating the update state of a sparse data scheduler according to an embodiment of the present disclosure is shown.
[0023] Figure 5 A schematic diagram showing the comparison state of a sparse data scheduler according to an embodiment of the present disclosure is illustrated.
[0024] Figure 6 A schematic diagram showing the read state of a sparse data scheduler according to an embodiment of the present disclosure is provided.
[0025] Figure 7 A schematic diagram showing the write state of a sparse data scheduler according to an embodiment of the present disclosure is illustrated.
[0026] Figure 8 A schematic diagram of an efficient streaming processing architecture for spatiotemporally ordered sparse data streams according to an embodiment of the present disclosure is shown. Detailed Implementation
[0027] Various exemplary embodiments, features, and aspects of this disclosure will now be described in detail with reference to the accompanying drawings. The same reference numerals in the drawings denote elements that have the same or similar functions. Although various aspects of the embodiments are shown in the drawings, they are not necessarily drawn to scale unless specifically indicated otherwise.
[0028] As used herein, the terms “comprising,” “including,” “having,” or variations thereof are open-ended and include one or more of the stated features, integrals, elements, steps, components, or functions, but do not exclude the presence or addition of one or more other features, integrals, elements, steps, components, functions, or groups thereof.
[0029] When an element is referred to as “connected,” “coupled,” “responding,” or a variation thereof relative to another element, it may be directly connected, coupled, or responding to another element, or there may be an intermediate element present.
[0030] Although the terms first, second, third, etc., may be used herein to describe various elements / operations, these elements / operations should not be limited by these terms. These terms are only used to distinguish one element / operation from another. Therefore, without departing from the teachings of the inventive concept, a first element / operation in some embodiments may be referred to as a second element / operation in other embodiments.
[0031] The term “exemplary” as used herein means “serving as an example, embodiment, or illustration.” Any embodiment illustrated herein as “exemplary” is not necessarily to be construed as superior to or better than other embodiments.
[0032] Furthermore, to better illustrate this disclosure, numerous specific details are set forth in the following detailed description. Those skilled in the art will understand that this disclosure can be practiced without certain specific details. In some instances, methods, means, components, and circuits well known to those skilled in the art have not been described in detail in order to highlight the main points of this disclosure.
[0033] It should be noted that the information (including but not limited to user device information, user personal information, etc.), data (including but not limited to data used for analysis, data stored, data displayed, etc.) and signals involved in this application are all authorized by the user or fully authorized by all parties, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant regions.
[0034] As mentioned above, existing processing techniques for sparse data streams with spatiotemporal order characteristics suffer from problems such as input-output order mismatch and high memory access latency. Specifically, existing technologies mainly employ two mainstream approaches for processing sparse data streams with spatiotemporal order characteristics. The first approach is to convert the sparse data stream into a dense frame format representation and then apply traditional image-based visual algorithms for processing. This method can leverage mature streaming hardware architectures such as line buffering to achieve efficient data reuse and low-latency processing. However, converting sparse data into dense frames completely destroys the sparsity of the data, leading to a large amount of invalid data participating in the computation, increasing computational complexity and storage overhead. In other words, a large number of zero values with no data locations also need to participate in computation and storage, resulting in a serious waste of computing resources and a significant increase in memory consumption, which contradicts the limited computing and storage resources of edge computing devices. The second approach involves designing entirely new processing algorithms for time-sequential sparse data streams, directly processing asynchronous data in a streaming manner. While this method preserves data sparsity, the lack of spatial correlation between time-sequential sparse data streams makes it difficult to apply traditional sliding window-based operators. Furthermore, it requires designing entirely new algorithmic mechanisms, which may produce inaccurate calculation results under certain conditions. Although the approach of directly processing time-sequential data streams maintains sparsity, the order in which data packets arrive is inconsistent with the processing order required by the sliding window operator, resulting in a mismatch between the input and output streams. This prevents the adoption of an efficient pipelined processing architecture, severely limiting system throughput and restricting the scalability and versatility of existing technologies.
[0035] Furthermore, existing technologies face significant challenges in sparse data storage and access. Due to the discontinuous spatial address distribution of sparse data, traditional continuous memory access patterns are unsuitable. Address matching operations are necessary to locate and access the required data, resulting in significant memory access latency overhead. This becomes a key bottleneck restricting the overall performance of sparse data stream processing architectures and makes it difficult to adapt to different types of sparse data streams and different application scenarios.
[0036] In view of this, this disclosure addresses the technical problems of input-output order mismatch and high memory access latency in existing spatiotemporally ordered sparse data stream processing technologies, and proposes a high-efficiency streaming processing device for spatiotemporally ordered sparse data streams. The core innovation of this device lies in the design of a mechanism that integrates a multi-layered waiting first-in-first-out queue with a multi-row shifted hash table. This achieves efficient conversion from input stream order to output stream order through window address decoupling, while utilizing a shifted hash table structure based on shift registers to enable parallel access within a single clock cycle. This achieves low-latency, high-throughput streaming processing capabilities while maintaining data sparsity. This device is not only suitable for visual sparse data generated by bio-inspired sensors such as neuromorphic cameras, but can also be extended to various processing scenarios involving spatiotemporally ordered sparse data streams requiring sliding window operations, such as LiDAR point cloud processing, network traffic analysis, and high-frequency financial data processing.
[0037] Specifically, this disclosure proposes a core technical approach for an efficient streaming processing device for spatiotemporally ordered sparse data streams. This approach involves introducing a window address as an intermediate medium to decouple the spatiotemporal order of the input data from the processing order of the sliding window, thereby resolving the problem of input-output stream order mismatch. The technical principle lies in fully utilizing two inherent characteristics of the sliding window operator: first, the window size is fixed for a given sliding window algorithm; second, when the window slides in an ordered manner, all windows scanning the same pixel remain spatially ordered. Based on these two characteristics, this disclosure also designs a multi-layered waiting queue structure. Each waiting queue is responsible for maintaining the window address order along a row dimension. By comparing the first addresses of each queue, the global minimum window address is determined, thereby achieving ordered window output.
[0038] In terms of architecture, the processing device proposed in this disclosure adopts a modular design, mainly implemented by a sparse data scheduler and an operator executor. The sparse data scheduler consists of two core modules: a multi-layered waiting first-in-first-out queue and a multi-row shift hash table. The sparse data scheduler is implemented using a finite state machine, including four processing states: update, compare, read, and write, coordinating the working timing of each module. Key technical aspects include: mapping and conversion of input data to window addresses, orderly maintenance of window addresses in multi-layered queues, window output determination based on minimum address comparison, and single-cycle parallel memory access based on shift hash tables. The device also supports the cascading deployment of multi-level algorithms, where the output of each layer can be used as the input of the next layer, forming a complete algorithm processing pipeline.
[0039] The following is passed Figures 1 to 7 This disclosure provides a detailed description of the efficient streaming processing apparatus for spatiotemporally ordered sparse data streams proposed in the embodiments of this disclosure.
[0040] Figure 1 A block diagram of an efficient streaming processing apparatus for spatiotemporally ordered sparse data streams according to an embodiment of the present disclosure is shown. Figure 1 As shown, the device includes:
[0041] The sparse data scheduler 11 and the operator executor 12 are used to perform the operation of the specified sliding window operator on the window data. The window data is the multiple pixel values of any frame of sparse data in the sparse data stream within any sliding window. The sparse data scheduler 11 is equipped with a multi-level waiting queue 111 and a multi-row shift hash table 112.
[0042] The multi-level waiting queue 111 is used to store and maintain the window address triggered by the input data in an orderly manner. The input data includes any valid data in any frame of sparse data in the sparse data stream. The window address includes the window address of multiple sliding windows covering the pixel values of the valid data.
[0043] The multi-row shift hash table 112 is used to store the pixel values of valid data in sparse data and supports parallel reading of window data in a single clock cycle;
[0044] The sparse data scheduler 11 is used to determine the window data to be processed and input it to the operator executor 12 based on the window address stored in the multi-level waiting queue 111 and the pixel value of the valid data stored in the multi-row shift hash table 112, so as to realize streaming processing of sparse data streams.
[0045] According to the apparatus of this disclosure, the sparse data scheduler, in conjunction with the operator executor, can efficiently perform sliding window operator operations on sparse data streams. In particular, by introducing a window address and a multi-layered waiting queue design, the window address can be used as an intermediate medium, allowing the sparse data scheduler to determine the window data to be processed based on the pixel values stored in a multi-row shifted hash table. This decouples the spatiotemporal order of the input data from the processing order of the sliding window, successfully solving the problem of input-output order mismatch in sparse data stream processing. This allows traditional sliding window operators to be directly applied to spatiotemporally ordered sparse data streams without algorithm reconstruction, greatly improving the versatility and reusability of the apparatus. Furthermore, the multi-row shifted hash table design enables efficient storage of pixel values and parallel access to window data within a single clock cycle. Compared to traditional non-contiguous memory access methods, the access latency for window data is reduced by more than an order of magnitude, thereby improving the overall throughput of sparse data streams and reducing data access latency, achieving a highly efficient pipelined processing flow for sliding window operations on sparse data streams.
[0046] In practical applications, the sparse data scheduler 11 and the operator executor 12 can each be implemented using dedicated hardware circuits, software algorithms, or a combination of hardware and software, as long as they can achieve their respective required functions. This disclosure does not impose any restrictions on this.
[0047] The sliding window operator can be any operator that performs operations based on a sliding window, such as linear operators like convolution operators, nonlinear operators like taking the maximum value in the window, or counting the number of zero values in the window. This disclosure does not limit the scope of the operators used. Furthermore, those skilled in the art can determine the type of sliding window operator executed by the operator executor and the corresponding window size based on actual needs; this disclosure also does not limit the scope of the operators used.
[0048] It should be understood that the sliding window of the sliding window operator uses an ordered sliding method (e.g., sliding from left to right, from top to bottom, with a sliding step size of 1). Therefore, the pixel positions of a frame of sparse data within the sliding window's scanning order (e.g., from left to right, from top to bottom) can be gradually increasing. In other words, the scanning order of the sliding window can follow the direction of increasing pixel positions in the sparse data. For example, assuming the sliding window's scanning order is from left to right, from top to bottom... Figure 2 The pixel positions in the sparse data frame shown increase from left to right and from top to bottom. Specifically, the pixel position of "A,1" in the first row and first column is less than the pixel position of "A,2" in the first row and second column, which is less than the pixel position of "B,2" in the second row and first column, and so on. This method facilitates the subsequent use of multi-level waiting queues to store window addresses in an orderly manner and helps to quickly determine the global minimum address for reading window data.
[0049] The sparse data stream consists of multiple frames of sparse data in the time dimension, and each frame of sparse data includes valid data in the spatial dimension. Any input data format can include the pixel position (i.e., spatial position, or pixel coordinates, spatial coordinates, etc.) of the valid data in its respective sparse data, pixel value (such as the light intensity change value collected by the neuromorphic camera, the event value collected by the event camera, etc.), and timestamp (i.e., the time when the valid data was collected).
[0050] It should be understood that a frame of sparse data stream may only contain pixel values at some pixel locations (i.e., only a portion of the data is valid). Since the sliding window has a certain window size, there can be multiple sliding windows covering a single pixel value. For example, such as... Figure 2 As shown, assuming the sliding window operator has a window size of 3×3 and a step size of 1, there are 9 sliding windows covering the pixel value "e". That is, for an operator with a sliding window size of N×N, each pixel value will trigger N... 2 A sliding window.
[0051] In some embodiments, the window address of each sliding window can be a pixel position at a specified location within the sliding window. This specified location includes the center of the sliding window, or it can be any position such as the top-left corner or bottom-right corner of the sliding window; this disclosure does not impose any limitations on this. For example, assuming the window address is defined as the pixel position of the window center, then as follows... Figure 2 The window address of the first sliding window in the first row of the nine sliding windows covering the pixel value "e" can be the pixel position (i.e., pixel coordinate) corresponding to "C,3" in the third row and third column of the entire frame of sparse data. Similarly, the window address of the third sliding window in the third row of the nine sliding windows is the pixel position corresponding to "E,5" in the fifth row and fifth column of the entire frame of sparse data. Assuming that the pixel coordinate of the pixel value "e" is (x,y), then the window address of the center of the nine sliding windows is: (x-1,y-1), ..., (x+1,y+1).
[0052] In some embodiments, the multi-level wait queue 111 can adopt a first-in-first-out (FIFO) queue structure to maintain window addresses in an orderly manner. The multi-row shift hash table 112 can be implemented using shift registers, with each row in the multi-row shift hash table 112 corresponding to a set of shift registers (i.e., a set of storage units). This novel storage architecture based on the multi-row shift hash table involving shift registers is effectively applicable to efficient storage and fast memory access for sparse data. The number of levels in the multi-level wait queue 111 and the number of rows in the multi-row shift hash table 112 can be equal to the maximum window size of the sliding window operator. For example, if the maximum window size of the sliding window operator is N, then N levels of wait queue 111 and N rows of shift hash table 112 can be used to distribute window addresses according to row dimensions into the multi-level wait queues for orderly storage, and to enable the multi-row shift hash table to support simultaneous memory access of N rows within a single clock cycle. 2 Each pixel value is used to ensure that all pixel values involved in the currently processed sliding window are in the hash table.
[0053] In some embodiments, each level of the multi-level waiting queue 111 can be used to store the window addresses of sliding windows in the same row of multiple sliding windows. The window addresses cached in each level of the waiting queue are arranged in ascending order, and the first window address cached in each level of the waiting queue is the minimum value among the window addresses cached in that level of the waiting queue. For example, as shown... Figure 2 As shown, assuming the sliding window operator has a window size of 3×3, a three-level waiting queue is used for... Figure 2The window addresses of the three sliding windows in the first row of the nine sliding windows covering the pixel value "e" (i.e., the pixel positions corresponding to "C,3", "C,4", and "C,5") can be stored in the first waiting queue of the three-level waiting queue, arranged in the order of "C,3", "C,4", and "C,5". If there are no other pixel positions before "C,3" in the first waiting queue, then the pixel position corresponding to "C,3" becomes the first window address in the first waiting queue, and so on. The second row... The window addresses of the three sliding windows (i.e., the pixel positions corresponding to "D,3", "D,4" and "D,5") can be stored in the second waiting queue of the three-level waiting queue and arranged in the order of "D,3", "D,4" and "D,5". The window addresses of the three sliding windows in the third row (i.e., the pixel positions corresponding to "E,3", "E,4" and "E,5") can be stored in the third waiting queue of the three-level waiting queue and arranged in the order of "E,3", "E,4" and "E,5".
[0054] In some embodiments, the multi-level waiting queue 111 can update the window address stored in the multi-level waiting queue 111 based on the minimum value among the window addresses of multiple sliding windows corresponding to newly arrived pixel values. For example, as shown in the figure... Figure 3 As shown, assuming "e" is the newly arrived pixel value, we can find its top-left window address (that is, the minimum value among all sliding window addresses covering the pixel value "e"). We then output and remove the window addresses of all waiting windows whose addresses are less than the top-left window address (i.e., the pixel position corresponding to "C,3") from the multi-level waiting queue 111. These waiting windows can be understood as windows whose pixel values are already determined. This fully utilizes the waiting queue to store window addresses in an ordered manner.
[0055] In some embodiments, the depth of each waiting queue in the multi-layer waiting queue 111 (i.e. the number of storage window addresses) can depend on the image width of a frame of sparse data. Considering the sparsity of sparse data, the depth parameter of each waiting queue can be obtained by configuring a scaling factor and multiplying the scaling factor by the image width. The multi-layer waiting queue can be constructed based on this depth parameter, thus making full use of the storage space of each waiting queue.
[0056] In some embodiments, the multi-row shift hash table 112 can employ a preset hash mapping function. This function can use a row-column separation mapping strategy to hash the row and column numbers corresponding to the pixel positions of the valid data to be stored. For example, the row number can be mapped to a row number in the shift hash table, and the column number can be modulo-mapped to a hash index to determine the specific storage location. Then, based on the mapped hash table rows and hash indexes, the pixel positions of the valid data are stored in the multi-row shift hash table 112. This method effectively stores the pixel values of valid data, reducing storage overhead.
[0057] In some embodiments, the multi-row shift hash table 112 may have shift control logic (shift mechanism), that is, by maintaining a pointer to the smallest row number of the currently stored pixel values in the hash table, when a new pixel value arrives, its row number is compared with the smallest row number pointer. If the difference exceeds the number of rows in the hash table, a shift operation is triggered. The shift operation adopts a cyclic overwrite strategy, marking the oldest pixel value in the hash table as invalid, and writing the new pixel value into that row. Efficient reuse of hash table rows is achieved through pointer cyclic overwrite. Specifically, when a new pixel value arrives, the number of shift operations to be performed can be determined based on the row number difference. Each shift operation pushes out the oldest pixel value and puts in the new pixel value. This design can adapt to different sparsity levels of sparse data, ensuring timely data updates in both sparse and dense scenarios.
[0058] In some embodiments, the multi-row shift hash table 112 may have a collision handling mechanism. When multiple pixel values are mapped to the same storage location in the hash table, the latest data overwrite strategy is adopted to ensure that the hash table stores the latest pixel value. In addition, it also supports adding a zero-value fast path. That is, for pixel positions without pixel values, the multi-row shift hash table 112 can directly return a zero value, avoiding actual storage and access operations and further reducing storage overhead.
[0059] In some embodiments, the sparse data scheduler 11 determines the window data to be processed based on the window addresses stored in the multi-level waiting queue 111 and the pixel values of the valid data stored in the multi-row shift hash table 112. For example, the sparse data scheduler 11 determines the ready processable window by comparing the first and second window addresses in each waiting queue, and then reads the window data (i.e., multiple pixel values) corresponding to the processable window from the multi-row shift hash table based on the pixel positions within the ready processable window. The window data can then be input to the operator executor to perform the sliding window operator operation on the window data. This disclosure does not limit the scope of the embodiments.
[0060] Based on the aforementioned sparse data scheduler, the data flow of the sparse data stream can be summarized as follows: the spatiotemporally ordered sparse data stream first enters the sparse data scheduler, generates window addresses through address mapping, and writes them into multi-level waiting queues. Addresses in the queues are compared, and the globally minimum address is output. Based on this globally minimum address, multiple rows of shifted hash tables are accessed in parallel to obtain window data. The window data is then processed by the operator executor to generate the output stream. For multi-level algorithms, the output result of each level serves as the input data for the next level, maintaining the same spatiotemporal order characteristics, and is also stored and accessed using a shifted hash table structure.
[0061] In some embodiments, to achieve efficient streaming processing of sparse data streams, this disclosure also provides a sparse data scheduler 11 implemented based on a finite state machine. The sparse data scheduler 11 has the following processing states: update state, comparison state, read state, and write state. That is, the sparse data scheduler 11 uses a finite state machine to manage the update state, comparison state, read state, and write state. Based on the above four processing states, the sparse data scheduler 11 determines the window data to be processed and inputs it to the operator executor based on the window addresses stored in the multi-level waiting queue and the pixel values of the valid data stored in the multi-row shift hash table, including:
[0062] The sparse data scheduler is initially in an update state. The processing in the update state includes: when receiving any valid data in any frame of sparse data in the sparse data stream, parsing out the pixel position and pixel value of the valid data in the frame of sparse data; based on the pixel position of the valid data and the window size of the sliding window operator, determining the window addresses of multiple sliding windows covering the pixel value of the valid data; distributing the window addresses of multiple sliding windows to multiple waiting queues according to the row dimension; and recording the minimum value among the window addresses of multiple sliding windows.
[0063] After completing the processing in the update state, the sparse data scheduler 11 enters the comparison state. The processing in the comparison state includes: determining the global minimum value of the window address in the multi-level waiting queue, and comparing the global minimum value with the recorded minimum value.
[0064] When the global minimum is less than the recorded minimum, the sparse data scheduler enters the read state. The processing in the read state includes: determining the multiple target pixel positions involved in the sliding window corresponding to the global minimum, accessing the multi-row shift hash table based on the multiple target pixel positions, obtaining the window data to be calculated and inputting the window data into the operator executor, removing the global minimum from the multi-level waiting queue, and returning to the comparison state.
[0065] When the global minimum is greater than or equal to the minimum recorded value, the sparse data scheduler enters the write state. The processing in the write state includes: based on the pixel position of the currently received valid data, writing the pixel value of the valid data to a multi-row shift hash table, and returning to the update state.
[0066] In practical applications, during the initialization phase, the depth parameters of the multi-level waiting queue can be configured to construct the multi-level waiting queue 111, and the storage units of the multi-row shift hash table 112 can be initialized. The initial state of the sparse data scheduler 11 can be set to the update state to wait for data to arrive before processing.
[0067] When new valid data arrives in any frame of sparse data in the sparse data stream, the sparse data scheduler 11 enters the update state. First, it parses the pixel position and pixel value of the valid data. Based on the pixel position of the valid data and the window size of the sliding window operator, it calculates the window addresses of all associated sliding windows triggered by the valid data. As mentioned above, for an operator with a window size of N×N, the sliding step size is 1 by default, so each valid data will trigger N... 2 A sliding window is used, and its address can be defined, for example, as the pixel position at the center of the sliding window. The calculated window addresses are then distributed into N waiting queues according to the row dimension. Each waiting queue stores the window address of the corresponding row and records the minimum window address triggered by the valid data (i.e., the minimum value among the window addresses of multiple sliding windows triggered by the valid data). The sparse data scheduler 11 can be configured with dedicated registers to record the minimum value among the window addresses of multiple sliding windows triggered by valid data. Furthermore, the pixel position and pixel value of the parsed valid data can be recorded in their respective dedicated registers for subsequent processing.
[0068] For example, suppose the sliding window operator is a convolution operator with a window size of N×N. Figure 4 This diagram illustrates the update state of a sparse data scheduler, as shown below. Figure 4 As shown, in the update state, the event value e (i.e. the pixel value of the currently received valid data) obtained by parsing valid data can be recorded in register 01, and the event address (i.e. the pixel position of the currently received valid data) can be recorded in register 02. Then, it is converted into a window address (i.e., the window address of all sliding windows triggered by valid data is calculated) and allocated to N rows of waiting queues (i.e., N layers of waiting queues). At the same time, the minimum value of the window address of all sliding windows triggered by valid data can be recorded in register 03.
[0069] After completing the processing in the update state, the sparse data scheduler 11 transitions to a comparison state. In the comparison state, the first window address of each multi-level waiting queue is read, and the global minimum window address (i.e., the global minimum value of window addresses in the multi-level waiting queues) is determined through comparison logic. Specifically, determining the global minimum value of window addresses in the multi-level waiting queues includes: reading the first window address of each waiting queue from the multi-level waiting queues; comparing the first window addresses of each waiting queue to obtain the minimum value among the first window addresses of the multi-level waiting queues; and using the minimum value among the first window addresses of the multi-level waiting queues as the global minimum value. As mentioned above, the first window address of each waiting queue is the minimum value of the stored window addresses in that waiting queue. Therefore, by comparing the first window addresses of each waiting queue, the global minimum value can be obtained. This global minimum value is then compared with the minimum value recorded in register 03 (i.e., the minimum value among the window addresses of multiple sliding windows covering the pixel values of the valid data), that is, determining the size of the global minimum value and the recorded minimum value.
[0070] For example, Figure 5 A schematic diagram of the comparison states of a sparse data scheduler is shown, such as... Figure 5 As shown, in comparison mode, comparator 04 can compare the first window address in each waiting queue to obtain the minimum value (i.e., the global minimum value) among the first window addresses in the multiple waiting queues. Then, comparator 05 compares the global minimum value output by comparator 04 with the minimum value recorded in register 03. The address comparison logic in the sparse data scheduler (i.e., the comparison logic implemented by comparator 04 and comparator 05) can complete the minimum value comparison of N addresses in a single clock cycle.
[0071] If the global minimum value is less than the minimum value recorded in register 01, it means that all pixel values within the sliding window corresponding to the global minimum value have been determined. At this time, the sparse data scheduler 11 enters the reading state. Based on the global minimum value (the address of the global minimum window) and the window size, it can calculate the multiple target pixel positions involved in the sliding window corresponding to the global minimum value. Then, based on these multiple target pixel positions, it accesses the shift hash table in parallel to obtain multiple pixel values within the sliding window (that is, it obtains the ready-to-process window data). As mentioned above, the multi-row shift hash table 112 can perform hash mapping according to the row number and column number of the pixel position, thereby returning multiple pixel values in a single clock cycle. For pixel positions with no valid data, it returns a zero value. Therefore, specifically, the above-mentioned method of accessing a multi-row shift hash table based on multiple target pixel positions to obtain the window data to be processed can include: obtaining the storage location corresponding to each target pixel position by hash mapping the row number and column number corresponding to each target pixel position; the multi-row shift hash table outputting multiple pixel values corresponding to multiple target pixel positions in parallel within a unit clock cycle based on the storage location corresponding to each target pixel position; wherein, for target pixel positions without valid data (i.e., no pixel value stored), the output pixel value is zero. In this way, parallel access to pixel data of multiple target pixel positions can be achieved, while reducing data access latency and improving the throughput of the device in processing sparse data.
[0072] For example, suppose the sliding window operator is a convolution operator with a window size of N×N. Figure 6 A schematic diagram of the read state of a sparse data scheduler is shown, such as... Figure 6 As shown, the global minimum value output by comparator 04 is expanded by the window address expansion module, that is, the N values involved in the sliding window corresponding to the global minimum value are determined. 2 N target pixel locations (i.e., N 2 The address), register 06 records the smallest row number (i.e., the row number of the pixel with the smallest pixel position in the N-row shift hash table) of the pixel values already stored in the N-row shift hash table. The comparison module compares the output of the window address extension module with the N... 2The difference between the row number corresponding to each target pixel position and the smallest row number recorded in the N-row shifted hash table in register 06 is used to perform hash matching on the corresponding row in the N-row shifted hash table. This means finding the relative position of each target pixel position's row with respect to the smallest row number in the N-row shifted hash table. This allows determining the row of the pixel value to be read in the hash table based on the row number corresponding to the target pixel position. Then, by performing a hash mapping on the column number corresponding to the target pixel position (e.g., taking the column number modulo), the column of the pixel value to be read in the hash table (i.e., the hash index) is obtained. This allows locating the storage location of the pixel value to be read in the N-row shifted hash table, and thus enabling the N... 2 N corresponding to each target pixel position 2 The N pixel values are read in parallel. For target pixel positions with no pixel value, a zero value can be directly output. 2 The pixel value, which corresponds to the global minimum value mentioned above, is the window data within the sliding window. The signal output by comparator 05 and the signal output by the read comparison module can drive the read enable module to read the pixel value from the storage location located in the N-row shifted hash table. After obtaining the complete window data, the window data (i.e., N...) can be... 2 The pixel values are input to the operator executor 12 to perform the sliding window operator calculation and generate the calculation result. At the same time, the global minimum window address (that is, the global minimum value output by comparator 04) is popped from the N-level waiting queue. The sparse data scheduler 11 returns to the above comparison state to continue processing the next window, that is, to determine the global minimum value in the N-level waiting queue again and compare it with the minimum value recorded in register 03.
[0073] If the global minimum value is greater than or equal to the minimum value recorded in register 01, it indicates that there are still undetermined pixel values in the sliding window and subsequent windows corresponding to this global minimum value. In this case, the sparse data scheduler 11 enters the write state to write the pixel value and pixel position of the currently received valid data into the multi-row shift hash table 112. The multi-row shift hash table 112 can select the corresponding hash table row according to the row number of the pixel position, perform hash mapping according to the column number to obtain the storage location of the pixel value in the hash table, and if the storage location already contains data, it is overwritten and updated. After the write is completed, the sparse data scheduler 11 returns to the update state to wait for the arrival of the next valid data. Therefore, the above-mentioned writing of the pixel value of the valid data to the multi-row shift hash table based on the pixel position of the currently received valid data includes:
[0074] By performing a hash mapping on the row and column numbers corresponding to the pixel positions of the valid data, the storage location of the valid data in the multi-row shift hash table can be obtained;
[0075] Based on the storage location of the valid data in the multi-row shift hash table, the pixel value of the valid data is written to the multi-row shift hash table; wherein, if the storage location of the valid data in the multi-row shift hash table already contains data, the pixel value of the valid data is used to overwrite the data already stored at that storage location.
[0076] As mentioned above, a pre-defined hash mapping function can be used to perform hash mapping on the row and column numbers corresponding to the pixel positions of valid data (e.g., row numbers correspond to hash table row numbers, and column numbers are moduloed to correspond to hash indexes). Furthermore, the multi-row shift hash table has a collision handling mechanism; when multiple pixels are mapped to the same hash table position, a latest data overwrite strategy is used to ensure that the hash table stores the latest pixel value.
[0077] As described above, the multi-row shift hash table 112 can have shift control logic (shift mechanism) to achieve efficient reuse of hash table rows. The shift mechanism of the shift hash table can be implemented based on the row number of the current data (i.e., the currently received valid data). When processing data with row number y, all pixel data with row numbers less than y minus N can be released from the hash table automatically via a shift register, where N is the number of rows in the multi-row shift hash table. Therefore, when the pixel value of newly arrived valid data needs to be written to the multi-row shift hash table, it is first determined whether a shift operation is required. If so, the corresponding shift is performed, and then the storage location is calculated according to the hash mapping function and written. When reading window data, the storage location is also calculated using the hash mapping function, and multiple pixel values are read in parallel. Therefore, the processing in the write state also includes:
[0078] Determine the row number difference between the row number corresponding to the pixel position of the currently received valid data and the smallest row number corresponding to the pixel value already stored in the multi-row shift hash table;
[0079] If the difference in row numbers exceeds the total number of rows in the multi-row shift hash table, the multi-row shift hash table performs a shift operation corresponding to the difference in row numbers. Each shift operation is used to remove the oldest pixel value stored in the multi-row shift hash table and update the smallest row number corresponding to the pixel value already stored in the multi-row shift hash table.
[0080] After completing at least one shift operation corresponding to the row number difference, the pixel value of the valid data based on the currently received valid data is written to the multi-row shift hash table.
[0081] For example, Figure 7 The diagram illustrates the write state of a sparse data scheduler, as shown below. Figure 7As shown, the write comparison module compares the event address (i.e., the pixel position of the currently received valid data) recorded in register 02 with the minimum row number (i.e., the pointer to the minimum row number of the N-row shifted hash table) recorded in register 06 to determine whether the difference in row numbers exceeds the number of rows N in the hash table. If the event address is greater than the minimum row number pointer and the difference in row numbers between the two exceeds the number of rows in the hash table, the write enable module is driven to perform a shift operation. At the same time, the minimum row number in the N-row shifted hash table recorded in register 06 is updated. After the shift operation is completed, the event address (i.e., the row number and column number corresponding to the pixel position) recorded in register 02 is hashed using a hash mapping function. After obtaining the storage location, the event value (i.e., the pixel value of the currently received valid data) recorded in register 01 is written into the N-row shifted hash table. The inverted output signal of comparator 05 and the output signal of the write comparison module can jointly drive the write enable module to complete the shift and write operations.
[0082] As described above, the multi-row shift hash table is implemented using shift registers. Each row corresponds to a shift register group. When new data arrives, the number of shift operations required is determined by the row number difference. Each shift pushes out the oldest pixel value, inserts the new pixel value, and simultaneously updates the smallest row number corresponding to the pixel value already stored in the multi-row shift hash table recorded in the register. After completing the shift operation corresponding to the row number difference, the storage location can be calculated based on the hash mapping function and written. This approach can adapt to the sparsity of different sparse data, ensuring timely updates of data storage in both sparse and dense scenarios. It also helps decouple the spatiotemporal order of input data from the processing order of the sliding window, thereby solving the problem of input / output stream mismatch.
[0083] Based on the efficient streaming processing device proposed in the above embodiments of this disclosure, this disclosure also provides an efficient streaming processing architecture for spatiotemporally ordered sparse data streams, including: multiple cascaded processing devices, the efficient streaming processing device described above; wherein, the result of the sliding window operator generated by any level of processing device is used as the valid data input to the next level of processing device. For example, assuming the sliding window operator is a convolution operator, Figure 8 A schematic diagram of an efficient streaming processing architecture is shown, such as... Figure 8 As shown, multiple processing devices (each including a sparse data scheduler and a convolution operator executor) can be cascaded to construct a multi-level algorithm processing pipeline. The output of each processing device can be used as the input of the next processing device, and the output data of each sparse data scheduler still satisfies the spatiotemporal order characteristics. It should be understood that those skilled in the art can set the number of processing devices in the processing architecture according to actual needs, and this disclosure does not limit this.
[0084] The processing apparatus and architecture proposed in this disclosure offer several significant advantages. First, by decoupling window addresses and designing a multi-layered waiting queue, the problem of input-output order mismatch in sparse data stream processing is successfully solved. This allows traditional sliding window operators to be directly applied to spatiotemporally ordered sparse data streams without algorithm reconstruction, greatly improving the versatility and reusability of the technical solution. Second, the shifted hash table design enables parallel memory access within a single clock cycle. Compared to traditional discontinuous memory access methods, memory access latency is reduced by more than an order of magnitude, becoming a key factor in improving overall system performance. Furthermore, the processing apparatus and architecture completely preserve the sparsity of the data. By skipping the calculation of zero-value windows and the storage of zero-value pixels, near-linear computational speedup can be achieved in high-sparseness scenarios. Actual tests show that throughput in simple shape scenarios can reach more than five times that of the original frame processing method. In addition, due to the adoption of a streaming processing architecture, data only needs to pass through a pipeline of fixed depth from input to output, and end-to-end latency is controlled at the microsecond level, making it very suitable for applications with strict real-time requirements.
[0085] From a resource consumption perspective, the aforementioned processing device and architecture have relatively low hardware resource requirements. The three-layer algorithm architecture, implemented on a Field-Programmable Gate Array (FPGA), requires less than 100,000 lookup tables and tens of thousands of triggers, which is well within the resource capacity of edge computing devices, eliminating the risk of resource overload. More importantly, the aforementioned processing device and architecture have excellent scalability. Larger window operators can be supported by increasing the number of waiting queue layers and shift hash table rows. Multi-level algorithm processing pipelines can be constructed by cascading multiple sparse data schedulers, and the system's processing throughput can be further improved through static or dynamic parallel deployment.
[0086] The core technological innovation of the processing device and architecture proposed in this disclosure lies in the proposed sparse data stream processing architecture that combines a multi-layered waiting first-in-first-out queue with a shifted hash table. This architecture decouples input and output order by using a window address as an intermediate medium, and achieves low-latency sparse data storage and access through a hash table structure based on a shift register. The collaborative operation of these two technologies enables efficient streaming processing of spatiotemporally ordered sparse data streams. This innovation overcomes the performance bottleneck of existing technologies in sparse data processing, providing a novel technical path for efficient processing of sparse data streams.
[0087] Key technical features include: a data-to-window address mapping mechanism that calculates all associated window addresses triggered by each data based on the size and shape of the sliding window operator; a multi-level waiting queue address maintenance mechanism that distributes window addresses to different queues according to dimensions and maintains the order within each queue; a window output determination mechanism based on minimum address comparison that determines ready-to-process windows by comparing the first addresses of each queue; a hash table storage mechanism based on shift registers that automatically releases historical data through shift operations corresponding to row numbers; and a single-clock-cycle parallel memory access mechanism that enables parallel reading of multiple pixel values through hash mapping.
[0088] All implementations of the sparse data stream processing architecture based on the collaboration of multi-layer waiting queues and shifted hash tables, including but not limited to implementations on various hardware platforms such as Application Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), and Graphics Processing Units (GPUs), can be applied to various sparse data stream processing scenarios such as neuromorphic sensors, event cameras, LiDAR, and network traffic monitoring. Various variants can be implemented using different numbers of queues, different hash table structures, and different window operator types.
[0089] Based on the efficient streaming processing apparatus proposed in the above embodiments of this disclosure, this disclosure also provides an efficient streaming processing method for spatiotemporally ordered sparse data streams. This method is applied to a sparse data scheduler; the sparse data scheduler is provided with multi-layer waiting queues and multi-row shift hash tables.
[0090] The multi-level waiting queue is used to store and maintain in an orderly manner the window addresses triggered by the input data, which includes any valid data in any frame of sparse data in the sparse data stream, and the window address includes the window addresses of multiple sliding windows covering the pixel values of the valid data.
[0091] The multi-row shift hash table is used to store the pixel values of valid data in sparse data and supports parallel reading of window data in a single clock cycle; the window data is the multiple pixel values of any frame of sparse data in the sparse data stream within any sliding window.
[0092] The processing method includes:
[0093] The sparse data scheduler determines the window data to be processed and inputs it to the operator executor based on the window addresses stored in the multi-level waiting queue and the pixel values of the valid data stored in the multi-row shift hash table, so as to realize streaming processing of the sparse data stream; wherein, the operator executor is used to perform the operation of the specified sliding window operator on the window data.
[0094] In one possible implementation, the sparse data scheduler has the following processing states: update state, comparison state, read state, and write state; wherein, determining the window data to be processed and inputting it to the operator executor based on the window addresses stored in the multi-level waiting queue and the pixel values of the valid data stored in the multi-row shift hash table includes: the sparse data scheduler is initially in the update state, and the processing in the update state includes: when receiving any valid data in any frame of sparse data in the sparse data stream, parsing the pixel position and pixel value of the valid data in that frame of sparse data, determining the window addresses of multiple sliding windows covering the pixel values of the valid data based on the pixel position of the valid data and the window size of the sliding window operator, and allocating the window addresses of the multiple sliding windows to the multi-level waiting queue according to the row dimension, and recording the minimum value among the window addresses of the multiple sliding windows; after completing the processing in the update state, the sparse data scheduler... The sparse data scheduler enters a comparison state. The processing in this state includes: determining the global minimum value of the window address in the multi-level waiting queue and comparing it with the recorded minimum value; if the global minimum value is less than the recorded minimum value, the sparse data scheduler enters a read state; the processing in this read state includes: determining multiple target pixel positions involved in the sliding window corresponding to the global minimum value, accessing the multi-row shift hash table based on the multiple target pixel positions to obtain the window data to be processed and inputting the window data to the operator executor, removing the global minimum value from the multi-level waiting queue, and returning to the comparison state; if the global minimum value is greater than or equal to the recorded minimum value, the sparse data scheduler enters a write state; the processing in this write state includes: writing the pixel value of the currently received valid data to the multi-row shift hash table based on the pixel position of the valid data, and returning to the update state.
[0095] In one possible implementation, the window address is a pixel position at a specified location within the sliding window, the specified location including the window center of the sliding window; the number of layers in the multi-level waiting queue and the number of rows in the multi-row shift hash table are equal to the maximum window size of the sliding window operator; each layer of the multi-level waiting queue is used to store the window addresses of sliding windows in the corresponding row of the multiple sliding windows, the window addresses stored in each layer of the waiting queue are arranged in ascending order, and the first window address stored in each layer of the waiting queue is the minimum value among the window addresses cached in that layer of the waiting queue.
[0096] In one possible implementation, determining the global minimum value of the window addresses in the multi-level waiting queue includes: reading the first and first window addresses in each level of the multi-level waiting queue, obtaining the minimum value among the first and first window addresses in the multi-level waiting queue by comparing the first and first window addresses in each level of the waiting queue, and taking the minimum value among the first and first window addresses in the multi-level waiting queue as the global minimum value.
[0097] In one possible implementation, the step of accessing the multi-row shift hash table based on the multiple target pixel positions to obtain the window data to be processed includes: obtaining the storage location corresponding to each target pixel position by hash mapping the row number and column number corresponding to each target pixel position; the multi-row shift hash table outputs multiple pixel values corresponding to the multiple target pixel positions in parallel within a unit clock cycle based on the storage location corresponding to each target pixel position; wherein, for target pixel positions without valid data, the output pixel value is zero.
[0098] In one possible implementation, writing the pixel value of the valid data to the multi-row shift hash table based on the pixel position of the currently received valid data includes: obtaining the storage position of the valid data in the multi-row shift hash table by hash mapping the row number and column number corresponding to the pixel position of the valid data; writing the pixel value of the valid data to the multi-row shift hash table based on the storage position of the valid data in the multi-row shift hash table; wherein, if the storage position of the valid data in the multi-row shift hash table already contains data, the pixel value of the valid data is used to overwrite the data already stored at that storage position.
[0099] In one possible implementation, the processing in the write state further includes: determining the row number difference between the row number corresponding to the pixel position of the currently received valid data and the smallest row number corresponding to the pixel value already stored in the multi-row shift hash table; if the row number difference exceeds the total number of rows in the multi-row shift hash table, the multi-row shift hash table performs a shift operation corresponding to the row number difference, wherein each shift operation is used to remove the oldest pixel value stored in the multi-row shift hash table and update the smallest row number corresponding to the pixel value already stored in the multi-row shift hash table; after completing at least one shift operation corresponding to the row number difference, the pixel value of the valid data is written to the multi-row shift hash table based on the pixel position of the currently received valid data.
[0100] In one possible implementation, the sparse data scheduler uses a finite state machine to manage the state of the update state, the comparison state, the read state, and the write state; the multi-row shift hash table is implemented using shift registers, with each row in the multi-row shift hash table corresponding to a set of shift registers.
[0101] This disclosure also provides a processor, including: the processing apparatus as described, or including the high-efficiency streaming processing architecture as described, or performing the processing method as described.
[0102] In practical applications, the aforementioned processor can be implemented on hardware platforms including but not limited to one or more application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), graphics processing units (GPUs), digital signal processors (DSPs), programmable logic devices (PLDs), general-purpose processors, controllers, microcontrollers, microprocessors, etc., and the embodiments disclosed herein do not impose any limitations on this.
[0103] The technical solutions described in this paper are not only applicable to the processing scenarios of neuromorphic vision sensors, but can also be extended to multiple fields such as lidar point cloud processing, sparse sampling reconstruction of medical images, abnormal network traffic detection, high-frequency data analysis in finance, and IoT sensor data processing. As long as the data stream has spatiotemporal order and sparsity characteristics, it can be efficiently processed using the processing device, processing architecture, processing method, and processor proposed in this paper. This provides a broad market space and application prospects for the industrial application of the technology.
[0104] It should be noted that the block diagrams and schematic diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, apparatuses, and architectures according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those marked in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram and / or flowchart, and combinations of blocks in block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0105] The various embodiments of this disclosure have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical application, or technical improvements to the embodiments in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.
Claims
1. A high-efficiency streaming processing device for spatiotemporally ordered sparse data streams, characterized in that, include: A sparse data scheduler and an operator executor; the operator executor is used to perform operations on window data using specified sliding window operators; The window data consists of multiple pixel values of any frame of sparse data in a sparse data stream within any sliding window. The sparse data scheduler is equipped with multiple layers of waiting queues and multiple rows of shift hash tables. The multi-level waiting queue is used to store and maintain the window address triggered by the input data in an orderly manner. The input data includes any valid data in any frame of sparse data in the sparse data stream. The window address includes the window address of multiple sliding windows covering the pixel values of the valid data. The multi-row shift hash table is used to store the pixel values of valid data in sparse data and supports parallel reading of window data in a single clock cycle. The sparse data scheduler is used to determine the window data to be processed and input it to the operator executor based on the window address stored in the multi-level waiting queue and the pixel value of the valid data stored in the multi-row shift hash table, so as to realize streaming processing of the sparse data stream.
2. The apparatus according to claim 1, characterized in that, The sparse data scheduler has the following processing states: update state, comparison state, read state, and write state; The step of determining the window data to be processed and inputting it to the operator executor based on the window addresses stored in the multi-level waiting queue and the pixel values of the valid data stored in the multi-row shift hash table includes: The sparse data scheduler is initially in the update state. The processing in the update state includes: when receiving any valid data in any frame of sparse data in the sparse data stream, parsing the pixel position and pixel value of the valid data in the frame of sparse data, determining the window addresses of multiple sliding windows covering the pixel value of the valid data based on the pixel position of the valid data and the window size of the sliding window operator, allocating the window addresses of the multiple sliding windows to the multi-level waiting queue according to the row dimension, and recording the minimum value among the window addresses of the multiple sliding windows. After completing the processing in the update state, the sparse data scheduler enters the comparison state. The processing in the comparison state includes: determining the global minimum value of the window address in the multi-level waiting queue, and comparing the global minimum value with the recorded minimum value. When the global minimum is less than the recorded minimum, the sparse data scheduler enters a read state. The processing in the read state includes: determining the multiple target pixel positions involved in the sliding window corresponding to the global minimum, accessing the multi-row shift hash table based on the multiple target pixel positions, obtaining the window data to be calculated and inputting the window data into the operator executor, removing the global minimum from the multi-level waiting queue, and returning to the comparison state. When the global minimum value is greater than or equal to the recorded minimum value, the sparse data scheduler enters the write state; the processing in the write state includes: based on the pixel position of the currently received valid data, writing the pixel value of the valid data to the multi-row shift hash table, and returning to the update state.
3. The apparatus according to claim 2, characterized in that, The window address is the pixel position at a specified location within the sliding window, and the specified location includes the center of the sliding window; The number of layers in the multi-layer waiting queue and the number of rows in the multi-row shift hash table are equal to the maximum window size of the sliding window operator; In the multi-level waiting queue, each waiting queue is used to store the window addresses of the sliding windows in the same row of the multiple sliding windows. The window addresses stored in each waiting queue are arranged in ascending order, and the first window address stored in each waiting queue is the minimum value among the window addresses cached in that waiting queue.
4. The apparatus according to claim 3, characterized in that, Determining the global minimum value of the window address in the multi-level waiting queue includes: The first window address of each waiting queue is read from the multi-level waiting queue. The minimum value of the first window address in each waiting queue is obtained by comparing the first window addresses in each waiting queue. The minimum value of the first window address in the multi-level waiting queue is taken as the global minimum value.
5. The apparatus according to claim 2, characterized in that, The step of accessing the multi-row shift hash table based on the multiple target pixel positions to obtain the window data to be processed includes: By performing a hash mapping on the row number and column number corresponding to each of the multiple target pixel positions, the storage location corresponding to each target pixel position is obtained; The multi-row shift hash table outputs multiple pixel values corresponding to the multiple target pixel positions in parallel within a unit clock cycle, based on the storage location corresponding to each target pixel position; wherein, for target pixel positions without valid data, the output pixel value is zero.
6. The apparatus according to claim 2, characterized in that, The step of writing the pixel value of the valid data to the multi-row shift hash table based on the pixel position of the currently received valid data includes: The storage location of the valid data in the multi-row shift hash table is obtained by hash mapping the row number and column number corresponding to the pixel position of the valid data. Based on the storage location of the valid data in the multi-row shift hash table, the pixel value of the valid data is written to the multi-row shift hash table; wherein, if the storage location of the valid data in the multi-row shift hash table already contains data, the pixel value of the valid data is used to overwrite the data already stored at that storage location.
7. The apparatus according to claim 2 or 6, characterized in that, The processing in the write state also includes: Determine the row number difference between the row number corresponding to the pixel position of the currently received valid data and the smallest row number corresponding to the pixel value already stored in the multi-row shift hash table; If the row number difference exceeds the total number of rows in the multi-row shift hash table, the multi-row shift hash table performs a shift operation corresponding to the row number difference. Each shift operation is used to remove the oldest pixel value stored in the multi-row shift hash table and update the smallest row number corresponding to the pixel value already stored in the multi-row shift hash table. After completing at least one shift operation corresponding to the row number difference, the pixel value of the valid data based on the currently received valid data is written to the multi-row shift hash table.
8. The apparatus according to claim 2, characterized in that, The sparse data scheduler uses a finite state machine to manage the state of the update state, the comparison state, the read state, and the write state. The multi-row shift hash table is implemented using shift registers, and each row in the multi-row shift hash table corresponds to a set of shift registers.
9. A high-efficiency streaming processing architecture for spatiotemporally ordered sparse data streams, characterized in that, include: Multiple cascaded processing devices, wherein the processing devices are high-efficiency streaming processing devices as described in any one of claims 1 to 8; wherein the operation result of the sliding window operator generated by any level processing device is used as valid data input to the next level processing device.
10. An efficient streaming processing method for spatiotemporally ordered sparse data streams, characterized in that, Applied to sparse data schedulers; The sparse data scheduler contains multiple layers of waiting queues and multiple rows of shift hash tables. The multi-level waiting queue is used to store and maintain in an orderly manner the window addresses triggered by the input data, which includes any valid data in any frame of sparse data in the sparse data stream, and the window address includes the window addresses of multiple sliding windows covering the pixel values of the valid data. The multi-row shift hash table is used to store the pixel values of valid data in sparse data and supports parallel reading of window data in a single clock cycle. The window data is the multiple pixel values of any frame of sparse data in a sparse data stream within any sliding window. The processing method includes: The sparse data scheduler determines the window data to be processed and inputs it to the operator executor based on the window addresses stored in the multi-level waiting queue and the pixel values of the valid data stored in the multi-row shift hash table, so as to realize streaming processing of the sparse data stream; wherein, the operator executor is used to perform the operation of the specified sliding window operator on the window data.
11. A processor, characterized in that, include: The processing apparatus as described in any one of claims 1 to 8, or comprising the high-efficiency streaming processing architecture as described in claim 9, or performing the processing method as described in claim 10.