Multi-channel data processing method, reality display apparatus, device, medium, and product

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By performing parallel network inference on the preset row data output by the image signal processor, the fragmentation process is eliminated, and a preset strategy is used to perform network inference on each data fragment. This solves the problem of uneven system resource utilization, achieves high frame rate and high resolution image processing, and reduces hardware cost and power consumption.

WO2026138838A1PCT designated stage Publication Date: 2026-07-02GRAVITYXR ELECTRONICS & TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: GRAVITYXR ELECTRONICS & TECH CO LTD
Filing Date: 2025-12-23
Publication Date: 2026-07-02

Application Information

Patent Timeline

23 Dec 2025

Application

02 Jul 2026

Publication

WO2026138838A1

IPC: G06T1/20

AI Tagging

Technology Topics

High frame rateNetwork processing unit

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Dmd high frame rate display system and method with matched photoelectric integration time
CN116471392BHigh frame rateProjection screen
An image recognition-based abalone feed intake monitoring system
CN122391988AStreaming data Animal science
Blind guiding chair data dithering processing method and device, computing device and storage medium
CN122313064AHigh frame ratePoint cloud
A High-Frame-Rate Ultrasonic Plane Wave Tensor Completion Method Based on T-SVD and Angle Regularization
CN122089596Areduce collectionincrease frame rateImage enhancement Ultrasonic/sonic/infrasonic diagnostics Complete data Computation complexity
Multimodal acquisition board
CN310062341SHigh frame rateMillimetre wave

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN2025144952_02072026_PF_FP_ABST

Patent Text Reader

Abstract

The present application provides a multi-channel data processing method, a reality display apparatus, a device, a medium, and a product. The method comprises: acquiring predefined line data output by at least two image signal processor channels; performing, in parallel, network inference on the predefined line data output by the at least two image signal processor channels, thereby eliminating the need to perform tiling processing on a full image to improve image processing accuracy, and reducing overall system computational power and data cache space requirements; and partitioning the predefined line data output by at least a portion of the image signal processor channels into a plurality of tiles, performing network inference on each tile of data according to a predefined policy, wherein the predefined policy involves a mobile terminal moving from a first end toward an opposite second end, and returning to the first end upon reaching the second end, until tile data inference is completed, thereby reducing neural processing unit memory overhead, improving the ability of a single neural network processing unit to process a plurality of image data channels in parallel, allowing for high frame rates and high resolutions, and reducing network inference latency.

Need to check novelty before this filing date? Find Prior Art

Description

Multi-channel Data Processing Method, Reality Display Device, Equipment, Medium and Product

[0001] This application claims the priority of the Chinese patent application submitted to the Chinese Patent Office on December 23, 2024, with the application number 202411909698.9 and the application title "Multi-channel Data Processing Method, Reality Display Device, Equipment, Medium and Product", the entire content of which is incorporated herein by reference. Technical Field

[0002] This application relates to data processing technologies, and particularly to a multi-channel data processing method, a reality display device, equipment, a medium and a product. Background Art

[0003] In devices such as mobile phones, cameras, and surveillance cameras, usually a separate neural processing unit (Neural Processing Unit, NPU) is configured for each image signal processor (Image Signal Processor, ISP) to perform network inference.

[0004] However, in extended reality display devices such as augmented reality display (Augmented Reality, AR), virtual reality display (Virtual Reality, VR), and mixed reality display (Mixed Reality, MR), complex image and video data need to be processed, and real-time environmental perception and user interaction are required. Configuring a separate neural processing unit for each image signal processor cannot flexibly and fully utilize the overall computing power of the system, resulting in waste of the overall computing power of the system and unbalanced resource utilization, increasing network inference latency, and unable to meet the low-latency requirements of extended reality display devices.

[0005] In addition, after the image signal processor receives the entire image data with a resolution of H×W collected by the camera module, usually the entire image needs to be sliced (Tile) by the image signal processor, divided into multiple sliced images of N×M (N < H, M < W) according to the data processing capabilities of the network processing unit, stored in the buffer, and then the neural processing unit reads each sliced image from the buffer one by one for network inference.

[0006] This requires preprocessing of the image slices before network inference to meet the requirements of the neural network processing unit, and postprocessing of the image slices after network inference to ensure that the size of the whole image is consistent before and after network processing. Preprocessing can include padding operations on the image slices in four directions (top, bottom, left, and right). Postprocessing can include cropping the effective image data of the slices after network inference to ensure that the size of the image slices is consistent before network inference, and can also include stitching the image slices together to ensure that the size of the whole image is consistent before and after network inference. This introduces additional operations and intermediate data such as tile creation, zero-padding, windowing, and cropping of overlapping images, increasing the overall computing power requirements of the system and affecting the image processing effect. Specifically, these operations are hardened into the image signal processor, increasing the area, power consumption, and cost of the image signal processor. Moreover, because of the preprocessing operations such as padding of the image slices, the padding area also needs to be calculated by the neural network processing unit, increasing the computing resources of the neural network processing unit, increasing its computing power and power consumption.

[0007] On the other hand, it also requires the system to have a larger data cache space, which greatly increases the area, power consumption and cost of the image signal processor, thus severely limiting the development and application of existing neural network processing units to parallel processing of multiple image data. Summary of the Invention

[0008] This application provides a multi-channel data processing method, a display device, an apparatus, a medium, and a product to reduce the overall computing power of the system and improve the ability of the neural network processing unit to process multi-channel data in parallel.

[0009] Firstly, this application provides a multi-channel data processing method, including:

[0010] Acquire preset row data from at least two image signal processors;

[0011] Network inference is performed in parallel on the preset rows of data output by each of the image signal processors.

[0012] In this process, at least some of the preset row data output by the image signal processor is divided into multiple data slices. Each data slice is subjected to network inference according to a preset strategy. The preset strategy is that the movement path starts from the first end of the data slice, moves to the opposite second end, and returns to the first end after reaching the second end, until the data slice inference is completed.

[0013] In some embodiments, performing network inference on each piece of data according to a preset strategy includes:

[0014] The convolution calculation is performed on each piece of data according to a preset strategy based on at least some of the convolutional layers until the calculation result meets the preset requirements.

[0015] Perform deconvolution calculations on the convolution results of each data piece based on the remaining convolutional layers until the number of rows in the deconvolution calculation result of each data piece is the same as the number of rows before the convolution calculation.

[0016] In this process, the calculation result of the previous convolutional layer is used as the input data for the next convolutional layer in two adjacent convolutional layers.

[0017] In some embodiments, the method for determining the number of rows and columns in each piece of data includes:

[0018] The required number of rows and output shape for each convolutional layer are determined based on the kernel, stride, and padding of each convolutional layer.

[0019] When the storage space occupied by the minimum output shape is less than the available storage space of the neural network processing unit, it is determined whether the number of rows of the convolutional layer corresponding to the minimum output shape meets the preset delay requirement.

[0020] If so, determine the number of rows and columns of each data piece based on the number of rows in the convolutional layer corresponding to the smallest output shape;

[0021] If not, determine whether the storage space occupied by the second smallest output shape is less than the available storage space of the neural network processing unit, until the number of rows and columns of each piece of data is determined based on the output shape.

[0022] In some embodiments, determining the required number of rows for each convolutional layer based on the kernel, stride, and padding of each convolutional layer includes:

[0023] Calculate a first difference between the kernel and padding of the current convolutional layer, and a first product between the first difference and the stride of the current convolutional layer;

[0024] The number of rows in the current convolutional layer is determined based on the first product and the number of rows in the previous convolutional layer.

[0025] The number of rows in the first convolutional layer is the difference between the number of rows in the first convolutional layer and the number of padding cells in the first convolutional layer.

[0026] In some embodiments, determining the output shape of each convolutional layer based on its kernel, stride, and padding includes:

[0027] The height of the slice data is calculated as a second difference between the height of the slice data and the convolution kernel of the first convolutional layer, a second product of the padding number of the first convolutional layer and a first preset value, and a first sum of the second difference and the second product. The height of the output data of the first convolutional layer is determined based on half of the first sum and the second preset value.

[0028] The width of the slice data is calculated, and the third difference between the width of the slice data and the convolution kernel of the first convolutional layer is calculated. The second sum of the third difference and the second product is calculated. The width of the output data of the first convolutional layer is determined based on half of the second sum and the second preset value.

[0029] Calculate the fourth difference between the height of the output data of the previous convolutional layer and the convolutional kernel of the current convolutional layer, the third product of the padding number of the current convolutional layer and the first preset value, and the third sum of the fourth difference and the third product. Determine the height of the output data of the current convolutional layer based on half of the third sum and the second preset value.

[0030] Calculate the fifth difference between the width of the output data of the previous convolutional layer and the convolutional kernel of the current convolutional layer, and the fourth sum of the fifth difference and the third product. Determine the width of the output data of the current convolutional layer based on half of the fourth sum and the second preset value.

[0031] The output shape of each convolutional layer is determined based on the number of convolutional kernels, the width of the output data, and the height of the output data.

[0032] In some embodiments, the method further includes:

[0033] A neural network model is pre-compiled offline according to the size of the entire image. The neural network model is used to perform network processing on the preset row data output by the image signal processor.

[0034] In a second aspect, this application provides an extended reality display device, the device comprising: a neural network processing unit, at least two image signal processors, and a buffer, wherein the buffer is connected to the neural network processing unit and each of the image signal processors;

[0035] Each of the image signal processors is used to write each line of data sequentially into the buffer;

[0036] The neural network processing unit includes at least two cores, each core being used to acquire a preset row of data output by the image signal processor from the buffer, and to perform network inference on the acquired preset row of data in parallel.

[0037] In this process, at least some of the cores will divide the acquired preset row data into multiple data pieces, and perform network inference on each data piece according to a preset strategy. The preset strategy is that the movement path starts from the first end of the data piece, moves to the corresponding second end, and returns to the first end after reaching the second end, until the data piece is inferred.

[0038] In some embodiments, the number of buffers is at least two, and each buffer is connected to one of the image signal processors;

[0039] Each of the image signal processors is used to write each line of data sequentially into the corresponding buffer; each of the cores is used to obtain a preset line of data output by the image signal processor from the corresponding buffer;

[0040] or,

[0041] Each of the image signal processors writes each line of data sequentially into the buffer according to the corresponding address; each core is used to retrieve a preset line of data output by an image signal processor from the buffer according to the corresponding address.

[0042] In some embodiments, the device further includes: at least two image sensors, each image sensor being connected to one of the image signal processors;

[0043] Each of the image sensors is used to transmit the acquired image data line by line to the corresponding image signal processor;

[0044] Each of the image signal processors is used to process each received line of data and then write it into the buffer.

[0045] In some embodiments, each of the cores is further configured to write the processed preset row data into a corresponding buffer;

[0046] Each image signal processor obtains the data processed by the neural network processing unit from the corresponding buffer and performs post-processing on the obtained data.

[0047] In some embodiments, the apparatus further includes:

[0048] Binocular camera;

[0049] The binocular camera is connected to two image signal processors. Each image signal processor acquires the image data collected by the corresponding camera and writes each line of data into the buffer sequentially.

[0050] Thirdly, this application provides an electronic device, including: a memory and a processor;

[0051] The memory is used to store instructions; the processor is used to invoke the instructions in the memory to execute the first aspect and any possible design of the first aspect.

[0052] Fourthly, this application provides a computer-readable storage medium storing computer instructions, which, when executed by at least one processor of an electronic device, cause the electronic device to perform the first aspect and any possible design of the first aspect.

[0053] Fifthly, this application provides a computer program product comprising computer instructions that, when executed by at least one processor of an electronic device, cause the electronic device to perform the first aspect and any possible design of the first aspect.

[0054] The multi-channel data processing method, display device, apparatus, medium, and product provided in this application acquire preset line data output from at least two image signal processors, perform network inference on the preset line data output from at least two image signal processors in parallel, eliminate the need for image segmentation to improve image processing accuracy, and reduce the requirements for overall system computing power and data cache space. Furthermore, at least some of the preset line data output from the image signal processors is divided into multiple slices, and network inference is performed on each slice according to a preset strategy. The preset strategy is that the mobile terminal starts from a first end, moves to a relative second end, and returns to the first end after reaching the second end, until the slice data inference is completed. Therefore, under the same hardware processing technology and cost conditions, the memory overhead of the neural network processing unit is reduced, and the ability of a single neural network processing unit to process multiple image data in parallel is improved. This helps support high frame rates and high resolutions, and thereby flexibly and fully utilizes the overall computing power of the system to reduce network inference latency, meeting the needs of parallel processing of binocular image data in extended display devices. Attached Figure Description

[0055] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0056] Figure 1 is a schematic diagram of the architecture of an extended reality display device provided in an embodiment of this application;

[0057] Figure 2 is a flowchart of a multi-channel data processing method provided in an embodiment of this application;

[0058] Figure 3 is a schematic diagram illustrating the sequence of convolution calculation according to an embodiment of this application;

[0059] Figure 4 is a timing diagram of a convolution calculation provided in an embodiment of this application;

[0060] Figure 5 is a flowchart illustrating a multi-channel data processing method provided in an embodiment of this application;

[0061] Figure 6 is a schematic diagram of the structure of an extended reality display device provided in an embodiment of this application;

[0062] Figure 7 is a schematic diagram of the operation of a virtual reality display device provided in an embodiment of this application;

[0063] Figure 8 is a schematic diagram of the network inference process provided in an embodiment of this application;

[0064] Figure 9 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of this application.

[0065] The accompanying drawings illustrate specific embodiments of this application, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the concept in any way, but rather to illustrate the concept of this application to those skilled in the art through reference to particular embodiments. Detailed Implementation

[0066] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.

[0067] In devices such as mobile phones, cameras, surveillance cameras, drones, augmented reality displays, virtual reality displays, and mixed reality displays, image signal processors (ISPs) and neural processing units (NPUs) work together to achieve high-quality image capture and intelligent analysis.

[0068] The image signal processor processes the raw images acquired by the image sensor to improve image quality. This includes functions such as denoising, depigmentation, color correction, white balance adjustment, and gamma correction. The neural network processing unit performs neural network calculations, enabling the device to quickly perform image recognition, speech recognition, and natural language processing.

[0069] As described in the background section, there is a need for a parallel processing technology for multi-channel image data. This technology can improve the accuracy of image processing by eliminating the need to process the entire image in tiles, and reduce the requirements for the overall computing power and data cache space of the system. Under the same hardware manufacturing process and cost conditions, it can improve the ability of a single network processing unit (NPU) to process multiple channels of image data in parallel, and thereby flexibly and fully utilize the overall computing power of the system to reduce network inference latency, so as to meet the needs of parallel processing of binocular image data in XR devices.

[0070] To address this, this application provides a multi-channel data processing method that performs network inference in parallel on preset row data output by at least two image signal processors. This eliminates the need for image segmentation to improve image processing accuracy and reduces the requirements for overall system computing power and data cache space. Furthermore, at least a portion of the preset row data output by the image signal processors is divided into multiple slices, and network inference is performed on each slice according to a preset strategy. The preset strategy involves the mobile terminal starting from the first end of the slice data, moving towards the corresponding second end, and returning to the first end after reaching the second end, until the slice data inference is completed. This reduces the memory overhead of the neural network processing unit under the same hardware processing technology and cost conditions, enhances the ability of a single neural network processing unit to process multiple image data in parallel, helps support high frame rates and high resolutions, and thereby flexibly and fully utilizes the overall computing power of the system to reduce network inference latency, in order to meet the needs of parallel processing of binocular image data in extended reality display devices.

[0071] The technical solution of this application and how the technical solution of this application solves the above-mentioned technical problems are described in detail below with specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments. The embodiments of this application will be described below with reference to the accompanying drawings.

[0072] Figure 1 shows the architecture diagram of the extended reality display device provided in the embodiments of this application. As shown in Figure 1, the extended reality display device may include a left-eye camera, a right-eye camera, two buffers for two image signal processors, and a dual-core neural network processing unit. Image signal processor 1 is connected to the left-eye camera and buffer 1, and image signal processor 2 is connected to the right-eye camera and buffer 2. Image signal processor 1 can acquire image data collected by the left-eye camera, process the image data, and write it line by line into buffer 1. Image signal processor 2 can acquire image data collected by the right-eye camera, process the image data, and write it line by line into buffer 2. Then, core 1 and core 2 in the neural network processing unit can acquire preset rows of data from buffer 1 and buffer 2 respectively, and perform network inference on the preset rows of data in parallel. Among them, core 1 can slice the acquired preset rows of data into multiple data pieces, and perform network inference on each data piece according to a preset strategy. The preset strategy is that the movement path starts from the first end of the data piece, moves to the opposite second end, and returns to the first end after reaching the second end, until the data piece inference is completed. Subsequently, cores 1 and 2 in the neural network processing unit can write the data after network inference into buffers 1 and 2 respectively. Image signal processors 1 and 2 can obtain the data processed by the neural network processing unit from buffers 1 and 2 respectively, and further process the obtained data, such as black level correction and format conversion.

[0073] Figure 2 shows a flowchart of a multi-channel data processing method provided in an embodiment of this application. As shown in Figure 2, with a neural network processing unit as the execution entity, the multi-channel data processing method provided in this embodiment includes:

[0074] S101. Obtain preset row data output from at least two image signal processors.

[0075] In this embodiment, the neural network processing unit can read preset rows of data for processing, thereby eliminating the need for image segmentation. This reduces the number of preprocessing and post-processing modules in the image signal processor, lowers the hardware configuration, and reduces the area, power consumption, and cost of the image signal processor. Furthermore, eliminating the need for image segmentation also reduces the computational resources of the neural network processing unit, lowering its computing power and power consumption, thus reducing the overall system computing power and improving image processing performance.

[0076] For example, taking one image signal processor as an example, the image signal processor writes each row of data into the buffer in the order of rows. When the preset row is full, the neural network processing unit can read the preset row of data. Specifically, the neural network processing unit can read the data from the buffer row by row in the order of rows and write it into the buffer unit in the neural network processing unit until the preset row of data is read from the buffer.

[0077] S102. Perform network inference in parallel on the preset row data output by each image signal processor, wherein at least some of the preset row data output by the image signal processor is divided into multiple data pieces, and network inference is performed on each data piece according to a preset strategy. The preset strategy is that the movement path starts from the first end of the data piece, moves to the corresponding second end, and returns to the first end after reaching the second end, until the data piece inference is completed.

[0078] In this embodiment, the neural network processing unit includes at least two cores, each core being responsible for calculating the data output by one image signal processor, thereby enabling parallel processing of the data output by at least two image signal processors.

[0079] Furthermore, at least a portion of the image signal output is divided into multiple slices of preset row data. Each slice of data is inferred by the network according to a preset strategy. The preset strategy is that the movement path starts from the first end of the slice data, moves to the corresponding second end, and returns to the first end after reaching the second end, until the slice data inference is completed. This reduces the memory overhead of the neural network processing unit under the same hardware processing technology and cost conditions, improves the ability of a single neural network processing unit to process multiple image data in parallel, helps to support high frame rates and high resolutions, and thereby flexibly and fully utilizes the overall computing power of the system to reduce network inference latency. Specifically, the frame rate of this application can reach 120 frames per second, the resolution can reach 4K, and the latency is about 0.2ms. Conventional technologies generally have a frame rate of 60 frames per second, a 2K resolution, and a latency of more than 16ms.

[0080] For example, the default strategy could also be to perform network inference on the data slices in a left-to-right, top-to-bottom order, forming a zigzag movement path.

[0081] For example, network inference can involve using AI (Artificial Intelligence) models to classify, regress, recognize images, and perform natural language processing on data. AI models include neural networks, decision trees, and support vector machines.

[0082] In some embodiments, convolution calculations are performed on at least a portion of the convolutional layers according to a preset strategy until the calculation results meet preset requirements. Then, deconvolution calculations are performed on the convolutional calculation results of each data segment based on the remaining convolutional layers until the number of rows in the deconvolution calculation result of each data segment is consistent with the number of rows before the convolution calculation, thereby completing the network inference for each data segment to achieve a specific task objective. In adjacent convolutional layers, the calculation result of the previous convolutional layer serves as the input to the next convolutional layer.

[0083] For example, taking two adjacent data sets as an example, as shown in Figure 3, the first convolutional layer performs convolution calculations on one data set in a left-to-right, top-to-bottom order, which is Z-shaped 1 in Figure 3. The convolution calculation result of this data set is output to the second convolutional layer, which continues to perform convolution calculations on the same data set in a left-to-right, top-to-bottom order, which is Z-shaped 2 in Figure 3. Then, the first convolutional layer performs convolution calculations on the other data set in a left-to-right, top-to-bottom order, which is Z-shaped 3 in Figure 3. The convolution calculation result of the other data set is output to the second convolutional layer, which continues to perform convolution calculations on the same data set in a left-to-right, top-to-bottom order, which is Z-shaped 4 in Figure 3.

[0084] For example, multiple data pieces are labeled as tile1, tile2, ..., tile3 in order from top to bottom and left to right. The first convolutional layer conv1 performs convolution calculation on this data piece from left to right and top to bottom (i.e., in a zigzag pattern) based on the first convolutional kernel corresponding to the first convolutional layer, and outputs the calculation result to the second convolutional layer. The second convolutional layer conv2 performs convolution calculation on the received data from left to right and top to bottom based on the second convolutional kernel corresponding to the second convolutional layer, and outputs the calculation result to the third convolutional layer. The third convolutional layer conv3 performs convolution calculation on the received data from left to right and top to bottom based on the convolutional kernel corresponding to the third convolutional layer. Assuming that the data space is minimized at this time, the preset requirement is met.

[0085] Then, return to the first convolutional layer to perform calculations on the second piece of data, and so on until the nth piece of data has been calculated.

[0086] Subsequently, deconvolution calculations can be performed sequentially on the convolution results of each data piece based on the fourth and fifth convolutional layers until the number of rows in the deconvolution result of each data piece is consistent with the number of rows before the convolution calculation.

[0087] Specifically, as shown in Figure 4, in the first stage (stage0), the first convolutional layer (conv_1) performs convolution calculation on the first data tile (tile1), outputting the first convolution calculation result (conv_1_tile1) to the second convolutional layer (conv_2). In the second stage (stage1), the second convolutional layer (conv_2) performs convolution calculation on the first convolution calculation result (conv_1_tile1) of the first data tile, outputting the second convolution calculation result (conv_2_tile1) to the third convolutional layer (conv_3). In the third stage (stage2), the third convolutional layer (conv_3) performs convolution calculation on the second calculation result (conv_2_tile1) of the first data tile, obtaining the third convolution calculation result (conv_3_tile1) of the first data tile. At this point, the data space of the convolution calculation result of the first data tile is minimized, meeting the preset requirements.

[0088] Similarly, in stage i, the first convolutional layer conv_1 is used to perform convolution calculation on the nth data tile_n, and the first convolution calculation result conv_1_tile_n of the nth data tile is output to the second convolutional layer conv_2; in stage i+1, the second convolutional layer conv_2 is used to perform convolution calculation on the first convolution calculation result conv_1_tile_n of the nth data tile, and the second convolution calculation result conv_2_tile_n of the nth data tile is output to the third convolutional layer conv_3; in stage i+2, the third convolutional layer conv_3 is used to perform convolution calculation on the second convolution calculation result conv_2_tile_n of the nth data tile, and the third convolution calculation result conv_3_tile_n of the nth data tile is obtained. At this time, the data space of the convolution calculation of the nth data tile is minimized, which meets the preset requirements.

[0089] Then, in stage i+3, the fourth convolutional layer conv_4 is used to perform deconvolution on the third convolution result conv_3_tile1 of the first data piece, outputting the first deconvolution result conv_4_tile1 of the first data piece to the fifth convolutional layer conv_5. In stage i+4, the fifth convolutional layer is used to perform deconvolution on the first deconvolution result conv_4_tile1 of the first data piece, obtaining the second deconvolution result conv_5_tile1 of the first data piece. If the number of rows in the second deconvolution result conv_5_tile1 of the first data piece is consistent with the number of rows in the first data piece, then the deconvolution calculation continues for the second data piece, and so on, until the number of rows in the deconvolution result of each data piece is consistent with the number of rows before the convolution calculation.

[0090] Accordingly, the deconvolution results of each data segment are combined to form a complete preset row of data. This combined preset row of data can then be written to external storage so that the image signal processor can further process it.

[0091] In some embodiments, considering that the neural network processing unit can complete the processing of the entire image in a preset row manner, the neural network model can be pre-compiled offline according to the size of the entire image. This allows the neural network model to perform network inference on the preset row data output by the image signal processor, avoiding the preprocessing and postprocessing operations caused by block compilation and improving the image processing accuracy.

[0092] Another embodiment of this application provides a multi-channel data processing method, which can first determine the number of rows and columns of each data segment based on a neural network model, and then determine the size of each data segment based on the number of rows and columns of each data segment. As shown in Figure 5, the method for determining the number of rows and columns of each data segment may include:

[0093] S201. Determine the number of rows and output shape required for each convolutional layer based on the convolutional kernel, stride, and padding of each convolutional layer.

[0094] For example, determining the number of rows required for each convolutional layer based on the kernel, stride, and padding of each convolutional layer may include: calculating a first difference between the kernel and padding of the current convolutional layer, and a first product between the first difference and the stride of the current convolutional layer;

[0095] The number of rows in the current convolutional layer is determined based on the first product and the number of rows in the previous convolutional layer.

[0096] The number of rows in the first convolutional layer is the difference between the number of rows in the first convolutional layer and the number of padding cells in the first convolutional layer.

[0097] For example, determining the output shape of each convolutional layer based on its kernel, stride, and padding can include:

[0098] The height of the output data of the first convolutional layer is determined by the second difference between the height of the slice data and the convolutional kernel of the first convolutional layer, the second product of the padding number of the first convolutional layer and the first preset value, and the first sum of the second difference and the second product. The height of the output data of the first convolutional layer is determined based on half of the first sum and the second preset value.

[0099] The width of the output data of the first convolutional layer is determined by calculating the third difference between the width of the slice data and the convolutional kernel of the first convolutional layer, and the second sum of the third difference and the second product. The width of the output data of the first convolutional layer is determined by half of the second sum and the second preset value.

[0100] Calculate the fourth difference between the height of the output data of the previous convolutional layer and the convolutional kernel of the current convolutional layer, the third product of the padding number of the current convolutional layer and the first preset value, and the third sum of the fourth difference and the third product. Determine the height of the output data of the current convolutional layer based on half of the third sum and the second preset value.

[0101] Calculate the fifth difference between the width of the output of the previous convolutional layer and the convolutional kernel of the current convolutional layer, and the fourth sum of the product of the fifth difference and the third. Determine the width of the output data of the current convolutional layer based on half of the fourth sum and the second preset value.

[0102] At this point, the height and width of the output data for each convolutional layer can be obtained;

[0103] Then, the output shape of each convolutional layer can be determined based on the number of convolutional kernels, the width and height of the output data.

[0104] For example, suppose a neural network model has L convolutional layers, and the kernel of the i-th convolutional layer (i.e., the kernel corresponding to the i-th convolutional layer) is K. i The number of convolutional kernels in the i-th layer is N. i The stride corresponding to the i-th convolutional kernel is S. i The number of padding elements in the i-th convolutional layer is P. i The input image has an shape of H×W×C, and the available storage space within the neural network processing unit is men. The padding factor controls the size and shape of the output feature map, ensuring that the input and output are the same or similar.

[0105] Accordingly, the number of rows required for each convolutional layer is calculated based on the kernel, stride, and padding of each convolutional layer, including:

[0106] The number of rows required for convolution calculation in the first convolutional layer is R1 = K1 - P1;

[0107] The number of rows required for convolution calculation in the second convolutional layer is R2 = R1 + (K2 - P2) * S1;

[0108] The number of rows required for convolution calculation in the third convolutional layer is R3 = R2 + (K3 - P3) * S2;

[0109] Similarly, the number of rows required for convolution computation in the i-th convolutional layer is R. i =R i-1 +(K i -P i )*S i .

[0110] The output shape of each convolutional layer is calculated based on its kernel, stride, and padding.

[0111] The output shape of the first convolutional layer is shape(1)=(N1,H1,W1), where H1=(H0-K1+2P1) / S1+1, W1=(W0-K1+2P1) / S1+1

[0112] The output shape of the second convolutional layer is shape(2)=(N2,H2,W2), where H2=(H1-K2+2P2) / S2+1,W2=(W1-K2+2P2) / S2+1;

[0113] The output shape of the i-th convolutional layer is shape(i) = (N... i H i W i ), where H i =(H i-1 -K i +2P i ) / S i +1,W i =(W i-1 -K i +2P i ) / S i +1.

[0114] S202. When the storage space occupied by the minimum output shape is less than the available storage space of the neural network processing unit, determine whether the number of rows of the convolutional layer corresponding to the minimum output shape meets the preset delay requirement.

[0115] Specifically, first find the minimum output shape. For example, if the convolutional layer corresponding to the minimum output shape is the i-th convolutional layer, then the minimum output shape is shape(i) = (N... i H i W i Assuming the calculated bits are B bits, the storage space occupied by the minimum output shape is (N). i H i W i B i Then, determine the storage space (N) occupied by the smallest output shape. i H i W i B i If the number of rows in the convolutional layer corresponding to the smallest output shape is less than the available storage space of the neural network processing unit, determine whether the number of rows in the convolutional layer meets the preset delay requirement.

[0116] For example, if the required latency is less than m1 rows, and the minimum output shape is shape(i) = (N... i H i W i The number of rows in the i-th convolutional layer corresponding to ) is R. iThen determine the number of rows R of the i-th convolutional layer. i Is it less than the preset delay m1?

[0117] If yes, proceed to step S203; otherwise, proceed to step S204.

[0118] S203. Determine the number of rows and columns of each data piece based on the number of rows of the convolutional layer corresponding to the minimum output shape.

[0119] Specifically, determine the number of rows R of the convolutional layer corresponding to the minimum output shape. i Let m be the number of rows and n be the number of columns for each data segment. However, if the total number of columns corresponding to the preset rows cannot be divided by the number of rows R of the convolutional layer corresponding to the minimum output shape... i Then obtain the total number of columns and rows R. i The remainder is used to update the number of columns and rows for each data segment.

[0120] S204. Determine whether the storage space occupied by the second smallest output shape is less than the available storage space of the neural network processing unit, until the number of rows and columns of each piece of data is determined according to the output shape.

[0121] For example, if the number of rows in the convolutional layer corresponding to the smallest output shape does not meet the preset latency requirement, the next smallest output shape is searched. Then, it is determined whether the storage space occupied by the next smallest output shape is less than the available storage space of the neural network processing unit. If so, it is determined whether the number of rows in the convolutional layer corresponding to the next smallest output shape meets the preset latency requirement. If the number of rows in the convolutional layer corresponding to the next smallest output shape meets the preset latency requirement, the number of rows and columns of each data piece is determined based on the number of rows in the convolutional layer corresponding to the next smallest output shape. If the number of rows in the convolutional layer corresponding to the next smallest output shape does not meet the preset latency requirement, the determination continues until the determination of all convolutional layers is completed, so that the number of rows and columns of each data piece can be determined based on the output shape.

[0122] For example, suppose the convolutional layer corresponding to the second smallest output shape is the j-th convolutional layer, and the second smallest output shape is shape(j) = (N j H j W j The storage space occupied by the second smallest output shape is (N). j H j W j B j If the second smallest output shape occupies a certain amount of storage space (N), then the storage space occupied by the second smallest output shape is... j H j W j B j When the number of rows R in the j-th convolutional layer is less than the available storage space of the neural network processing unit, determine the number of rows R. j Is it less than the preset delay m1? If so, according to the row number R of the j-th convolutional layer...j Determine the number of rows and columns for each data segment.

[0123] In the above embodiments, the number of rows and output shape required for each convolutional layer are determined by parameters such as the convolution kernel, stride, and padding of each convolutional layer of the neural network model. Then, based on the size relationship between the space occupied by the output shape and the available storage space of the neural network processing unit, and whether the number of rows of the convolutional layer meets the preset delay requirements, the number of rows and columns of each piece of data that matches the neural network model are determined so that the neural network model can be used to perform network inference on each piece of data.

[0124] The multi-channel data processing method provided in the embodiments of this application has been described in detail above. The embodiments of this application also provide a structural schematic diagram of an extended reality display device, as shown in Figure 6. The extended reality display device provided in this embodiment includes:

[0125] The neural network processing unit 20, at least two image signal processors 10, and a buffer 30 are provided, with the buffer 30 connecting the neural network processing unit 20 and each image signal processor 10.

[0126] Each image signal processor 10 is used to write each line of data sequentially into the buffer 20;

[0127] The neural network processing unit 20 includes at least two cores, each core being used to acquire a preset row of data output by an image signal processor from the buffer 30, and to perform network inference on the acquired preset row of data in parallel.

[0128] In this process, at least some cores will divide the acquired preset row data into multiple data pieces, and perform network inference on each data piece according to a preset strategy. The preset strategy is that the mobile terminal starts from the first end of the data piece, moves to the corresponding second end, and returns to the first end after reaching the second end, until the data piece inference is completed.

[0129] In this embodiment, the neural network processing unit includes at least two cores, each core being responsible for calculating the data output by one image signal processor, thereby enabling parallel processing of the data output by at least two image signal processors.

[0130] Furthermore, by reading and processing data from preset rows through the neural network processing unit, the need for image segmentation can be eliminated. This reduces the number of preprocessing and post-processing modules in the image signal processor (ESP), lowering its hardware requirements and reducing its area, power consumption, and cost. Eliminating the need for image segmentation also reduces the computational resources required by the ESP, lowering its computing power and power consumption, thereby reducing the overall system computational power and improving image processing performance.

[0131] Furthermore, each image signal processor writes the processed data line by line into the buffer. When the preset line is full, the neural network processing unit can read the data from the preset line and perform network inference on it. While the neural network processing unit performs network inference on the read preset line data, the image signal processor can continue writing data into the buffer in line order, overwriting the positions read by the neural network processing unit, thus achieving buffer address reuse and reducing the buffer size. Moreover, by continuously writing data line by line into the buffer and continuously retrieving preset line data from the buffer for network inference, the neural network processing unit can support image computation in a pipelined manner, reducing latency and supporting high frame rates and high resolutions.

[0132] In addition, at least some of the pre-defined rows of data are sliced and network inference is performed on each slice, so that the neural network processing unit only needs to process a small portion of the data at a time, reducing the memory occupied by the neural network processing unit.

[0133] In some examples, considering that the time required for the neural network processing unit to perform network inference on the preset row of data is affected by the single-core computing power and the neural network processing model, the time for the neural network processing unit to read the preset row of data from the buffer can be adjusted according to the actual situation. A read command is then issued to the buffer based on the adjusted read time. Furthermore, if the buffer is not yet full of the preset row of data when the read command is issued, the neural network processing unit can wait until the buffer contains the preset row of data before reading it. This reduces the buffer space, for example, to the space required for two preset rows of data.

[0134] In some embodiments, a multi-channel image signal processor simultaneously receives image data acquired by multiple image sensors, such as two-channel or three-channel sensors. Each image signal processor processes the received image data and outputs it line by line to a buffer in row-by-row order.

[0135] In one implementation, an image sensor can acquire data for each row of pixels sequentially and transmit it to an image signal processor (ISP). The ISP processes each row of data it receives and outputs it to a buffer. For example, if N rows of data are exposed within a fixed exposure time, the image sensor can output the N rows of data to the ISP one by one. The ISP processes each row of data it receives and outputs it to a buffer until all N rows of data are output to the buffer.

[0136] In some embodiments, there can be multiple buffers, each connected to one image signal processor, and each buffer is used to store the data output by one image signal processor. Accordingly, each core in the neural network processing unit can obtain the data output by one image signal processor from one buffer, improving data throughput and reducing data transmission bottlenecks. For example, if the preset number of columns corresponding to row data is M and the bit width is 10 bits, then one buffer is 1×N×M×1.25 bytes.

[0137] It should be noted that, for the buffer, the image signal processor continuously writes data, and the neural network processing unit continuously reads data. Data is read and written simultaneously, and the read data is refreshed, thus forming a stable data stream.

[0138] For example, the neural network processing unit integrates a Direct Memory Access (DMA) module, which can request a preset row of data from the buffer at once through the AXI (Advanced eXtensible Interface) interface, and then perform slicing processing inside the neural network processing unit.

[0139] In other embodiments, there is only one buffer, which is connected to at least two image processors. This buffer is independently encoded using different addresses and stores the data output by at least two image processors, reducing hardware resource usage and complexity. Accordingly, each image signal processor can sequentially write each line of data into the buffer according to its corresponding address, and each core can retrieve a preset line of data output by an image signal processor from the buffer according to its corresponding address.

[0140] In some examples, the neural network processing unit may have a buffer corresponding to each kernel. The buffers corresponding to each kernel are independent of each other and are used to store data read from the corresponding image signal processor, data calculated in the middle of the model, and weight coefficients. Accordingly, the neural network processing unit includes an input data buffer, a weight coefficient buffer, and an intermediate data buffer. The intermediate data may include the convolution calculation results and deconvolution calculation results of the sliced data.

[0141] In some embodiments, the extended reality display device further includes at least two image sensors, each image sensor being connected to an image signal processor, and each image sensor transmitting the acquired image data line by line to the corresponding image signal processor. Accordingly, each image signal processor can process each received line of data and write it into a corresponding buffer, thereby enabling efficient conversion from light signals to the final image.

[0142] For example, an extended reality display device has two image acquisition units, one on the left and one on the right. When the extended reality display is started, the two image acquisition units simultaneously collect exposure data line by line in order, and after the exposure is completed, they collect data and transmit the collected data line by line to the corresponding image signal processor.

[0143] In some embodiments, the virtual reality display device may include a binocular camera, which is connected to two image signal processors. Each image signal processor acquires image data captured by the corresponding camera and writes each line of data sequentially into a buffer. Correspondingly, the dual cores in the neural network processing unit can acquire preset lines of data output by one image signal processor from the buffer and perform network inference on the acquired preset lines of data in parallel.

[0144] In some embodiments, each core is further configured to write the processed preset row data into a corresponding buffer. Each image signal processor obtains the data processed by the neural network processing unit from the corresponding buffer and performs post-processing on the obtained data. The neural network processing unit continuously reads and writes data in preset row units, forming a stable data stream and reducing latency.

[0145] For example, post-processing by an image signal processor may include: black level correction, dynamic range adjustment, data formatting, etc., and post-processing can be performed according to actual needs.

[0146] In practical applications, after the image signal processor has exposed the entire image data, it stops writing data to the buffer. Similarly, after the neural network processing unit obtains the last set of preset row data, it stops retrieving data from the buffer. Furthermore, considering that the neural network processing unit needs to perform network inference after reading the last set of preset row data, it will continue to calculate for a period of time. The calculation time can be determined by the structure of the neural network processing module. The entire image processing is completed when the neural network processing unit writes the last set of preset row data into the buffer. Then, the second image can be processed using the same method.

[0147] To facilitate understanding of the working process of a virtual reality display device, Figure 7 shows a schematic diagram of the device's operation. As shown in Figure 7, taking a virtual reality display device that includes binocular cameras (left and right cameras), two image signal processors, and a neural network processing unit including a first core and a second core as an example, the working process of the virtual reality display device can include the following steps:

[0148] S301, left eye camera acquires image data.

[0149] S302, right eye camera acquires image data.

[0150] As one implementation, when the virtual reality display device is turned on, the left and right cameras simultaneously acquire image data, for example, exposure data line by line in sequence.

[0151] S303, The first image signal processor acquires image data collected by the left eye camera, processes the acquired data, and writes it line by line into the first storage space.

[0152] S304. The second image signal processor acquires image data collected by the right eye camera, processes the acquired data, and writes it line by line into the second storage space.

[0153] For example, the first storage space and the second storage space can be different storage spaces in a buffer, or they can be different buffers.

[0154] S305, the neural network processing unit issues a read data command.

[0155] The read instruction is used to retrieve data written by the first image signal processor from the first storage space and data written by the second image signal processor from the second storage space.

[0156] S306, The neural network processing unit obtains preset row data from the first storage space and the second storage space respectively, and performs network inference on the preset row data in parallel through the first core and the second core.

[0157] Figure 8 shows a schematic diagram of the network inference process performed by the first and second cores in the neural network processing unit. As shown in Figure 8, it includes the following steps:

[0158] S307. Divide the m rows of data into multiple pieces.

[0159] For example, divide m rows of data into m / n equal pieces using n columns.

[0160] S308. The first convolutional layer performs convolution calculations on the i-th slice according to a preset strategy.

[0161] Where i is greater than or equal to 1 and less than or equal to m / n.

[0162] For example, the preset strategy may include a zigzag movement path, that is, performing convolution calculations on the i-th slice in a left-to-right and top-to-bottom order.

[0163] For example, the convolution result of the i-th slice can also be stored in the storage space within the corresponding kernel.

[0164] S309. The second convolutional layer determines whether the convolution calculation result of the first convolutional layer meets the calculation requirements of the convolution kernel of the second convolutional layer.

[0165] If yes, proceed to step S310; otherwise, proceed to step S308.

[0166] S310, calculate sequentially up to the t-th layer.

[0167] Where t is an integer greater than 2.

[0168] In this step, each convolutional layer from the second convolutional layer to the t-th convolutional layer performs convolution calculations sequentially, and the convolution calculation result of the previous convolutional layer is used as the input data of the next convolutional layer. The previous convolutional layer and the next convolutional layer are two adjacent layers.

[0169] The t-th layer represents the layer with the smallest output after the convolution calculation, which also means the smallest data space.

[0170] Repeat steps S308-S310 until all slice convolution calculations are completed.

[0171] S311. After all slice convolution calculations are completed, the (t+1)th convolutional layer performs deconvolution calculations on the convolution result of the i-th slice.

[0172] S312, until the size of the deconvolution calculation result of the Lth convolutional layer on the i-th slice is consistent with the size of the i-th slice of the first convolutional layer.

[0173] Where L is greater than t+1.

[0174] Repeat steps S311 and S312 until all slice deconvolution calculations are completed.

[0175] S313. Each core of the neural network processing unit combines the deconvolution calculation results of each slice and writes them into the corresponding storage space.

[0176] Then, the data processed by the neural network processing unit can be further processed. Specifically, the first image signal processor reads the data processed by the neural network processing unit from the first storage space and performs subsequent processing, and the second image signal processor reads the data processed by the neural network processing unit from the second storage space and performs subsequent processing.

[0177] Figure 9 shows a schematic diagram of the hardware structure of an electronic device provided in an embodiment of this application. As shown in Figure 9, the electronic device 20 is used to implement the operation corresponding to the neural network processing unit in any of the above method embodiments. The electronic device 20 in this embodiment may include: a memory 21, a processor 22, and a communication interface 23.

[0178] The memory 21 is used to store computer instructions. The memory 21 may include high-speed random access memory (RAM) and may also include non-volatile memory (NVM), such as at least one disk storage device, and may also be a USB flash drive, external hard drive, read-only memory, disk or optical disc, etc.

[0179] Processor 22 is used to execute computer instructions stored in memory to implement the multi-channel data processing method in the above embodiments. For details, please refer to the relevant descriptions in the foregoing method embodiments. The processor 22 can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), etc. A general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in this invention can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules within the processor.

[0180] Alternatively, the memory 21 can be either standalone or integrated with the processor 22.

[0181] The communication interface 23 can be connected to the processor 22. The processor 22 can control the communication interface 23 to realize the functions of receiving and sending signals.

[0182] The electronic device provided in this embodiment can be used to execute the above-described multi-channel data processing method. Its implementation and technical effects are similar, and will not be described again here.

[0183] This application also provides a computer-readable storage medium storing computer instructions, which, when executed by a processor, are used to implement the methods provided in the various embodiments described above.

[0184] This application also provides a computer program product including computer instructions stored in a computer-readable storage medium. At least one processor of the device can read the computer instructions from the computer-readable storage medium, and the at least one processor executes the computer instructions to cause the device to perform the methods provided in the various embodiments described above.

[0185] This application also provides a chip including a memory and a processor. The memory stores computer instructions, and the processor retrieves and executes the computer instructions from the memory, causing a device with the chip installed to perform the methods described in the various possible implementations above.

[0186] Other embodiments of this application will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this application are indicated by the following claims.

[0187] It should be understood that this application is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this application is limited only by the appended claims.

Claims

1. A method of multipath data processing, wherein, The method includes: Acquire preset row data from at least two image signal processors; Network inference is performed in parallel on the preset rows of data output by each of the image signal processors. In this process, at least some of the preset row data output by the image signal processor is divided into multiple data slices. Each data slice is subjected to network inference according to a preset strategy. The preset strategy is that the movement path starts from the first end of the data slice, moves to the opposite second end, and returns to the first end after reaching the second end, until the data slice inference is completed.

2. The method of claim 1, wherein, The process of performing network inference on each piece of data according to a preset strategy includes: The convolution calculation is performed on each piece of data according to a preset strategy based on at least some of the convolutional layers until the calculation result meets the preset requirements. Perform deconvolution calculations on the convolution results of each data piece based on the remaining convolutional layers until the number of rows in the deconvolution calculation result of each data piece is the same as the number of rows before the convolution calculation. In this process, the calculation result of the previous convolutional layer is used as the input data for the next convolutional layer in two adjacent convolutional layers.

3. The method of claim 1 or 2, wherein, The methods for determining the number of rows and columns in each data segment include: The required number of rows and output shape for each convolutional layer are determined based on the kernel, stride, and padding of each convolutional layer. When the storage space occupied by the minimum output shape is less than the available storage space of the neural network processing unit, it is determined whether the number of rows of the convolutional layer corresponding to the minimum output shape meets the preset delay requirement. If so, determine the number of rows and columns of each data piece based on the number of rows in the convolutional layer corresponding to the smallest output shape; If not, determine whether the storage space occupied by the second smallest output shape is less than the available storage space of the neural network processing unit, until the number of rows and columns of each piece of data is determined based on the output shape.

4. The method of claim 3, wherein, The process of determining the required number of rows for each convolutional layer based on the kernel, stride, and padding of each convolutional layer includes: Calculate a first difference between the kernel and padding of the current convolutional layer, and a first product between the first difference and the stride of the current convolutional layer; The number of rows in the current convolutional layer is determined based on the first product and the number of rows in the previous convolutional layer. The number of rows in the first convolutional layer is the difference between the number of rows in the first convolutional layer and the number of padding cells in the first convolutional layer.

5. The method of claim 3 or 4, wherein, The process of determining the output shape of each convolutional layer based on its kernel, stride, and padding includes: The height of the output data of the first convolutional layer is determined by calculating the second difference between the height of the slice data and the convolutional kernel of the first convolutional layer, the second product of the padding number of the first convolutional layer and the first preset value, and the first sum of the second difference and the second product. The height of the output data of the first convolutional layer is determined based on half of the first sum and the second preset value. The width of the slice data is calculated, and the third difference between the width of the slice data and the convolution kernel of the first convolutional layer is calculated. The second sum of the third difference and the second product is calculated. The width of the output data of the first convolutional layer is determined based on half of the second sum and the second preset value. Calculate the fourth difference between the height of the output data of the previous convolutional layer and the convolutional kernel of the current convolutional layer, the third product of the padding number of the current convolutional layer and the first preset value, and the third sum of the fourth difference and the third product. Determine the height of the output data of the current convolutional layer based on half of the third sum and the second preset value. Calculate the fifth difference between the width of the output data of the previous convolutional layer and the convolutional kernel of the current convolutional layer, and the fourth sum of the fifth difference and the third product. Determine the width of the output data of the current convolutional layer based on half of the fourth sum and the second preset value. The output shape of each convolutional layer is determined based on the number of convolutional kernels, the width of the output data, and the height of the output data.

6. The method of any one of claims 1-5, wherein, The method further includes: A neural network model is pre-compiled offline according to the size of the entire image. The neural network model is used to perform network processing on the preset row data output by the image signal processor.

7. An extended reality display device, wherein, The device includes: a neural network processing unit, at least two image signal processors, and a buffer, wherein the buffer is connected to the neural network processing unit and each of the image signal processors; Each of the image signal processors is used to write each line of data sequentially into the buffer; The neural network processing unit includes at least two cores, each core being used to acquire a preset row of data output by the image signal processor from the buffer, and to perform network inference on the acquired preset row of data in parallel. In this process, at least some of the cores will divide the acquired preset row data into multiple data pieces, and perform network inference on each data piece according to a preset strategy. The preset strategy is that the movement path starts from the first end of the data piece, moves to the corresponding second end, and returns to the first end after reaching the second end, until the data piece inference is completed.

8. The apparatus of claim 7, wherein, The number of buffers is at least two, and each buffer is connected to one of the image signal processors; Each of the image signal processors is used to write each line of data sequentially into the corresponding buffer; each of the cores is used to obtain a preset line of data output by the image signal processor from the corresponding buffer; or, Each of the image signal processors writes each line of data sequentially into the buffer according to the corresponding address; each core is used to retrieve a preset line of data output by an image signal processor from the buffer according to the corresponding address.

9. The apparatus of claim 7 or 8, wherein, The device further includes: at least two image sensors, each image sensor being connected to one of the image signal processors; Each of the image sensors is used to transmit the acquired image data line by line to the corresponding image signal processor; Each of the image signal processors is used to process each received line of data and then write it into the buffer.

10. The apparatus of any one of claims 7-9, wherein, Each of the cores is also used to write the processed preset row data into the corresponding buffer; Each image signal processor obtains the data processed by the neural network processing unit from the corresponding buffer and performs post-processing on the obtained data.

11. The apparatus of any of claims 7-10, wherein, The device further includes: Binocular camera; The binocular camera is connected to two image signal processors. Each image signal processor acquires the image data collected by the corresponding camera and writes each line of data into the buffer sequentially.

12. An electronic device, comprising: include: A processor, and a memory communicatively connected to the processor; The memory stores computer-executed instructions; The processor executes computer execution instructions stored in the memory to implement the method as described in any one of claims 1 to 6.

13. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions that, when executed by a processor, are used to implement the method as described in any one of claims 1 to 6.

14. A computer program product, characterised in that, The computer program product includes a computer program that, when executed by a processor, implements the method of any one of claims 1 to 6.