A dual-scheduling-mode neural network accelerator

The neural network accelerator with dual scheduling modes adopts data pixel priority and data channel priority scheduling modes to optimize data transmission and caching, solving the problem of hardware resource waste in the single scheduling mode and improving hardware utilization and computing efficiency.

CN115423083BActive Publication Date: 2026-06-19INST OF COMPUTING TECH CHINESE ACAD OF SCI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INST OF COMPUTING TECH CHINESE ACAD OF SCI
Filing Date
2022-09-16
Publication Date
2026-06-19

Smart Images

  • Figure CN115423083B_ABST
    Figure CN115423083B_ABST
Patent Text Reader

Abstract

The application discloses a neural network accelerator of a double scheduling mode, which comprises a matrix operation array, a pooling unit and an activation unit, and further comprises an array switching module, a double scheduling cache module and an auxiliary operation module, wherein the array switching module is used for controlling the connection mode between sub-operation units in the matrix operation array to realize array mode switching of the matrix operation array, controlling the mode of the double scheduling cache module in caching and transmitting data, and controlling the auxiliary operation module to perform auxiliary operation; the double scheduling cache module is used for caching neural network data to be processed obtained from an external storage medium according to the corresponding scheduling mode of the accelerator, and transmitting the data to the matrix operation array according to the corresponding scheduling mode; and the auxiliary operation module is used for performing addition calculation on the result of the matrix operation array after operation in a serial array mode based on the control of the array switching control module.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of neural network technology, specifically to the field of neural network accelerators, and more specifically to a neural network accelerator with a dual scheduling mode and a neural network data processing method. Background Technology

[0002] In recent years, with the rapid development of deep learning technology, it has been extensively deployed and researched in the field of image processing, especially in the processing of various visual tasks in autonomous driving and robotics, where deep learning technology is indispensable. However, since terminal robots are resource-constrained applications with limited energy resources, carrying high-performance GPUs and CPUs requires a large amount of energy. Due to low computational efficiency, a large amount of energy is wasted on ineffective calculations, making battery life a bottleneck. Therefore, improving computational efficiency has become a very important point.

[0003] Existing neural network accelerator architectures demonstrate good acceleration performance and energy efficiency in neural network-based vision tasks. However, as the number of vision tasks on terminal devices increases, such as depth perception, semantic segmentation, target recognition and detection, and optical flow detection in robots, neural network models require significant computational resources. Due to the irregular data sizes of network layers in various tasks, the data deployment efficiency on the hardware acceleration array decreases under a single scheduling architecture, leading to ineffective computation of neural network accelerator resources and consequently, a decline in accelerator hardware utilization.

[0004] In summary, existing neural network accelerators with a single scheduling mode have the following drawbacks: the data deployment efficiency on the hardware acceleration array is low, which leads to the use of neural network accelerator resources for ineffective computation, resulting in low hardware utilization. Summary of the Invention

[0005] Therefore, the purpose of this invention is to overcome the shortcomings of the prior art and provide a neural network accelerator with dual scheduling mode and a neural network data processing method.

[0006] According to a first aspect of the present invention, a neural network accelerator with dual scheduling modes is provided, wherein the dual scheduling modes include a scheduling mode that prioritizes data acquisition based on data pixels and a scheduling mode that prioritizes data acquisition based on data channels. The accelerator includes a matrix operation array, a pooling unit, and an activation unit. The matrix operation array includes multiple sub-operation units composed of multipliers and adders. The accelerator further includes an array switching module, a dual scheduling cache module, and an auxiliary operation module. The array switching module includes multiple data gating units, which are respectively connected to each sub-operation unit in the matrix operation array, the dual scheduling cache module, and the auxiliary operation module. The data gating units output a switching control signal according to a preset rule to select the scheduling mode corresponding to the accelerator, thereby controlling the connection method between the sub-operation units in the matrix operation array to achieve array mode switching of the matrix operation array and controlling the cache size of the dual scheduling cache module. The system controls the data transmission method and the auxiliary operation module to perform auxiliary operations. Specifically, in a scheduling mode prioritizing data acquisition by data pixels, the matrix operation array is switched to a parallel array mode for neural network data computation, and the dual scheduling cache module is controlled to cache and transmit data in parallel. In a scheduling mode prioritizing data acquisition by data channels, the matrix operation array is switched to a serial array mode for neural network data computation, and the dual scheduling cache module is controlled to cache and transmit data serially. The dual scheduling cache module is used to cache neural network data to be processed from external storage media according to the accelerator's corresponding scheduling mode and to transmit data to the matrix operation array according to the corresponding scheduling mode. The auxiliary operation module is connected to the matrix operation array and the array switching module and includes at least one adder, used to perform addition calculations on the results of operations performed by the matrix operation array in serial array mode based on the control of the array switching module.

[0007] In some embodiments of the present invention, the scheduling mode of acquiring data with priority to data pixels refers to the scheduling mode of acquiring data in units of data from one or more channels of multiple pixels; the scheduling mode of acquiring data with priority to data channels refers to the scheduling mode of acquiring data in units of data from all channels of a single pixel.

[0008] Preferably, the preset rule is: to calculate the amount of invalid data generated when acquiring the current neural network data to be processed with priority of data pixels and priority of data channels respectively, and to use the method with the smaller amount of invalid data in the calculation result as the scheduling mode of the accelerator.

[0009] In some embodiments of the present invention, in the parallel array mode, all sub-operation units in the matrix operation array perform neural network data calculations in parallel; in the serial array mode, all sub-operation units in the matrix operation array perform neural network data calculations in serial mode.

[0010] In some embodiments of the present invention, the dual-scheduling cache module further includes a data prefetch dual-scheduling cache module, a data splicer, and an on-chip dual-scheduling main cache module, wherein: the data prefetch dual-scheduling cache module includes multiple cache units connected to an external storage medium, used to cache neural network data to be processed obtained from the external storage medium according to the scheduling mode corresponding to the accelerator, and to transmit the data processing results to the external storage medium; the data splicer is connected to the data prefetch dual-scheduling cache module and the on-chip dual-scheduling main cache module, used to splice the neural network data to be processed cached by the data prefetch dual-scheduling cache module according to the scheduling mode corresponding to the accelerator, and then transmit it to the on-chip dual-scheduling main cache module; the on-chip dual-scheduling main cache module includes multiple on-chip storage units connected to the data splicer, a matrix operation array, and an array switching module, used to transmit the neural network data to be processed spliced ​​by the data splicer to the matrix operation array for calculation according to the scheduling mode corresponding to the accelerator under the control of the control signal of the array switching module, receive the calculation results, and transmit the results to the data prefetch dual-scheduling cache module via the data splicer.

[0011] Preferably, in the data prefetch dual-scheduling cache module, under the data pixel-first data acquisition scheduling mode, the data of one or more channels of multiple pixels are cached in parallel; under the data channel-first data acquisition scheduling mode, the data of all channels of a single pixel are cached serially; under the data pixel-first data acquisition scheduling mode, the data splicer splices the data of the same channel of each pixel cached by the data prefetch dual-scheduling cache module into a vector and transmits it in parallel to the on-chip dual-scheduling main cache module; under the data channel-first data acquisition scheduling mode, the data of all channels corresponding to a single pixel are spliced ​​into a vector and transmitted serially to the on-chip dual-scheduling main cache module; under the data pixel-first data acquisition scheduling mode, the on-chip dual-scheduling main cache module transmits the spliced ​​vector of the same channel of each pixel spliced ​​by the data splicer to each sub-operation unit in the matrix operation array for parallel computation, with each sub-operation unit calculating the data of one pixel; under the data channel-first data acquisition scheduling mode, the spliced ​​vector of all channels corresponding to a single pixel spliced ​​by the data splicer is transmitted serially to all sub-operation units in the matrix operation array for serial computation.

[0012] In some embodiments of the present invention, the array switching module includes: an input connection switching module, which includes multiple data gating units, each of which is connected to an on-chip storage unit of the on-chip dual-scheduling main cache module, for controlling the on-chip dual-scheduling main cache module to transmit the neural network data to be processed to the matrix operation array according to the scheduling mode corresponding to the accelerator; and an output connection switching module, which includes multiple data gating units, for controlling the on-chip dual-scheduling main cache module to receive the data processing results according to the scheduling mode of the accelerator.

[0013] In some embodiments of the present invention, the accelerator further includes a data transmission interface, an instruction decoder, and a control module, wherein: the data transmission interface includes a data transmission logic device unit and is connected to an external storage medium, an instruction decoder, and a dual-schedule cache module, for transmitting the neural network data to be processed and the data processing results from the external storage medium to the dual-schedule cache module; the instruction decoder includes a data decoding unit and a logic operation unit and is connected to the data transmission interface and the dual-schedule cache module, for decoding the instructions corresponding to the neural network data to be processed; the control unit is connected to all other units in the accelerator and is used to control the operation of each module.

[0014] According to a second aspect of the present invention, a neural network data processing method for an accelerator as described in the first aspect of the present invention is provided. The method includes: S1, selecting a scheduling mode based on the neural network data to be processed and generating corresponding processing instructions, wherein the scheduling mode includes a scheduling mode that prioritizes data acquisition by data pixels and a scheduling mode that prioritizes data acquisition by data channels; S2, the dual scheduling cache module is used to cache the neural network data to be processed acquired from the outside according to the scheduling mode corresponding to the accelerator and to transmit the data to the matrix operation array according to the corresponding scheduling mode; S3, in the scheduling mode that prioritizes data acquisition by data pixels, controlling the matrix operation array to switch to parallel array mode to perform neural network data calculation and controlling the dual scheduling cache module to cache and transmit data in parallel; or in the scheduling mode that prioritizes data acquisition by data channels, controlling the matrix operation array to switch to serial array mode to perform neural network data calculation and controlling the dual scheduling cache module to cache and transmit data in serial mode.

[0015] Preferably, in step S1, the scheduling mode is selected as follows: the amount of invalid data generated when acquiring the current neural network data to be processed is calculated separately when data pixels are prioritized and when data channels are prioritized, and the mode with the smaller amount of invalid data in the calculation results is selected as the scheduling mode of the accelerator.

[0016] Compared with the prior art, the advantages of the present invention are as follows:

[0017] 1. A hardware structure compatible with two-dimensional scheduling was established for the neural network accelerator. Based on the characteristics of the network layer parameters of the neural network data, a better scheduling method was selected to improve the deployment rate of neural network data in the accelerator's computing array.

[0018] 2. A method for selecting scheduling modes based on the parameters of neural network data network layers is proposed to improve the hardware utilization of neural network accelerators. Attached Figure Description

[0019] The embodiments of the present invention will be further described below with reference to the accompanying drawings, wherein:

[0020] Figure 1 This is a schematic diagram of the structure of a neural network accelerator with dual scheduling mode according to an embodiment of the present invention;

[0021] Figure 2 a is a schematic diagram illustrating the principle of prioritizing the acquisition of neural network data to be processed in the dual scheduling mode according to an embodiment of the present invention;

[0022] Figure 2 b is a schematic diagram illustrating the principle of prioritizing the acquisition of neural network data in the dual scheduling mode according to an embodiment of the present invention;

[0023] Appendix Figure 3 a is a schematic diagram of the connection of the computing unit that prioritizes data pixels to acquire the neural network data to be processed in the dual scheduling mode according to an embodiment of the present invention;

[0024] Appendix Figure 3 b is a schematic diagram of the connection of the computing components in the dual scheduling mode according to an embodiment of the present invention, in which the data channel is given priority to acquire the neural network data to be processed.

[0025] Appendix Figure 4 a is a schematic diagram of the cache structure and data transmission principle inside the dual-scheduling cache module according to an embodiment of the present invention;

[0026] Appendix Figure 4 b is a schematic diagram of the cache structure and data transmission principle inside the dual-scheduling cache module for acquiring neural network data to be processed with priority based on data pixels, according to an embodiment of the present invention;

[0027] Appendix Figure 4 c is a schematic diagram of the cache structure and data transmission principle inside the dual-scheduling cache module for prioritizing the acquisition of neural network data to be processed according to an embodiment of the present invention;

[0028] Appendix Figure 5 This is a schematic diagram illustrating the switching principle between the dual-scheduling cache module and the array switching module of the matrix operation array according to an embodiment of the present invention.

[0029] Appendix Figure 6 This is a schematic diagram illustrating the switching principle between the dual-scheduling cache module, the matrix operation array, and the auxiliary operation module according to an embodiment of the present invention.

[0030] Appendix Figure 7 This is a schematic diagram illustrating the principle of serial-to-parallel mode switching within the dual-scheduling cache module according to an embodiment of the present invention.

[0031] Appendix Figure 8 This is a schematic flowchart of a neural network data processing method according to an embodiment of the present invention. Detailed Implementation

[0032] To make the objectives, technical solutions, and advantages of this invention clearer, the invention is further described in detail below through specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.

[0033] The technical concept of this invention is described below. As mentioned in the background art, existing neural network accelerators with a single scheduling mode in the field of image processing require a large amount of computing resources from the neural network model when the visual tasks of the terminal device increase. Furthermore, due to the irregular size of the neural network layer parameters under various tasks, the data deployment efficiency on the hardware acceleration array of the neural network accelerator under the single-architecture scheduling mode is low. This results in a large amount of resources being wasted on repetitive computations, leading to low overall hardware utilization of the accelerator. Therefore, the inventors propose a neural network accelerator architecture with a dual scheduling mode. This architecture is a hardware structure compatible with two scheduling modes, capable of sequentially arranging irregular data and selecting an appropriate scheduling mode based on the size characteristics of the neural network layer parameters to acquire and compute data, thereby improving the effective deployment rate of the neural network accelerator on the computing array and further enhancing the hardware utilization of the neural network accelerator.

[0034] According to one embodiment of the present invention, the present invention provides a neural network accelerator with dual scheduling modes, as shown in the attached figure. Figure 1 As shown, the neural network accelerator includes: a data transmission interface 1, an instruction decoder 2, a control module 3, a dual scheduling cache module 4, a matrix operation array 5, an array switching module 6, an auxiliary operation module 7, a pooling unit 8, and an activation unit 9. The modules are connected by connecting lines. Each module will be described in detail below.

[0035] First, it should be noted that the scheduling mode indicates the method of acquiring data from external storage media (e.g., DDR, DRAM), and the dual scheduling mode includes a scheduling mode that prioritizes data acquisition by data pixels and a scheduling mode that prioritizes data acquisition by data channels. The scheduling mode prioritizing data acquisition by data pixels refers to acquiring data on a unit of data from one or more channels of multiple pixels; the scheduling mode prioritizing data acquisition by data channels refers to acquiring data on a unit of data from all channels of a single pixel. According to one embodiment of the present invention, the present invention calculates the amount of invalid data generated when acquiring the currently processed neural network data in both the data pixel-priority and data channel-priority modes, and uses the mode with the smaller amount of invalid data in the calculation results as the scheduling mode of the accelerator. It should be noted that this process is typically pre-executed offline in the external storage media to determine the scheduling mode selected by the accelerator, and the accelerator is configured based on the determined scheduling mode.

[0036] The data transmission interface 1 includes a data transmission logic device unit and is connected to an external storage medium, an instruction decoder 2, and a dual scheduling cache module 4. It is used to transmit the neural network data to be processed and the data processing results from the external storage medium to the dual scheduling cache module. It should be noted that when obtaining the neural network data to be processed from the external storage medium, the instructions corresponding to the scheduling mode used in the neural network data, determined in advance based on the parameters of the neural network data to be processed, and the weights used during computation, are also input into the neural network accelerator.

[0037] The instruction decoder 2 includes a data decoding unit and a logic operation unit, and is connected to the data transmission interface 1 and the dual scheduling cache module 4, for decoding the instructions corresponding to the neural network data to be processed.

[0038] The control unit 3 is connected to all other units in the accelerator and is used to control the operation of each module according to the instructions decoded by the instruction decoder 2.

[0039] The dual-scheduling cache module 4 is used to cache neural network data to be processed obtained from the external storage medium according to the scheduling mode corresponding to the accelerator, and to transmit the data to the matrix operation array 5 according to the corresponding scheduling mode. According to an embodiment of the present invention, the dual-scheduling cache module 4 includes a data prefetch dual-scheduling cache module 41, a data splicer 42, and an on-chip dual-scheduling main cache module 43, wherein: the data prefetch dual-scheduling cache module 41 includes multiple cache units, which are connected to the external storage medium through the data transmission interface 1, and are used to cache neural network data to be processed obtained from the external storage medium according to the scheduling mode corresponding to the accelerator, and to transmit the data processing results to the external storage medium through the data transmission interface 1; the data splicer 42 is connected to the data prefetch dual-scheduling cache module 41 and the on-chip dual-scheduling main cache module 43, and is ... from the external storage medium according to the scheduling mode corresponding to the accelerator, and to transmit the data processing results to the external storage medium through the data transmission interface 1; According to the scheduling mode corresponding to the accelerator, the neural network data to be processed cached in the data prefetch dual-scheduling cache module 41 is concatenated and then transmitted to the on-chip dual-scheduling main cache module 43. The on-chip dual-scheduling main cache module 43 includes multiple on-chip storage units, which are connected to the data concatenator 42, the matrix operation array 5, and the array switching module 6. Under the control of the control signal of the array switching module 6, it transmits the neural network data to be processed after being concatenated by the data concatenator 42 to the matrix operation array 5 for calculation according to the scheduling mode corresponding to the accelerator, receives the calculation results, and transmits the results to the data prefetch dual-scheduling cache module 41 via the data concatenator 42. It should be noted that the data concatenator 42 includes multiple register units and multiple data gating units, which can concatenate data to achieve serial transmission or split data to achieve parallel transmission. For example, the serial input of 1, 2, 3, 4 can be concatenated into [1, 2, 3, 4] to achieve parallel transmission. In addition, the data splicer can also rearrange the data, for example, rearrange [1,2,3,4] into [4,3,2,1] to achieve reverse data input. Preferably, the data splicer 42 can also be set between, but not limited to, the data transmission interface 1 and the data prefetch dual-scheduling cache module 41, the on-chip dual-scheduling main cache module 43 and the matrix operation array 5, and other modules, to complete the splitting, splicing and rearrangement of data transmission between different modules.

[0040] According to one embodiment of the present invention, the data prefetch dual-scheduling cache module 41 caches data from one or more channels of multiple pixels in parallel under a data pixel-first data acquisition scheduling mode; and caches data from all channels of a single pixel in serial mode under a data channel-first data acquisition scheduling mode. Under the data pixel-first data acquisition scheduling mode, the data splicer 42 splices the data from the same channel of each pixel cached by the data prefetch dual-scheduling cache module 41 into a vector and transmits it in parallel to the on-chip dual-scheduling main cache module 43; under the data channel-first data acquisition scheduling mode, the data splicer 42 splices the data from all channels corresponding to a single pixel cached by the data prefetch dual-scheduling cache module 41 into a vector and transmits it serially to the on-chip dual-scheduling main cache module 43. In the data acquisition scheduling mode prioritizing data pixels, the on-chip dual-scheduling main cache module 43 transmits the data splicing vector of the same channel of each pixel after splicing by the data splicer to each sub-operation unit in the matrix operation array 5 for parallel calculation, and each sub-operation unit calculates the data of one pixel; in the data acquisition scheduling mode prioritizing data channels, the on-chip dual-scheduling main cache module 43 transmits the data splicing vector of all channels corresponding to a single pixel after splicing by the data splicer 42 to all sub-operation units in the matrix operation array 5 for serial calculation.

[0041] The matrix operation array 5 includes multiple sub-operation units composed of multipliers and adders, used to perform matrix multiplication and addition operations.

[0042] The array switching module 6 includes multiple data gating units, each connected to a sub-operation unit, a dual-scheduling cache module 4, and an auxiliary operation module 7 in the matrix operation array 5. These gating units, based on preset rules, select the corresponding scheduling mode of the accelerator and output a switching control signal to control the connection between the sub-operation units in the matrix operation array 5. This enables the switching of the array mode of the matrix operation array 5, controls the data caching and transmission method of the dual-scheduling cache module 4, and controls the auxiliary operation module 7 to perform auxiliary operations. Specifically, in the data pixel-priority data acquisition scheduling mode, the matrix operation array 5 switches to a parallel array mode for neural network data computation, and the dual-scheduling cache module 4 caches and transmits data in parallel. In the data channel-priority data acquisition scheduling mode, the matrix operation array 5 switches to a serial array mode for neural network data computation, and the dual-scheduling cache module 4 caches and transmits data serially. According to one embodiment of the present invention, the parallel array mode refers to all sub-operation units in the matrix operation array 5 performing neural network data calculations in parallel. In this mode, each sub-operation unit performs a multiplication operation on one pixel from multiple pixels, combining the neural network weights and neural network data. It then adds the multiplication results using its own adder and transmits the results in parallel through multiple paths to the subsequent pooling unit 8 and activation unit 9 for pooling and activation operations. The serial array mode refers to all sub-operation units in the matrix operation array performing neural network data calculations serially. In this mode, all sub-operation units perform a multiplication operation on all channels of a single pixel, combining the neural network weights and neural network data. The intermediate results of the multiplication operation are transmitted to the auxiliary operation module 7 for addition, and the addition results are transmitted serially through a single path to the subsequent pooling unit 8 and activation unit 9 for pooling and activation operations. It should be noted that the convolutional operations involved in the matrix operation array 5 and the auxiliary operation module 7 are well-known in the field of neural networks and will not be elaborated upon here.

[0043] According to one embodiment of the present invention, the array switching module 6 includes: an input connection switching module 61, which includes multiple data gating units, each of which is connected to an on-chip storage unit of the on-chip dual-scheduling main cache module, for controlling the on-chip dual-scheduling main cache module 43 to transmit the neural network data to be processed to the matrix operation array 5 according to the scheduling mode corresponding to the accelerator; and an output connection switching module 62, which includes multiple data gating units, for controlling the on-chip dual-scheduling main cache module 43 to receive the data processing results according to the scheduling mode of the accelerator.

[0044] The auxiliary operation module 7 is connected to the matrix operation array 5 and the array switching module 6 and includes at least one adder, which is used to perform addition calculations on the results of the matrix operation array 5 after the operation is performed in the serial array mode based on the control of the array switching module 6.

[0045] The pooling unit 8 includes, but is not limited to, a comparison module and a logic operation module, which are used to perform pooling operations on the operation results after the operation of the matrix operation array 5 or the auxiliary operation module 7, and transmit the results to the activation unit 9.

[0046] The activation unit 9 includes, but is not limited to, a comparison module and a logic operation module, used to perform activation operations on the data after the pooling operation by the pooling unit, and to transmit the data processing result back to the on-chip dual-scheduling main cache module 43. It should be noted that the pooling and activation involved in the above-mentioned pooling unit 8 and activation unit 9 are well-known contents in the field of neural networks, and will not be elaborated on here.

[0047] To better understand the working principles and processes of the above modules, the working process of the accelerator will be explained below with reference to the accompanying drawings and examples. It should be noted that the following explanation uses the case where the matrix operation array 5 contains four sub-operation units (sub-operation units 51, 52, 53, and 54, each of which includes four multipliers and one adder) and the on-chip dual-scheduling main cache module 43 contains four on-chip storage units (on-chip storage units 431, 432, 433, and 434) as examples.

[0048] According to one example of the invention, appendix Figure 2 a and 2b respectively illustrate the principle diagrams of two methods for acquiring neural network data to be processed under the dual scheduling mode.

[0049] According to one example of the invention, appendix Figure 3 a and 3b show schematic diagrams illustrating the connection of the data processing components corresponding to the two scheduling modes under the dual scheduling mode.

[0050] To facilitate a comparison of the two scheduling modes, the following will... Figure 2 a and Figure 3 a is combined with, Figure 2 b and Figure 3b) This section combines and explains the reasons for choosing two scheduling modes and how data is processed in each mode. First, the amount of invalid data generated when acquiring the current neural network data to be processed is calculated separately for pixel-priority and channel-priority methods. The scheduling mode with the smaller amount of invalid data in the calculation results is chosen as the accelerator's scheduling mode. Invalid data refers to the remainder when the width of the input feature map data is divided by the number of channels and the number of computational units in the matrix operation vector. Specifically, if the remainder when the width parameter of the input feature map data is divided by the number of computational units in the matrix operation vector is less than the remainder when the number of channels is divided by the number of computational units in the matrix operation vector, then the pixel-priority method is selected to prefetch the input feature data; if the remainder when the width parameter of the input feature map data is divided by the number of computational units in the matrix operation vector is greater than the remainder when the number of channels is divided by the number of computational units in the matrix operation vector, then the channel-priority method is selected to prefetch the input feature data.

[0051] like Figure 2 As shown in Figure a, the neural network data to be processed consists of four 16-channel pixels. Assuming that calculations show that prioritizing pixel-level data acquisition results in less invalid data, a pixel-level-first data acquisition scheduling mode is chosen. The neural network data to be processed is prefetched from the external storage medium in units of four pixels from the same neural network layer. Each pixel includes 16 channel feature values, meaning it is segmented along the channel dimension by different pixels. Multiple periods are used to prefetch consecutive pixels from the segmented data. Each pixel in the selected set shares the same batch weights, and matrix operations are performed based on these weights. Then, as... Figure 3 As shown in Figure a, since the data to be processed in the example contains four 16-channel pixels, and the data acquisition scheduling mode is selected with priority given to data pixels, the matrix operation array takes four channels of four pixels at a time in each cycle, corresponding to channel weights of size 4×4. Each sub-operation unit calculates one channel of four pixels at a time. Since there are four sets of sub-operation units in the matrix operation array, the multiplication and addition operations in the convolution operation of four 1-channel pixels are completed in parallel. The four sets of sub-pixel matrix operation arrays share weights. The above steps are repeated for four cycles, and finally the convolution result of four pixels is output in parallel.

[0052] like Figure 2As shown in b, the neural network data to be processed consists of 16 pixels with 16 channels each. Assuming that calculations show that prioritizing data acquisition by data channels results in less invalid data, a channel-priority data acquisition scheduling mode is chosen. The neural network data to be processed is prefetched from the external storage medium in one cycle, using all 16 channels of all neural network layers for a single pixel (the prefetching of the next pixel only begins after all channels of one pixel have been prefetched), i.e., segmented along the pixel dimension using a single pixel. Then, as... Figure 3 As shown in b, since the data to be processed in the example contains 16 pixels with 16 channels, the scheduling mode of prioritizing data acquisition by data channels is selected. In each cycle, the matrix operation array takes all 16 channels of one pixel at a time, corresponding to channel weights of size 1×16. At this time, the input data is also processed by four sets of sub-operation units. Since the channel dimension of the data is larger than the dimension of the matrix operation array, the 16 channels of the pixel are split into 4 groups for separate calculation. Each sub-operation unit calculates 4 channels, and the calculation results of the four sub-operation units are input to the auxiliary operation module for accumulation. Finally, the convolution result of one pixel is output serially in each cycle.

[0053] When performing multi-weighted operations, multiple computation vectors can be set up in each sub-operation unit, or the input feature data can be reused by multiple sub-operation units to complete the multi-weighted convolution operation task.

[0054] According to one example of the invention, appendix Figure 4 Figures a, 4b, and 4c respectively illustrate the internal cache structure and data transmission principle of the dual-scheduling cache module, as well as the internal cache structure and data transmission principle of the dual-scheduling cache module in dual-scheduling mode. These will be explained below.

[0055] As attached Figure 4 As shown in Figure a, the data prefetch dual-scheduling cache module includes multiple cache units, and the on-chip dual-scheduling main cache module includes multiple on-chip storage units. After receiving the neural network data to be processed from the external storage medium through the data transmission interface, the data splicer splices the neural network data to be processed cached in the data prefetch dual-scheduling cache module according to the scheduling mode corresponding to the accelerator and then transmits it to the on-chip dual-scheduling main cache module. Then, the on-chip dual-scheduling main cache module transmits the neural network data to be processed spliced ​​by the data splicer to the matrix operation array for calculation according to the scheduling mode corresponding to the accelerator.

[0056] As attached Figure 4In the data pixel-first acquisition scheduling mode shown in b, the data prefetch dual-scheduling cache module caches the data of one or more channels of multiple pixels in parallel. The data splicer splices the data of the same channel of each pixel cached by the data prefetch dual-scheduling cache module into a vector and transmits it in parallel to the on-chip dual-scheduling main cache module. The on-chip dual-scheduling main cache module transmits the spliced ​​vector of the data of the same channel of each pixel after splicing by the data splicer to each sub-operation unit in the matrix operation array for parallel calculation. Each sub-operation unit calculates the data of one pixel.

[0057] As attached Figure 4 In the data channel-priority data acquisition scheduling mode shown in c, the data prefetch dual-scheduling cache module caches the data of all channels of a single pixel in a serial manner. The data splicer splices the data of all channels corresponding to a single pixel into a vector and transmits it serially to the on-chip dual-scheduling main cache module. The dual-scheduling main cache module then serially transmits the spliced ​​vector of all channels corresponding to a single pixel to all sub-operation units in the matrix operation array for serial calculation.

[0058] According to one example of the invention, appendix Figure 5 The switching principle between the dual scheduling cache module 4 and the matrix operation array 5 is shown.

[0059] When inputting neural network data, in the data pixel-priority acquisition scheduling mode, the array switching module 6 controls the four pixels to be transmitted to their respective sub-operation units through the gating signal of the gating unit. Since the four pixels share weights, the gating unit copies the input single weight into four and transmits them to the sub-operation units in the matrix operation array 5 to complete the convolution operation of the four pixels. The operation results are then transmitted in parallel to the pooling unit 8 and the activation unit 9 to complete the subsequent pooling and activation operations (the subsequent pooling and activation are not shown in the figure). In the data channel-priority acquisition scheduling mode, the array switching module 6 divides all channels of a single pixel into four groups through the gating signal of the gating unit and transmits the data and weights to the sub-operation units in the matrix operation array 5 to complete the convolution operation. The intermediate results of the sub-operation units are transmitted to the auxiliary operation module 7 for addition operations (the subsequent transmission to the auxiliary operation module 7 is not shown in the figure), and the operation results are transmitted serially to the pooling unit 8 and the activation unit 9 to complete the subsequent pooling and activation operations (the subsequent pooling and activation are not shown in the figure). It should be noted that the control signal is used to change whether the data processed by the matrix operation array or auxiliary operation module is transmitted serially or in parallel.

[0060] When outputting data processing results, in the scheduling mode of prioritizing data acquisition by data pixels, since the data of 4 pixels is processed in parallel mode in each cycle, the array switching module 6 transmits the data processing results of the 4 pixels to the on-chip dual scheduling main cache module 43 in parallel through four paths via the gating unit (the data processing result refers to the data processing result generated by the subsequent pooling unit 8 and activation unit 9 after the operation of the 4 pixels output by the matrix operation array 5 is transmitted in parallel). In the scheduling mode of prioritizing data acquisition by data channels, since the data of a single pixel is processed in serial mode in each cycle, the array switching module 6 transmits the data processing result of the pixel to the on-chip dual scheduling main cache module 43 in serial mode via a single path via the gating unit (the data processing result refers to the data processing result generated by the subsequent pooling unit 8 and activation unit 9 after the operation of the single pixel output by the auxiliary operation module 7 is transmitted in serial mode to the subsequent pooling unit 8 and activation unit 9).

[0061] According to one example of the invention, appendix Figure 6 The switching principle between the dual-scheduling cache module and the matrix operation array 5 and the auxiliary operation module 7 is illustrated in two scheduling modes. In the data pixel-priority scheduling mode, the array switching module 6 transmits the results of the four sub-operation units in parallel through four paths to the subsequent pooling unit 8 and activation unit 9 via a control gating signal, and finally transmits the data processing result back to the on-chip dual-scheduling main cache module 43 in parallel through four paths. In the data channel-priority scheduling mode, the array switching module 6 transmits the intermediate operation results of the four sub-operation units serially to the auxiliary operation module 7 via a control gating signal. After addition operation by the auxiliary operation module 7, the results are transmitted to the subsequent pooling unit 8 and activation unit 9 via a single path, and finally transmit the data processing result back to the on-chip dual-scheduling main cache module 43 serially through a single path.

[0062] According to one example of the invention, appendix Figure 7 The diagram illustrates the principle of serial-to-parallel mode switching within the dual-scheduling cache module 4 under two scheduling modes. The array switching module 6 controls the switching of the data path between the data prefetch dual-scheduling cache module 41 and the on-chip dual-scheduling main cache module 43 in the dual-scheduling cache module 4 through a strobe signal. In the data pixel-priority scheduling mode, data is read and written in parallel and transmitted in parallel from the four on-chip storage units of the on-chip dual-scheduling main cache module 43 via four paths for subsequent operations. In the data channel-priority scheduling mode, data is read and written serially and transmitted serially from the four on-chip storage units of the on-chip dual-scheduling main cache module 43 via a data splicer 42 for subsequent operations via a single path.

[0063] According to an embodiment of the present invention, the present invention also provides a neural network data processing method based on the above-described dual-scheduling mode neural network accelerator, as shown in the appendix. Figure 8 As shown, the method includes: S1, selecting a scheduling mode based on the neural network data to be processed and generating corresponding processing instructions, wherein the scheduling modes include a scheduling mode that prioritizes data acquisition by data pixels and a scheduling mode that prioritizes data acquisition by data channels; S2, the dual scheduling cache module is used to cache the neural network data to be processed acquired from the outside according to the scheduling mode corresponding to the accelerator and to transmit the data to the matrix operation array according to the corresponding scheduling mode; S3, in the scheduling mode that prioritizes data acquisition by data pixels, controlling the matrix operation array to switch to parallel array mode to perform neural network data calculation and controlling the dual scheduling cache module to cache and transmit data in parallel; or in the scheduling mode that prioritizes data acquisition by data channels, controlling the matrix operation array to switch to serial array mode to perform neural network data calculation and controlling the dual scheduling cache module to cache and transmit data in serial mode.

[0064] The following example illustrates the data processing process of a single network layer using a dual-scheduling neural network accelerator. The specific data processing process includes five steps, T1-T5, which will be explained below.

[0065] In step T1, in offline mode, the external storage medium determines the scheduling mode based on the parameters of the neural network data to be processed and generates corresponding processing instructions. According to one embodiment of the present invention, when the neural network accelerator obtains the neural network data to be processed from the external storage medium in offline mode, it determines the corresponding scheduling mode based on its parameters and generates instructions corresponding to the scheduling mode. These instructions include, but are not limited to, accelerator activation, data retrieval method, scheduling mode switching, matrix operations, etc. Completing this step offline reduces the computation time of data in the neural network accelerator, thereby improving computational efficiency. It should be noted that the neural network data to be processed in the external storage medium can only be input into the neural network accelerator via the data transmission interface through the data prefetch dual-scheduling cache module.

[0066] In step T2, the accelerator inputs the neural network data, processing instructions, and weights to be processed from the external storage medium to the data prefetch dual-scheduling cache module in the dual-scheduling cache module via the data transmission interface. It should be noted that the neural network accelerator first prefetches and caches the data, instructions, and weights to be processed through the data prefetch dual-scheduling cache module in the dual-scheduling cache module.

[0067] In step T3, the instruction decoder loads processing instructions from the data prefetch dual-scheduling cache module and parses them to issue control instructions corresponding to different scheduling modes to the control unit. The control unit, based on the control instructions corresponding to different scheduling modes, controls the array switching module to switch the array mode of the matrix operation array and controls the dual-scheduling cache module to transmit data to the matrix operation array in a specific manner. According to one embodiment of the present invention, the data prefetch dual-scheduling cache module transmits data serially or in parallel to the on-chip dual-scheduling main cache module in different ways corresponding to the dual-scheduling modes.

[0068] In step T4, the control unit controls the dual scheduling cache module to transmit the neural network data and weights to be processed to the matrix operation array and auxiliary operation module for operation according to the scheduling mode, and performs pooling and activation of the operation results to generate data processing results, and transmits the data processing results to the dual scheduling cache module for caching.

[0069] In step T5, the data processing result is transmitted from the on-chip dual-scheduling main cache module to the data prefetch dual-scheduling cache module, and then output to the external storage medium via the data transmission interface. It should be noted that the data processing result can only be transmitted to the external storage medium via the data transmission interface through the data prefetch dual-scheduling cache module.

[0070] Compared with the prior art, the advantages of the present invention are as follows:

[0071] 1. A hardware structure compatible with two-dimensional scheduling was established for the neural network accelerator. Based on the characteristics of the network layer parameters of the neural network data, a better scheduling method was selected to improve the deployment rate of neural network data in the accelerator's computing array.

[0072] 2. A method for selecting scheduling modes based on the parameters of neural network data network layers is proposed to improve the hardware utilization of neural network accelerators.

[0073] It should be noted that although the steps are described in a specific order above, it does not mean that the steps must be executed in the above specific order. In fact, some of these steps can be executed concurrently, or even in a different order, as long as the required function can be achieved.

[0074] This invention can be a system, method, and / or computer program product. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the invention.

[0075] Computer-readable storage media can be tangible devices that hold and store instructions for use by an instruction execution device. Computer-readable storage media can be, for example, including but not limited to, electrical storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination thereof. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital multifunction disc (DVD), memory sticks, floppy disks, mechanical encoding devices, such as punch cards or recessed protrusions storing instructions thereon, and any suitable combination thereof.

[0076] The various embodiments of the present invention have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical application, or technical improvements to the embodiments in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.

Claims

1. A dual-scheduling-mode neural network accelerator, wherein, The dual scheduling mode includes a scheduling mode that prioritizes data acquisition based on data pixels and a scheduling mode that prioritizes data acquisition based on data channels. The accelerator includes a matrix operation array, a pooling unit, and an activation unit. The matrix operation array includes multiple sub-operation units composed of multipliers and adders. The accelerator further includes an array switching module, a dual scheduling cache module, and an auxiliary operation module. The array switching module includes multiple data gating units, which are respectively connected to each sub-operation unit, the dual scheduling cache module, and the auxiliary operation module in the matrix operation array. The data gating units output switching control signals according to preset rules based on the accelerator's corresponding scheduling mode to control the connection method between the sub-operation units in the matrix operation array. This enables the switching of the matrix operation array's array mode, control of the dual scheduling cache module's data caching and transmission methods, and control of the auxiliary operation module to perform auxiliary operations. Specifically, in the scheduling mode prioritizing data acquisition by data pixels, the matrix operation array is controlled to switch to a parallel array mode for neural network data computation, and the dual scheduling cache module is controlled to cache and transmit data in parallel. In the scheduling mode prioritizing data acquisition by data channels, the matrix operation array is controlled to switch to a serial array mode for neural network data computation, and the dual scheduling cache module is controlled to cache and transmit data serially. The dual-scheduling cache module is used to cache neural network data to be processed from external storage medium according to the scheduling mode corresponding to the accelerator, and to transmit the data to the matrix operation array according to the corresponding scheduling mode. The auxiliary operation module is connected to the matrix operation array and the array switching module and includes at least one adder, which is used to perform addition calculations on the results of the matrix operation array after the operation is performed in the serial array mode based on the control of the array switching module. The scheduling mode of acquiring data based on data pixels refers to the scheduling mode of acquiring data in units of data from one or more channels of multiple pixels; the scheduling mode of acquiring data based on data channels refers to the scheduling mode of acquiring data in units of data from all channels of a single pixel.

2. The accelerator of claim 1, wherein The preset rule is as follows: calculate the amount of invalid data generated when acquiring the neural network data to be processed with priority of data pixels and priority of data channels respectively, and use the method with the smaller amount of invalid data in the calculation result as the scheduling mode of the accelerator.

3. The accelerator according to claim 2, characterized in that, In the parallel array mode, all sub-operation units in the matrix operation array perform neural network data calculations in parallel; in the serial array mode, all sub-operation units in the matrix operation array perform neural network data calculations in serial mode.

4. The accelerator of claim 3, wherein The dual-scheduling cache module further includes a data prefetch dual-scheduling cache module, a data splicer, and an on-chip dual-scheduling main cache module, wherein: The data prefetch dual-scheduling cache module includes multiple cache units, which are connected to the external storage medium and are used to cache the neural network data to be processed obtained from the external storage medium according to the scheduling mode corresponding to the accelerator and to transmit the data processing results to the external storage medium. The data splicer is connected to the data prefetch dual-scheduling cache module and the on-chip dual-scheduling main cache module. It is used to splice the neural network data to be processed cached in the data prefetch dual-scheduling cache module according to the scheduling mode corresponding to the accelerator and then transmit it to the on-chip dual-scheduling main cache module. The on-chip dual-scheduling main cache module includes multiple on-chip storage units, which are connected to a data splicer, a matrix operation array, and an array switching module. Under the control of the control signal of the array switching module, the module transmits the neural network data to be processed after being spliced ​​by the data splicer to the matrix operation array for calculation according to the scheduling mode corresponding to the accelerator, receives the calculation results, and transmits the results to the data prefetch dual-scheduling cache module via the data splicer.

5. The accelerator according to claim 4, characterized in that: The data prefetch dual-scheduling cache module caches data from one or more channels of multiple pixels in parallel under the data pixel-first data acquisition scheduling mode; and caches data from all channels of a single pixel in serial mode under the data channel-first data acquisition scheduling mode. In the data pixel-first data acquisition scheduling mode, the data splicer splices the data of the same channel at each pixel point cached by the data prefetch dual scheduling cache module into a vector and transmits it in parallel to the on-chip dual scheduling main cache module; in the data channel-first data acquisition scheduling mode, it splices the data of all channels corresponding to a single pixel point into a vector and transmits it serially to the on-chip dual scheduling main cache module. In the data pixel-priority acquisition scheduling mode, the on-chip dual-scheduling main cache module transmits the data splicing vector of the same channel of each pixel after being spliced ​​by the data splicer to each sub-operation unit in the matrix operation array for parallel computation, and each sub-operation unit calculates the data of one pixel; in the data channel-priority acquisition scheduling mode, the data splicing vector of all channels corresponding to a single pixel after being spliced ​​by the data splicer is transmitted serially to all sub-operation units in the matrix operation array for serial computation.

6. The accelerator of claim 5, wherein, The array switching module includes: The input connection switching module includes multiple data gating units, each of which is connected to an on-chip storage unit of the on-chip dual-scheduling main cache module. It is used to control the on-chip dual-scheduling main cache module to transmit the neural network data to be processed to the matrix operation array according to the scheduling mode corresponding to the accelerator. The output connection switching module includes multiple data gating units, which are used to control the on-chip dual-scheduling main cache module to receive data processing results according to the accelerator's scheduling mode.

7. The accelerator of claim 6, wherein The accelerator also includes a data transmission interface, an instruction decoder, and a control module, wherein: The data transmission interface includes a data transmission logic device unit and is connected to an external storage medium, an instruction decoder, and a dual-scheduling cache module. It is used to realize the transmission of neural network data to be processed and data processing results from the external storage medium to the dual-scheduling cache module. The instruction decoder includes a data decoding unit and a logic operation unit, and is connected to a data transmission interface and a dual scheduling cache module, for decoding the instructions corresponding to the neural network data to be processed; The control unit is connected to all other units in the accelerator and is used to control the operation of each module.

8. A neural network data processing method for an accelerator as described in any one of claims 1-7, characterized in that, The method includes: S1. Select a scheduling mode based on the neural network data to be processed and generate corresponding processing instructions, wherein the scheduling mode includes a scheduling mode that prioritizes data acquisition based on data pixels and a scheduling mode that prioritizes data acquisition based on data channels. S2. The dual-scheduling cache module is used to cache neural network data to be processed from the outside according to the scheduling mode corresponding to the accelerator and to transmit the data to the matrix operation array according to the corresponding scheduling mode. S3. In the scheduling mode of prioritizing data acquisition by data pixels, control the matrix operation array to switch to parallel array mode for neural network data calculation and control the dual scheduling cache module to cache and transmit data in parallel; or in the scheduling mode of prioritizing data acquisition by data channels, control the matrix operation array to switch to serial array mode for neural network data calculation and control the dual scheduling cache module to cache and transmit data in serial mode.

9. The method of claim 8, wherein, In step S1, the scheduling mode is selected as follows: the amount of invalid data generated when acquiring the current neural network data to be processed is calculated separately when data pixels are prioritized and when data channels are prioritized, and the mode with the smaller amount of invalid data in the calculation results is selected as the scheduling mode of the accelerator.

10. A computer-readable storage medium, characterized in that, It stores a computer program that can be executed by a processor to implement the steps of the method as described in any one of claims 8-9.

11. An electronic device, comprising: include: One or more processors; A storage device for storing one or more programs, which, when executed by one or more processors, cause the electronic device to perform the steps of the method as described in any one of claims 8-9.