Neural network device and method of operating the same
By employing on-chip buffer memory and computing circuitry in the neural network device, and optimizing memory address allocation and operation sequence, the problem of high energy consumption in low-power systems of neural network devices is solved, achieving more efficient energy utilization and processing performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SAMSUNG ELECTRONICS CO LTD
- Filing Date
- 2020-06-28
- Publication Date
- 2026-06-12
AI Technical Summary
Existing neural network devices require frequent memory accesses when processing large amounts of computation, resulting in excessive energy consumption, especially in low-power systems such as mobile devices or IoT devices where resources are limited. It is necessary to reduce the energy consumption required to process large amounts of data.
It employs on-chip buffer memory and computing circuitry to receive and store input feature maps through a single port, and prioritizes write operations in each cycle. Combined with the controller to manage memory address allocation, it reduces access to external memory.
By reducing access to external memory, the power consumption of the neural network device is reduced, while processing efficiency and system performance are improved.
Smart Images

Figure CN113033790B_ABST
Abstract
Description
[0001] This application claims the benefit of Korean Patent Application No. 10-2019-0162910, filed on December 9, 2019, with the Korean Intellectual Property Office, the entire disclosure of which is incorporated herein by reference for all purposes. Technical Field
[0002] The following description relates to neural network devices and methods of operating neural network devices. Background Technology
[0003] Neural network devices are computing systems based on computational architectures. Neural network technology can analyze input data and extract useful information from it.
[0004] Neural network devices typically require computation on large amounts of complex input data. To enable a typical neural network device to handle such computations, it often involves reading or writing large amounts of data from or to memory for processing, resulting in significant energy consumption due to frequent memory accesses. Low-power and high-performance systems, such as mobile devices or Internet of Things (IoT) devices, typically have limited resources and therefore require techniques to reduce the energy consumption needed to process large amounts of data. Summary of the Invention
[0005] This summary is provided to introduce, in a simplified form, the selection of concepts further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to help determine the scope of the claimed subject matter.
[0006] In one general aspect, a neural network device includes an on-chip buffer memory, computing circuitry, and a controller. The on-chip buffer memory is configured to store input feature maps of a first layer of the neural network. The computing circuitry is configured to receive the input feature maps of the first layer through a single port of the on-chip buffer memory and perform neural network operations on the input feature maps of the first layer to output an output feature map of the first layer corresponding to the input feature maps of the first layer. The controller is configured to send the output feature map of the first layer to the on-chip buffer memory through the single port to synchronize the output feature map of the first layer with the input feature maps of the first layer. Figure 1 The output feature map of the first layer is stored in the on-chip buffer memory. The output feature map of the first layer is reused as the input feature map for the neural network operation of the subsequent second layer.
[0007] The computing circuitry can also be configured to perform neural network operations based on one or more computational loops. The controller can also be configured to: in each cycle, perform a read operation to read at least a portion of the data constituting the input feature map of the first layer from the on-chip buffer memory via the single port, with each of the one or more computational loops being executed in each cycle. When a write operation is requested to write at least a portion of the data constituting the output feature map of the first layer to the on-chip buffer memory via the single port at the time when the read operation would be executed, the write operation can be performed prior to the read operation.
[0008] The controller can allocate a first memory address for storing the input feature map of the first layer and a second memory address for storing the output feature map of the first layer in different directions.
[0009] The controller can allocate a first memory address from the starting point of the memory address corresponding to the storage space of the on-chip buffer memory along a first direction, and allocate a second memory address from the ending point of the memory address corresponding to the storage space of the on-chip buffer memory along a second direction opposite to the first direction.
[0010] When the output feature map of the first layer stored in the second memory address is reused as the input feature map of the second layer, and when the output feature map of the second layer corresponding to the input feature map of the second layer is output from the computing circuit, the controller may allocate a third memory address from the starting point along the first direction for storing the output feature map of the second layer in the on-chip buffer memory.
[0011] Neural network operations may include convolution, activation, and pooling operations. The computational circuit may also be configured to output the results of pooling, convolution, and activation operations performed on the input feature map of the first layer as the output feature map of the first layer.
[0012] The neural network device may further include a weight buffer memory configured to store the weight values of the first layer for neural network operations on the input feature maps of the first layer. The weight buffer memory can receive the weight values of the first layer from an external memory outside the neural network device through a single port of the weight buffer memory, and can also send the weights of the first layer to the computation circuitry through the same single port.
[0013] On-chip buffer memory, computing circuitry, and controllers can be installed on a single chip.
[0014] The neural network device may also include an auxiliary buffer memory. When the output feature map of the second layer, corresponding to the input feature map of the second layer, is output from the computing circuit, the controller can determine whether the total size of the input feature map and the output feature map of the second layer exceeds the size of the on-chip buffer memory. If the total size exceeds the size of the on-chip buffer memory, the controller can temporarily store the output feature map of the second layer in the auxiliary buffer memory instead of the on-chip buffer memory. The output feature map of the second layer temporarily stored in the auxiliary buffer memory can be sent to external memory outside the neural network device at preset intervals.
[0015] When the output feature map of the second layer is reused as the input feature map for the subsequent neural network operation of the third layer, the controller can determine whether the size of the output feature map of the third layer exceeds the size of the on-chip buffer memory when the output feature map of the third layer corresponding to the input feature map of the third layer is output from the computing circuit. If the size of the output feature map of the third layer exceeds the size of the on-chip buffer memory, the controller can temporarily store the output feature map of the third layer in the auxiliary buffer memory. If the size of the output feature map of the third layer is less than or equal to the size of the on-chip buffer memory, the controller can store the output feature map of the third layer in the on-chip buffer memory.
[0016] In another general aspect, a method of operating a neural network device includes: storing an input feature map of a first layer of the neural network in an on-chip buffer memory; sending the input feature map of the first layer to a computing circuit via a single port of the on-chip buffer memory; when the computing circuit performs neural network operations on the input feature map of the first layer, outputting an output feature map of the first layer corresponding to the input feature map of the first layer; and sending the output feature map of the first layer to the on-chip buffer memory via the single port, thereby connecting the output feature map of the first layer with the input feature map of the first layer. Figure 1 The output feature map of the first layer is stored in the on-chip buffer memory. The output feature map of the first layer is reused as the input feature map for the neural network operation of the subsequent second layer.
[0017] The method may further include: performing a read operation in each cycle to read at least a portion of the data constituting the input feature map of the first layer from the on-chip buffer memory via the single port, to perform neural network operations based on the one or more operation loops, each of the one or more operation loops being executed in each cycle. When a write operation is requested to write at least a portion of the data constituting the output feature map of the first layer to the on-chip buffer memory via the single port at the time when the read operation would be executed, the write operation is performed prior to the read operation.
[0018] The output feature map of the first layer and the input feature map of the first layer Figure 1The step of storing the feature maps in the on-chip buffer memory may include: allocating a first memory address for storing the input feature maps of the first layer and a second memory address for storing the output feature maps of the first layer in different directions.
[0019] The allocation step may include: allocating a first memory address for storing the input feature map of the first layer from the starting point of the memory address corresponding to the storage space of the on-chip buffer memory along a first direction, and allocating a second memory address for storing the output feature map of the first layer from the ending point of the memory address corresponding to the storage space of the on-chip buffer memory along a second direction opposite to the first direction.
[0020] When the output feature map of the first layer stored in the second memory address is reused as the input feature map of the second layer, when the output feature map of the second layer corresponding to the input feature map of the second layer is output from the computing circuit, a third memory address for storing the output feature map of the second layer in the on-chip buffer memory can be allocated from the starting point along the first direction.
[0021] Neural network operations may include convolution, activation, and pooling operations. The output steps may include: outputting the results of performing pooling, convolution, and activation operations on the input feature map of the first layer as the output feature map of the first layer.
[0022] When the weight values of the first layer are sent to the weight buffer memory from an external memory outside the neural network device through a single port of the weight buffer memory, the weight values of the first layer used for neural network operations can be stored in the weight buffer memory; and the weight values of the first layer are sent from the weight buffer memory to the computing circuit through a single port of the weight buffer memory.
[0023] When the output feature map of the second layer, corresponding to the input feature map of the second layer, is output from the computing circuit, it can be determined whether the total size of the input feature map and the output feature map of the second layer exceeds the size of the on-chip buffer memory. If it is determined that the total size exceeds the size of the on-chip buffer memory, the output feature map of the second layer can be temporarily stored in an auxiliary buffer memory instead of the on-chip buffer memory. The output feature map of the second layer temporarily stored in the auxiliary buffer memory can be sent to an external memory outside the neural network device based on a preset period.
[0024] When the output feature map of the second layer is reused as the input feature map for the subsequent neural network operation of the third layer, when the output feature map of the third layer corresponding to the input feature map of the third layer is output from the computing circuit, it can be determined whether the size of the output feature map of the third layer exceeds the size of the on-chip buffer memory. If the size of the output feature map of the third layer exceeds the size of the on-chip buffer memory, the output feature map of the third layer can be temporarily stored in the auxiliary buffer memory; conversely, if the size of the output feature map of the third layer is less than or equal to the size of the on-chip buffer memory, the output feature map of the third layer can be stored in the on-chip buffer memory.
[0025] A non-transitory computer-readable recording medium storing instructions that, when executed by a processor, enable the processor to control the execution of the above methods.
[0026] Other features and aspects will become clear from the following detailed description, the accompanying drawings, and the claims. Attached Figure Description
[0027] Figure 1 This is a diagram illustrating an example relationship between the input feature map and the output feature map in a neural network.
[0028] Figure 2 This is a diagram illustrating an example neural network architecture.
[0029] Figure 3 This is a block diagram of an example neural network device.
[0030] Figure 4 This is a diagram illustrating an example memory address allocation scheme for storing feature maps in on-chip buffer memory.
[0031] Figure 5 This is a block diagram of an example neural network device.
[0032] Figure 6 This is a diagram illustrating an example movement path of a feature map of a neural network device.
[0033] Figure 7 This is an example computation loop that is executed to perform neural network operations.
[0034] Figure 8 This is a block diagram of an example electronic system.
[0035] Figure 9 This is a flowchart illustrating an example operation method of a neural network device.
[0036] Figure 10 This is a flowchart illustrating an example operation method of a neural network device.
[0037] Throughout the accompanying drawings and detailed embodiments, the same reference numerals denote the same elements. The drawings may not be to scale, and for clarity, illustration, and convenience, the relative dimensions, scale, and depiction of elements in the drawings may be exaggerated. Detailed Implementation
[0038] The following detailed embodiments are provided to aid the reader in gaining a comprehensive understanding of the methods, apparatus, and / or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatus, and / or systems described herein will become apparent upon understanding this disclosure. For example, the order of operations described herein is merely illustrative and is not limited to the order set forth herein; rather, the order of operations may be altered, as will become clear upon understanding this disclosure, except for operations that must occur in a specific order. Furthermore, for clarity and conciseness, descriptions of features known upon understanding this disclosure may be omitted.
[0039] The features described herein may be implemented in different forms and are not to be construed as limited to the examples described herein. Rather, the examples described herein have been provided merely to illustrate some of the many feasible ways of implementing the methods, apparatus, and / or systems described herein that will be clear upon understanding the disclosure of this application.
[0040] As used herein, the term “and / or” includes any one of the associated listed items and any combination of any two or more.
[0041] Although terms such as “first,” “second,” and “third” may be used herein to describe various components, assemblies, regions, layers, or parts, these components, assemblies, regions, layers, or parts are not limited by these terms. Rather, these terms are used only to distinguish one component, assembly, region, layer, or part from another. Thus, without departing from the teaching of the examples described herein, the first component, first assembly, first region, first layer, or first part referred to as the first component, first assembly, first region, first layer, or first part may also be referred to as the second component, second assembly, second region, second layer, or second part.
[0042] The terminology used herein is for the purpose of describing various examples only and is not intended to limit disclosure. Unless the context clearly indicates otherwise, the singular form is intended to include the plural form as well. The terms “comprising,” “including,” and “having” indicate the presence of the features, quantities, operations, components, elements, and / or combinations thereof stated therein, but do not preclude the presence or addition of one or more other features, quantities, operations, components, elements, and / or combinations thereof.
[0043] As will become clear upon understanding the disclosure of this application, the features of the examples described herein can be combined in various ways. Furthermore, although the examples described herein have various configurations, other configurations are possible, as will become clear upon understanding the disclosure of this application.
[0044] Figure 1 This is a diagram illustrating an example relationship between the input feature map and the output feature map in a neural network.
[0045] The neural network can be a deep neural network (DNN) or an n-layer neural network. A DNN or an n-layer neural network can be, for example, a convolutional neural network (CNN), a recurrent neural network (RNN), a deep belief network, or a restricted Boltzmann machine.
[0046] exist Figure 1 In a neural network layer, a first feature map FM1 may correspond to an input feature map, and a second feature map FM2 may correspond to an output feature map. The feature maps may represent a dataset expressing various features of the input data. Feature maps FM1 and FM2 may have elements of a two-dimensional matrix or elements of a three-dimensional matrix, and pixel values may be defined in each element. Each feature map in FM1 and FM2 has a width W (or referred to as a column), a height H (or referred to as a row), and a depth D. The depth D may correspond to the number of nodes. It should be noted that the term "may" (e.g., what an example or embodiment may include or implement) is used with respect to examples or embodiments to indicate that at least one example or embodiment includes or implements such a feature, while all examples and embodiments are not limited thereto.
[0047] A convolution operation can be performed on a first feature map FM1 and a weight map WM, and as a result, a second feature map FM2 can be generated. The weight map WM filters the features of the first feature map FM1 by performing a convolution operation with the first feature map FM1 using weight parameters defined in each element of the weight map WM filter. The weight map WM is used to perform a convolution operation with a window (or tile) of the first feature map FM1 when shifting the first feature map FM1 in a sliding window manner. During each shift, each weight parameter included in the weight map WM can be multiplied with each pixel value in the overlapping window of the first feature map FM1, and the multiplication results are summed. As the first feature map FM1 and the weight map WM are convolved, a node of the second feature map FM2 can be generated. Although in Figure 1 Only one weight map WM is shown, but multiple weight maps can be convolved with the first feature map FM1 to generate a second feature map FM2 with multiple nodes.
[0048] The second feature map FM2 can correspond to the input feature map of the next layer. For example, the second feature map FM2 can be the input feature map of a pooling (or subsampling) layer.
[0049] Figure 2 This is a diagram illustrating an example neural network architecture.
[0050] exist Figure 2 In this example, neural network 2 has a structure including an input layer, hidden layers, and an output layer. Neural network 2 can perform operations based on received input data (e.g., I1 and I2) and can generate output data (e.g., O1 and O2) based on the results of the operations.
[0051] As mentioned above, neural network 2 can be an n-layer neural network or a DNN comprising two or more hidden layers. For example, such as Figure 2 As shown, neural network 2 can be a DNN comprising an input layer (layer 1), two hidden layers (layers 2 and 3), and an output layer (layer 4). When neural network 2 is implemented using a DNN architecture, it includes more layers capable of processing effective information, thus enabling it to handle datasets of higher complexity than a neural network with a single layer. Neural network 2 is shown as comprising four layers, but this is merely an example, and neural network 2 may include fewer or more layers, or fewer or more nodes. That is, neural network 2 may include... Figure 2 The diagram shows layers with different structures.
[0052] Each layer in the neural network 2 may include multiple nodes. Each node may correspond to multiple artificial nodes (known as neurons, PEs, units, or similar terms). For example, as Figure 2 As shown, the input layer 1 may include two nodes (channels), and the hidden layers 2 and 3 may each include three channels CH1, CH2, and CH3. However, this is just an example, and each layer included in the neural network 2 may include a variety of numbers of nodes (channels).
[0053] The nodes in each layer of neural network 2 can connect to each other to process data. For example, a node can receive data from other nodes, perform calculations on the data, and output the results to other nodes.
[0054] The input and output of each node can be referred to as input activation and output activation, respectively. That is, activation can be the output of a node, or it can be parameters corresponding to the inputs of nodes included in the next layer. Each node can determine its own activation based on the activations, weights, and biases received from nodes included in the previous layer. Weights are parameters used to compute the output activation in each node and can be values assigned through iterative training based on the connections between nodes.
[0055] Each node can be processed as a computing unit or processing element, or by a computing unit or processing element. The computing unit or processing element receives input and outputs activation values, so the input-output of each node can be mapped. For example, when σ is the activation function... It is the weight from the k-th node included in layer (i-1) to the j-th node included in layer i. It is the bias of the j-th node included in the i-th layer, and When the j-th node in the i-th layer is activated, the activation occurs. The following equation 1 can be used to calculate it.
[0056] Equation 1:
[0057]
[0058] like Figure 2 As shown, the activation of the first node in the second layer (i.e., hidden layer 2) can be determined by... This indicates that, according to Equation 1, It can have The value of . However, Equation 1 described above is merely an example used to describe the activations, weights, and biases used to process data in a neural network 2, and is not limited thereto. Activation can be a value obtained by passing a weighted sum of activations received from the previous layer to an activation function (such as a sigmoid function or a rectified linear unit (ReLU) function).
[0059] As mentioned above, in neural network 2, large datasets are exchanged between multiple interconnected nodes and undergo multiple computational processes across multiple layers. Therefore, a technique is needed that can reduce the power consumption required to read or write large amounts of data for multiple computational processes.
[0060] For ease of description, Figure 1 and Figure 2 Only a schematic architecture of neural network 2 is shown. However, compared to... Figure 1 and Figure 2 Unlike the architecture shown, Neural Network 2 can be implemented with more or fewer layers, feature maps, weight maps, etc., and its size has also been modified in various other examples.
[0061] Figure 3 This is a block diagram of an example neural network device 300.
[0062] exist Figure 3The neural network device 300 may include an on-chip buffer memory 310, computing circuitry 320, and a controller 330. The neural network device 300 can send data to and receive data from an external memory 390 located outside the neural network device 300. The neural network device 300 is an on-chip device, and the components within the neural network device 300 can be mounted on a single chip.
[0063] The neural network device 300 can be a hardware accelerator designed to execute neural networks. The neural network device 300 can be used to improve the processing speed of electronic systems that include the neural network device 300 as an accelerator.
[0064] exist Figure 3 In the neural network device 300 shown, besides Figure 3 In addition to the components shown, the neural network device 300 may also include other general-purpose components.
[0065] On-chip buffer memory 310 refers to memory disposed in the chip corresponding to the neural network device 300. (For example, compared to a typical neural network device) On-chip buffer memory 310 can store input feature maps and output feature maps. Figure 1 This reduces the movement of feature maps to the neural network device 300. Therefore, in such a typical example, the access to external memory 390 required to read / write input or output feature maps can be reduced.
[0066] The on-chip buffer memory 310 can be configured as a memory address space. A memory address space is a space that defines a range of memory addresses used to store data, and represents a space where feature maps allocated memory addresses can be stored. For example, when the range of memory addresses corresponding to the memory address space of the on-chip buffer memory 310 is 0x0000 to 0xFFFF, an input feature map or output feature map is allocated a memory address corresponding to at least a portion of 0x0000 to 0xFFFF, and therefore can be stored in the on-chip buffer memory 310. Typical memories configured in a typical neural network device include multiple memory address spaces. The memory address space storing the input feature map is separate from the memory address space storing the output feature map. In contrast, the on-chip buffer memory 310 according to one or more embodiments can store both the input feature map and the output feature map. Figure 1 It is stored in a memory address space.
[0067] Compared to typical neural network devices, the on-chip buffer memory 310 can store both the input feature map and the output feature map. Figure 1Since the input and output feature maps are stored in a single memory address space, the capacity of the on-chip buffer memory 310 can be reduced, thereby reducing the area occupied by the neural network device 300. In a typical memory where the input and output feature maps are stored in separate buffer memories or in separate memory address spaces within a single buffer memory, space is required to accommodate the sum of the maximum size of the input feature map and the maximum size of the output feature map.
[0068] For example, when the sizes of the input and output feature maps of the first layer are 2 megabytes (MB) and 6 MB, respectively, and the sizes of the input and output feature maps of the second layer are 7 MB and 1 MB, respectively, a typical buffer memory must have a capacity capable of holding at least 13 MB of data. In contrast, as a non-limiting example, in the above example, the on-chip buffer memory 310 according to the embodiment may only have a capacity capable of holding 8 MB of data. Further reference will be made to... Figure 4 Description of the input feature map and output feature map Figure 1 The memory address allocation method is used to store the memory address in an address space of the on-chip buffer memory 310.
[0069] The on-chip buffer memory 310 uses a single port. The use of a single port in the on-chip buffer memory 310 means that the port used for reading the feature map and the port used for writing the feature map are the same. The on-chip buffer memory 310 can move feature maps stored in a memory address space through a single port connected to a memory address space. Compared to a typical buffer memory with the same capacity using two ports, the on-chip buffer memory 310 using a single port can have half the area and power consumption. In other words, when the area of the on-chip buffer memory 310 using a single port is the same as that of a typical buffer memory using two ports, the on-chip buffer memory 310 using a single port can have twice the capacity of a typical buffer memory using two ports.
[0070] The width of a single port of the on-chip buffer memory 310 can be determined based on the storage cells of the feature map. The width of a single port can be determined by TN words. TN is the degree of parallelism of the feature map. Parallelism is a variable representing the units of features that constitute the feature map and can be processed in a single operation. A word refers to the number of bits required to represent the data corresponding to a feature. For example, the number of bits corresponding to a word can be determined based on the decimal format of the data. The decimal format can be selected as floating-point or fixed-point format in various examples of the neural network device 300.
[0071] In one embodiment, the on-chip buffer memory 310 may store input feature maps included in a first layer of a neural network. The first layer is not limited to the first layer in the neural network, but rather refers to any layer included in a neural network, and is a term used to distinguish one layer from the other layers.
[0072] The computing circuit 320 may be a hardware configuration for outputting an output feature map corresponding to the input feature map by performing neural network operations on the input feature map. For example, the computing circuit 320 may receive the input feature map of the first layer through a single port of the on-chip buffer memory 310, and perform neural network operations on the input feature map to output the output feature map of the first layer.
[0073] In one example, neural network operations may include convolution, activation, and pooling operations. Computational circuitry 320 may perform activation operations on the result of convolution on the input feature map, perform pooling operations on the result of the activation operations, and output the result of the pooling operations as an output feature map. In one example, all three operations of computational circuitry 320 may be executed on a single chip without accessing external memory. In addition to the operations described above, computational circuitry 320 may include various operations used for neural network operations (such as batch normalization).
[0074] The controller 330 can function as the overall controller of the neural network device 300. For example, the controller 330 can control the operation of the on-chip buffer memory 310. The controller 330 can be implemented as an array of logic gates, or it can be implemented as a combination of a general-purpose microprocessor and a memory storing instructions that can be executed in the microprocessor.
[0075] The controller 330 can send the input feature map stored in the on-chip buffer memory 310 to the computing circuit 320 through a single port of the on-chip buffer memory 310. The controller 330 can also send the output feature map to the on-chip buffer memory 310 through a single port of the on-chip buffer memory 310. The output feature map is the output result of neural network operation on the input feature map.
[0076] Controller 330 can input feature map and output feature map Figure 1 The first layer's output feature map is stored in the on-chip buffer memory 310. For example, the controller 330 can send the first layer's output feature map to the on-chip buffer memory 310 via a single port of the on-chip buffer memory 310, and also store the first layer's output feature map and the first layer's input feature map in the on-chip buffer memory 310. Figure 1The output feature map of the first layer is stored in the on-chip buffer memory 310. The output feature map of the first layer can be reused as the input feature map for the neural network operation of the second layer, which is the layer following the first layer.
[0077] For example, compared to a typical neural network device, when the feature map is reused using the on-chip buffer memory 310, access to the external memory 390 for writing and reading the feature map can be excluded because the feature map is only moved and stored in the neural network device 300 with an on-chip structure.
[0078] Figure 4 This is a diagram illustrating an example memory address allocation scheme for storing feature maps in on-chip buffer memory.
[0079] In order to combine the input feature map and the output feature map Figure 1 It is stored in a memory address space, neural network device (e.g., Figure 3 The neural network device 300 employs a method of adjusting the points and directions used to allocate memory addresses for feature maps to on-chip buffer memory.
[0080] Neural network devices can allocate memory addresses for storing input and output feature maps in a specific layer in opposite directions. For example, while the memory address of either the input or output feature map is allocated starting from the beginning of the memory address (memory address: 0) and moving towards the end of the memory address (memory address: maximum), the memory address of the other feature map in the input and output feature maps can be allocated starting from the end of the memory address and moving towards the beginning of the memory address.
[0081] exist Figure 4 In the example, the first memory address 410 of the on-chip buffer memory used to store the input feature map (IFM) of the first layer can be allocated starting from the beginning of the memory address and in the direction toward the end of the memory address, and the second memory address 420 of the on-chip buffer memory used to store the output feature map (OFM) of the first layer can be allocated starting from the end of the memory address and in the direction toward the beginning of the memory address.
[0082] When the output feature map of the first layer stored in the second memory address 420 is reused as the input feature map of the second layer, the output feature map of the second layer corresponding to the input feature map of the second layer can be output from the computing circuit. The third memory address 430 of the on-chip buffer memory used to store the output feature map of the second layer can be allocated starting from the beginning of the memory address and allocated in the direction towards the end of the memory address.
[0083] Because according to Figure 4 The memory address allocation scheme allows neural network devices to combine input feature maps and output feature maps. Figure 1 This allows for storage while reducing overlap between input and output feature maps within a single memory address space. Neural network devices can reduce access to external memory by storing both input and output feature maps in on-chip buffer memory. Both the storage and movement of feature maps can be performed within a neural network device with an on-chip architecture.
[0084] As mentioned above, neural network devices can move and store feature maps internally within a neural network device with an on-chip architecture, thereby reducing the power consumption caused by accessing external memory. However, when it is difficult to combine the input and output feature maps based on their total size... Figure 1 When stored in on-chip buffer memory, the neural network device can operate in a compatibility mode. In the following text, reference will be made to... Figure 5 Further description regardless Figure 4 Instead of using a memory address allocation scheme, it can avoid separating the input feature map and the output feature map. Figure 1 This refers to the situation where the data is stored in on-chip buffer memory.
[0085] Figure 5 This is a block diagram of an example neural network device 500.
[0086] exist Figure 5 The neural network device 500 may include an on-chip buffer memory 510, computing circuitry 520, a controller 530, and an auxiliary buffer memory 550. The neural network device 500 can send data to and receive data from an external memory 590 located outside the neural network device 500. The neural network device 500 is an on-chip device, and the components of the neural network device 500 can be mounted on a single chip. Figure 5 The on-chip buffer memory 510, computing circuit 520, and controller 530 can respectively correspond to Figure 3 The on-chip buffer memory 310, computing circuitry 320, and controller 330 are included. Therefore, redundant descriptions are omitted.
[0087] The controller 530 can determine whether the total size of the input feature map and the output feature map exceeds the size of the on-chip buffer memory 510. If the total size of the input feature map and the output feature map exceeds the size of the on-chip buffer memory 510, the neural network device 500 may not store the input feature map and the output feature map in memory. Figure 1 When the feature maps are stored in the on-chip buffer memory 510, the neural network device 500 can be selected to operate in a compatibility mode. In compatibility mode, the neural network device 500 can use an auxiliary buffer memory 550 other than the on-chip buffer memory 510 to store the feature maps.
[0088] The auxiliary buffer memory 550 can be selectively operated only in compatibility mode. When the output feature map output from the computing circuit 520 may not be stored in the on-chip buffer memory 510, the auxiliary buffer memory 550 can temporarily hold the output feature map. The output feature map temporarily held in the auxiliary buffer memory 550 can be sent to the external memory 590 according to a preset cycle. The output feature map can be stored in the external memory 590, transferred from the external memory 590 to the computing circuit 520, and can be reused as an input feature map for the next layer of neural network operations.
[0089] Figure 6 This is a diagram illustrating an example movement path of a feature map of a neural network device.
[0090] exist Figure 6 The neural network device includes a weight buffer memory 640, an on-chip buffer memory 610, an auxiliary buffer memory 650, and a local bus 660. The neural network device may also include, for example, computing circuitry and a controller. Figure 6 The on-chip buffer memory 610, computing circuit, auxiliary buffer memory 650, and controller can respectively correspond to Figure 5 The on-chip buffer memory 510, computing circuitry 520, auxiliary buffer memory 550, and controller 530 are included; therefore, repeated descriptions of them are omitted. The computing circuitry may include convolution operation circuitry 621, activation operation circuitry 622, and pooling operation circuitry 623. External memory 690 is located outside the neural network device.
[0091] T Z T is the parallelism of the input feature map and a variable representing the unit of features that constitute the input feature map and can be processed in a single operation. M It is the parallelism of the output feature map, and it is a variable representing the unit of features that constitute the output feature map and can be processed in a single operation.
[0092] Access to external memory 690 is performed during the processes of inputting external data into the first layer (input layer) of the neural network device, outputting data from the last layer (output layer) of the neural network device, and reading weights. In other layers, operations of the neural network device (such as moving, calculating, and storing feature maps) can be performed entirely within the neural network device with an on-chip architecture without accessing external memory 690.
[0093] The weight buffer memory 640 stores weights used for neural network operations on the input feature map. The weight buffer memory 640 is single-port compliant. The weight buffer memory 640 can receive weights from external memory 690 via this single port. The weight buffer memory 640 can also send weights to the convolution operation circuit 621 via this single port for neural network operations on the input feature map. The convolution operation circuit 621 performs convolution operations on the input feature map based on the weights.
[0094] The input feature map stored in the on-chip buffer memory 610 can be sent to the convolution operation circuit 621, and a convolution operation can be performed on the input feature map based on weights. The result of the convolution operation on the input feature map can be input to the activation operation circuit 622, and then an activation operation can be performed. The result of the activation operation can be input to the pooling operation circuit 623, and a pooling operation can be performed directly. The output feature map, as the result of the pooling operation, can be sent to the on-chip buffer memory 610, and combined with the input feature map... Figure 1 The input and output feature maps are stored. However, when the total size of the input and output feature maps exceeds the size of the on-chip buffer memory 610, and therefore the neural network device may not store the input and output feature maps... Figure 1 When the input feature map stored in the on-chip buffer memory 610 is in the on-chip buffer memory 610, the neural network device can operate in compatibility mode. When the neural network device selects to operate in compatibility mode, the auxiliary buffer memory 650 can be operated. When the total size of the input feature map stored in the on-chip buffer memory 610 and the output feature map output from the pooling operation circuit 623 exceeds the size of the on-chip buffer memory 610, the controller (e.g., Figure 3 Controller 330 or Figure 5 The controller 530 can temporarily store the output feature map in the auxiliary buffer memory 650. The output feature map can be temporarily stored in the auxiliary buffer memory 650 and can be sent to the external memory 690 according to a preset period.
[0095] For example, when the total size of the input feature map of the second layer stored in the on-chip buffer memory 610 and the output feature map of the second layer output from the computing circuit exceeds the size of the on-chip buffer memory 610, the controller can temporarily store the output feature map of the second layer in the auxiliary buffer memory 650.
[0096] When the input feature map is stored in external memory 690, the controller can determine the storage location of the output feature map based on whether the size of the output feature map exceeds the size of the on-chip buffer memory 610. For example, when the output feature map of the second layer is stored in external memory 690 and is reused as the input feature map of the third layer, the controller can determine whether the size of the output feature map of the third layer output from the computing circuit exceeds the size of the on-chip buffer memory 610. When the size of the output feature map of the third layer exceeds the size of the on-chip buffer memory 610, the controller can temporarily store the output feature map of the third layer in auxiliary buffer memory 650. When the size of the output feature map of the third layer is less than or equal to the size of the on-chip buffer memory 610, the controller can store the output feature map of the third layer in the on-chip buffer memory 610.
[0097] However, even when the neural network device selects to operate in compatibility mode and stores the feature map in external memory 690, the minimum portion of the feature map required for reuse can be stored in on-chip buffer memory 610, and the remainder can be stored in external memory 690.
[0098] Neural network devices can reduce the size of the output feature map by processing pooling operations within the computational circuitry rather than in a separate layer. For example, the size of the output feature map can be reduced by the square of the pooling step size through pooling operations. Therefore, the possibility of storing the output feature map in the on-chip buffer memory 610 can be increased, and access to the external memory 690 can be reduced.
[0099] Local bus 660 is the data (feature maps or weights) movement path between the buffer memory and other components using a single port. Feature maps can move between on-chip buffer memory 610 and computing circuitry via local bus 660. Weights can move between weight buffer memory 640 and external memory 690 or computing circuitry via local bus 660. When a large amount of data must be sent via a single port, local bus 660 prevents collisions on the single port. When feature map read operations and feature map write operations are requested simultaneously, the order of read and write operations can be adjusted on local bus 660, thus preventing collisions between read and write operations within the single port.
[0100] Figure 7 This is an example computation loop that is executed to perform neural network operations.
[0101] M represents the number of features in the output feature map, R and C represent the number of rows and columns in the output feature map, Z represents the number of features in the input feature map, K represents the size of the convolution filter, S represents the stride of the convolution, P represents the stride of the pooling filter, and Q represents the size of the pooling filter.
[0102] A neural network device can execute at least one computational loop to perform neural network operations. For example, a neural network device can use an input feature map as input to execute an M-loop. An M-loop is a loop that specifies a range of the number of features to be output in the output feature map. The range of the number of features can be T in each loop. M The expansion is done element by element. An M-loop can include an R / C-loop. An R / C-loop specifies a loop of rows and columns of the output feature map to be output during processing. R / C-loops include RR-loop, CC-loop, R-loop, and C-loop.
[0103] Because the R / C-loop is within the M-loop, the size of the weight buffer memory can be determined to be relatively small. The R / C-loop within the M-loop means that when the processing of expanding rows and columns by a fixed range of the number of features in the output feature map and outputting the features of the output feature map is completed, the weight buffer memory is shifted by T within the range of the number of features to be output. M The processing of simultaneously expanding rows and columns of each element and outputting the features of the feature map is repeated.
[0104] In one example, the size of the weight buffer memory required to maximize the reuse of weights within the computation loop can be T. M ×Z×K 2 Proportional. According to Figure 7 The algorithm differs, and this size is smaller than the weight buffer memory size M×Z×K required when the M-loop is in the R / C-loop. 2 According to Figure 7 The algorithm can reduce the area occupied by the weight buffer memory in the neural network device.
[0105] K-loop #1 is a loop that performs convolution operations on the input feature map and weights. Immediately after the convolution operations in K-loop #1, activation and pooling operations are performed consecutively. Because activation and pooling operations are processed consecutively within a single loop rather than in separate loops, the processing of neural network operations is simplified. Pooling operations can be easily handled within the loop when the R / C-loop runs continuously without interference from other loops.
[0106] A read request can be made whenever the range of features in the input feature map used in the convolution operation is changed. That is, a read operation can be requested in each loop where the z-value of the input feature map is changed in the Z-loop. A write request can be made whenever a row or column is changed. That is, a write operation can be requested whenever the r or c value in the R / C-loop changes (when the loop of K-loop#1 completes). A read request for the input feature map whose z-value has changed can be made in each cycle, and can be made every K×K×Z / T. Z Periodically make write requests to the output feature map that change its r or c value.
[0107] When read operations are performed every cycle and write operations are performed every K×K×Z / T Z When each cycle is executed, each K×K×Z / T Z Conflicts between write and read operations per cycle can occur at a single port. However, compared to the frequency of read requests occurring per cycle, the conflict occurs every K×K×Z / T. Z The frequency of write requests occurring in a given cycle can correspond to a very low frequency. For example, K×K×Z / T Z The frequency of collisions occurring at a single port in a given cycle is less than 1% of the frequency of repetition throughout the entire cycle, and the impact on the computational speed of the neural network device may be very small.
[0108] Lines 700 and 710 are codes used only when the neural network device is selected to operate in compatibility mode. Line 700 is code used to read the input feature map from external memory when the input feature map is stored in external memory. Line 710 is code used to write the output feature map to external memory.
[0109] Figure 8 This is a block diagram of an example electronic system 800. Electronic system 800 may correspond to reference... Figures 1 to 7 as well as Figure 9 and Figure 10 Any or all neural network devices described.
[0110] exist Figure 8 In this system, electronic system 800 can extract useful information by analyzing input data in real time based on neural networks, and can determine the situation or control the configuration of electronic devices equipped with electronic system 800 based on the extracted information. For example, electronic system 800 can be applied to robotic devices (such as drones and advanced driver assistance systems (ADAS)), smart TVs, smartphones, medical devices, mobile devices, image display devices, measuring devices, Internet of Things (IoT) devices, etc. In addition, electronic system 800 can be installed on at least one electronic device of various types.
[0111] Electronic system 800 may include a central processing unit (CPU) 810, random access memory (RAM) 820, a neural network device 830, a memory 840, a sensor module 850, and a communication module 860. Electronic system 800 may also include an input / output module, a security module, a power control device, etc. Some of the hardware configuration of electronic system 800 may be at least one semiconductor chip or mounted on at least one semiconductor chip. Neural network device 830 may be any neural network device or all neural network devices having the on-chip structure described above (e.g., Figure 3 Neural network device 300 or Figure 5 (a neural network device 500) or a device including a neural network device.
[0112] CPU 810 controls the overall operation of electronic system 800. CPU 810 may include one or more processor cores. CPU 810 can process or execute instructions and / or data stored in memory 840. In one embodiment, CPU 810 can control the function of neural network device 830 by executing a program stored in memory 840. CPU 810 may be implemented as a CPU, graphics processing unit (GPU), application processor (AP), etc.
[0113] RAM 820 may temporarily store programs, data, or instructions. For example, instructions and / or data stored in memory 840 may be temporarily stored in RAM 820 under the control of CPU 810 or boot code. RAM 820 may be implemented as memory (such as dynamic random access memory (DRAM) or static random access memory (SRAM)).
[0114] The neural network device 830 can perform neural network operations based on received input data and generate information signals based on the results of the operations. The neural network may include, but is not limited to, CNNs, RNNs, deep belief networks, restricted Boltzmann machines, etc. The neural network device 830 may correspond to a hardware accelerator dedicated to neural networks.
[0115] The information signal may include one of various types of recognition signals (such as voice recognition signals, object recognition signals, image recognition signals, and biometric information recognition signals). For example, the neural network device 830 may receive frame data included in a video stream as input data and generate recognition signals from the frame data for objects included in an image represented by the frame data. However, this disclosure is not limited thereto; the neural network device 830 may receive various types of input data depending on the type or function of the electronic device on which the electronic system 800 is installed, and may generate recognition signals based on the input data.
[0116] Memory 840 is a storage location used to store data, and can store the operating system (OS), various instructions, and various types of data. Memory 840 can correspond to... Figure 3 External memory 390. When intermediate results (e.g., output feature maps) generated during the operation of the neural network device 830 for each operation are stored in memory 840, power consumption can increase due to frequent access to memory 840. The neural network device 830 can reduce access to memory 840 by limiting the storage of feature maps generated during the operation of the neural network device 830 to the neural network device 830. As a result, power consumption can be reduced according to the disclosed neural network device 830. Memory 840 may also store quantized neural network data (such as parameters, weight maps, or weight lists) used in the neural network device 830.
[0117] The memory 840 may be DRAM, but is not limited to it. The memory 840 may include at least one of volatile memory and non-volatile memory. Non-volatile memory includes read-only memory (ROM), programmable ROM (PROM), erasable programmable ROM (EPROM), electrically erasable and programmable ROM (EEPROM), flash memory, phase-change RAM (PRAM), magnetic RAM (MRAM), resistive RAM (RRAM), ferroelectric RAM (FRAM), etc. Volatile memory includes DRAM, SRAM, synchronous DRAM (SDRAM), PRAM, MRAM, RRAM, FRAM, etc. In one embodiment, the memory 840 may include at least one of hard disk drive (HDD), solid-state drive (SSD), compact flash memory (CF), secure digital (SD), micro SD, mini SD, extreme digital (xD), and memory stick.
[0118] Sensor module 850 can collect information from the periphery of the electronic device on which electronic system 800 is installed. Sensor module 850 can sense or receive signals (e.g., image signals, audio signals, magnetic signals, biosignals, or touch signals) from outside the electronic device and convert the sensed or received signals into data. For this purpose, sensor module 850 may include at least one of various types of sensing devices (such as microphones, imaging devices, image sensors, light detection and ranging (LIDAR) sensors, ultrasonic sensors, infrared sensors, biosensors, and touch sensors).
[0119] The sensor module 850 can provide the converted data as input data to the neural network device 830. For example, the sensor module 850 may include an image sensor and can generate a video stream by capturing images of the external environment of the electronic device, and sequentially provide consecutive data frames of the video stream as input data to the neural network device 830. However, this disclosure is not limited thereto, and the sensor module 850 may provide various types of data to the neural network device 830.
[0120] The communication module 860 may include various wired or wireless interfaces for communicating with external devices. For example, the communication module 860 may include communication interfaces that can connect to a wired local area network (LAN), a wireless local area network (WLAN) (such as Wi-Fi), a wireless personal area network (WPAN) (such as Bluetooth), a wireless universal serial bus (USB), Zigbee, near field communication (NFC), radio frequency identification (RFID), power line communication (PLC), or a mobile cellular network (such as third-generation (3G), fourth-generation (4G), or long-term evolution (LTE)).
[0121] Figure 9 This is a flowchart illustrating an example operation method of a neural network device.
[0122] exist Figure 9 In the text, the operation method of the neural network device includes... Figure 3 The neural network device 300 shown in the figure or Figure 5 The operations in the neural network device 500 shown are processed in a time sequence. Therefore, even if the following is omitted, the above references... Figures 3 to 8 The description of the given neural network device can also be applied to Figure 9 The method.
[0123] In operation 910, the neural network device can store the input feature map of the first layer in the neural network in an on-chip buffer memory.
[0124] In Operation 920, the neural network device can send the input feature map of the first layer to the computing circuitry via a single port of the on-chip buffer memory.
[0125] In operation 930, the neural network device can output an output feature map of the first layer corresponding to the input feature map of the first layer by performing neural network operations on the input feature map of the first layer.
[0126] The neural network device can perform neural network operations based on one or more computation cycles. In each cycle, the neural network device can perform a read operation to read at least a portion of the data constituting the input feature map of the first layer from an on-chip buffer memory via a single port, with each of the one or more computation cycles being executed in each cycle. However, when a write operation is requested to write at least a portion of the data constituting the output feature map of the first layer to the on-chip buffer memory via a single port at the time when the read operation would be executed, the write operation can be performed prior to the read operation.
[0127] Neural network operations may include convolution, activation and pooling operations. A neural network device can output the result of pooling, convolution and activation operations on the input feature map of the first layer as the output feature map of the first layer.
[0128] In operation 940, the neural network device can send the output feature map of the first layer to the on-chip buffer memory via a single port, and also send the output feature map of the first layer and the input feature map of the first layer to the on-chip buffer memory. Figure 1 It is stored in the on-chip buffer memory.
[0129] The neural network device can allocate a first memory address for storing the input feature map of the first layer and a second memory address for storing the output feature map of the first layer in different directions, thereby reducing the overlap between the first memory address and the second memory address.
[0130] The neural network device can allocate a first memory address from the starting point of the memory address corresponding to the storage space of the on-chip buffer memory along a first direction, and allocate a second memory address from the ending point of the memory address corresponding to the storage space of the on-chip buffer memory along a second direction opposite to the first direction.
[0131] When the output feature map of the first layer stored in the second memory address is reused as the input feature map of the second layer, and when the output feature map of the second layer corresponding to the input feature map of the second layer is output from the computing circuit, the neural network device can allocate a third memory address from the starting point along the first direction for storing the output feature map of the second layer in the on-chip buffer memory, thereby reducing the overlap between the second memory address and the third memory address.
[0132] In one embodiment, the neural network device may store the weights of the first layer used for neural network operations on the input feature map of the first layer in a weight buffer memory. The weight buffer memory may receive the weights of the first layer from external memory outside the neural network device via a single port. The weight buffer memory may also send the weights of the first layer to the computation circuitry via the same single port.
[0133] On-chip buffer memory, computing circuitry, and controllers for neural network devices can be installed on a single chip. Figure 9 Each operation can be performed on a single chip.
[0134] In one example, when the output feature map of the second layer, corresponding to the input feature map of the second layer, is output from the computing circuit, the neural network device can determine whether the total size of the input and output feature maps of the second layer exceeds the size of the on-chip buffer memory. If it is determined that the total size of the input and output feature maps of the second layer exceeds the size of the on-chip buffer memory, the neural network device can temporarily store the output feature map of the second layer in an auxiliary buffer memory instead of the on-chip buffer memory. The output feature map of the second layer, temporarily stored in the auxiliary buffer memory, can be sent to external memory outside the neural network device according to a preset period.
[0135] When the output feature map of the second layer is reused as the input feature map for the neural network operation of the third layer, the neural network device can output the output feature map of the third layer from the computing circuit, which is the layer below the second layer.
[0136] The neural network device can determine whether the size of the output feature map of the third layer exceeds the size of the on-chip buffer memory. When it is determined that the size of the output feature map of the third layer exceeds the size of the on-chip buffer memory, the neural network device can temporarily store the output feature map of the third layer in an auxiliary buffer memory. When it is determined that the size of the output feature map of the third layer is less than or equal to the size of the on-chip buffer memory, the neural network device can store the output feature map of the third layer in the on-chip buffer memory.
[0137] Figure 10 This is a flowchart illustrating an example operation method of a neural network device.
[0138] exist Figure 10 In the text, the operation method of the neural network device includes... Figure 3 The neural network device 300 shown in the figure or Figure 5 The operations in the neural network device 500 shown are processed in a time sequence. Therefore, even if the following is omitted, the above references... Figures 3 to 8 The description of the given neural network device can also be applied to Figure 10 The method.
[0139] According to the publicly available principles and operating methods of neural network devices, the input feature map and output feature map are... Figure 1The input and output feature maps are stored in an on-chip buffer. However, there is an exception where the input and output feature maps may not be stored together in the on-chip buffer. To prepare for such exceptions, the disclosed neural network device can operate in a compatibility mode in addition to the principle operation method. As an exception, the neural network device can operate in a compatibility mode when the total size of the input and output feature maps exceeds the size of the on-chip buffer. Figure 10 It is a diagram illustrating the overall operation process, including the principle of the neural network device, the operating method, and the compatibility mode.
[0140] Figure 10 Operations S1031, S1071, S1072, and S1073 in the text represent operations performed when the neural network device is selected to operate in compatibility mode.
[0141] exist Figure 10 The diagram illustrates the operation method from the first layer (input layer) to the last layer (output layer) of a neural network device. Figure 10 The 'n' in the diagram indicates the order of the layers in the neural network device. For example, when the neural network device consists of a total of five layers, 'n=1' represents the first layer, 'n=2' represents the second layer, and 'n=5' represents the last layer.
[0142] In operations 1010 and 1020, the neural network device may store the input feature map of the first layer (n=1), which is first generated when external data is input, in external memory. When it is determined in operation 1030 that the size of the input feature map of the first layer is less than or equal to the size of the on-chip buffer memory, the neural network device may execute operation 1040. In operation 1040, the neural network device may store the input feature map of the first layer in the on-chip buffer memory. When it is determined in operation 1030 that the size of the input feature map of the first layer exceeds the size of the on-chip buffer memory, the neural network device may execute operation S1031. In operation S1031, the neural network device may operate in compatibility mode to store the input feature map of the first layer in external memory. In operation 1050, the neural network device may send the input feature map of the first layer to the computing circuit to perform neural network operations and output the output feature map of the first layer.
[0143] When it is determined in operation 1060 that the first layer is the last layer, the neural network device may execute operation 1120. In operation 1120, the neural network device may store the output feature map of the first layer in external memory. When it is determined in operation 1060 that the first layer is not the last layer, the neural network device may execute operation 1070. When it is determined in operation 1070 that the total size of the input feature map and output feature map of the first layer is less than or equal to the size of the on-chip buffer memory, the neural network device may execute operation 1080. In operation 1080, the neural network device may store the output feature map of the first layer in the on-chip buffer memory. When it is determined in operation 1070 that the total size of the input feature map and output feature map of the first layer exceeds the size of the on-chip buffer memory, the neural network device may enter operation S1071 and operate in compatibility mode.
[0144] When it is determined in operation S1071 that the input feature map of the first layer is stored in the on-chip buffer memory according to operation S1040, the neural network device may execute operation S1072. In operation S1072, the neural network device may temporarily store the output feature map of the first layer in the auxiliary buffer memory. In operation S1072, the output feature map of the first layer temporarily stored in the auxiliary buffer memory may be transferred to external memory and stored in external memory. When it is determined in operation S1071 that the input feature map of the first layer is stored in external memory according to operation S1031, the neural network device may proceed to operation S1073.
[0145] When it is determined in operation S1073 that the size of the output feature map of the first layer is less than or equal to the size of the on-chip buffer memory, the neural network device may execute operation S1080. In operation S1080, the neural network device may store the output feature map of the first layer in the on-chip buffer memory. When it is determined in operation S1073 that the size of the output feature map of the first layer exceeds the size of the on-chip buffer memory, the neural network device may execute operation S1072. In operation S1072, the neural network device may store the output feature map of the first layer in the auxiliary buffer memory.
[0146] In operation 1090, the neural network device can reuse the output feature map of the first layer as the input feature map of the second layer (n=2). When the output feature map of the first layer is reused as the input feature map of the second layer, the next layer in subsequent operations can correspond to "n=n+1". In operation 1100, the neural network device can send the input feature map of the second layer to the computing circuit to perform neural network operations and output the output feature map of the second layer.
[0147] In operation 1110, the neural network device determines whether the second layer is the last layer. When it is determined that the second layer is the last layer, the neural network device may perform operation 1120. In operation 1120, the neural network device may store the output feature map in external memory. When it is determined that the second layer is not the last layer, the neural network device may perform operation 1070 to determine whether to operate in compatibility mode based on the total size of the input feature map and the output feature map of the second layer.
[0148] Subsequently, during repeated operation 1090, the same aforementioned processing is performed on the next layer, and when the last layer is reached, the neural network device can store the output feature map of the last layer of the neural network in external memory and terminate the operation of the neural network device.
[0149] Perform the operations described in this application Figures 1 to 10The neural network devices, neural network device 300, neural network device 500, neural network device 830, on-chip buffer memory 310, on-chip buffer memory 510, computing circuit 320, computing circuit 520, controller 330, controller 530, external memory 390, external memory 590, auxiliary buffer memory 550, CPU 810, RAM 820, memory 840, sensor module 850, and transmit / receive module 860 are implemented by hardware components configured to perform the operations described in this application. Examples of hardware components that can be used to perform the operations described in this application include, where appropriate, controllers, sensors, generators, drivers, memories, comparators, arithmetic logic units, adders, subtractors, multipliers, dividers, integrators, and any other electronic components configured to perform the operations described in this application. In other examples, one or more of the hardware components performing the operations described in this application are implemented by computing hardware (e.g., by one or more processors or computers). A processor or computer may be implemented by one or more processing elements, such as logic gate arrays, controllers and arithmetic logic units, digital signal processors, microcomputers, programmable logic controllers, field-programmable gate arrays, programmable logic arrays, microprocessors, or any other means or combination of means configured to respond to and execute instructions in a defined manner to achieve a desired result. In one example, the processor or computer includes or is connected to one or more memories storing instructions or software executed by the processor or computer. Hardware components implemented by the processor or computer may execute instructions or software (such as an operating system (OS) and one or more software applications running on the OS) for performing the operations described herein. The hardware components may also access, manipulate, process, create, and store data in response to the execution of instructions or software. For the sake of brevity, the singular terms “processor” or “computer” may be used in the description of the examples described herein, but in other examples, multiple processors or computers may be used, or a processor or computer may include multiple processing elements or multiple types of processing elements or both. For example, a single hardware component or two or more hardware components may be implemented by a single processor, or two or more processors, or a processor and a controller. One or more hardware components may be implemented by one or more processors, or processors and controllers, and one or more other hardware components may be implemented by one or more other processors, or additional processors and additional controllers. One or more processors, or processors and controllers, may implement a single hardware component or two or more hardware components.The hardware components can have any one or more different processing configurations, examples of which include: a single processor, a discrete processor, a parallel processor, a single instruction single data (SISD) multiprocessing, a single instruction multiple data (SIMD) multiprocessing, multiple instruction single data (MISD) multiprocessing, and multiple instruction multiple data (MIMD) multiprocessing.
[0150] Figures 1 to 10 The methods for performing the operations described in this application, as shown, are executed by computing hardware (e.g., by one or more processors or a computer), which is implemented as described above to execute instructions or software to perform the operations performed by the methods described in this application. For example, a single operation or two or more operations may be executed by a single processor, or two or more processors, or a processor and a controller. One or more operations may be executed by one or more processors, or a processor and a controller, and one or more other operations may be executed by one or more other processors, or additional processors and additional controllers. One or more processors, or a processor and a controller, may execute a single operation or two or more operations.
[0151] Instructions or software for controlling computing hardware (e.g., one or more processors or computers) to implement hardware components and perform the methods described above can be written as computer programs, code segments, instructions, or any combination thereof to individually or collectively instruct or configure one or more processors or computers, such as machines or special-purpose computers, to perform operations performed by the hardware components and methods described above. In one example, the instructions or software include machine code (such as machine code generated by a compiler) that is directly executed by one or more processors or computers. In another example, the instructions or software include high-level code that is executed by one or more processors or computers using an interpreter. The instructions or software can be written using any programming language based on the block diagrams and flowcharts shown in the accompanying drawings and the corresponding descriptions in the specification, which disclose algorithms for performing operations performed by the hardware components and methods described above.
[0152] Instructions or software for controlling computing hardware (e.g., one or more processors or computers) to implement hardware components and perform the methods described above, along with any associated data, data files, and data structures, may be recorded, stored, or fixed on or in one or more non-transitory computer-readable storage media. Examples of non-transitory computer-readable storage media include: read-only memory (ROM), random access memory (RAM), flash memory, CD-ROM, CD-R, CD+R, CD-RW, CD+RW, DVD-ROM, DVD-R, DVD+R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, magnetic tape, floppy disk, magneto-optical data storage device, optical data storage device, hard disk, solid-state drive, and any other device configured to store instructions or software and any associated data, data files, and data structures in a non-transitory manner and to provide the instructions or software and any associated data, data files, and data structures to one or more processors or computers, such that one or more processors or computers can execute the instructions. In one example, instructions or software, along with any associated data, data files, and data structures, are distributed across a networked computer system, such that the instructions or software, along with any associated data, data files, and data structures, are stored, accessed, and executed in a distributed manner through one or more processors or computers.
[0153] While this disclosure includes specific examples, it will be clear upon understanding this disclosure that various changes in form and detail may be made to these examples without departing from the spirit and scope of the claims and their equivalents. The examples described herein are to be considered descriptive only and not for limiting purposes. The description of features or aspects in each example is to be considered applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques are performed in a different order, and / or if components in the described system, architecture, apparatus, or circuit are combined in a different manner, and / or replaced or supplemented by other components or their equivalents. Therefore, the scope of the disclosure is not limited by the specific embodiments but by the claims and their equivalents, and all variations within the scope of the claims and their equivalents should be construed as included in the disclosure.
Claims
1. A neural network device, comprising: An on-chip buffer memory is configured to store the input feature maps of the first layer of the neural network. The computing circuit is configured to receive the input feature map of the first layer through a single port of the on-chip buffer memory, and perform neural network operations on the input feature map of the first layer to output the output feature map of the first layer corresponding to the input feature map of the first layer. and The controller is configured to send the output feature map of the first layer to an on-chip buffer memory via the single port, so as to store the output feature map of the first layer and the input feature map of the first layer together in the on-chip buffer memory. In this process, the output feature map of the first layer is reused as the input feature map for the neural network operations of the subsequent second layer. Specifically, the controller allocates a first memory address for storing the input feature map of the first layer and a second memory address for storing the output feature map of the first layer along different directions. Among them, the controller: A first memory address is allocated along a first direction from the starting point of the memory address corresponding to the storage space of the on-chip buffer memory, and A second memory address is allocated from the endpoint of the memory address corresponding to the storage space of the on-chip buffer memory along a second direction opposite to the first direction.
2. The neural network device according to claim 1, wherein, The computing circuitry is also configured to perform neural network operations based on one or more computational loops. The controller is further configured to: perform a read operation in each cycle to read at least a portion of the input feature map constituting the first layer from the on-chip buffer memory via the single port, wherein each of the one or more operation loops is executed in each cycle, and When a write operation is requested to write at least a portion of the data constituting the output feature map of the first layer to the on-chip buffer memory via the single port at the time when a read operation is to be performed, the write operation is performed prior to the read operation.
3. The neural network device according to claim 1, wherein, When the output feature map of the first layer stored in the second memory address is reused as the input feature map of the second layer, and when the output feature map of the second layer corresponding to the input feature map of the second layer is output from the computing circuit, the controller allocates a third memory address from the starting point along the first direction for storing the output feature map of the second layer in the on-chip buffer memory.
4. The neural network device according to claim 1 or claim 2, wherein, Neural network operations include convolution, activation, and pooling operations. The computing circuit is also configured to output the results of pooling, convolution and activation operations performed on the input feature map of the first layer as the output feature map of the first layer.
5. The neural network device according to claim 1 or claim 2, further comprising: The weight buffer is configured to store the weights of the first layer used for neural network operations on the input feature maps of the first layer. The weight buffer memory receives the weights of the first layer from an external memory outside the neural network device through a single port of the weight buffer memory, and sends the weights of the first layer to the computing circuit through the single port of the weight buffer memory.
6. The neural network device according to claim 1, wherein, On-chip buffer memory, computing circuitry, and controllers are housed in a single chip.
7. The neural network device according to claim 1 or claim 2, further comprising: Auxiliary buffer memory, Specifically, when the output feature map of the second layer, corresponding to the input feature map of the second layer, is output from the computing circuit, the controller determines whether the total size of the input feature map and the output feature map of the second layer exceeds the size of the on-chip buffer memory. If the total size exceeds the size of the on-chip buffer memory, the controller temporarily stores the output feature map of the second layer in an auxiliary buffer memory instead of the on-chip buffer memory. The output feature map of the second layer, which is temporarily stored in the auxiliary buffer memory, is sent to the external memory outside the neural network device according to a preset period.
8. The neural network device according to claim 7, wherein, When the output feature map of the second layer is reused as input features for the subsequent neural network operation of the third layer, and when the output feature map of the third layer corresponding to the input feature map of the third layer is output from the computing circuit, the controller determines whether the size of the output feature map of the third layer exceeds the size of the on-chip buffer memory, and When the size of the output feature map of the third layer exceeds the size of the on-chip buffer memory, the controller temporarily stores the output feature map of the third layer in the auxiliary buffer memory, and When the size of the output feature map of the third layer is less than or equal to the size of the on-chip buffer memory, the controller stores the output feature map of the third layer in the on-chip buffer memory.
9. A method of operating a neural network device, the method comprising: The input feature map of the first layer of the neural network is stored in an on-chip buffer memory; The input feature map of the first layer is sent to the computing circuit through a single port of the on-chip buffer memory; When the computing circuit performs neural network operations on the input feature map of the first layer, it outputs the output feature map of the first layer, which corresponds to the input feature map of the first layer. and The output feature map of the first layer is sent to the on-chip buffer memory through the single port, thereby storing the output feature map of the first layer and the input feature map of the first layer together in the on-chip buffer memory. In this process, the output feature map of the first layer is reused as the input feature map for the neural network operations of the subsequent second layer. The step of storing the output feature map and the input feature map of the first layer together in the on-chip buffer memory includes: allocating a first memory address for storing the input feature map of the first layer and a second memory address for storing the output feature map of the first layer in different directions. The allocation steps include: A first memory address for storing the input feature map of the first layer is allocated along a first direction from the starting point of the memory address corresponding to the storage space of the on-chip buffer memory, and A second memory address for storing the output feature map of the first layer is allocated from the endpoint of the memory address corresponding to the storage space of the on-chip buffer memory along a second direction opposite to the first direction.
10. The method of claim 9, further comprising: In each cycle, a read operation is performed to read at least a portion of the data constituting the input feature map of the first layer from the on-chip buffer memory via the single port, so as to perform neural network operations based on one or more operation loops, each of the one or more operation loops being executed in each cycle; and When a write operation is requested to write at least a portion of the data constituting the output feature map of the first layer to the on-chip buffer memory via the single port at the time when a read operation is to be performed, the write operation is performed prior to the read operation.
11. The method of claim 9, further comprising: When the output feature map of the first layer stored in the second memory address is reused as the input feature map of the second layer, and when the output feature map of the second layer corresponding to the input feature map of the second layer is output from the computing circuit, a third memory address for storing the output feature map of the second layer is allocated from the starting point along the first direction.
12. The method according to claim 9 or claim 10, wherein, Neural network operations include convolution, activation, and pooling operations. The output steps include: The results of pooling, convolution, and activation operations performed on the input feature map of the first layer are output as the output feature map of the first layer.
13. The method according to claim 9 or claim 10, further comprising: When the weights of the first layer are sent to the weight buffer from an external memory outside the neural network device through a single port of the weight buffer, the weights of the first layer used for neural network operations are stored in the weight buffer. and The weights of the first layer are sent from the weight buffer to the computation circuit via a single port of the weight buffer.
14. The method according to claim 9 or claim 10, further comprising: When the output feature map of the second layer, which corresponds to the input feature map of the second layer, is output from the computing circuit, it is determined whether the total size of the input feature map of the second layer and the output feature map of the second layer exceeds the size of the on-chip buffer memory. and When the total size is determined to exceed the size of the on-chip buffer memory, the output feature map of the second layer is temporarily stored in the auxiliary buffer memory instead of the on-chip buffer memory. Specifically, the output feature map of the second layer, which is temporarily stored in the auxiliary buffer memory, is sent to the external memory outside the neural network device based on a preset period.
15. The method of claim 14, further comprising: When the output feature map of the second layer is reused as the input feature map for the neural network operation of the subsequent third layer, when the output feature map of the third layer corresponding to the input feature map of the third layer is output from the computing circuit, it is determined whether the size of the output feature map of the third layer exceeds the size of the on-chip buffer memory. and When the size of the output feature map of the third layer exceeds the size of the on-chip buffer memory, the output feature map of the third layer is temporarily stored in the auxiliary buffer memory, and when the size of the output feature map of the third layer is less than or equal to the size of the on-chip buffer memory, the output feature map of the third layer is stored in the on-chip buffer memory.
16. A non-transitory computer-readable recording medium storing instructions, which, when executed by a processor, cause the processor to control the execution of the method of claim 9.