Apparatus for controlling data input and output of neural network circuit
By generating compressed data sequences and validity determination sequences, the problem of inefficient invalid bit handling in neural networks is solved, achieving more efficient operation processing and reduced power consumption, thereby improving learning performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SAMSUNG ELECTRONICS CO LTD
- Filing Date
- 2020-09-04
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies suffer from inefficiency and high power consumption when handling invalid bits in neural networks, especially since the handling of invalid bits is not optimized enough during training and operation.
By generating a compressed data sequence and a validity determination sequence, the encoding circuit compresses consecutive invalid bits into a single bit, and the decoding circuit omits non-consecutive invalid bits in the neural network circuit. The transmission of bits is managed by using a buffer to store the data and a pointer to control the transmission of bits, thus achieving the skipping of invalid bits and the processing of valid bits.
It improves the operation and processing speed of neural networks, reduces power consumption, optimizes the amount of computation during training, and enhances learning performance.
Smart Images

Figure CN113052303B_ABST
Abstract
Description
[0001] This application claims the benefit of Korean Patent Application No. 10-2019-0176097, filed with the Korean Intellectual Property Office on December 27, 2019, the entire disclosure of which is incorporated herein by reference for all purposes. Technical Field
[0002] The following description relates to methods and apparatus for controlling the data inputs and outputs of neural network circuits. Background Technology
[0003] Unlike traditional rule-based intelligent systems, artificial intelligence (AI) systems are computer systems that enable machines to learn, make judgments, and become intelligent. As AI systems are used more extensively, they can achieve higher recognition rates and more accurately understand user preferences.
[0004] AI technologies can include machine learning (e.g., deep learning) and element techniques that utilize machine learning. Machine learning can be an algorithmic technique that classifies / learns features of input data, while element techniques can be techniques that implement functions (such as cognition and judgment) by using machine learning algorithms (such as deep learning) and can be implemented in technical fields such as language understanding, visual understanding, inference / prediction, knowledge representation, and motion control.
[0005] Artificial intelligence technologies can be applied to a variety of fields, including: Language understanding, which can be technologies for recognizing and applying / processing language / characters, and may include natural language processing, machine translation, dialogue systems, question answering, and speech recognition / synthesis. Visual understanding, which can be technologies for recognizing and processing objects like vision, and may include object recognition, object tracking, image retrieval, person recognition, scene understanding, spatial understanding, and image enhancement. Inference / prediction, which can be technologies for judging information and performing logical inference and prediction, and may include knowledge-based / probability-based inference, optimization prediction, preference-based planning, and recommendation. Knowledge representation, which can be technologies for automatically processing human experience information into knowledge data, and may include knowledge construction (data generation / classification) and knowledge management (data utilization). Motion control, which can be technologies for controlling the autonomous driving of vehicles and the movement of robots, may include, as a non-limiting example, motion control (navigation, collision, driving) and operational control (action control). Summary of the Invention
[0006] This summary is provided to introduce, in a simplified form, the selection of concepts that will be further described in the detailed embodiments below. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to help determine the scope of the claimed subject matter.
[0007] In one general aspect, a neural network deep learning data control device includes: a memory; an encoding circuit configured to: receive a data sequence, generate a compressed data sequence in which consecutive invalid bits in a bit string of the compressed data sequence are compressed into individual bits of the compressed data sequence, generate a validity determination sequence indicating valid and invalid bits in the bit string of the compressed data sequence, and write the compressed data sequence and the validity determination sequence into the memory; and a decoding circuit configured to: read the compressed data sequence and the validity determination sequence from the memory, and, based on the validity determination sequence, determine bits in the bit string of the compressed data sequence that are set to be sent to the neural network circuit, such that the neural network circuit omits operations regarding non-consecutive invalid bits.
[0008] The single bit of the compressed data sequence can indicate the number of consecutive invalid bits in the bit string of the data sequence.
[0009] The decoding circuit may include: a buffer configured to sequentially store a compressed data sequence and a validity determination sequence, and the decoding circuit may be configured to: store a first pointer and a second pointer, the first pointer indicating the position in the buffer where the current bit of the compressed data sequence to be sent to the neural network circuit is stored, and the second pointer indicating the position in the buffer where the next bit of the compressed data sequence to be sent to the neural network circuit in the next cycle is stored.
[0010] To determine the bit to be sent to the neural network circuit, the decoding circuit can be configured to: determine whether the current bit corresponding to the first pointer is valid based on a validity determination sequence; skip sending the current bit to the neural network circuit in response to the current bit being invalid; and send the current bit to the neural network circuit in response to the current bit being valid.
[0011] The decoding circuit can be configured to: determine whether the next bit corresponding to the second pointer is valid based on the validity determination sequence; move the first pointer to the position where the next bit is stored in the buffer in response to the next bit being valid; and move the first pointer to the position where the bit to be sent to the neural network circuit in the next cycle of the next bit is stored in the buffer in response to the next bit being invalid.
[0012] The decoding circuit can be configured to: determine whether the next bit corresponding to the second pointer is valid based on the validity determination sequence; in response to the next bit being valid, move the second pointer to the position in the bit storage buffer that will be sent to the neural network circuit in the next cycle after the next bit; and in response to the next bit being invalid, move the second pointer to the position in the bit storage buffer that will be sent to the neural network circuit in the cycle after the next bit.
[0013] The decoding circuit can be configured to determine the operation process to skip the neural network circuit based on the validity determination sequence.
[0014] The decoding circuit can be configured to determine whether to skip the operation processing of the neural network circuit based on the next bit corresponding to the second pointer.
[0015] The decoding circuit can be configured to: determine whether the next bit corresponding to the second pointer is valid based on the validity determination sequence; not skip the operation processing of the neural network circuit in response to the next bit being valid; and skip the operation processing of the neural network circuit in response to the next bit being invalid.
[0016] The decoding circuit can be configured to skip the next bit value in response to the next bit being invalid.
[0017] The decoding circuit can be configured to skip the operation of the neural network circuit in response to the next bit being invalid, by adding the value 1 to the value of the next bit.
[0018] The decoding circuit can be configured to store a third pointer that indicates the location in the buffer where the compressed data sequence and validity determination sequence will be stored.
[0019] A valid bit can be a bit with a value greater than a predetermined threshold, and an invalid bit can be a bit with a value less than or equal to a predetermined threshold.
[0020] The bit value at the position corresponding to the valid bit in the compressed data sequence can be "1", and the bit value at the position corresponding to the invalid bit in the compressed data sequence can be "0".
[0021] The decoding circuit can be configured to use a validity determination sequence as a clock gating signal to perform the operation of the neural network circuit.
[0022] Buffers may include ring buffers.
[0023] The encoding circuit can be configured to generate a compressed data sequence by compressing consecutive valid bits with the same bit value in the bit string of the data sequence into another single bit of the compressed data sequence.
[0024] The decoding circuit can be configured to store a fourth pointer for identifying the multiple reused data in response to a data sequence including multiple reused data.
[0025] The decoding circuit can be configured to add bits to the multiple compressed data sequences to make them have the same length when reading them in parallel.
[0026] Data sequences can indicate the strength of the edges connecting the nodes of a neural network circuit.
[0027] The device may include neural network circuitry, wherein the neural network circuitry is configured to train the neural network by re-determining one or more of the connection strengths for a dropout operation in response to a determined bit string of a received compressed data sequence.
[0028] In another general aspect, a neural network system includes a neural network circuit and a control device for controlling the data inputs and outputs of the neural network circuit.
[0029] In one general aspect, a method for training a neural network for image recognition includes: obtaining training image data; and performing training on the neural network based on the training image data. Prior to a training operation for the current layer of the neural network, a raw data sequence for the current layer operation is processed, and the processed data sequence is applied to the training operation of the current layer such that operations concerning invalid bits are omitted in the training operation of the current layer. The step of processing the raw data sequence for the current layer operation includes: generating a compressed data sequence based on the raw data sequence for the current layer operation, in which consecutive invalid bits in the bit string of the compressed data sequence are compressed into individual bits of the compressed data sequence; generating a validity determination sequence for determining valid and invalid bits in the bit string of the compressed data sequence; writing the compressed data sequence and the validity determination sequence to memory; reading the compressed data sequence and the validity determination sequence from memory; and determining, based on the validity determination sequence, bits in the bit string of the compressed data sequence that are set to be applied to the training operation.
[0030] The original data sequence used for the operations of the current layer can be the input data and / or the weights of the current layer. The input data for the first layer of a neural network can be the training image data, and the input data for the next layer of the neural network can be the output data of the current layer.
[0031] In another general aspect, a processor-implemented neural network deep learning data control method includes: receiving a data sequence; generating a compressed data sequence in which consecutive invalid bits in a bit string of the data sequence are compressed into individual bits of the compressed data sequence; generating a validity determination sequence for determining valid and invalid bits in the bit string of the compressed data sequence; writing the compressed data sequence and the validity determination sequence to memory; reading the compressed data sequence and the validity determination sequence from memory; and, based on the validity determination sequence, determining bits in the bit string of the compressed data sequence that are set to be sent to a neural network circuit, such that the neural network circuit omits operations on non-consecutive invalid bits.
[0032] The single bit of the compressed data sequence can indicate the number of consecutive invalid bits in the bit string of the data sequence.
[0033] The method may include: sequentially storing a compressed data sequence and a validity determination sequence; and storing a first pointer and a second pointer, the first pointer indicating the position in the buffer where the current bit of the compressed data sequence to be sent to the neural network circuit is stored, and the second pointer indicating the position in the buffer where the next bit of the compressed data sequence to be sent to the neural network circuit in the next cycle is stored.
[0034] The determined steps may include: determining whether the current bit corresponding to the first pointer is valid based on the validity determination sequence; skipping the transmission of the current bit to the neural network circuit in response to the current bit being invalid; and transmitting the current bit to the neural network circuit in response to the current bit being valid.
[0035] The method may include: determining whether the next bit corresponding to the second pointer is valid based on a validity determination sequence; moving the first pointer to the position where the next bit is stored in the buffer in response to the next bit being valid; and moving the first pointer to the position where the bit to be sent to the neural network circuit in the next cycle of the next bit is stored in the buffer in response to the next bit being invalid.
[0036] The method may include: determining whether the next bit corresponding to the second pointer is valid based on a validity determination sequence; moving the second pointer to the position in the bit storage buffer that will be sent to the neural network circuit in the next cycle after the next bit in response to the next bit being valid; and moving the second pointer to the position in the bit storage buffer that will be sent to the neural network circuit in the cycle after the next bit in response to the next bit being invalid.
[0037] The method may include: determining whether to skip the operation processing of the neural network circuit based on the next bit corresponding to the second pointer.
[0038] The method may include: determining whether the next bit corresponding to the second pointer is valid based on the validity determination sequence; not skipping the operation processing of the neural network circuit in response to the next bit being valid; and skipping the operation processing of the neural network circuit in response to the next bit being invalid.
[0039] Skipping steps may include: in response to the next bit being invalid, skipping the operation processing of the neural network circuit to the next bit value.
[0040] The method may include: storing a third pointer, which indicates the location in the buffer where the compressed data sequence and validity determination sequence will be stored.
[0041] The generation steps may include generating a compressed data sequence by compressing consecutive valid bits with the same bit value in the bit string of the data sequence into another single bit of the compressed data sequence.
[0042] The method may include: in response to a data sequence comprising multiple reusable data, storing a fourth pointer for identifying the multiple reusable data.
[0043] The method may include adding bits to the multiple compressed data sequences to make them have the same length when reading multiple compressed data sequences in parallel.
[0044] A non-transitory computer-readable storage medium stores instructions that, when executed by a processor, configure the processor to perform the method.
[0045] In another general aspect, a processor-implemented neural network data control method includes: receiving a data sequence indicating the connection strength of connections between nodes of a neural network; generating a compressed data sequence, the compressed data sequence including bits of the data sequence greater than a threshold and bits having values determined based on the number of consecutive bits of the data sequence less than or equal to the threshold; and training the neural network by performing a drop-off operation on the one or more connections based on the compressed data sequence.
[0046] Other features and aspects will become clear from the following detailed description, drawings, and claims. Attached Figure Description
[0047] Figure 1A An example of a method for training a neural network is shown.
[0048] Figure 1B This shows an example of omitting operations in a neural network to improve learning performance.
[0049] Figure 2 An example of a control device is shown.
[0050] Figure 3A An example of a sequence generated by an encoding circuit is shown.
[0051] Figure 3B An example of a sequence generated by an encoding circuit is shown.
[0052] Figures 4A to 4G This illustrates an example of performing operations in a neural network based on the output of a control device.
[0053] Figure 5 This illustrates an example of performing operations in a neural network based on the output of a control device.
[0054] Figure 6An example of zero gating is shown.
[0055] Figure 7 This illustrates an example of increasing reuse by storing a range of values used iteratively.
[0056] Figure 8 This illustrates an example of using zero-gating in a systolic array to reduce power consumption.
[0057] Figure 9 This illustrates an example of controlling data input and output when data is stored in parallel.
[0058] Figure 10 This illustrates an example of an application of a method for controlling data input and output.
[0059] Throughout the accompanying drawings and detailed embodiments, unless otherwise described or provided, the same reference numerals will be understood to denote the same elements, features, and structures. The drawings may not be to scale, and for clarity, illustration, and convenience, the relative sizes, proportions, and depictions of elements in the drawings may be exaggerated. Detailed Implementation
[0060] The following detailed embodiments are provided to aid the reader in gaining a comprehensive understanding of the methods, apparatus, and / or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatus, and / or systems described herein will become apparent upon understanding this disclosure. For example, the order of operations described herein is merely illustrative and is not limited to the order set forth herein; rather, the order of operations may be changed as will become clear upon understanding this disclosure, except for operations that must occur in a specific order. Furthermore, for clarity and brevity, descriptions of features known upon understanding this disclosure may be omitted.
[0061] Although terms such as “first” or “second” are used herein to describe various components, assemblies, regions, layers, or parts, these components, assemblies, regions, layers, or parts are not limited by these terms. Rather, these terms are used only to distinguish one component, assembly, region, layer, or part from another. Therefore, without departing from the teachings of the examples described herein, the first component, first assembly, first region, first layer, or first part referred to as a second component, second assembly, second region, second layer, or second part may also be referred to as a second component, second assembly, second region, second layer, or second part.
[0062] Throughout this specification, when a component is described as "connected to" or "bonded to" another component, that component may be directly "connected to" or "bonded to" that other component, or there may be one or more other components in between. In contrast, when an element is described as "directly connected to" or "directly bonded to" another element, there may be no other element in between. Similarly, similar expressions, such as "between" and "immediately between," "adjacent to" and "right next to," should be interpreted in the same manner. As used herein, the term "and / or" includes any one and any combination of any two or more of the associated listed items.
[0063] As used herein, unless the context clearly indicates otherwise, the singular form is intended to include the plural form as well. The terminology used herein is for describing various examples only and is not intended to limit this disclosure. Unless the context clearly indicates otherwise, the singular form is intended to include the plural form as well. The terms “comprising,” “including,” and “having” indicate the presence of the stated features, quantities, operations, components, elements, and / or combinations thereof, but do not preclude the presence or addition of one or more other features, quantities, operations, components, elements, and / or combinations thereof.
[0064] Unless otherwise defined, all terms used herein (including technical and scientific terms) shall have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains and as commonly understood based on an understanding of the disclosure of this application. Unless expressly defined herein, terms (such as those defined in a general dictionary) shall be interpreted as having a meaning consistent with their meaning in the context of the relevant art and in the disclosure of this application, and shall not be interpreted in an idealized or overly formal sense. The use of the term "may" (e.g., regarding what an example or embodiment may include or implement) in relation to examples or embodiments indicates the existence of at least one example or embodiment that includes or implements such a feature, but not all examples are limited thereto.
[0065] The examples can be implemented as various types of products (such as personal computers, laptop computers, tablet computers, smartphones, televisions, smart home appliances, smart vehicles, self-service kiosks, and wearable devices) or implemented with various types of products. The examples will be described in detail below with reference to the accompanying drawings, wherein the same reference numerals are used for the same elements.
[0066] Figure 1A An example of a method for training a neural network is shown.
[0067] Reference Figure 1A The neural network 100 may include an input layer 120, a hidden layer 140, and an output layer 145. Figure 1A In this context, neural network 100 can be a fully connected network that performs classification and outputs information included in the input data. For example, neural network 100 can be a neural network for recognizing images. Specifically, if the input data is image data, neural network 100 can output output data as result data obtained by classifying the types of image objects included in the image data.
[0068] Multiple layers forming neural network 100 may each include multiple nodes (e.g., node 125) that receive data. Figure 1A As shown, two adjacent layers can be connected by multiple edges or connections (e.g., edge 130), and two nodes can be connected by an edge (e.g., edge 135). Each node may include weights, and the neural network 100 can determine output data based on values determined by performing operations (e.g., multiplication) between the input signals and weights (e.g., based on image data).
[0069] Reference Figure 1A The input layer 120 can receive input data (e.g., input data 110 including a cat as an image object).
[0070] In addition, the neural network 100 may include: a first edge layer 150 formed between the input layer 120 and the first hidden layer, a second edge layer 155 formed between the first hidden layer and the second hidden layer, a third edge layer 160 formed between the second hidden layer and the third hidden layer, and a fourth edge layer 170 formed between the third hidden layer and the output layer 145.
[0071] Multiple nodes included in the input layer 120 of the neural network 100 can receive signals corresponding to the input data 110. Through the operation of multiple layers included in the hidden layer 140, the output layer 145 can output output data 175 corresponding to the image data 110. Figure 1A In the example, neural network 100 can classify the type of image objects included in the input image by performing an operation, and output output data 175 as an image recognition result of "cat probability: 98%". To increase the accuracy of the output data 175 from neural network 100, the weights can be adjusted to increase the accuracy of the output data by performing learning or training in the direction from output layer 145 to input layer 120 (e.g., by one or more learning techniques (such as backpropagation of the remaining nodes of the neural network that have not been discarded)).
[0072] As described above, the neural network 100 can learn to adjust the connection weights of one or more nodes in its layers. In one example, overfitting may occur during the weight adjustment process. Overfitting can be described as a decrease in output accuracy for new input data due to over-focusing on the training data. To address such overfitting, operations such as dropout or pruning can be used. Operations such as dropout or pruning can be techniques that improve learning performance by omitting operations in the neural network (e.g., operations determined to be unnecessary).
[0073] Figure 1B Examples are shown of omitting operations in a neural network (e.g., operations determined to be unnecessary) to improve learning performance.
[0074] Reference Figure 1B The diagram illustrates a fully connected neural network 180 and a partially connected neural network 190. The partially connected neural network 190 may have fewer nodes and fewer edges than the fully connected neural network 180. For example, the partially connected neural network 190 may be a network with dropped edges applied.
[0075] Model ensembles can be used to improve the learning performance of fully connected neural networks.180 For model ensembles, different training data can be used to perform training, or the models can have different architectures. However, when using deep networks, training one or more networks to accurately estimate, interpret, or classify different types of image objects can involve training multiple networks (e.g., training each network based on individual image object types), which can involve performing a large amount of computation. To reduce the amount of computation performed in training one or more networks to accurately classify different types of image objects, dropout can randomly omit a portion of neurons during the network's learning loop, instead of training multiple networks. In this example, using dropout to train the network configures it to accurately classify different types of image objects, producing an effect similar to exponentially training various models, thus achieving the effect of model ensembles.
[0076] Reference Figure 1B The partially connected neural network 190 may have fewer edges than the fully connected neural network 180. Therefore, the partially connected neural network 190 may include multiple bit values "0" indicating "disconnection" in the edge sequence indicating the connection between nodes.
[0077] The following sections will describe in detail practical methods for omitting operations (e.g., operations determined to be unnecessary) in neural network operations. According to one or more embodiments of this disclosure, matrix and vector multiplications used in the operational processing of neural networks (e.g., fully connected networks) can be performed quickly and with low power.
[0078] Figure 2 An example of a control device is shown.
[0079] Reference Figure 2 The control device 200 may include a memory 210, an encoding circuit 220, and a decoding circuit 230. The control device 200 may be connected to a neural network circuit 240 that performs deep learning operations on the neural network. In one example, an artificial intelligence system may include the control device 200 and the neural network circuit 240. The control device 200 may receive information output during the operational processing of the neural network circuit 240 and send information generated by the control device 200 to the neural network circuit 240.
[0080] The neural network circuit 240 can receive training image data and perform deep learning or training based on the training image data. The neural network circuit 240 can perform training operations through a neural network including an input layer, hidden layers, and an output layer. Here, the hidden layer may include multiple layers (e.g., a first layer, a second layer, and a third layer). In one example, the input layer of the neural network circuit 240 receives the training image data and outputs the output data of the input layer based on the training image data, which serves as the input data for subsequent hidden layers. The output layer of the neural network circuit 240 outputs the image recognition result based on the output data of the previous layer. (Refer to the above...) Figures 1A to 1B Non-restricted example operations of a neural network performed by neural network circuit 240 are described.
[0081] Control device 200 can receive and output data to neural network circuit 240 in a first-in, first-out (FIFO) manner. Neural network circuit 240 can process information layer by layer. In one example, a waiting time may exist for each layer during the information processing of neural network circuit 240. For example, the result of the operation of the first layer may be processed or reprocessed within a predetermined waiting time after the operation of the first layer. The processing or reprocessing of the operation results can be performed by control device 200. Control device 200 can process the operation results of the first layer and send the processed operation results to neural network circuit 240. The processed operation results received by neural network circuit 240 from control device 200 can be used for the operation of the second layer. Control device 200 can sequentially receive data from neural network circuit 240 and sequentially output processed data to neural network circuit 240. In one example, before operating on the current layer of the neural network circuit 240, the control device 200 may receive a data sequence from the neural network circuit 240 as raw data for the operation of the current layer, process the data sequence, and output the processed data sequence to the neural network circuit 240 for the operation of the current layer of the neural network circuit 240.
[0082] The neural network circuit 240 can perform neural network operations. For example, the neural network can be a fully connected network. Nodes included in each layer of a fully connected network can have weights. In a fully connected network, the signal input to the current layer can be output (e.g., output to a subsequent layer) after an operation (e.g., multiplication) with the weight matrix. Here, the signal input to the current layer can be a matrix of size N×1 (N represents the number of nodes in the current layer). Furthermore, the weight matrix multiplied with the signal input to the current layer can be a matrix of size M×N (M represents the number of nodes in the layers after the current layer; N represents the number of nodes in the current layer). The signal output from the current layer can be input to a subsequent layer. Here, the signal output from the current layer can be input to a subsequent layer under the control of the control device 200. For example, the signal output from the current layer can be processed by the control device 200, and the processed signal can be input to a subsequent layer.
[0083] The memory 210 can store a bit stream or sequence of a predetermined size.
[0084] A sequence can be a sequence that includes information related to the input feature map and / or a sequence that includes information related to the weights of the filter. Such a sequence can be referred to as a "data sequence".
[0085] For example, the sequence may include information about whether nodes in multiple layers constituting a neural network are connected by edges. More specifically, the sequence may include information indicating the connection or disconnection of multiple edges formed in layers included in the neural network. For example, refer to... Figure 1A The sequence may include information related to the edge sequence, which indicates the connection of a plurality of edges 130 included in a predetermined layer (e.g., the first edge layer 150).
[0086] Each bit value in a sequence indicates the connection strength of a predetermined edge. For example, a larger bit value indicates a higher connection strength, and a smaller bit value indicates a lower connection strength. The information indicating the connection strength of predetermined edges as a sequence can be called a "data sequence".
[0087] The sequence may include information associated with a sequence that distinguishes valid and invalid bits in the bit string of the data sequence. For example, a value "0" in the bit string of this sequence may indicate that the bit corresponding to the address of the corresponding bit in the data sequence is an invalid bit. Conversely, a value "1" in the bit string of this sequence may indicate that the bit corresponding to the address of the corresponding bit in the data sequence is a valid bit. Whether a bit in the data sequence is valid or invalid can be determined by comparing the size of the bit with a predetermined threshold. Hereinafter, the sequence that determines the valid and invalid bits in the bit string of the data sequence may be referred to as the "validity determination sequence".
[0088] Memory 210 can store the aforementioned data sequence and / or validity determination sequence. The data sequence can be compressed and stored in memory 210 as a compressed data sequence. (See below for further details.) Figure 3A Provide a detailed description of non-limiting examples of data sequences, compressed data sequences, and sequences for validity determination.
[0089] When the neural network circuit 240 terminates or completes the operation of a predetermined layer, the control device 200 can receive the operation result of that layer from the neural network circuit 240. In one example, the operation result of the layer can be the data sequence of the layer.
[0090] Encoding circuit 220 can process the data sequence received by control device 200 and store the processed data sequence in memory 210. For example, the processed sequence can be a compressed sequence obtained by compressing the data sequence. Furthermore, for example, the processed sequence can be a validity determination sequence that distinguishes between valid and invalid bits in the bit string of the compressed data sequence. Encoding circuit 220 can generate the processed sequence corresponding to the operating loop of neural network circuit 240. Encoding circuit 220 can write the processed sequence into memory 210. The compressed sequence may include fewer bit strings than the sequence before compression, thus reducing the amount of writing to memory 210 by encoding circuit 220. Therefore, due to the reduction in the number of writes, the power consumption of control device 200 in one or more embodiments can be advantageously reduced.
[0091] Decoding circuit 230 can send the processed sequence generated by encoding circuit 220 to neural network circuit 240, allowing neural network circuit 240 to determine (or redetermine) the connection states (e.g., connection strengths) of edges in the neural network. Decoding circuit 230 can read the processed sequence from memory 210, allowing control device 200 to sequentially output bit strings from the processed sequence. The compressed sequence may contain fewer bit strings than the uncompressed sequence, thus reducing the amount of data read from memory 210 by decoding circuit 230. Therefore, due to the reduced number of reads, the power consumption of control device 200 in one or more embodiments can be advantageously reduced.
[0092] Furthermore, the decoding circuit 230 can determine the bits in the bit string of the compressed data sequence that will be sent to the neural network circuit, allowing the neural network circuit to omit operations on non-contiguous invalid bits. By omitting operations on non-contiguous invalid bits, the decoding circuit 230 can advantageously improve the processing speed. This will be discussed later. Figures 4A to 5 The detailed description of the decoding circuit 230 omits non-restrictive example operations regarding non-contiguous invalid bits.
[0093] In one embodiment, the control device 200 and the neural network circuit 240 may be included together in the neural network system.
[0094] Figure 3A This shows the encoding circuit (e.g., Figure 2 An example of a sequence generated by the encoding circuit 220.
[0095] Reference Figure 3A Examples of data sequence 310, compressed data sequences 320 and 340, and validity determination sequences 330 and 350 are shown.
[0096] The data sequence 310 may include information indicating the connection strength of a predetermined edge. The data sequence 310 may include a bit string. Larger bit values included in the bit string may indicate a high connection strength of the predetermined edge, and smaller bit values may indicate a low connection strength of the predetermined edge.
[0097] Data sequence 310 may include valid bits and invalid bits. Whether a bit in data sequence 310 is valid or invalid can be determined by comparing the value of the bit with a predetermined threshold. When a bit in data sequence 310 has a value less than or equal to the threshold, the bit can be determined to be invalid. Invalidity may indicate that the edge corresponding to the bit is broken. When a bit with a value less than or equal to the predetermined threshold is determined to be invalid, the calculation using that bit can be determined to be unnecessary, and therefore such calculation can be omitted by pruning or discarding.
[0098] Encoding circuit 220 can generate compressed data sequences 320 and 340, wherein consecutive invalid bits in the bit string of data sequence 310 are compressed into a single bit.
[0099] In one example of generating the compressed data sequence 320 and the validity determination sequence 330, when the predetermined threshold is "0", bits in the bit string of data sequence 310 with values less than or equal to "0" can be determined as invalid, and bits in the bit string of data sequence 310 with values greater than "0" can be determined as valid. Furthermore, data sequence 310 may include consecutive bits with values less than or equal to the threshold "0". When consecutive bits with values less than or equal to the threshold "0" exist, encoding circuit 220 can generate compressed data sequence 320 by representing said consecutive bits with a single bit value. In one example, the single bit value may indicate the number of consecutive bits in data sequence 310 with values less than or equal to the threshold "0". For example, when data sequence 310 includes three consecutive bits (such as "000") with values less than or equal to the threshold "0", "000" in data sequence 310 may be represented as "3" in compressed data sequence 320. Encoding circuit 220 can compress consecutive invalid bits into a single bit as described above, thereby improving the operating speed of neural network circuit 240. Furthermore, when the value of a bit in data sequence 310 is greater than a predetermined threshold "0", that bit can be included in the compressed data sequence 320. Therefore, encoding circuit 220 can compress data sequence 310 "0900310002400781" to generate compressed data sequence 320 "192313242781".
[0100] In one example of generating the compressed data sequence 340 and the validity determination sequence 350, when the predetermined threshold is "3", bits in the bit string of the data sequence 310 with values less than or equal to "3" can be determined as invalid, and bits with values greater than "3" can be determined as valid. Furthermore, the data sequence 310 may include consecutive bits with values less than or equal to the threshold "3". When consecutive bits with values less than or equal to the threshold "3" exist, the encoding circuit 220 can generate the compressed data sequence 340 by representing the consecutive bits with a single bit value. In this example, the single bit value may indicate the number of consecutive bits in the data sequence 310 with values less than or equal to the threshold "3". For example, when the data sequence 310 includes eight consecutive bits with values less than or equal to the threshold "3" (such as "00310002"), "00310002" in the data sequence 310 can be represented as "8" in the compressed data sequence 340. Encoding circuit 220 can compress consecutive invalid bits into a single bit as described above, thereby improving the operating speed of neural network circuit 240. Furthermore, when the value of a bit in data sequence 310 is greater than a predetermined threshold "3", that bit can be included in the compressed data sequence 340. Therefore, encoding circuit 220 can compress data sequence 310 "0900310002400781" to generate compressed data sequence 340 "19842781".
[0101] Encoding circuit 220 can generate validity determination sequences 330 and 350, respectively, indicating the validity of valid and invalid bits in the bit strings of compressed data sequences 320 and 340.
[0102] The validity determination sequences 330 and 350 can be binary sequences represented by "0" and "1". For example, a value "0" included in the bit strings of validity determination sequences 330 and 350 can indicate that the bit corresponding to the address of the corresponding bit in the compressed data sequences 320 and 340 is an invalid bit. Furthermore, a value "1" included in the bit strings of validity determination sequences 330 and 350 can indicate that the bit corresponding to the address of the corresponding bit in the compressed data sequences 320 and 340 is a valid bit.
[0103] Decoding circuit 230 can read compressed data sequences 320 and 340 and validity determination sequences 330 and 350 from memory 210. Decoding circuit 230 can determine the bits in the bit string of compressed data sequences 320 and 340 that will be sent to neural network circuit 240 based on validity determination sequences 330 and 350, so that neural network circuit 240 can omit operations on non-contiguous invalid bits.
[0104] Figure 3B This shows the encoding circuit (e.g., Figure 2An example of a sequence generated by the encoding circuit 220.
[0105] Reference Figure 3B Examples of data sequence 360, compressed data sequences 365, 375 and 390, and validity determination sequences 370, 380, 385 and 395 are shown.
[0106] Data sequence 360, compressed data sequence 365, and validity determination sequences 370 and 385 can be generated respectively through the methods used to generate Figure 3A The data sequence 310, compressed data sequences 320 and 340, and validity determination sequences 330 and 350 are generated by the same operations. For example, the encoding circuit 220 can generate a compressed data sequence 365 by using a single bit indicating the number of consecutive bits to represent consecutive bits of data sequence 360 with values less than or equal to the threshold "0".
[0107] Encoding circuit 220 generates compressed data sequences 375 and 390 by compressing consecutive invalid bits in the bit string of data sequence 360 into a single bit and also compressing consecutive valid bits with the same bit value into a single bit.
[0108] For example, data sequence 360 may include consecutive bits with the same value greater than a threshold "0". When consecutive bits with the same value greater than the threshold "0" exist in data sequence 360, encoding circuit 220 can generate compressed data sequences 375 and 390 by representing the consecutive bits with a single bit value. In one example, a single bit value may be represented by the bit value of consecutive bits in data sequence 360. For example, when data sequence 360 includes four consecutive bits with values greater than the threshold "0" (such as "7777"), "7777" in data sequence 360 may be represented as "7" in compressed data sequences 375 and 390. As another example, when data sequence 360 includes three consecutive bits with values greater than the threshold "0" (such as "222"), "222" in data sequence 360 may be represented as "2" in compressed data sequences 375 and 390. Encoding circuit 220 can compress consecutive valid bits into a single bit as described above, thereby improving the operating speed of neural network circuit 240. Therefore, the encoding circuit 220 can compress the data sequence 360 "100334007777900310002220781" to generate compressed data sequences 375 and 390 "1234279231321781".
[0109] Encoding circuit 220 can generate validity determination sequences 370 and 385, which indicate valid and invalid bits in the bit strings of compressed data sequences 365 and 375, respectively. Furthermore, encoding circuit 220 can generate validity determination sequence 380, which indicates the number of consecutive valid bits with the same bit value in the bit strings of compressed data sequences 365 and 375.
[0110] For example, the value "0" included in the bit string of the validity determination sequence 385 can indicate that the bit corresponding to the address of the corresponding bit in the compressed data sequence 375 is an invalid bit. In this example, the bit corresponding to the invalid bit in the validity determination sequence 380 has the value "0".
[0111] When the bit value in validity determination sequence 385 is "1", validity determination sequence 380 can be used to determine the number of consecutive valid bits with the same bit value. For example, the value "4" included in the bit string of validity determination sequence 380 can indicate that the bit corresponding to the address of the corresponding bit in compressed data sequence 375 appears four times consecutively in compressed data sequence 365 and data sequence 360.
[0112] Encoding circuit 220 can generate validity determination sequence 395, which indicates the valid and invalid bits in the bit strings of compressed data sequences 365 and 390, and also indicates the number of consecutive valid bits with the same bit value.
[0113] For example, a value "0" included in the bit string of the validity determination sequence 395 can indicate that the bit corresponding to the address of the corresponding bit in the compressed data sequence 390 is an invalid bit. In this example, the bit corresponding to the invalid bit in the validity determination sequence 395 has the value "0". Furthermore, a non-zero value included in the bit string of the validity determination sequence 395 can indicate that the bit corresponding to the address of the corresponding bit in the compressed data sequence 390 is a valid bit, and can indicate the number of consecutive occurrences of the bit corresponding to the bit value in the address of the corresponding bit in the compressed data sequence 390.
[0114] For example, the value "4" included in the bit string of the validity determination sequence 395 indicates that the bit corresponding to the address of the corresponding bit in the compressed data sequence 390 is valid, and indicates that the valid bit appears four times consecutively in both the compressed data sequence 365 and the data sequence 360. Furthermore, the value "0" included in the bit string of the validity determination sequence 395 can indicate that the bit corresponding to the address of the corresponding bit in the compressed data sequence 390 is invalid.
[0115] Figures 4A to 4G This illustrates an example of performing operations in a neural network based on the output of a control device.
[0116] Reference Figures 4A to 4G The control device 420 can output data for operations on a neural network (e.g., a fully connected network) to the neural network circuit 440. For example, the data output from the control device 420 can be input data for the current layer of the neural network. Operations can be performed using the input data for the current layer output from the control device 420 and the weight sequence 430 of the current layer. Although Figure 4A to Figure 4G An example of the control device 420 outputting input data for the current layer is shown, but in some examples, the operation can be performed using the weight sequence 430 of the current layer output from the control device 420 or using the weight sequence 430 of the current layer output from the control device 420 and the input data for the current layer.
[0117] The neural network circuit 440 can perform matrix multiplication operations using processing elements. The neural network circuit 440 can output the result of the operation performed using data output from the control device 420 and the weight sequence 430 as the output of the current layer.
[0118] The decoding circuit 423 may include buffers that sequentially store compressed data sequences and validity determination sequences. The buffer may be a ring buffer.
[0119] Decoding circuit 423 can store the first pointer (e.g., Figures 4A to 4G The "c" of the pointer, and the second pointer (e.g., Figures 4A to 4G The "n") and the third pointer (e.g., Figure 4A to 4G The first pointer ("w") indicates the position in the buffer where the current bit of the compressed data sequence to be sent to the neural network circuit is stored, the second pointer indicates the position in the buffer where the next bit of the compressed data sequence to be sent to the neural network circuit in the next cycle is stored, and the third pointer indicates the position in the buffer where the compressed data sequence and the validity determination sequence will be stored. Here, the first pointer can be called the current pointer, the second pointer can be called the next pointer, and the third pointer can be called the write pointer.
[0120] The decoding circuit 423 can determine the bits in the bit string of the compressed data sequence that will be sent to the neural network circuit 440 based on the validity determination sequence, so that the neural network circuit omits operations on non-contiguous invalid bits.
[0121] The decoding circuit 423 can read the compressed data sequence and the validity determination sequence from the memory, and sequentially store the compressed data sequence and the validity determination sequence in a buffer in a FIFO manner. Specifically, the decoding circuit 423 can read the compressed data sequence and the validity determination sequence from the memory using a read pointer (e.g., ...). Figures 4A to 4GThe "r" pointer indicates the bit, and writes that bit into the buffer at the position corresponding to the third (or write) pointer. The decoding circuit 423 can move the read pointer and the third pointer by one space.
[0122] The decoding circuit 423 can move the first (or current) pointer and the second (or next) pointer one position when the bit value corresponding to the second pointer in the validity determination sequence is "1", and can move the first pointer and the second pointer two positions when the bit value corresponding to the second pointer in the validity determination sequence is "0".
[0123] The decoding circuit 423 can determine the bit value corresponding to the first pointer in the compressed data sequence and the validity determination sequence as the bit to be output.
[0124] Reference Figure 4A An example is shown of a compressed data sequence “192313242781” and a validity determination sequence “010110110111” output from control device 420 to neural network circuit 440. The compressed data sequence “192313242781” and the validity determination sequence “010110110111” can be generated by encoding circuit 421 based on data sequence 410 and can be written into memory.
[0125] The decoding circuit 423 can input the value obtained by adding "1" and the bit value corresponding to the second pointer in the validity determination sequence as the data of the multiplexer, and can also input the bit value corresponding to the second pointer in the validity determination sequence as the control signal of the multiplexer.
[0126] Reference Figure 4B When Figure 4A When the bit value corresponding to the second pointer in the validity determination sequence during the loop is "1", the decoding circuit 423 can move the first pointer and the second pointer by one position.
[0127] exist Figure 4B In the context of weight sequence 430, the current address counter can indicate the address corresponding to the first bit of weight sequence 430, and when... Figure 4A When the bit value corresponding to the second pointer in the validity determination sequence during the loop is "1", the decoding circuit 423 can send the "1" as the output of the multiplexer to the address counter. The value sent to the address counter can indicate the difference between the bit address of the weight sequence 430 of the current operation and the bit address of the weight sequence 430 of the next operation. For example, when the value "1" is sent to the address counter, the bit one cell after that participates in the next operation.
[0128] exist Figure 4BIn the middle, the decoding circuit 423 can identify the bit value of the bit waiting to be output in the compressed data sequence as "1" (for example, when in Figure 4A When the bit corresponding to the first pointer in the compressed data sequence during the loop is "1".
[0129] exist Figure 4B In the middle, when Figure 4A When the bit value of the bit waiting for output in the validity determination sequence during the loop is "1", the decoding circuit 423 can send the corresponding compressed data sequence to the neural network circuit 440, and when in Figure 4A When the bit value of the bit waiting for output in the validity determination sequence during the loop is "0", the decoding circuit 423 may not send the corresponding compressed data sequence to the neural network circuit 440.
[0130] Reference Figure 4C When Figure 4B When the bit value corresponding to the second pointer in the validity determination sequence is "1" in the loop, the decoding circuit 423 can move the first pointer and the second pointer by one position.
[0131] exist Figure 4C In the context of weight sequence 430, the current address counter can indicate the address corresponding to the second bit of weight sequence 430, and when in... Figure 4B When the bit value corresponding to the second pointer in the validity determination sequence during the loop is "1", the decoding circuit 423 can send "1" to the address counter.
[0132] exist Figure 4C In the middle, the decoding circuit 423 can identify the bit value of the bit waiting to be output in the compressed data sequence as "8" (for example, when in Figure 4B When the bit value corresponding to the first pointer in the compressed data sequence during the loop is "8", and Figure 4B The compressed data sequence in the file contains bits whose values are "1" and "1" awaiting output. Figure 4B The first "1" of the weight sequence 430 indicated by the address counter can participate Figure 4C The operation of the neural network circuit 440 in the loop.
[0133] Therefore, in Figure 4C In the process, the bit value "1" of the compressed data sequence output from the decoding circuit 423 and the bit value "1" output from the weight sequence 430 can be input into the neural network circuit 440. The neural network circuit 440 can store the value "1" by multiplying the bit value "1" of the compressed data sequence with the bit value "1" of the weight sequence 430.
[0134] Reference Figure 4D When Figure 4CWhen the bit value corresponding to the second pointer in the validity determination sequence during the loop is "0", the decoding circuit 423 can move the first pointer and the second pointer by two positions.
[0135] exist Figure 4D In the context of weight sequence 430, the current address counter can indicate the address corresponding to the third bit of weight sequence 430, and when... Figure 4C In the loop, when the bit value corresponding to the second pointer in the validity determination sequence is "0", the decoding circuit 423 can convert "3" (by connecting "1" with "3"). Figure 4D The value obtained by adding the bit value corresponding to the second pointer in the compressed data sequence is sent to the address counter.
[0136] exist Figure 4D In the middle, the decoding circuit 423 can identify the bit value of the bit waiting to be output in the compressed data sequence as "7" (for example, when in Figure 4C When the bit value corresponding to the first pointer in the compressed data sequence during the loop is "7", and Figure 4C The compressed data sequence in the file contains the bit value "8" waiting to be output and the bit value "8" from the output. Figure 4C The second bit "3" of the weight sequence 430 indicated by the address counter can participate. Figure 4D The operation of the neural network circuit 440 in the loop.
[0137] Therefore, in Figure 4D In this circuit, the bit value "8" of the compressed data sequence output from the decoding circuit 423 and the bit value "3" output from the weight sequence 430 can be input into the neural network circuit 440. The neural network circuit 440 can obtain "24" (by multiplying the bit value "8" of the compressed data sequence by the bit value "3" of the weight sequence 430) and use it as a stored input from the weight sequence 430. Figure 4C The value "25" is stored by adding the intermediate result "1" of the loop (for example, the above multiplication and addition operations can be implemented by a multiply-accumulator (MAC), but is not limited to this).
[0138] Reference Figure 4E When Figure 4D When the bit value corresponding to the second pointer in the validity determination sequence during the loop is "1", the decoding circuit 423 can move the first pointer and the second pointer by one position.
[0139] exist Figure 4E In the context of weight sequence 430, the current address counter can indicate the address corresponding to the sixth bit of weight sequence 430, and when in... Figure 4D When the bit value corresponding to the second pointer in the validity determination sequence during the loop is "1", the decoding circuit 423 can send "1" to the address counter.
[0140] exist Figure 4E In the middle, the decoding circuit 423 can identify the bit value of the bit waiting to be output in the compressed data sequence as "4" (for example, when in Figure 4D When the bit value corresponding to the first pointer in the compressed data sequence during the loop is "4", and Figure 4D The bit value "7" awaiting output in the compressed data sequence and the bit value of the bit waiting to be output. Figure 4D The third bit "5" of the weight sequence 430 indicated by the address counter can participate. Figure 4E The operation of the neural network circuit 440 in the loop.
[0141] Therefore, in Figure 4E In this circuit, the bit value "7" of the compressed data sequence output from the decoding circuit 423 and the bit value "5" output from the weight sequence 430 can be input into the neural network circuit 440. The neural network circuit 440 can obtain "35" (by multiplying the bit value "7" of the compressed data sequence by the bit value "5" of the weight sequence 430) and use it as a stored value from the weight sequence 430. Figure 4D The intermediate result value "25" of the loop is added to store the value "60".
[0142] Reference Figure 4F When Figure 4E When the bit value corresponding to the second pointer in the validity determination sequence during the loop is "0", the decoding circuit 423 can move the first pointer and the second pointer by two positions.
[0143] exist Figure 4F In the context of weight sequence 430, the current address counter can indicate the address corresponding to the seventh bit of weight sequence 430, and when in... Figure 4E In the loop, when the bit value corresponding to the second pointer in the validity determination sequence is "0", the decoding circuit 423 can convert "4" (by connecting "1" with "4"). Figure 4F The value obtained by adding the bit value corresponding to the second pointer in the compressed data sequence is sent to the address counter.
[0144] exist Figure 4F In the middle, the decoding circuit 423 can identify the bit value of the bit waiting to be output in the compressed data sequence as "2" (for example, when in Figure 4E When the bit value corresponding to the first pointer in the compressed data sequence during the loop is "2", and Figure 4E The compressed data sequence in the file contains the bit value "4" waiting to be output, and the bit value "4" is generated by... Figure 4E The sixth bit "0" of the weight sequence 430 indicated by the address counter can participate Figure 4F The operation of the neural network circuit 440 in the loop.
[0145] However, in Figure 4F In the process, when the bit involved in the operation of the weight sequence 430 is "0", that bit can be determined as invalid and therefore not sent to the neural network circuit 440. Therefore, the neural network circuit 440 may not perform multiplication with the compressed data sequence. Thus, the bit value indicated by the data sequence of the weight sequence 430 stored in the neural network circuit 440 can be kept as "5", and the bit value indicated by the validity determination sequence of the weight sequence 430 can be set to "0". Furthermore, the neural network circuit 440 can still retain the data already stored from... Figure 4E The intermediate result value of the loop is "60".
[0146] Reference Figure 4G When Figure 4F When the bit value corresponding to the second pointer in the validity determination sequence during the loop is "1", the decoding circuit 423 can move the first pointer and the second pointer by one position.
[0147] exist Figure 4G In the context of weight sequence 430, the current address counter can indicate the address corresponding to the eleventh bit of weight sequence 430, and when in... Figure 4F When the bit value corresponding to the second pointer in the validity determination sequence during the loop is "1", the decoding circuit 423 can send "1" to the address counter.
[0148] exist Figure 4G In the middle, the decoding circuit 423 can identify the bit value of the bit waiting to be output in the compressed data sequence as "1" (for example, when in Figure 4F When the bit corresponding to the first pointer in the compressed data sequence during the loop is "1", and Figure 4F The compressed data sequence in the file contains the bit value "2" waiting to be output, and the bit value "2" is generated by... Figure 4F The seventh bit "4" of the weight sequence 430 indicated by the address counter can participate. Figure 4G The operation of the neural network circuit 440 in the loop.
[0149] Therefore, in Figure 4G In this circuit, the bit value "2" of the compressed data sequence output from the decoding circuit 423 and the bit value "4" output from the weight sequence 430 can be input into the neural network circuit 440. The neural network circuit 440 can obtain "8" (by multiplying the bit value "2" of the compressed data sequence by the bit value "4" of the weight sequence 430) and use it as a stored input from the weight sequence 430. Figure 4F The intermediate result value "60" of the loop is added to store the value "68".
[0150] According to reference Figures 4A to 4GIn the described example, invalid bits may not be output to the neural network circuit 440. Therefore, the control device of one or more embodiments may be configured to omit not only operations with respect to consecutive invalid bits, but also operations with respect to non-consecutive invalid bits.
[0151] Figure 5 This illustrates an example of performing operations in a neural network based on the output of a control device.
[0152] Reference Figure 5 The decoding circuit 510 of the control device can output data used to train a neural network (e.g., a fully connected network) to the neural network circuit 520. The data output from the decoding circuit 510 of the control device can be the input data for the current layer of the neural network. Operations can be performed by the neural network circuit 520 using the input data for the current layer output from the decoding circuit 510 of the control device and the weight data 530, 540, 550, 560 for the current layer. Such operations may include multiplication between a weight matrix (e.g., a 4×16 matrix) and an input matrix (e.g., a 16×1 matrix). The neural network circuit 520 (also referred to as the training circuit 520) can use multiple processing elements to perform matrix multiplication. Data can be moved sequentially from left to right among the processing elements.
[0153] It can be done by according to Figures 4A to 4G The example of processing operations in a neural network circuit based on the output of a control device iteratively executes the operation a number of times corresponding to the number of items in the weight data, in order to perform the operation according to... Figure 5 The example demonstrates the processing of operations in a neural network based on the output of a control device.
[0154] Figure 6 An example of zero strobing is shown.
[0155] Reference Figure 6 The validity determination sequence can be used as a clock gating signal to execute the operation of the neural network circuit. The decoding circuit of the control device can determine whether the current bit corresponding to the first pointer is valid, and if the current bit is invalid (e.g., if the bit value in the validity determination sequence is "0"), the current bit may not be sent to the neural network circuit. The decoding circuit can start the operation when the data input to the neural network circuit 440 is valid, thereby reducing the power consumption of the neural network circuit.
[0156] As referenced above Figures 4A to 4G As stated above, when the bit value corresponding to the second pointer in the validity determination sequence is "0", the decoding circuit can move the first pointer and the second pointer two positions. Therefore, the current value corresponding to the first pointer usually has a valid value.
[0157] However, since the first and second pointers do not exceed the third pointer, the current bit corresponding to the first pointer may have an invalid value under certain circumstances.
[0158] For example, in Example 610, the bit value corresponding to the second pointer (e.g., "N") in the compressed data sequence can be invalid, so the first pointer (e.g., "C") and the second pointer can move two positions. However, in Example 620, the first and second pointers can move one position when they have not exceeded the third pointer (e.g., "W"). Figure 6 In this context, "CG" can represent the clock gating signal, and "JP" can represent the number of divisions that the first and second pointers have moved.
[0159] Therefore, in Example 620, the current bit corresponding to the first pointer can have an invalid value. In Example 630, the current bit corresponding to the first pointer can be determined to be invalid by the clock gating signal "1", so the decoding circuit will maintain the bit value indicated by the previous bit of the data sequence as "9" instead of sending the current bit to the neural network circuit.
[0160] Each of Examples 610, 620, and 630 may include a data sequence corresponding to the first row and a validity determination sequence corresponding to the second row.
[0161] Figure 7 This illustrates an example of increasing reuse by storing the range of values used in iterations.
[0162] Reference Figure 7 When the data sequence includes multiple reused data, the decoding circuit can store a fourth pointer to identify the multiple reused data. Here, the fourth pointer can also be called an iteration pointer.
[0163] When a sequence of data to be reused is inserted into the buffer, the decoding circuit can place multiple fourth pointers representing the iteration interval, thereby facilitating iteration. When invalid data exceeds the iteration range, the decoding circuit can split the invalid data into two segments and store the segments separately in the buffer.
[0164] The decoding circuit can store a fourth pointer indicating the range of iterations, and iteratively decode the data for the next iteration interval after reusing the data in that range, until the data has been reused to the maximum extent.
[0165] Figure 8 This illustrates an example of using zero-gating in a systolic array to reduce power consumption.
[0166] Reference Figure 8The encoding circuitry of the control device can compress the input data and weight data for the current layer individually, based on the scope of reuse. Figure 8 In this context, "A" can represent the bit value of a data sequence, and "B" can represent the bit value of the corresponding bit in the weight sequence.
[0167] Figure 9 This illustrates an example of controlling data input and output when data is stored in parallel.
[0168] Reference Figure 9 When the memory has a large bit width, data can be stored in parallel. Data stored in parallel is highly likely to include different numbers of consecutive "0"s at the same address, so the decoding circuit can insert dummy values to match the data to the sequence with the longest range. In one example, when reading multiple compressed data sequences in parallel, the decoding circuit can add bits to each compressed data sequence to make them have the same length.
[0169] In this way, neural network circuits can omit operations on common non-contiguous invalid bits of data stored in parallel.
[0170] Figure 10 This illustrates an example of an application of a method for controlling data input and output.
[0171] The method for controlling data input and output is applicable to all schemes that read continuous data sequentially.
[0172] Reference Figure 10 The control method for a control device connected to a neural network circuit performing deep learning operations to control data input and output can also be applied to the pulsating array 1010. Figure 10 In this context, PE can represent "processing element".
[0173] Furthermore, the control method of connecting to the control device of the neural network circuit that performs deep learning operations to control data input and output can also be applied to the adder tree architecture 1020.
[0174] Regarding Figures 1 to 1 Figure 10The described control devices, memories, encoding circuits, decoding circuits, neural network circuits, pulsating arrays, adder tree architectures, address counters, accumulators, data control devices, weight control devices, input / weight control devices, output control devices, control device 200, memory 210, encoding circuit 220, decoding circuit 230, neural network circuit 240, control device 420, encoding circuit 421, decoding circuit 423, neural network circuit 440, decoding circuit 510, neural network circuit 520, pulsating array 1010, adder tree architecture 1020, devices, units, modules, apparatuses, and other components are implemented by or represent hardware components. Examples of hardware components that can be used to perform the operations described in this application include, where appropriate, controllers, sensors, generators, drivers, memories, comparators, arithmetic logic units, adders, subtractors, multipliers, dividers, integrators, and any other electronic components configured to perform the operations described in this application. In other examples, one or more hardware components performing the operations described herein are implemented using computing hardware (e.g., one or more processors or computers). A processor or computer may be implemented using one or more processing elements (such as logic gate arrays, controllers and arithmetic logic units, digital signal processors, microcomputers, programmable logic controllers, field-programmable gate arrays, programmable logic arrays, microprocessors, or any other means or combination of means configured to respond to and execute instructions in a defined manner to achieve a desired result). In one example, the processor or computer includes or is connected to one or more memories storing instructions or software executed by the processor or computer. The hardware components implemented by the processor or computer can execute instructions or software (such as an operating system (OS) and one or more software applications running on the OS) for performing the operations described herein. The hardware components may also access, manipulate, process, create, and store data in response to the execution of instructions or software. For the sake of brevity, the singular terms “processor” or “computer” are used in the description of the examples described herein; however, in other examples, multiple processors or computers may be used, or a processor or computer may include multiple processing elements or multiple types of processing elements or both. For example, a single hardware component or two or more hardware components may be implemented using a single processor, or two or more processors, or a processor and a controller. One or more hardware components may be implemented using one or more processors, or a processor and a controller, and one or more other hardware components may be implemented using one or more other processors, or additional processors and additional controllers. One or more processors, or processors and controllers, may implement a single hardware component or two or more hardware components.The hardware components can have any one or more different processing configurations, examples of which include: a single processor, a discrete processor, a parallel processor, a single instruction single data (SISD) multiprocessing, a single instruction multiple data (SIMD) multiprocessing, multiple instruction single data (MISD) multiprocessing, and multiple instruction multiple data (MIMD) multiprocessing.
[0175] Figure 1 to Figure 10 The method of performing the operations described in this application, as shown, is executed by computing hardware (e.g., by one or more processors or a computer), wherein the computing hardware is implemented as described above to execute instructions or software to perform the operations performed by the method described in this application. For example, a single operation or two or more operations may be executed by a single processor or two or more processors, or a processor and a controller. One or more operations may be executed by one or more processors, or a processor and a controller, and one or more other operations may be executed by one or more other processors, or additional processors and additional controllers. One or more processors, or a processor and a controller, may execute a single operation or two or more operations.
[0176] Instructions or software for controlling computing hardware (e.g., one or more processors or computers) to implement hardware components and perform the methods described above can be written as computer programs, code segments, instructions, or any combination thereof to individually or collectively instruct or configure one or more processors or computers to operate as machines or special-purpose computers to perform operations performed by the hardware components and methods described above. In one example, the instructions or software include machine code (such as machine code generated by a compiler) that is directly executed by one or more processors or computers. In another example, the instructions or software include high-level code that is executed by one or more processors or computers using an interpreter. The instructions or software can be written using any programming language based on the block diagrams and flowcharts shown in the accompanying drawings and the corresponding description used herein, which disclose algorithms for performing operations performed by the hardware components and methods described above.
[0177] Instructions or software used to control computing hardware (e.g., one or more processors or calculators) to implement hardware components and perform the methods described above, along with any associated data, data files, and data structures, may be recorded, stored, or fixed in, or on, one or more non-transitory computer-readable storage media. Examples of non-transitory computer-readable storage media include: read-only memory (ROM), random access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD+R, CD-RW, CD+RW, DVD-ROM, DVD-R, DVD+R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or optical disc storage, hard disk drive (HDD). Solid-state drives (SSDs), card storage devices (such as multimedia microcards or cards (e.g., Secure Digital (SD) or Extreme Digital (XD) cards)), magnetic tapes, floppy disks, magneto-optical data storage devices, optical data storage devices, hard disks, solid-state drives, and any other devices configured to store instructions or software and any associated data, data files, and data structures in a non-transitory manner and to provide said instructions or software and any associated data, data files, and data structures to one or more processors or computers, such that one or more processors or computers can execute said instructions. In one example, the instructions or software and any associated data, data files, and data structures are distributed across a networked computer system, such that the instructions and software and any associated data, data files, and data structures are stored, accessed, and executed in a distributed manner through one or more processors or computers.
[0178] While this disclosure includes specific examples, it will be clear upon understanding this disclosure that various changes in form and detail may be made to these examples without departing from the spirit and scope of the claims and their equivalents. The examples described herein are to be considered descriptive only and not for limiting purposes. The description of features or aspects in each example is to be considered applicable to similar features or aspects in other examples. Suitable results may be obtained if the described techniques are performed in a different order, and / or if components in the described system, architecture, apparatus, or circuit are combined in a different manner and / or replaced or supplemented by other components or their equivalents. Therefore, the scope of this disclosure is not limited by the specific embodiments but by the claims and their equivalents, and all variations within the scope of the claims and their equivalents shall be construed as included in this disclosure.
Claims
1. A control device for controlling the data input and output of a neural network circuit, comprising: Memory; The encoding circuit is configured as follows: Receive data sequence, Generate a compressed data sequence in which consecutive invalid bits in the bit string of the data sequence are compressed into individual bits of the compressed data sequence. Generate a validity determination sequence for the valid and invalid bits in the bit string indicating the data sequence to be compressed, and Write the compressed data sequence and the validity determination sequence into memory; as well as The decoding circuit is configured as follows: Read the compressed data sequence and validity determination sequence from memory, and Based on the validity determination sequence, the bits in the bit string of the compressed data sequence that are set to be sent to the neural network circuit are determined, so that the neural network circuit omits operations on invalid bits. The decoding circuit includes a buffer configured to sequentially store a compressed data sequence and a validity determination sequence. The decoding circuit is further configured to store a first pointer and a second pointer, the first pointer indicating the position in the buffer where the current bit of the compressed data sequence to be sent to the neural network circuit is stored, and the second pointer indicating the position in the buffer where the next bit of the compressed data sequence to be sent to the neural network circuit in the next cycle is stored. The decoding circuit is further configured to: determine whether the next bit corresponding to the second pointer is valid based on the validity determination sequence; move the first pointer to the position where the next bit is stored in the buffer in response to the next bit being valid; and move the first pointer to the position where the bit to be sent to the neural network circuit in the next cycle in response to the next bit being invalid.
2. The control device according to claim 1, wherein, The single bit of the compressed data sequence indicates the number of consecutive invalid bits in the bit string of the data sequence.
3. The control device according to claim 1 or claim 2, wherein, To determine the bit set to be sent to the neural network circuit, the decoding circuit is also configured to: The validity sequence is used to determine whether the current bit corresponding to the first pointer is valid. In response to the current bit being invalid, skip sending the current bit to the neural network circuit; and In response to the current bit being valid, the current bit is sent to the neural network circuit.
4. The control device according to claim 1 or claim 2, wherein, The decoding circuit is also configured as follows: Based on the validity determination sequence, determine whether the next bit corresponding to the second pointer is valid; In response to the next bit being valid, the second pointer is moved to the position in the buffer where the bit to be sent to the neural network circuit in the next cycle is stored. as well as In response to an invalid next bit, the second pointer is moved to the position in the buffer where the bit to be sent to the neural network circuit will be stored in the loop after the next bit.
5. The control device according to claim 1 or claim 2, wherein, The decoding circuit is also configured to determine the operation process that skips the neural network circuit based on the validity determination sequence.
6. The control device according to claim 1 or claim 2, wherein, The decoding circuit is also configured to determine whether to skip the operation processing of the neural network circuit based on the next bit corresponding to the second pointer.
7. The control device according to claim 1 or claim 2, wherein, The decoding circuit is also configured as follows: Based on the validity determination sequence, determine whether the next bit corresponding to the second pointer is valid; In response to the next valid bit, the operation processing of the neural network circuit is not skipped; and In response to the next bit being invalid, the operation processing of the neural network circuit is skipped.
8. The control device according to claim 7, wherein, The decoding circuit is also configured to skip the next bit value in response to the next bit being invalid.
9. The control device according to claim 7, wherein, The decoding circuit is also configured to send the value obtained by adding the value 1 to the value of the next bit in response to the next bit being invalid to the address counter.
10. The control device according to claim 1 or claim 2, wherein, The decoding circuit is also configured to store a third pointer, which indicates the location in the buffer where the compressed data sequence and validity determination sequence will be stored.
11. The control device according to claim 1 or claim 2, wherein, A valid bit is a bit with a value greater than a predetermined threshold, and an invalid bit is a bit with a value less than or equal to a predetermined threshold.
12. The control device according to claim 1 or claim 2, wherein, The bit value at the position corresponding to the valid bit in the compressed data sequence is "1", and the bit value at the position corresponding to the invalid bit in the compressed data sequence is "0".
13. The control device according to claim 1 or claim 2, wherein, The decoding circuit is also configured to use a validity determination sequence as a clock gating signal to perform the operation of the neural network circuit.
14. The control device according to claim 1 or claim 2, wherein, The buffers include ring buffers.
15. The control device according to claim 1 or claim 2, wherein, The encoding circuit is also configured to generate a compressed data sequence by compressing consecutive valid bits with the same bit value in the bit string of the data sequence into another single bit of the compressed data sequence.
16. The control device according to claim 1 or claim 2, wherein, The decoding circuit is also configured to: in response to a data sequence comprising multiple reused data, store a fourth pointer for identifying the multiple reused data.
17. The control device according to claim 1 or claim 2, wherein, The decoding circuit is also configured to add bits to the multiple compressed data sequences to make them have the same length when multiple compressed data sequences are read in parallel.
18. The control device according to claim 1 or claim 2, wherein, The data sequence indicates the connection strength of the edges between nodes in a neural network circuit.
19. A neural network system comprising a neural network circuit and a control device for controlling the data input and output of the neural network circuit as described in claim 1.
20. A method for training a neural network for recognizing images, comprising: Obtain training image data; as well as The neural network is trained based on the training image data. Specifically, before the training operation for the current layer of the neural network, the original data sequence used for the current layer's operation is processed. The processed data sequence is then applied to the training operation for the current layer, thus omitting operations on invalid bits during the current layer's training. The steps for processing the original data sequence used for operations in the current layer include: A compressed data sequence is generated based on the original data sequence used for the current layer's operations. In the compressed data sequence, consecutive invalid bits in the bit string of the data sequence are compressed into a single bit of the compressed data sequence. Generate a validity determination sequence for the valid and invalid bits in the bit string used to determine the compressed data sequence; Write the compressed data sequence and the validity determination sequence into memory; Read the compressed data sequence and validity determination sequence from memory; and Based on the validity of the sequence, the bits in the bit string of the compressed data sequence that are set to be used for training operations are determined. The method further includes: sequentially storing a compressed data sequence and a validity determination sequence; and storing a first pointer and a second pointer, the first pointer indicating the position in the buffer where the current bit of the compressed data sequence to be applied to the training operation is stored, and the second pointer indicating the position in the buffer where the next bit of the compressed data sequence to be applied to the training operation is stored in the next loop after the current bit. The method further includes: determining whether the next bit corresponding to the second pointer is valid based on the validity determination sequence; moving the first pointer to the position where the next bit is stored in the buffer in response to the next bit being valid; and moving the first pointer to the position where the bit to be applied to the training operation is stored in the buffer in response to the next bit being invalid.
21. The method according to claim 20, wherein, The original data sequence used for operations in the current layer is the input data and / or the weights of the current layer. In this neural network, the input data for the first layer is the training image data, and the input data for the next layer is the output data of the current layer.
22. The method according to claim 20 or claim 21, wherein, The single bit of the compressed data sequence indicates the number of consecutive invalid bits in the bit string of the data sequence.
23. The method according to claim 20 or claim 21, wherein, The determined steps include: The validity sequence is used to determine whether the current bit corresponding to the first pointer is valid. In response to the current bit being invalid, skip applying the current bit to the training operation; and In response to the current bit being valid, apply the current bit to the training operation.
24. The method according to claim 20 or claim 21, further comprising: Based on the validity determination sequence, determine whether the next bit corresponding to the second pointer is valid; In response to the next bit being valid, the second pointer is moved to the position in the buffer where the bit to be applied to the training operation is stored in the next loop. as well as In response to an invalid next bit, the second pointer is moved to the position in the buffer where the bit to be applied to the training operation is stored in the loop after the next bit.
25. The method according to claim 20 or claim 21, further comprising: Based on the next bit corresponding to the second pointer, determine whether to skip the training operation processing.
26. The method according to claim 20 or claim 21, further comprising: Based on the validity determination sequence, determine whether the next bit corresponding to the second pointer is valid; The response is valid until the next bit is available, without skipping training operation processing; as well as If the next bit is invalid, skip the training operation processing.
27. The method according to claim 26, wherein, The skipped steps include: in response to the next bit being invalid, skipping the next bit value in the training operation.
28. The method according to claim 20 or claim 21, further comprising: A third pointer is stored, which indicates the location in the buffer where the compressed data sequence and validity determination sequence will be stored.
29. The method according to claim 20 or claim 21, wherein, The generation steps include generating a compressed data sequence by compressing consecutive valid bits with the same bit value in the bit string of the data sequence into another single bit of the compressed data sequence.
30. The method according to claim 20 or claim 21, further comprising: In response to a data sequence comprising multiple reused data, a fourth pointer is stored to identify the multiple reused data.
31. The method according to claim 20 or claim 21, further comprising: When multiple compressed data sequences are read in parallel, bits are added to the multiple compressed data sequences to make them have the same length.
32. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, configure the processor to perform the method of claim 20.