Data processing apparatus, data processing system and data processing method
By encoding the layer construction information of the neural network and transmitting only the updated layer information, the problem of excessive data transmission volume in multiple client systems is solved, thereby improving data transmission efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- MITSUBISHI ELECTRIC CORP
- Filing Date
- 2019-09-27
- Publication Date
- 2026-06-19
AI Technical Summary
In systems with multiple client-server connections, existing technologies require the transmission of information from outdated layers, leading to increased data transmission volume and an inability to effectively reduce data size.
By encoding the layer construction information of the neural network, only the updated layer information is transmitted, and the model title information and layer title information are encoded, the decoding side only decodes the necessary information.
It effectively reduced the amount of data transmitted, lowered the server's encoding processing load, and improved data transmission efficiency.
Smart Images

Figure CN114503119B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a data processing apparatus, a data processing system, and a data processing method for generating encoded data that encodes information related to the structure of a neural network. Background Technology
[0002] Machine learning exists as a method for solving classification (recognition) and regression problems of input data. Within machine learning, there are methods such as neural networks that simulate the neural circuits (neurons) of the brain. In neural networks (hereinafter referred to as NNs), input data is classified (recognized) or regressed through a probabilistic model (recognition model, generative model) represented by a network formed by interconnected neurons.
[0003] Furthermore, neural networks (NNs) can achieve high performance by optimizing their parameters through learning from large amounts of data. However, with the increasing scale of NNs in recent years, the data size of NNs has tended to be large, which has also increased the computational load on computers using NNs.
[0004] For example, Non-Patent Document 1 describes a technique for encoding information representing the structure of a neural network, namely the weights of the edges (including bias values), after scalar quantization. By encoding the edge weights after scalar quantization, the data size of the edge-related data is compressed.
[0005] Existing technical documents
[0006] Non-patent literature
[0007] Non-patent literature 1: Vincent Vanhoucke, Andrew Senior, Mark Z.Mao, "Improving thespeed of neural networks on CPUs", Proc.Deep Learning and Unsupervised FeatureLearning NIPS Workshop, 2011. Summary of the Invention
[0008] The problem the invention aims to solve
[0009] In systems where multiple clients connect to a server via a data transmission network, data representing the construction of a neural network (NN) learned on the server side is encoded, and the encoded data is decoded on the client side. Thus, multiple clients may each use the NN learned on the server for data processing. In existing systems, when updating the NN construction, information about the updated layers is transmitted to the client, along with information about the unupdated layers. Therefore, there is a problem of not being able to reduce the size of the transmitted data.
[0010] The present invention addresses the aforementioned problems and aims to provide a data processing apparatus, data processing system, and data processing method capable of reducing the data size of data representing the structure of a neural network.
[0011] means for solving problems
[0012] The data processing apparatus of the present invention includes: a data processing unit that learns a neural network; and an encoding unit that generates encoded data that encodes model title information for identifying the model of the neural network, layer title information for identifying the layers of the neural network, and weight information of each edge belonging to the layer identified by the layer title information, and the encoding unit encodes layer structure information representing the layer structure of the neural network.
[0013] The effects of the invention
[0014] According to the present invention, the encoding unit encodes layer construction information representing the layer structure of the NNN and a new layer flag indicating whether each layer being encoded is an update of a layer in the reference model or a new layer. Since only information related to the updated layers in the data representing the structure of the NNN is encoded and transmitted, the data size of the data representing the structure of the NNN can be reduced. Attached Figure Description
[0015] Figure 1 This is a block diagram illustrating the structure of the data processing system in Embodiment 1.
[0016] Figure 2 This is a diagram illustrating an example of the structure of an NNN.
[0017] Figure 3 This is a block diagram showing the structure of the data processing apparatus (encoder) according to Embodiment 1.
[0018] Figure 4 This is a block diagram illustrating the structure of the data processing apparatus (decoder) of Embodiment 1.
[0019] Figure 5 This is a flowchart illustrating the operation of the data processing apparatus (encoder) in Embodiment 1.
[0020] Figure 6This is a flowchart illustrating the operation of the data processing apparatus (decoder) in Embodiment 1.
[0021] Figure 7 This is a diagram illustrating an example of encoded data in Implementation 1.
[0022] Figure 8 This is a diagram illustrating another example of coded data in Implementation 1.
[0023] Figure 9 This is a diagram illustrating an example of convolution processing of one-dimensional data in Implementation 1.
[0024] Figure 10 This is a diagram illustrating an example of convolution processing of two-dimensional data in Implementation 1.
[0025] Figure 11 It is a graph showing the matrix of edge weight information for each node in the l-th layer of the NN.
[0026] Figure 12 This is a graph showing the quantization step size matrix of the edge weight information for each node in the l-th layer of an NN.
[0027] Figure 13 It is a graph showing the matrix of edge weight information in the convolutional layer.
[0028] Figure 14 This is a graph showing the quantization stride of the matrix that represents the weight information of the edges in the convolutional layer.
[0029] Figure 15 This is a block diagram showing the structure of a modified example of the data processing apparatus (encoder) of Embodiment 1.
[0030] Figure 16 This is a diagram showing an outline of the update of encoded data in Implementation 1.
[0031] Figure 17 It is shown that... Figure 16 The diagram shows the structure of the network model corresponding to the update of the encoded data.
[0032] Figure 18 This is a diagram illustrating an example of the layer construction information contained in the model information header.
[0033] Figure 19 This is a diagram showing an example of layer identification information corresponding to the layer construction information contained in the model information title.
[0034] Figure 20A This is a block diagram illustrating the hardware structure for implementing the functions of the data processing apparatus of Embodiment 1. Figure 20BThis is a block diagram illustrating the hardware structure of the software that performs the functions of the data processing apparatus implementing Embodiment 1. Detailed Implementation
[0035] Implementation method 1.
[0036] Figure 1 This is a block diagram illustrating the structure of the data processing system in Embodiment 1. Figure 1 In the data processing system shown, server 1 is connected to clients 3-1, 3-2, ..., 3-N via data transmission network 2. N is a natural number greater than 2. Server 1 is a data processing device that generates a high-performance neural network (NN) by optimizing its parameters through learning from large amounts of data. Figure 1 The data processing system shown has a first data processing device.
[0037] Data transmission network 2 is a network that transmits data exchanged between server 1 and clients 3-1, 3-2, ..., 3-N; it is the Internet or an intranet. For example, in data transmission network 2, information for generating NN is sent from server 1 to clients 3-1, 3-2, ..., 3-N.
[0038] Clients 3-1, 3-2, ..., 3-N are devices that generate the neural network (NN) learned from server 1 and use the generated NN for data processing. For example, clients 3-1, 3-2, ..., 3-N are devices with communication and data processing capabilities, such as personal computers (PCs), cameras, or robots. Clients 3-1, 3-2, ..., 3-N are respectively... Figure 1 The data processing system shown has a second data processing device.
[0039] exist Figure 1 In the data processing system shown, the model, parameter data size, and values representing appropriate performance of the neural network (NN) differ in clients 3-1, 3-2, ..., 3-N. Therefore, even if the model and parameters of the NN are encoded using the technique described in Non-Patent Document 1, they still need to be compressed to data sizes suitable for clients 3-1, 3-2, ..., 3-N, increasing the encoding processing load.
[0040] Therefore, in the data processing system of Implementation Method 1, server 1 generates encoded data, which encodes information such as model title information for identifying the model of the neural network (NN), layer title information for identifying the layers of the NN, and the weights of edges containing bias values for each layer (hereafter, unless otherwise specified, the edge weights contain bias values), and sends this encoded data to clients 3-1, 3-2, ..., 3-N via data transmission network 2. Clients 3-1, 3-2, ..., 3-N are each able to decode only the necessary layer-related information from the encoded data transmitted from server 1 via data transmission network 2. This reduces the encoding processing load on server 1 and decreases the amount of data transmitted from server 1 to data transmission network 2.
[0041] Here, the structure of NN will be explained. Figure 2 This is a diagram illustrating an example of the structure of an NNN. For example... Figure 2 As shown, the input data (x1, x2, ..., x...) N1 The processing is performed in each layer of the neural network, and the processing results (y1, ..., y) are output. NL N l (l = 1, 2, ..., L) represents the number of nodes in the l-th layer, and L represents the number of layers in the neural network. For example... Figure 2 As shown, the neural network has an input layer, hidden layers, and an output layer, each of which is a structure of multiple nodes connected by edges. The output value of each node can be calculated based on the output values of the nodes in the layers before the edges are connected, the weights of the edges, and the activation function defined for each layer.
[0042] In neural networks (NNs), there exist, for example, convolutional neural networks (CNNs) that not only have fully-connected layers but also convolutional layers and pooling layers. CNNs can generate networks that perform data processing beyond classification and regression, such as those that filter data.
[0043] For example, CNNs can be used to filter images or sounds as input, removing noise or enhancing the quality of the input signal; restore high frequencies lost in compressed sound; restore images with missing regions (inpainting); or perform super-resolution processing. CNNs can also be used to construct neural networks that combine generative and recognition models, using a recognition model to determine whether data was generated by the generative model to verify its authenticity.
[0044] In recent years, a new type of neural network, the Generative Adversarial Network (GAN), has been proposed. This GAN learns in an adversarial manner, preventing the generative model from generating data that the recognition model deems not real, while the recognition model deems data generated by the generative model not real. This type of GAN can generate high-precision generative and recognition models.
[0045] Figure 3 This is a block diagram showing the structure of the data processing apparatus (encoder) according to Embodiment 1. Figure 3 The data processing apparatus shown is a first data processing apparatus that uses a learning dataset and an evaluation dataset to learn a neural network and generate encoded data representing the structure of the neural network (hereinafter referred to as model information). For example, it is... Figure 1 Server 1 is shown.
[0046] Figure 3 The data processing apparatus shown includes a data processing unit 10 and an encoding unit 11. The data processing unit 10 is a first data processing unit for learning a neural network (NN), and includes a learning unit 101, an evaluation unit 102, and a control unit 103. The encoding unit 11 generates encoded data that encodes model title information identifying the model of the NN learned by the learning unit 101, layer title information identifying the layers of the NN, and edge weight information of the layer units. Furthermore, the encoding unit 11 encodes the layer structure information of the layer to be encoded (encoded layer) and encodes new layer flags. The layer structure information represents the layer structure of the NN. The new layer flag is flag information used to identify whether the layer is a newly added layer or an update of an existing layer, which will be described in detail later.
[0047] The learning unit 101 performs learning processing on the neural network (NN) using the learning dataset, generating model information for the learned NN. The model information is output from the learning unit 101 to the evaluation unit 102. Furthermore, the learning unit 101, having encoding model information controlled by the control unit 103 (described later), outputs the encoding model information to the encoding unit 11 upon receiving a learning completion instruction from the control unit 103. The evaluation unit 102 generates the NN using the model information and performs inference processing on the generated NN based on the evaluation dataset. The value of the evaluation metric obtained as a result of the inference processing is the evaluation result, which is output from the evaluation unit 102 to the control unit 103. The evaluation metric is set in the evaluation unit 102, for example, inference accuracy or the output value of a loss function.
[0048] The control unit 103 determines whether the model of the neural network (NN) learned by the learning unit 101 has been updated and whether the learning unit 101 has completed learning the NN, based on the evaluation value obtained by the evaluation unit 102. The control unit 103 then controls the learning unit 101 according to the determination results. For example, the control unit 103 compares the evaluation value with a model update determination criterion and determines whether to update the model information to encoding model information based on the comparison result. Furthermore, the control unit 103 compares the evaluation value with a learning completion determination criterion and determines whether the learning unit 101 has completed learning the NN based on the comparison result. These determination criteria are determined based on the history of the evaluation values.
[0049] Figure 4 This is a block diagram illustrating the structure of the data processing apparatus (decoder) of Embodiment 1. Figure 4 The data processing device shown is the second data processing device, for example, is Figure 1 The clients shown are 3-1, 3-2, ..., 3-N, which are connected to... Figure 3 The encoded data generated by the encoding unit 11 shown is decoded to generate a neural network, and the generated neural network is used to perform inference processing on one or more evaluation data.
[0050] Figure 4 The data processing apparatus shown includes a decoding unit 201 and an inference unit 202. The decoding unit 201 decodes model information from the encoded data generated by the encoding unit 11. For example, the decoding unit 201 can decode only the encoded data. Figure 4 The data processing device shown decodes the required information.
[0051] The inference unit 202 is a second data processing unit that generates a neural network (NN) using the model information decoded by the decoding unit 201 and performs data processing using the generated NN. For example, the data processing is inference processing for evaluation data using the NN. The inference unit 202 performs inference processing on the evaluation data using the NN and outputs the inference result.
[0052] Next, the operation of the data processing system in Implementation Method 1 will be explained. Figure 5 This is a flowchart illustrating the operation of the data processing apparatus (encoder) in Embodiment 1. Figure 3 The data processing method of the data processing apparatus shown. The learning unit 101 learns the neural network (step ST1). For example, the learning unit 101 uses a learning dataset to learn the neural network and outputs the model information obtained through this learning to the evaluation unit 102.
[0053] Model information represents the structure of the neural network (NN), consisting of layer construction information representing the construction of each layer and the weights of the edges belonging to each layer. Layer construction information includes layer category information, structural information related to the layer category, and information needed to construct the layer other than the edge weights. Information needed to construct the layer other than the edge weights includes, for example, the activation function. Layer category information represents the type of layer; by referring to the layer category information, it is possible to identify the type of layer, such as convolutional layers, pooling layers, or fully connected layers.
[0054] Structural information related to layer category refers to information about the structure of the layer corresponding to the layer category information. For example, if the layer category corresponding to the layer category information is a convolutional layer, the structural information related to the layer category includes the number of channels for convolution, the data size and shape of the convolutional filter (kernel), the convolution interval (stride), whether there is padding for the input signal boundaries for convolution processing, and the padding method if padding is present. Furthermore, if the layer category corresponding to the layer category information is a pooling layer, the structural information related to the layer category includes the pooling method such as max pooling or average pooling, the shape of the pooling kernel, the pooling interval (stride), whether there is padding for the input signal boundaries for pooling processing, and the padding method if padding is present.
[0055] In representing the weights of each edge, sometimes the weights are set independently for each edge, as in fully connected layers. On the other hand, sometimes the weights of edges are shared in units of the convolutional filter (kernel) (channel unit), as in convolutional layers, meaning that the weights of edges are shared within a single filter.
[0056] Evaluation unit 102 evaluates the neural network (NN) (step ST2). For example, evaluation unit 102 generates the NN using the model information generated by learning unit 101, and performs inference processing using the generated NN based on the evaluation dataset. The evaluation result is output from evaluation unit 102 to control unit 103. The evaluation result is, for example, the output value of inference accuracy or loss function.
[0057] Next, the control unit 103 determines whether to update the model information (step ST3). For example, if the evaluation value generated by the evaluation unit 102 does not meet the model update determination criteria, the control unit 103 determines that the encoding model information held by the learning unit 101 should not be updated; if the evaluation value meets the model update determination criteria, the control unit 103 determines that the encoding model information should be updated.
[0058] As an example of a benchmark for model updates, there is the following: when the evaluation value is the output value of the loss function, the evaluation value of the current learning is less than the minimum evaluation value of the learning history since the beginning of learning. As another example, there is the following: when the evaluation value is inference accuracy, the evaluation value of the current learning is greater than the maximum evaluation value of the learning history since the beginning of learning.
[0059] Furthermore, the switching unit of the learning history can be arbitrary. For example, suppose there is a learning history for each model identification number (model_id) described later. In this case, if the model does not have a reference model identification number (reference_model_id) described later, learning begins with no learning history. That is, in the first step ST3, the model information must be updated. On the other hand, if the model has a reference model identification number, the learning history (history A) of the model indicated by that reference model identification number is referenced. This prevents the model from being updated to a model with a worse evaluation value than the model indicated by the reference model identification number (low inference accuracy, large loss function value, etc.) during learning. In this case, if the model identification number and the reference model identification number are the same, the learning history (history A) corresponding to the reference model identification number is updated whenever the model is learned. On the other hand, if the model identification number of the model is different from the reference model identification number, the learning history (history A) corresponding to the reference model identification number is copied as the initial value of the learning history (history B) of the model identification number. Then, whenever the model is learned, the learning history (history B) of the model is updated.
[0060] If the control unit 103 determines that the model information needs to be updated (step ST3: Yes), the learning unit 101 updates the encoded model information to the new model information (step ST4). For example, the control unit 103 generates model update instruction information indicating that the model information needs to be updated, and outputs learning control information containing the model update instruction information to the learning unit 101. The learning unit 101 updates the encoded model information to the new model information according to the model update instruction information contained in the learning control information.
[0061] On the other hand, if it is determined that the model information will not be updated (step ST3: No), the control unit 103 generates model update instruction information indicating that the model information will not be updated, and outputs learning control information containing the model update instruction information to the learning unit 101. The learning unit 101 does not update the encoding model information according to the model update instruction information contained in the learning control information.
[0062] Next, the control unit 103 compares the evaluation value with the learning completion determination criterion, and determines whether the learning unit 101 has completed learning the NN based on the comparison result (step ST5). For example, if the learning completion determination criterion is whether the evaluation value generated by the evaluation unit 102 reaches a specific value, the control unit 103 determines that the learning unit 101 has completed learning the NN if the evaluation value generated by the evaluation unit 102 meets the learning completion determination criterion, and determines that the learning unit 101 has not completed learning the NN if the evaluation value does not meet the learning completion determination criterion. Alternatively, if the learning completion determination criterion, such as determining that learning is completed when the model information is not updated for M consecutive times (M is a predetermined integer greater than or equal to 1) (step ST3: No), is based on the most recent learning history, the control unit 103 determines that the learning unit 101 has not completed learning the NN if the learning history does not meet the learning completion determination criterion.
[0063] If the control unit 103 determines that the learning of the neural network (NN) has been completed (step ST5: Yes), the learning unit 101 outputs the model information for encoding to the encoding unit 11 and proceeds to the processing in step ST6. On the other hand, if the control unit 103 determines that the learning of the NN has not been completed (step ST5: No), the processing from step ST1 onwards is executed.
[0064] The encoding unit 11 encodes the encoding model information input from the learning unit 101 (step ST6). The encoding unit 11 encodes the encoding model information generated by the learning unit 101 in units of layers of the N, generating encoded data consisting of title information and layer unit encoded data. In addition, the encoding unit 11 encodes layer construction information and new layer flags.
[0065] Figure 6 This is a flowchart illustrating the operation of the data processing apparatus (decoder) in Embodiment 1. Figure 4 The operation of the data processing device shown is as follows: The decoding unit 201 decodes the model information from the encoded data encoded by the encoding unit 11 (step ST11). Next, the inference unit 202 generates a neural network (NN) based on the model information decoded by the decoding unit 201 (step ST12). The inference unit 202 uses the generated NN to perform inference processing on the evaluation data and outputs the inference result (step ST13).
[0066] Next, regarding Figure 5The encoding unit 11 in step ST6 provides a detailed explanation of the encoding of the model information. Regarding the encoding of the model information by the encoding unit 11, encoding methods (1) or (2) can be used, for example. Alternatively, the choice between encoding method (1) and encoding method (2) can be defined for each parameter. For example, by setting the title information as (1) and the weight data as (2), the decoder can easily parse the title information without performing variable-length decoding, and the weight data, which occupies most of the data size of the encoded data, can achieve high compression through variable-length decoding, thus suppressing the overall data size of the encoded data.
[0067] (1) When title information exists, the data consisting of a bit string containing the title information arranged in a pre-defined order is set as encoded data. This bit string is obtained by describing the parameters of each piece of information contained in the model information with bit precision defined in the parameters. The bit precision is, for example, the bit precision defined in the parameters such as 8 bits of int type or 32 bits of float type.
[0068] (2) The data consisting of a bit string containing the title information arranged in a pre-set order is set as the encoded data. The bit string itself is obtained by encoding the parameters of each information contained in the model information according to the variable length encoding method set for each parameter.
[0069] Figure 7 This is a diagram showing an example of encoded data in Embodiment 1. The encoded data in (1) or (2) above can also be... Figure 7 The order shown is as follows. Figure 7 The encoded data shown consists of a collection of data called data units, which can be either non-layer data units or layer data units. A layer data unit is a data unit that stores encoded data of layer units, i.e., layer data.
[0070] Layer data consists of a start code, data unit type, layer information title, and weight data. The layer information title is obtained by encoding the layer title information used to identify the layers of the neural network. The weight data is obtained by encoding the weight information of the edges belonging to the layer indicated by the layer information title. Additionally, in... Figure 7 In the encoded data shown, the order of the data units in each layer does not necessarily have to be the same as the order of the layers in the neural network; it can be arbitrary. This is because the layer identification number (layer_id), which will be described later, can identify which layer in the neural network each data unit belongs to.
[0071] Non-layer data units are data units that store data other than layer data. For example, a non-layer data unit may store start code, data unit type, and model information title. The model information title is obtained by encoding model title information used to identify the neural network.
[0072] The start code is a code stored at the beginning of a data unit and used to identify the start position of the data unit. Clients 3-1, 3-2, ..., 3-N (hereinafter referred to as the decoding side) can determine the start position of non-layer or layer data units by referring to the start code. For example, if 0x000001 is defined as the start code, data stored in the data unit other than the start code is set not to generate 0x000001. Therefore, the start position of the data unit can be determined based on the start code.
[0073] To prevent the generation of 0x000001, for example, a 03 can be inserted into the third byte of the encoded data from 0x000000 to 0x000003, resulting in 0x00000300 to 0x00000303. During decoding, 0x000003 is converted to 0x0000, thus allowing for reconstruction. Furthermore, the start code only needs to be a uniquely identifiable bit string; a bit string other than 0x000001 can also be defined as the start code. Additionally, any method that can identify the beginning of a data unit is acceptable, and a start code is not required. For example, a bit string that identifies the end of a data unit can be appended to the end of the data unit. Alternatively, a start code can be appended only to the beginning of non-layer data units as part of the model information header, encoding the data size of each layer's data units. Thus, the division position of each layer's data units can be identified based on the aforementioned information.
[0074] The data unit type is data stored after the start code within a data unit to identify the type of data unit. The data unit type has predefined values for each type of data unit. By referring to the data unit type stored in the data unit, the decoding side can identify whether a data unit is a non-layer data unit or a layer data unit, and further, can identify what kind of non-layer or layer data unit it is.
[0075] The model information header in non-layer data units includes a model identification number (model_id), the number of inner-layer data units (num_layers), and the number of coding layer data units (num_coded_layers). The model identification number is the number used to identify the model of the neural network (NN). Therefore, each model essentially has an independent number; however, assuming that the data processing device (decoder) in Implementation 1 receives a new model with the same model identification number as previously received models, the model with that model identification number is overwritten. The number of inner-layer data units is the number of layer data units constituting the model identified by the model identification number. The number of coding layer data units is the number of layer data units actually present in the encoded data. Figure 7In the example, there are layer data units (1) to (n), therefore, the number of coding layer data units is n. Furthermore, the number of coding layer data units must be less than the number of inner layer data units in the model.
[0076] The layer information header in the layer data unit contains a layer identification number (layer_id) and layer construction information. The layer identification number is used to identify the layer. A pre-defined method for assigning layer identification number values allows identification of which layer it is. For example, the numbers can be assigned sequentially from the layer closest to the input layer, with the input layer being 0 and the next layer being 1. The layer construction information represents the structure of each layer in the neural network, including layer category information, structural information related to the layer category, and information necessary for constructing the layer other than edge weights. For example, this includes only the layer-specific information in `model_structure_information` and `layer_id_information`, which will be discussed later. Furthermore, as layer construction information, `weight_bit_length` represents the bit precision of the weights of each edge in that layer. For example, if `weight_bit_length = 8`, it means the weights are 8 bits of data. Therefore, the bit precision of the edge weights can be set on a layer-by-layer basis. This allows for adaptive control, such as changing the bit precision layer-by-layer based on the layer's importance (the degree to which bit precision affects the output).
[0077] Furthermore, while the layer information title containing layer structure information has been shown so far, the model information title may also contain all the layer structure information (model_structure_information) contained in the encoded data and the layer identification information (layer_id_information) corresponding to this layer structure information. The decoding side can determine the structure of the layer with each layer identification number by referring to the model information title. Furthermore, in the above case, the structure of the layer with each layer identification number can be determined by referring to the model information title; therefore, the layer information title may only contain the layer identification number. Thus, when the data size of a layer data unit is larger than the data size of a non-layer data unit, the data size of each layer data unit can be reduced, and the maximum data size of a data unit within the encoded data can be reduced.
[0078] Within the layer data unit, weight data encoded on a layer-by-layer basis is stored after the layer information header. The weight data packet contains a non-zero flag and non-zero weight data. The non-zero flag indicates whether the edge weight value is zero, and is set in relation to the weights of all edges belonging to the corresponding layer.
[0079] Non-zero weight data refers to the data following a non-zero flag in the weight data. The non-zero flag indicates a non-zero (meaningful) weight, and this weight value is set accordingly. Figure 7 In this context, weight data (1) to weight data (m) representing non-zero weight values are defined as non-zero weight data. The number of non-zero weight data m is equal to the total number M of weights in the corresponding layer l. l Below. Additionally, regarding weight data related to layers with sparse edges and non-zero weight values, the non-zero weight data is scarce, serving almost only as a non-zero indicator; therefore, the data size of the weight data can be significantly reduced.
[0080] Figure 8 This is a diagram illustrating another example of the encoded data in Embodiment 1. The encoded data in (1) or (2) above can also be... Figure 8 The order shown is as follows. About Figure 8 The data structure of the encoded data and weighted data shown is similar to... Figure 7 Unlike other data types, in non-zero weighted data, the weights of all edges belonging to the corresponding layer are arranged sequentially according to each bit plane, starting from the upper bit. Furthermore, the layer information header includes bit plane data position identification information indicating the starting position of each bit showing the edge weight.
[0081] For example, when the bit precision defined in the edge weight is X, the weights of all edges belonging to the corresponding layer are described using bit precision X. The encoding unit 11 sets the first bit of the non-zero weight data in the bit string of these weights, namely the first bit weight data (1), the first bit weight data (2), ..., the first bit weight data (m), to each non-zero weight data of the first bit. This process is repeated from the second bit of non-zero weight data to the Xth bit of non-zero weight data. In addition, the first bit weight data (1), the first bit weight data (2), ..., the first bit weight data (m) are the non-zero weight data constituting the bit plane of the first bit.
[0082] The decoding side can determine the necessary encoded data in the layer unit's encoded data based on the bit-plane data position identification information, and decode the determined encoded data with arbitrary bit precision. That is, the decoding side can select only the necessary encoded data from the encoded data and decode the NN model information corresponding to the decoding side's environment. Furthermore, the bit-plane data position identification information only needs to identify the division positions between bit-plane data, or it can be information indicating the beginning position of each bit-plane data, or information indicating the data size of each bit-plane data.
[0083] To transmit the entire encoded data representing the structure of the NNN to the decoding side, if the transmission bandwidth of the data transmission network 2 is insufficient, the encoding unit 11 can also restrict the transmission of non-zero weight data in the encoded data according to the transmission bandwidth of the data transmission network 2. For example, the top 8 bits of the non-zero weight data in the bit string of weight information recorded with 32-bit precision can be designated as the transmission target. The decoding side can identify, based on the start code following the non-zero weight data, that a layer data unit corresponding to the next layer is arranged after the 8th non-zero weight data in the encoded data. Furthermore, by referring to the non-zero flag in the weight data, the decoding side can correctly decode weights with a value of zero.
[0084] When the weight data is decoded at arbitrary bit precision on the decoding side, in order to improve the inference accuracy at that bit precision, the encoding unit 11 may also include the bias, which is the sum of the weights when decoded at each bit precision, in the layer information header. For example, the encoding unit 11 adds the same bias to the bit string of weights described at bit precision on a layer-by-layer basis, calculates the bias that becomes the highest precision, and includes the calculated bias in the layer information header for encoding.
[0085] Furthermore, the encoding unit 11 may also include the biases of the edge weights in all layers of the NN in the model information header for encoding. Moreover, the encoding unit 11 may also set a flag in the layer information header or model information header indicating whether the biases are included, for example, including the biases in the encoded data only when the flag is valid.
[0086] The encoding unit 11 can also set the difference between the edge weight value and a specific value as the encoding object. For example, the weight of the previous layer in the encoding order can be used as a specific value. In addition, the weight of the corresponding edge belonging to a higher layer (the layer closest to the input layer) can be set as a specific value, and the weight of the corresponding edge of the model before the update can also be set as a specific value.
[0087] Furthermore, the encoding unit 11 has the functions shown in (A), (B) and (C).
[0088] (A) The encoding unit 11 has a scalable encoding function that divides the encoding data into basic encoding data and enhanced encoding data for encoding.
[0089] (B) The encoding unit 11 has the function of encoding the difference between the weights of the edges in the reference NN.
[0090] (C) The encoding unit 11 has the function of encoding only a portion of the information in the reference NN (e.g., information of the layer unit) as information for updating the NN.
[0091] Explain the example in (A).
[0092] The encoding unit 11 quantizes the edge weights using a predefined quantization method for the edge weights. The data encoded from the quantized weights is designated as basic encoded data, and the data encoded by treating the quantization error as weight is designated as enhanced encoded data. Through quantization, the bit precision of the weights becoming the basic encoded data is lower than the bit precision of the weights before quantization, thus reducing the data size. When the transmission bandwidth for transmitting encoded data to the decoding side is insufficient, the data processing apparatus of Embodiment 1 only transmits the basic encoded data to the decoding side. On the other hand, when the transmission bandwidth for transmitting encoded data to the decoding side is sufficient, the data processing apparatus of Embodiment 1 transmits the enhanced encoded data, in addition to the basic encoded data, to the decoding side.
[0093] The enhanced coded data can be set to two or more. For example, the encoding unit 11 sets the quantization value when the quantization error is further quantized as the first enhanced coded data, and sets the quantization error as the second enhanced coded data. Furthermore, it is also possible to encode the quantization value when the quantization error of the second enhanced coded data is further quantized and the quantization error itself, to obtain the target number of enhanced coded data. In this way, by using scalable coding, coded data corresponding to the transmission bandwidth and transmission allowable time of the data transmission network 2 can be transmitted.
[0094] Additionally, the encoding unit 11 can also... Figure 8 The upper M bits of the non-zero weight data shown are used as the basic encoding data for encoding. The remaining bit string is divided into one or more enhanced encoding data. In this case, the encoding unit 11 sets a non-zero flag again in both the basic encoding data and the enhanced encoding data. Weights that are 0 in the enhanced encoding data of the upper bits must be 0.
[0095] Let's illustrate with example (B).
[0096] In the case where the learning unit 101 has a model of the neural network before relearning, the encoding unit 11 can also encode the difference between the edge weights in the relearned neural network model and the corresponding edge weights in the model before relearning. Furthermore, the relearning can be transfer learning or supplementary learning. In the data processing system, when the neural network structure is updated frequently or when the distribution of the learning data changes little each time it is relearned, the difference in edge weights is small; therefore, the size of the encoded data after relearning can be reduced.
[0097] Based on the model identification number, the encoding unit 11 includes the reference model identification number (reference_model_id) used to identify the model before the update that should be referenced in the model information header. In the example of (B), the model before relearning can be identified based on the aforementioned reference model identification number. Furthermore, the encoding unit 11 may also set a flag (reference_model_present_flag) in the model information header indicating whether a reference source exists in the encoded data. In this case, the encoding unit 11 first encodes the aforementioned flag (reference_model_present_flag), and only sets the reference model identification number in the model information header if the aforementioned flag indicates that the encoded data is for model updating.
[0098] For example, in Figure 1 In the data processing system shown, when the update frequency of the NameNode (NN) differs between clients or when NNs using different models are used for data processing, the client can correctly identify which model's update code is being processed by referring to the reference model identification number. If the client identifies the update code as being for a model not on the client's side based on the reference model identification number, it can also communicate this information to server 1.
[0099] The example in (C) is explained.
[0100] When the learning unit 101 has a model of the neural network (NN) before relearning, for example for fine-tuning purposes, it may fix one or more arbitrary layers from the upper part of the NN (input layer side) and only relearn a portion of these layers. In this case, the encoding unit 11 only encodes information representing the structure of the layers updated through relearning. Therefore, during NN updates, the size of the encoded data transmitted to the decoding side can be reduced. Furthermore, the number of encoded layer data units (num_coded_layers) in the encoded data becomes the number of inner layer data units (num_layers) of the model. On the decoding side, by referring to the reference model identification number included in the model information header and the layer identification number included in the layer information header, the layers that should be updated can be determined.
[0101] Next, the data processing of the learning department 101, the evaluation department 102, and the reasoning department 202 will be explained.
[0102] Figure 9 This diagram illustrates an example of convolution processing for one-dimensional data in Implementation 1, showing a convolutional layer performing the convolution processing on the one-dimensional data. Examples of one-dimensional data include audio data and time-series data. Figure 9The convolutional layer shown has 9 nodes (10-1 to 10-9) in the previous layer and 3 nodes (11-1 to 11-3) in the next layer. Edges 12-1, 12-6, and 12-11 are assigned the same weight; edges 12-2, 12-7, and 12-12 are assigned the same weight; edges 12-3, 12-8, and 12-13 are assigned the same weight; edges 12-4, 12-9, and 12-14 are assigned the same weight; and edges 12-5, 12-10, and 12-15 are assigned the same weight. Furthermore, the weights of edges 12-1 to 12-5 sometimes all have different values, and sometimes multiple weights have the same value.
[0103] Five of the nine nodes in the previous layer (from 10⁻¹ to 10⁻⁹) are connected to one node in the next layer using the weights described above. The kernel size K is 5, and the kernel is defined by the combination of these weights. For example, as... Figure 9 As shown, node 10-1 is connected to node 11-1 via edge 12-1, node 10-2 is connected to node 11-1 via edge 12-2, node 10-3 is connected to node 11-1 via edge 12-3, node 10-4 is connected to node 11-1 via edge 12-4, and node 10-5 is connected to node 11-1 via edge 12-5. The kernel is defined by the combination of the weights of edges 12-1 to 12-5.
[0104] Node 10-3 is connected to node 11-2 via edge 12-6, node 10-4 is connected to node 11-2 via edge 12-7, node 10-5 is connected to node 11-2 via edge 12-8, node 10-6 is connected to node 11-2 via edge 12-9, and node 10-7 is connected to node 11-2 via edge 12-10. The kernel is defined by the combination of the weights of edges 12-6 to 12-10.
[0105] Node 10-5 is connected to node 11-3 via edge 12-11, node 10-6 is connected to node 11-3 via edge 12-12, node 10-7 is connected to node 11-3 via edge 12-13, node 10-8 is connected to node 11-3 via edge 12-14, and node 10-9 is connected to node 11-3 via edge 12-15. The kernel is defined by the combination of the weights of edges 12-11 to 12-15.
[0106] Learning unit 101, evaluation unit 102, and inference unit 202, in processing the input data using a CNN, utilize a combination of the edge weights of the convolutional layers, arranged at step intervals for each kernel (in... Figure 9 In a CNN with S=2, convolution operations are performed. The combination of edge weights is determined through learning for each kernel. Furthermore, in CNNs used for image recognition, neural networks (NNs) are often constructed using convolutional layers with multiple kernels.
[0107] Figure 10 This is a diagram illustrating an example of convolution processing of two-dimensional data in Implementation 1, showing convolution processing of two-dimensional data such as image data. Figure 10 The kernel 20 in the two-dimensional data shown is of size K in the x-direction. x The magnitude in the y-direction is K. y The block region. The kernel size K is K = K x ×K y The learning department 101, evaluation department 102, or reasoning department 202, in two-dimensional data, takes steps S in the x-direction. x The interval and the number of steps in the y-direction S y Convolution operations are performed on 20 data points per core at intervals. Here, the stride S... x S y It is an integer greater than or equal to 1.
[0108] Figure 11 It is a graph showing the matrix of edge weight information for each node in the l-th (l = 1, 2, ..., L) layer of a fully connected NN. Figure 12 It is a graph showing the matrix of quantization step size of the edge weight information for each node in the l-th (l = 1, 2, ..., L) layer of a fully connected layer of an NN.
[0109] In NN, Figure 11 The weights w of each layer are shown. ij The combination of these elements forms the data that constitutes the network. Therefore, in deep neural networks (NNs) with multiple layers, the data volume typically reaches hundreds of megabytes or more, requiring a large storage size. i is the node index, i = 1, 2, ..., N. l j is the edge index, j = 1, 2, ..., N l-1 +1 (including bias).
[0110] Therefore, in the data processing apparatus of Embodiment 1, the weight information is quantized in order to reduce the amount of data for edge weight information. For example, such as Figure 12 As shown, the quantization step size q ij According to the weight w of each edge ij The quantization step size can be set using multiple node indices or multiple edge indices, and multiple node indices and edge indices can also be co-quantized. This reduces the amount of quantization information that should be encoded.
[0111] Figure 13 It is a graph showing the matrix of edge weight information in the convolutional layer. Figure 14 This is a graph showing the quantization stride of the matrix illustrating the weight information of edges in a convolutional layer. In a convolutional layer, the weights of edges connected to a kernel are common to all nodes, which can reduce the number of edges connected to each node, i.e., the kernel size K, making the kernel a small region. Figure 13The weight w of each edge is set according to each core. i’j’ The data obtained, Figure 14 The quantization step size q is set according to each core. i’j’ The data obtained. Additionally, i' is the kernel index, i' = 1, 2, ..., M. l (l = 1, 2, ..., L). j' is the edge index, j' = 1, 2, ..., K l +1 (including bias).
[0112] The quantization step size can also be common across multiple kernel indices, multiple edge indices, or multiple kernel and edge indices. This reduces the amount of quantization information that should be encoded. For example, all quantization steps within a layer can be common, setting a single quantization step size within a single layer; or all quantization steps within a model can be common, setting a single quantization step size within a single model.
[0113] Figure 15 This is a block diagram showing the structure of a modified example of the data processing apparatus (encoder) of Embodiment 1. Figure 15 The data processing apparatus shown is a first data processing apparatus that uses a learning dataset and an evaluation dataset to learn a neural network and generate encoded data of model information for the neural network. For example, it is... Figure 1 Server 1 is shown. Figure 15 The data processing apparatus shown includes a data processing unit 10A, an encoding unit 11, and a decoding unit 12.
[0114] The data processing unit 10A is a data processing unit that generates and learns the neural network (NN), and includes a learning unit 101A, an evaluation unit 102, and a control unit 103. The encoding unit 11 encodes the model information generated by the learning unit 101A, generating encoded data consisting of header information and encoded data for layer units. The decoding unit 12 decodes the model information from the encoded data generated by the encoding unit 11. Furthermore, the decoding unit 12 outputs the decoded model information to the learning unit 101A.
[0115] Similar to Learning Unit 101, Learning Unit 101A uses a learning dataset to learn the neural network and generates model information representing the structure of the learned neural network. Furthermore, Learning Unit 101A generates a neural network using the decoded model information and then uses the learning dataset to learn the parameters of the generated neural network again.
[0116] During the relearning process described above, by fixing the weights of a portion of the edges and relearning, high precision can be achieved while keeping the size of the encoded data small. For example, by performing relearning with the weights whose non-zero flag is 0 fixed at 0, it is possible to prevent the size of the encoded data from exceeding the weights of the edges before relearning, and the weights can be optimized.
[0117] The data processing apparatus includes a decoding unit 12, and the data processing unit 10A uses the information decoded by the decoding unit 12 to learn the neural network (NN). Therefore, for example, even if the encoding unit 11 performs irreversible encoding that produces encoding distortion, this data processing apparatus can generate and learn the NN based on the actual decoding result of the encoded data, enabling the learning of the NN to minimize the impact of encoding errors even when the size of the encoded data is constrained.
[0118] In having Figure 1 The same structure and as server 1 has Figure 3 The data processing device shown has the following characteristics as clients 3-1, 3-2, ..., 3-N: Figure 4 In the data processing system of the data processing apparatus shown, the data output from the intermediate layer of the NN can be used as a feature quantity for data processing of image data and sound data, taking image retrieval or matching as described in Reference 1 below as an example.
[0119] (Reference 1) ISO / IEC JTC1 / SC29 / WG11 / m39219, “Improved retrieval and matching with CNN feature for CDVA”, Chengdu, China, Oct. 2016.
[0120] For example, when using the output data of the intermediate layers of a neural network (NN) to perform image feature operations such as image retrieval, matching, or object tracking, image feature quantities are replaced or appended to image feature quantities used in existing image processing methods, namely HOG (Histogram of Oriented Gradients), SIFT (Scale Invariant Feature Transform), or SURF (Speeded Up Robust Features). This allows the image processing to be performed using the same processing steps as image processing methods that use existing image feature quantities. In the data processing system of Embodiment 1, the encoding unit 11 encodes model information representing the structure of the NN up to the intermediate layers that output the image feature quantities.
[0121] Furthermore, the data processing device, which functions as server 1, uses the feature values from the aforementioned data processing to perform data processing such as image retrieval. The data processing device, which functions as client, generates a neural network (NN) from the encoded data up to the intermediate layer, and uses the data output from the intermediate layer of the generated NN as feature values to perform data processing such as image retrieval.
[0122] In the data processing system, the encoding unit 11 encodes the model information representing the structure up to the intermediate layers of the neural network (NN). This improves the compression ratio of the quantized parameter data and reduces the amount of weight information data before encoding. The client uses the model information decoded by the decoding unit 201 to generate the NN and uses the data output from the intermediate layers of the generated NN as feature values for data processing.
[0123] Furthermore, the data processing system of Implementation 1 has the same characteristics as... Figure 1 With the same structure, as server 1, it can have Figure 3 or Figure 15 The data processing device shown, acting as clients 3-1, 3-2, ..., 3-N, is capable of having Figure 4 The data processing apparatus shown is described above. In a data processing system with this structure, a new layer flag (new_layer_flag) is set in the encoded data. When the new layer flag is 0 (invalid), the layer corresponding to the new layer flag is the layer that is updated based on the reference layer. When the new layer flag is 1 (valid), the layer corresponding to the new layer flag is a newly added layer.
[0124] When the new layer flag is 0 (invalid), a flag (channel_wise_update_flag) is set for the layer corresponding to the new layer flag to identify whether there is an update on the weights of edges on a channel-by-channel basis. If this flag is 0 (invalid), the weights of edges for all channels are encoded. If this flag is 1 (valid), an update flag (channel_update_flag) for the weights on a channel-by-channel basis is set. This update flag indicates whether there is an update from the reference layer on a per-channel basis. When the update flag is 1 (valid), the channel weights are encoded; if it is 0 (invalid), they are set to the same weights as the reference layer.
[0125] Furthermore, as the layer information header, information indicating the number of channels in the layer (num_channels) and information indicating the weights of edges per channel unit (weights_per_channels) are set. The weights_per_channels of a certain layer l becomes the kernel size K. l +1 or the number of edges N starting from the previous layer, i.e., layer l-1. l-1 +1.
[0126] By incorporating the aforementioned new layer flags, the number of channels and the weights of each channel unit can be determined solely from the encoded data of the layer data unit. Therefore, as a layer data unit decoding process, the update flags for the channel unit weights can be decoded.
[0127] Furthermore, when the flag used to identify whether weights have been updated on a channel-by-channel basis is set to 1 (valid), it is restricted to situations where the reference layer and the number of channels are the same. This is because when the reference layer and the number of channels are different, the correspondence between the channels between the reference layer and the layer corresponding to the aforementioned flag becomes unclear.
[0128] Figure 16 This is a diagram illustrating a summary of the update of encoded data in Implementation 1. Figure 16 In the middle, the data shown on the upper side consists of non-layer data units and layer data units (1) to (4), and is consistent with... Figure 7 Similarly, the data is encoded sequentially starting from the layer data unit (4). In the non-layer data units, the model identification number (model_id) = 0, the number of inner layer data units (num_layers) = 4, the layer structure information (model_structure_information) and the layer identification information (layer_id_information) are set as model title information, and the flag indicating whether there is a reference source (reference_model_present_flag) is set to 0 (invalid).
[0129] In layer data unit (1), the layer identification number (layer_id) is set to 0, the information indicating the number of channels (filters, kernels) of the layer (num_channels) is set to 32, and the information indicating the number of weights (including bias values) per channel (filters, kernels) is set to 76. In addition, in layer data unit (2), the layer identification number (layer_id) is set to 1, the information indicating the number of channels of the layer (num_channels) is set to 64, and the information indicating the number of weights per channel (weights_per_channels) is set to 289.
[0130] In layer data unit (3), the layer identification number (layer_id) is set to 2, the information indicating the number of channels in the layer (num_channels) is set to 128, and the information indicating the number of weights per channel unit (weights_per_channels) is set to 577. In addition, in layer data unit (4), the layer identification number (layer_id) is set to 3, the information indicating the number of channels in the layer (num_channels) is set to 100, and the information indicating the number of weights per channel unit (weights_per_channels) is set to 32769.
[0131] exist Figure 16The data shown on the lower side is the updated data from the data shown on the upper side, using layer construction information, layer update flags, and new layer flags. It consists of non-layer data units and layer data units (1'), (2), (3), (5), and (4'). For the client transmitting the data shown on the upper side, it is necessary to send non-layer data units and layer data units (1'), (5), and (4'), but layer data units (2) and (3) are not updated and do not need to be sent.
[0132] exist Figure 16 In the non-layer data unit shown below, the model identification number (model_id) = 10, the number of inner layer data units (num_layers) = 5, the layer structure information (model_structure_information) and the layer identification information (layer_id_information) are set as model title information. The flag indicating whether there is a reference source in the encoded data (reference_model_present_flag) is set to 1 (valid), the reference model identification number (reference_model_id) is set to 0, and the number of encoded layer data units (num_coded_layers) is set to 3.
[0133] In the layer data unit (1'), the layer identification number (layer_id) is 0, the new layer flag (new_layer_flag) is set to 0, the number of channels (num_channels) is set to 32, and the number of weights per channel (weights_per_channels) is set to 76. Furthermore, the flag used to identify whether weights have been updated per channel (channel_wise_update_flag) is set to 1 (valid), therefore, the update flag (channel_update_flag) for weights per channel is set.
[0134] Since layer data unit (2) with layer identification number (layer_id) 1 and layer data unit (3) with layer identification number (layer_id) 2 are not update objects, they are not included in the encoded data. Therefore, the above model title information is set with the number of inner layer data units (num_layers) = 5 and the number of encoded layer data units (num_coded_layers) = 3.
[0135] In the layer data unit (5), the layer identification number (layer_id) is 4, and the new layer flag (new_layer_flag) is set to 1 (valid). In addition, the information indicating the number of channels in the layer (num_channels) is set to 256, and the information indicating the number of weights per channel unit (weights_per_channels) is set to 1153.
[0136] In the layer data unit (4'), the layer identification number (layer_id) is 3, the new layer flag (new_layer_flag) is set to 0, the number of channels in the layer (num_channels) is set to 100, and the number of weights per channel (weights_per_channels) is set to 16385. Furthermore, the channel-wise update flag (channel_wise_update_flag) used to identify whether weights have been updated per channel is set to 0 (invalid), meaning the weights per channel are not updated.
[0137] In the data shown below, the layer data units (1) and (4) in the data shown above are updated to layer data units (1') and (4'), and then a layer data unit (5) with layer identification number 4 is added.
[0138] Figure 17 It is shown that... Figure 16 The diagram shows the structure of the network model corresponding to the update of the encoded data. Figure 17 In the middle, the network model shown on the left is... Figure 16 The network model shown on the top is implemented by decoding the data. Furthermore, the network model shown on the right is... Figure 16 The network model is implemented by decoding the data shown on the lower side.
[0139] In the layer data unit (1'), the channel_wise_update_flag used to identify whether the weights have been updated is set to 1, so the weights of several channels are updated from the layer data unit (1). In addition, by adding layer data units (5) and updating layer data units (4') from layer data units (4), a 2D convolution layer and a 2D maxpooling layer are added between the fully connected layer and the network model shown on the right.
[0140] Figure 18This diagram illustrates an example of layer structure information contained in the model information header. The full layer structure information (model_structure_information) contained in the model information header can also be set. Figure 18 The text information shown. Figure 18 The text information shown represents the layer structure of a model based on the standard specifications described in Reference 2, such as NNEF (Neural Network Exchange Format).
[0141] (Reference 2) "Neural Network Exchange Format", The Khronos NNEF WorkingGroup, Version 1.0, Revision 3, 2018-06-13.
[0142] exist Figure 18 In the example, the network model (A) with model_id=0 is... Figure 16 The network model corresponding to the data shown on the upper side ( Figure 17 (The network model shown on the left). (B) The network model with model_id=10 is the same as the network model shown on the left. Figure 16 The network model corresponding to the data shown on the lower side ( Figure 17 (Network model shown on the right).
[0143] Figure 19 This is a diagram showing an example of layer identification information (layer_id_information) corresponding to the layer construction information contained in the model information header, illustrating the setting of the layer identification information (layer_id_information) corresponding to the layer construction information contained in the model information header. Figure 18 The layer identification information corresponds to the layer identification number of the layer structure information. Figure 19 In the example, the network model (A) with model_id=0 is... Figure 17 The layer identification information corresponding to the network model shown on the left. (B) The network model with model_id=10 is the same as... Figure 17 The right side shows the layer identification information corresponding to the network model. The weights and biases of each layer are assigned to the layer identification number, and their values correspond to... Figure 16 The data shown.
[0144] File data, such as the file containing full-layer structure information (model_structure_information) and the layer identification number corresponding to the full-layer structure information (layer_id_information), are encoded in the model information header by being inserted after information indicating the number of bytes of the aforementioned file data. Alternatively, a structure can be adopted in which the URL (Uniform Resource Locator) indicating the source of the aforementioned file data is included in the model information header. Furthermore, a flag identifying which structure it is can be set before the aforementioned file data or URL in the model information header, allowing selection of any of these structures. This identification flag can be shared in model_structure_information and layer_id_information, or it can be held separately. If the former, the amount of information in the model information header can be reduced; if the latter, it can be set independently according to the preconditions for use.
[0145] Furthermore, the model information title contains information indicating the format of the aforementioned text information. For example, it may specify NNEF as index 0, with other formats starting from 1. This allows identification of the format in which the information is written, enabling correct decoding.
[0146] in addition, Figure 18 and Figure 19 The layer structure information represented by text information and the information representing the layer identification number corresponding to the layer structure information shown can be applied to all systems shown in Embodiment 1. Furthermore, it is possible to identify which layer's data unit within the model belongs to based solely on the encoded data from `model_structure_information` and `layer_id_information`. Therefore, even when updating the model (if `reference_model_present_flag` is valid), models not generated based on the encoded data shown in this embodiment can be set as reference models. That is, the encoded data shown in this embodiment, by having `model_structure_information` and `layer_id_information` as part of the model information header, can set any model as a reference model. However, in this case, it is necessary to predefine the correspondence between the reference model identification number (`reference_model_id`) and the reference model.
[0147] Next, the hardware structure for implementing the functions of the data processing apparatus of Embodiment 1 will be described. The functions of the data processing unit 10 and the encoding unit 11 in the data processing apparatus of Embodiment 1 are implemented by a processing circuit. That is, the data processing apparatus of Embodiment 1 has a processing circuit for executing... Figure 5 The processing circuit for steps ST1 to ST6. The processing circuit can be dedicated hardware, or it can be a CPU (Central Processing Unit) that executes programs stored in memory.
[0148] Figure 20A This is a block diagram illustrating the hardware structure for implementing the functions of the data processing apparatus in Embodiment 1. Figure 20A In the middle, the processing circuit 300 is used as Figure 3 The data processing device shown has a dedicated circuit for its function. Figure 20B This is a block diagram illustrating the hardware structure of the software that performs the functions of the data processing apparatus implementing Embodiment 1. Figure 20B In this configuration, the processor 301 and the memory 302 are connected to each other via a signal bus.
[0149] In the above processing circuit is Figure 20A In the case of dedicated hardware, the processing circuit 300 may be a single circuit, a composite circuit, a programmable processor, a parallel programmable processor, an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), or a combination thereof. Furthermore, the functions of the data processing unit 10 and the encoding unit 11 can be implemented using different processing circuits, or these functions can be implemented using a single processing circuit.
[0150] In the above processing circuit is Figure 20B In the case of the processor shown, the functions of the data processing unit 10 and the encoding unit 11 are implemented through software, firmware, or a combination of software and firmware. The software or firmware is written as a program and stored in the memory 302. The processor 301 reads and executes the program stored in the memory 302, thereby implementing the functions of the data processing unit 10 and the encoding unit 11. That is, the data processing apparatus of Embodiment 1 has a memory 302, which is used to store data that, when executed by the processor 301, results in execution... Figure 5 The program for steps ST1 to ST6 shown. These programs cause the computer to execute the steps or methods of the data processing unit 10 and the encoding unit 11. The memory 302 may also be a computer-readable storage medium storing programs for enabling the computer to function as the data processing unit 10 and the encoding unit 11.
[0151] The memory 302 is, for example, a non-volatile or volatile semiconductor memory such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable Read Only Memory), and EEPROM (Electrically Erasable Programmable Read Only Memory), as well as a hard disk, floppy disk, optical disk, high-density disk, mini disk, DVD, etc.
[0152] Furthermore, the functions of the data processing unit 10 and the encoding unit 11 can be partially implemented using dedicated hardware and partially implemented using software or firmware. For example, the data processing unit 10 can be implemented using a processing circuit that is dedicated hardware, and the encoding unit 11 can be implemented by the processor 301 reading and executing the program stored in the memory 302. In this way, the processing circuit can implement the above-mentioned functions using hardware, software, firmware, or a combination thereof.
[0153] In addition, for Figure 3 The data processing apparatus shown has been described, however, Figure 4 The data processing device shown is the same. For example, Figure 4 The data processing device shown has a function for performing Figure 6 The processing circuit for steps ST11 to ST13. This processing circuit can be dedicated hardware or a CPU that executes programs stored in memory.
[0154] In the above processing circuit is Figure 20A In the case of dedicated hardware, the processing circuit 300 may be, for example, a single circuit, a composite circuit, a programmable processor, a parallel programmable processor, an ASIC, an FPGA, or a combination thereof. Furthermore, the functions of the decoding unit 201 and the inference unit 202 can be implemented using different processing circuits, or these functions can be implemented using a single processing circuit.
[0155] In the above processing circuit is Figure 20B In the processor shown, the functions of the decoding unit 201 and the inference unit 202 are implemented through software, firmware, or a combination of both. The software or firmware, written as a program, is stored in the memory 302. The processor 301 reads and executes the program stored in the memory 302, thereby implementing the functions of the decoding unit 201 and the inference unit 202. That is, Figure 4 The data processing device shown has a memory 302 for storing data that is executed by the processor 301 when the data is processed. Figure 6The program for steps ST11 to ST13 shown. These programs cause the computer to execute the steps or methods of the decoding unit 201 and the inference unit 202. The memory 302 may also be a computer-readable storage medium storing programs for enabling the computer to function as the decoding unit 201 and the inference unit 202.
[0156] Furthermore, the functions of the decoding unit 201 and the inference unit 202 can be partially implemented using dedicated hardware and partially implemented using software or firmware. For example, the decoding unit 201 could be implemented using a processing circuit that is dedicated hardware, and the inference unit 202 could be implemented by the processor 301 reading and executing the program stored in the memory 302.
[0157] As described above, in the data processing apparatus of Embodiment 1, the encoding unit 11 encodes the layer structure information and the layer update flag. When the layer update flag indicates an update of the layer structure, a new layer flag is encoded. Only information related to the updated layer in the data representing the structure of the NNN is encoded and transmitted, thus reducing the data size of the data representing the structure of the NNN.
[0158] Furthermore, the encoding unit 11 encodes information representing the structure of the NN, generating encoded data consisting of header information and encoded data of layer units. Since only layer-related information required by the decoding side can be encoded, the processing load of encoding information related to the structure of the NN can be reduced, thus decreasing the amount of data transmitted to the decoding side.
[0159] In the data processing apparatus of Embodiment 1, the encoding unit 11 encodes the weight information of the edges belonging to the layer of the NNN, starting from the upper bit and working in units of bit planes. This reduces the size of the encoded data transmitted to the decoding side.
[0160] In the data processing apparatus of Embodiment 1, the encoding unit 11 encodes information related to one or more layers specified by the header information. Therefore, by encoding only information related to the layers required by the decoding side, the size of the encoded data transmitted to the decoding side can be reduced.
[0161] In the data processing apparatus of Embodiment 1, the encoding unit 11 encodes the difference between the weight value and a specific value of the edge belonging to the layer specified by the title information. This reduces the size of the encoded data transmitted to the decoding side.
[0162] In the data processing apparatus of Embodiment 1, the encoding unit 11 encodes the edge weight information into basic encoded data and enhanced encoded data. This enables encoded data transmission that corresponds to the transmission bandwidth and transmission allowable time of the data transmission network 2.
[0163] Furthermore, the present invention is not limited to the above embodiments, and can be freely combined with each of the embodiments, or any structural elements of each embodiment can be modified or omitted within the scope of the present invention.
[0164] Industrial utilization
[0165] The data processing apparatus of the present invention can be used, for example, in image recognition technology.
[0166] Label Explanation
[0167] 1 server, 2 data transmission network, 3-1 to 3-N clients, 10 and 10A data processing units, 10-1 to 10-9 and 11-1 to 11-3 nodes, 11 encoding unit, 12 decoding unit, 12-1 to 12-15 edges, 20 cores, 101 and 101A learning units, 102 evaluation unit, 103 control unit, 201 decoding unit, 202 inference unit, 300 processing circuit, 301 processor, 302 memory.
Claims
1. A data processing apparatus, characterized by, The data processing device has: A data processing unit that learns a neural network, which takes an image or sound as input; and The encoding unit generates encoded data that encodes model title information for identifying the model of the neural network, layer title information for identifying a specified layer of the neural network, and weight information of each edge of the specified layer identified by the layer title information. The specified layer is a newly added layer relative to the reference model or an updated layer relative to the reference model. The encoded data includes layer structure information representing the layer structure of the neural network and a new layer flag indicating whether each of the encoded specified layers is an update of a layer of the reference model or a new layer. The encoded data is transmitted by the data processing device to the data processing device on the decoding side via a data transmission network.
2. The data processing apparatus according to claim 1, characterized in that, The encoding unit encodes the weight information of the edges belonging to the layer, starting from the upper bit and using the bit plane as the unit.
3. The data processing apparatus according to claim 1 or 2, characterized in that, The encoding unit encodes the weight information of edges belonging to one or more layers identified by the layer title information.
4. The data processing apparatus according to claim 1 or 2, characterized in that, The encoding unit encodes the difference between the weight value of the edge and a specific value.
5. The data processing apparatus according to claim 1 or 2, characterized in that, The encoding unit divides the edge weight information into basic encoding data and enhanced encoding data for encoding. The basic encoded data is data generated by quantizing the weights of the edges and encoding the quantized weights, while the enhanced encoded data is data generated by encoding the quantization error as a weight.
6. The data processing apparatus according to claim 1 or 2, characterized in that, The data processing device includes a decoding unit that decodes the encoded data generated by the encoding unit. The data processing unit uses the information decoded by the decoding unit to learn the neural network.
7. A data processing system, characterized in that, The data processing system has a first data processing device and a second data processing device. The first data processing device has: A first data processing unit, which learns a neural network that takes an image or sound as input; and An encoding unit generates encoded data that encodes model title information for identifying the model of the neural network, layer title information for identifying specified layers of the neural network, and weight information of each edge belonging to the specified layer identified by the layer title information. The specified layer is either a newly added layer relative to a reference model or a layer updated relative to the reference model. The encoded data includes layer structure information representing the layer structure of the neural network and a new layer flag indicating whether each of the encoded specified layers is an update of a layer in the reference model or a new layer. The encoded data is transmitted by the first data processing device to the second data processing device via a data transmission network. The second data processing device has: A decoding unit that decodes the encoded data generated by the encoding unit; The second data processing unit uses the information decoded by the decoding unit to generate the neural network and performs data processing using the neural network.
8. The data processing system according to claim 7, characterized in that, The encoding unit encodes information related to the intermediate layers of the neural network. The second data processing device performs data processing using data output from the intermediate layer of the neural network as feature quantities.
9. A data processing method, characterized in that, The data processing method includes the following steps: The data processing unit learns a neural network that takes an image or sound as input; and The encoding unit generates encoded data that encodes model title information for identifying the model of the neural network, layer title information for identifying a specified layer of the neural network, and weight information of each edge of the specified layer identified by the layer title information. The specified layer is a newly added layer relative to the reference model or an updated layer relative to the reference model. The encoded data includes layer construction information representing the layer structure of the neural network and a new layer flag indicating whether each of the encoded specified layers is an update of a layer of the reference model or a new layer. The encoded data is transmitted to the decoding side via a data transmission network.
Citation Information
Patent Citations
Image transmission method, device and system
CN107172428A
Neural network model processing method, device and terminal
CN109409518A