A data processing method, apparatus, and neural network processor chip
By concatenating feature vectors in edge computing devices and processing them using a streaming 2D convolutional network, the problem of high computational cost of 2D convolutional networks is solved, achieving efficient convolution processing suitable for edge computing devices.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SMARTER SILICON (SHANGHAI) TECH CO LTD
- Filing Date
- 2023-02-21
- Publication Date
- 2026-06-30
AI Technical Summary
The existing two-dimensional convolutional network structure requires a huge amount of computation, making it impossible to deploy neural networks on edge computing devices.
By concatenating the newly acquired feature vector with the historical vector sequence to form the input feature vector, and using the target convolution kernel for convolution processing, the amount of computation is reduced. A streaming 2D convolutional network is used for speech recognition or voice wake-up, and only the latest input data frame is calculated.
It reduces computational load, improves the efficiency and accuracy of convolution processing, is suitable for edge computing devices, and reduces the demand for computing resources.
Smart Images

Figure CN116306809B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of neural network technology, and includes, but is not limited to, a data processing method, an apparatus, and a neural network processor chip. Background Technology
[0002] In related technologies, neural network structures often incorporate a large number of two-dimensional convolutional network structures. However, the computational cost of two-dimensional convolutional network structures is enormous, especially when these neural networks are applied to edge computing devices, where the huge computational requirements make it impossible to deploy the algorithm network. Summary of the Invention
[0003] In view of this, embodiments of this application provide a data processing method, apparatus, and neural network processor chip.
[0004] In a first aspect, embodiments of this application provide a data processing method, the method comprising: acquiring a first feature vector corresponding to a first data frame; concatenating the first feature vector with a historical vector sequence containing multiple second feature vectors along the segmentation dimension of the first data frame to obtain an input feature vector; wherein each second feature vector corresponds to a second data frame, and the acquisition time of the second data frame is earlier than that of the first data frame; and performing convolution processing on the input feature vector using a target convolution kernel to obtain an output feature vector.
[0005] Secondly, embodiments of this application provide a data processing apparatus, comprising: a first acquisition module, configured to acquire a first feature vector corresponding to a first data frame; a concatenation module, configured to concatenate the first feature vector with a historical vector sequence containing multiple second feature vectors along the segmentation dimension of the first data frame to obtain an input feature vector; wherein each second feature vector corresponds to a second data frame, and the acquisition time of the second data frame is earlier than that of the first data frame; and a first processing module, configured to perform convolution processing on the input feature vector using a target convolution kernel to obtain an output feature vector.
[0006] Thirdly, embodiments of this application provide a neural network processor chip, including multiple computing units and a processor; the processor is configured to acquire the size of the network structure parameters of a convolutional neural network, determine the number of computing units in each convolutional layer of the network structure based on the size of the network structure parameters, and allocate a corresponding number of computing units to each convolutional layer; wherein, the network structure parameters include at least the number of convolutional layers; the corresponding number of computing units of the convolutional layer is used to perform the convolution processing on the corresponding convolutional layer. Attached Figure Description
[0007] Figure 1 This is a flowchart illustrating a data processing method according to an embodiment of this application;
[0008] Figure 2 This is a flowchart illustrating another data processing method according to an embodiment of this application;
[0009] Figure 3 This is a schematic diagram illustrating a method for updating a historical vector sequence according to an embodiment of this application;
[0010] Figure 4 This is a schematic diagram illustrating the training process of a convolutional neural network according to an embodiment of this application;
[0011] Figure 5 This is a schematic diagram of a convolution processing procedure according to an embodiment of this application;
[0012] Figure 6 This is a flowchart illustrating a convolution processing method according to an embodiment of this application;
[0013] Figure 7 This is a schematic diagram of another convolution processing procedure according to an embodiment of this application;
[0014] Figure 8 This is a flowchart illustrating another convolution processing method according to an embodiment of this application;
[0015] Figure 9 This is a schematic diagram of the composition structure of a data processing device according to an embodiment of this application;
[0016] Figure 10 This is a schematic diagram of the composition structure of a neural network processor chip according to an embodiment of this application;
[0017] Figure 11 This is a schematic diagram of the hardware entity of an electronic device according to an embodiment of this application. Detailed Implementation
[0018] The technical solution of this application will be further described in detail below with reference to the accompanying drawings and embodiments.
[0019] Figure 1 This is a flowchart illustrating a data processing method according to an embodiment of this application, as shown below. Figure 1 As shown, the method includes the following steps:
[0020] Step 102: Obtain the first feature vector corresponding to the first data frame;
[0021] The multiple first data frames can be arranged in a preset order, which can be a chronological order or another order. One or more first data frames can be acquired each time according to the preset order, and the number of frames can be less than a preset number. When acquiring multiple first data frames each time, feature processing can be performed on each first data frame to obtain a first feature vector corresponding to that first data frame. The first feature vector 21 corresponding to each first data frame can be represented as [C]. in D in W a ], where C in It can represent the number of input channels, D in W can represent the number of input features (i.e., the number of rows) in the feature dimension of the first data frame. a This can represent the size of the first data frame in the segmentation dimension (i.e., the number of columns); such as Figure 2 As shown, the C of the first feature vector 21 in =1,D in =7, W a =1; It should be noted that, in terms of the segmentation dimension, a data frame can also include multiple columns of data; a data frame can also be a single column of data.
[0022] Step 104: Concatenate the first feature vector with a historical vector sequence containing multiple second feature vectors along the segmentation dimension of the first data frame to obtain the input feature vector;
[0023] Each of the second feature vectors corresponds to a second data frame, and the acquisition time of the second data frame is earlier than that of the first data frame.
[0024] The segmentation dimension can be a time dimension or other preset dimensions; the historical vector sequence can be stored in a buffer, and the historical vector sequence 22 can be represented as [C in D in W b ], where C in It can represent the number of input channels, D in W can represent the number of input features in the feature dimension of the second data frame. b This indicates the size of the second data frame in the partitioning dimension (i.e., the number of columns); for example... Figure 2 As shown, the C of the historical vector sequence 22 in =1,D in =7, W b =2.
[0025] The first feature vector 21 and the historical vector sequence 22 are concatenated to obtain the input feature vector 23, which can be represented as [C]. in Din W c ], C in It can represent the number of input channels, D in W can represent the number of input features in the feature dimension of the input feature vector. c This represents the number of columns in the segmentation dimension of the input feature vector; for example... Figure 2 As shown, a column of data can include a data frame, and the input feature vector 23 has C. in =1,D in =7, W c =3, that is, two second data frames and one first data frame are spliced together to form an input feature vector 23 containing three data frames.
[0026] Step 106: Perform convolution processing on the input feature vector using the target convolution kernel to obtain the output feature vector.
[0027] Wherein, the target convolutional kernel 24 can be represented as [C out C in ,K d ,K t ], where C out Indicates the number of output channels, C in K represents the number of input channels. d K represents the size of the target convolutional kernel in the feature dimension. t Indicates the size of the target convolutional kernel in the segmentation dimension, such as Figure 2 As shown, the C of the target convolution kernel out =1,C in =1,K d =3,K t =3, the convolution stride is [2,1], that is, the convolution stride of the target convolution kernel in the feature dimension is 2, and the convolution stride in the segmentation dimension is 1; the output feature vector 25 can be represented as [C out D out W e ], C out It can represent the number of output channels, D out W can represent the number of output features in the feature dimension of the output feature vector. e This indicates the number of columns in the output feature vector along the segmentation dimension; for example... Figure 2 As shown, the C of the output feature vector 25 out =1,D out =5, W e =1.
[0028] In some embodiments, the kernel size can be represented as [16, 8, 7, 3], that is, the number of output channels is 16, the number of input channels is 8, the size on the feature axis is 7, and the size on the time axis is 3; the size of a single target convolutional kernel can be represented as [7, 3], that is, the size on the feature axis is 7, and the size on the time axis is 3; the number of frames in the first data frame is 1000 frames, and the size of a single first data frame can be represented as [1, 8, 128, 1], that is, the number of output channels is 1, the number of input channels is 8, and the size on the feature axis is 1. 28, with a size of 1 on the time axis, and the number of convolutional layers in the convolutional neural network is 1. Tests show that when using the convolution processing method in related technologies for convolution operations, the average number of clock cycles per frame for a single-core CPU is 643. When using the convolution processing method in this embodiment for convolution operations, the average number of clock cycles per frame for a single-core CPU is 33.4. The smaller the number of clock cycles consumed by different convolution processing methods, the lower the complexity. Therefore, it can be seen that the convolution processing method in this embodiment can greatly reduce the amount of computation.
[0029] In this embodiment, the newly acquired first feature vector is concatenated with a historical vector sequence containing multiple second feature vectors to obtain an input feature vector, and the input feature vector is convolved to obtain an output feature vector. This avoids full data computation and only calculates the first feature vector corresponding to the latest input frame or multiple frames of the first data frame, reducing computational workload and improving convolution processing efficiency.
[0030] In some embodiments, the number of dimensions of the output feature vector is the same as the number of dimensions of the first data frame; and / or, the number of dimensions of the target convolutional kernel is greater than the number of dimensions of the first data frame.
[0031] In some embodiments, when the target convolution kernel is a three-dimensional vector, the first data frame can be a two-dimensional vector, that is, two-dimensional data is processed by a three-dimensional convolution kernel. When the target convolution kernel is a three-dimensional vector, the first data frame can be a one-dimensional vector, that is, one-dimensional data is processed by a two-dimensional convolution kernel.
[0032] like Figure 2 As shown, the number of dimensions of the output feature vector 25 and the first data frame are both 1, that is, the output feature vector 25 and the first feature vector are both one-dimensional vectors; the number of dimensions of the target convolution kernel is 2, that is, the target convolution kernel is a two-dimensional vector, which is greater than the number of dimensions of the first data frame.
[0033] like Figure 2As shown, the first feature vector 21 in one dimension (i.e., feature dimension) and the historical vector sequence 22 in two dimensions (i.e., feature dimension and segmentation dimension) are concatenated to form the input feature vector 23 in two dimensions (i.e., feature dimension and segmentation dimension). The input feature vector 23 can be convolved by the target convolution kernel 24 in two dimensions (i.e., feature dimension and segmentation dimension). At this time, the target convolution kernel 24 is slid along the feature dimension of the input feature vector 23 to perform convolution processing, and the output feature vector 25 in one dimension (i.e., feature dimension) is obtained.
[0034] Similarly, when the first feature vector is a two-dimensional vector (i.e., feature dimension and channel dimension) and the target convolution kernel is a three-dimensional convolution kernel, the two-dimensional first feature vector and the three-dimensional historical vector sequence (i.e., feature dimension, segmentation dimension and channel dimension) can be concatenated to form a three-dimensional input feature vector (i.e., feature dimension, segmentation dimension and channel dimension). The input feature vector can be convolved by the three-dimensional target convolution kernel (i.e., feature dimension, segmentation dimension and channel dimension). At this time, the target convolution kernel 24 is slid across the feature dimension and channel dimension of the input feature vector 23 to perform convolution processing, resulting in a two-dimensional output feature vector (i.e., feature dimension and channel dimension).
[0035] In this embodiment, the number of dimensions of the target convolution kernel is greater than the number of dimensions of the first data frame. Therefore, three-dimensional convolution can be used to process two-dimensional vectors, or two-dimensional convolution can be used to process one-dimensional vectors. Compared with the related technologies that use three-dimensional convolution to process three-dimensional vectors or two-dimensional convolution to process two-dimensional vectors, the computational workload can be reduced and the accuracy and efficiency of convolution processing can be improved.
[0036] In some embodiments, the method further includes:
[0037] Step 1011: Obtain the first time series signal to be identified;
[0038] The first time-series signal may include vibration signals, electrocardiogram signals, voice signals, video stream signals, etc.
[0039] Step 1012: Divide the first time series signal into multiple first data frames in the time dimension according to the time sequence;
[0040] In some embodiments, the method further includes:
[0041] Step 1013: Determine the number of the second data frames by the difference between the size of the target convolutional kernel in the segmentation dimension and the number of the first data frames.
[0042] Wherein, the size of the target convolutional kernel in the segmentation dimension is represented as K. tThe number of the first data frames, i.e., the number of columns of the first feature vector corresponding to the first data frame in the segmentation dimension, can be expressed as W. a The number of second data frames, i.e., the number of columns of the second data frame in the segmentation dimension, can be expressed as W. b K t =W a +W b .
[0043] In this embodiment, the size of the feature map after buffer splicing is the same as the size of the target convolution kernel in the segmentation dimension. During the convolution process, the target convolution kernel does not need to be moved and calculated in the segmentation dimension, which can reduce the amount of computation and improve the efficiency of convolution operation.
[0044] Correspondingly, step 102, "obtaining the first feature vector corresponding to the first data frame," includes:
[0045] Step 1021: Obtain the first feature vector corresponding to the first data frame for each frame;
[0046] In this process, the first data frame can be fed into the convolutional neural network one frame at a time, according to the timing of the first data frame.
[0047] In this process, each time a first data frame (W) is acquired, the first data frame is obtained. a =1, at this time the number of second data frames is W b =K t -W a =K t -1.
[0048] Step 106, "Convolutional processing of the input feature vector using the target convolution kernel to obtain the output feature vector," includes:
[0049] Step 1061: Perform convolution processing by sliding the target convolution kernel along the feature dimension of the input feature vector to obtain the output feature vector.
[0050] Since the sum of the number of columns in the segmentation dimension of the first data frame and the second data frame is equal to the size of the target convolution kernel in the segmentation dimension, the target convolution kernel can be used for convolution processing by simply sliding the target convolution kernel in the feature dimension.
[0051] In this embodiment, the convolution kernel only needs to move and operate in the horizontal feature dimension of the input feature vector. When calculating in the feature dimension, it does not need to move vertically from the left boundary to the right boundary of the two-dimensional input feature vector. It only needs to calculate the newly added first data frame. Each time, the output is not a feature map, but the output feature vector corresponding to the first data frame, which meets the requirements of time serialization, greatly reduces the amount of computation, and improves the efficiency of convolution operation.
[0052] In some embodiments, the method further includes: acquiring a signal to be identified;
[0053] The data is segmented along a set segmentation dimension to obtain multiple frames of the first data frame.
[0054] It should be noted that the segmentation dimension is not limited to the time dimension.
[0055] In this embodiment of the application, the signal to be identified may not be limited to time series signals, and the segmentation dimension may not be limited to the time dimension, thereby enabling convolution processing of signals other than time series signals, increasing the diversity of processed signals.
[0056] In some embodiments, the number of the first data frames is a first number, and the method further includes:
[0057] Step 107: Store the first number of the first feature vectors in the historical vector sequence;
[0058] Step 108: Remove the first number of second feature vectors that are earlier in the time sequence from the historical vector sequence.
[0059] Among them, such as Figure 3 As shown, the historical vector sequence before the update may include second feature vectors 31 and 32. The first quantity can be 1, 3, or 7, etc. The first feature vector saved in the historical vector sequence should be consistent with the second feature vector removed from the historical vector sequence. When the first quantity is 1, a first feature vector 33 is saved in the historical vector sequence, and the second feature vector 31 that appears earlier in the time sequence is removed from the historical vector sequence to obtain the updated historical vector sequence. The updated historical vector sequence may include the second feature vector 32 and the first feature vector 33.
[0060] In some embodiments, the convolution processing is performed using a convolutional neural network model, wherein the training method of the convolutional neural network model includes:
[0061] Step 1051: Obtain the second time series signal for training;
[0062] Step 1052: Perform feature extraction on the second time series signal to obtain a first feature matrix corresponding to the second time series signal. The first feature matrix includes a third feature vector corresponding to each of the multiple time points in the segmentation dimension.
[0063] Among them, such as Figure 4As shown, the segmentation dimension can be a time dimension. The first feature matrix 41 includes six third feature vectors 411 to 416 arranged in time sequence, with each third feature vector corresponding to a different time. In the first feature matrix, C... in =1,D in =7, W a =6.
[0064] Step 1053: The target convolutional kernel is slid across the feature dimension and segmentation dimension of the third feature vector to perform convolution processing in order to train the convolutional neural network.
[0065] In the training process of the convolutional neural network, the standard two-dimensional convolution method in related technologies can still be used. That is, the number of dimensions of the first data frame and the number of dimensions of the convolution kernel are the same. That is, if the first data frame is one-dimensional, the convolution kernel is also one-dimensional; if the first data frame is two-dimensional, the convolution kernel is also two-dimensional. By sliding the target convolution kernel 42 to the feature dimensions and segmentation dimensions of the third feature vectors 411 to 416 for convolution processing, the output feature vector 43 can be obtained; such as Figure 4 As shown, K in the target convolution kernel 42 d =3,K t =3; C in the output feature vector 43 out =1,D out =5, W e =4.
[0066] In this embodiment, since the convolutional neural network still uses the standard two-dimensional convolution method for computation, it can improve both the accuracy of the training process and the efficiency of the application process.
[0067] In some embodiments, the method further includes:
[0068] Step 1054: Determine the number of computational units in each convolutional layer of the network structure based on the size of the network structure parameters of the convolutional neural network;
[0069] Step 1055: Assign a corresponding number of computation units to each of the convolutional layers; wherein the network structure parameters include at least the number of the convolutional layers;
[0070] Correspondingly, the "convolution processing through a convolutional neural network model" includes: using the corresponding number of computational units of the convolutional layer to perform the convolution processing on the corresponding convolutional layer.
[0071] In this embodiment, the number of computational units is determined based on the size of the network structure parameters of the convolutional neural network, thereby rationally allocating the computational resources of each convolutional layer, reducing resource waste, and improving computational efficiency.
[0072] In some embodiments, the network structure parameters further include at least one of the following: the number of convolutional kernels corresponding to each convolutional layer, the number of second data frames, and the number of convolutions in the feature dimension;
[0073] Step 1054, "determining the number of computational units in each convolutional layer of the network structure based on the size of the network structure parameters of the convolutional neural network," includes:
[0074] Step 10542: Determine the number of computational units in each convolutional layer of the network structure based on at least one of the number of convolutional layers, the number of convolutional kernels corresponding to each convolutional layer, the number of second data frames, and the number of convolutions in the feature dimension;
[0075] The method further includes: step 10541: determining the number of convolutions in the feature dimension of the corresponding convolutional layer based on the convolution stride of each convolutional layer in the feature dimension.
[0076] In this embodiment of the application, the number of operation units in each convolutional layer can be determined based on at least one of the number of convolutional layers, the number of convolutional kernels in each convolutional layer, the number of second data frames, and the number of convolutions in the feature dimension, thereby enabling a more accurate determination of the number of operation units in each convolutional layer.
[0077] In related technologies, the neural network structure of intelligent speech typically incorporates numerous 2D convolutional network structures. For example, 2D convolutions are extensively used in speech recognition and voice wake-up. However, 2D convolutions involve enormous computational demands, especially when these neural networks are applied to edge computing devices, where the massive computational requirements can render the algorithm network unusable.
[0078] In this embodiment, speech recognition or voice wake-up can be performed using a streaming 2D convolutional network. Throughout the process, data frames are sequentially fed into the streaming convolutional network in chronological order. The size of the convolutional kernel can be denoted as [C]. out C in ,K d ,K t ], where C out Indicates the number of output channels, C in K represents the number of input channels. d K represents the size of the convolution kernel along the feature axis (i.e., the feature dimension). t To determine the size of the convolution kernel along the time axis (i.e., the time dimension), a [C] can be initialized. in Din ,K t The buffer is defined as [-1], such as... Figure 2 As shown, K d =3,K t =3, convolution stride is [2,1], buffer size is [C in D in [2]. The convolution processing flow in the embodiments of this application may include the following steps:
[0079] Step S201: Input a data frame, the shape of which is denoted as [C in D in ,1];
[0080] The input data frame can also be referred to as the first data frame.
[0081] Step S202: Combine the input data frame with the shape [C in D in The buffers of [,2] are concatenated on the time axis (the last dimension) to form [C in D in The input feature vector of [3] ensures that the feature map satisfies the receptive field of the convolution on the time axis (there is enough data for computation);
[0082] The buffer may contain two second data frames.
[0083] Step S203: Perform two-dimensional convolution calculation to obtain the output features [C] out D out ,1];
[0084] Step S204: Update the input feature buffer;
[0085] Among them, the shape of the buffer after adding the first data frame is [C in D in The first frame (the second data frame that is the first in the time sequence) of the feature data in [3] is removed from the buffer, and the remaining frames are retained.
[0086] In this embodiment of the application, the above steps can avoid the full data calculation of 2D convolution, and only the latest input frame (i.e. the first data frame) and its related features (i.e. the first feature vector) are calculated. Moreover, the longer the horizontal dimension (i.e. the feature dimension), the more computing power is saved.
[0087] In the training process, this embodiment still uses the 2D convolution method in related technologies. The above steps S201 to S204 are only applicable to the inference part (i.e. the application processing of the convolutional neural network). This can improve the correctness and accuracy of the training process, and at the same time greatly improve the efficiency of inference operations.
[0088] This application embodiment takes into account the characteristic that speech is a time-series signal and divides the speech signal into frames; in the related technology, 2D convolution needs to perform shift operations in the horizontal and vertical dimensions (i.e., the time dimension) of the data each time it is operated, while this application embodiment only needs to perform shift operations in the horizontal dimension.
[0089] In this embodiment of the application, when performing calculations in the horizontal dimension, it is not necessary to calculate from the left boundary to the right boundary of the 2D data; it is only necessary to calculate the newly added frame of data.
[0090] In this embodiment of the application, the network outputs a 2D feature map each time, but instead outputs a feature vector corresponding to the data frame, which meets the requirements of time serialization.
[0091] In this embodiment, instead of calculating an entire 2D feature in each operation, a new frame unit is added, which greatly reduces the amount of computation. The benefit is even greater when the 2D convolutional network is larger.
[0092] Compared with the 1D convolutional networks in related technologies, although the input vector in this embodiment is also a one-dimensional vector each time, 1D convolution itself is different from 2D convolution in terms of dimension. The scope of this embodiment is for streaming processing of 2D convolution.
[0093] In related technologies, 2D convolutional networks are typically used to compute 2D feature maps. This involves a convolutional computation process where the target convolutional kernel slides along the segmentation dimension of the input feature vector and along the feature dimension of the input feature vector, essentially performing a full data coverage computation to obtain the output feature vector. However, in this embodiment, a new one-dimensional vector is fed into a streaming 2D convolutional network. While still using a 2D convolutional network, full data coverage is not required. This embodiment divides the vertical dimension into a corresponding number of parallel convolutional queues based on the number of vertical convolutions. Each convolutional queue buffers data of the horizontal dimension of the convolutional kernel. When a new frame of data is received, the entire 2D convolutional network is updated. The oldest frame data is discarded on the left side of the buffer, and the new frame data is added on the right side. Then, each convolutional queue moves to the right once for convolution, and the results of each convolutional queue are output to form a one-dimensional convolutional vector. The entire convolutional processing process is as follows: Figure 5 As shown, after receiving a new first data frame 51, the earliest frame data is discarded at the far left of the buffer 52, and the new first data frame 51 is added to the right. The convolution queue in the buffer moves to the right and performs one convolution to form the output feature vector 54.
[0094] Finally, the output one-dimensional vector (i.e., the output feature vector) can be decoded to obtain the current output frame; or it can be passed into the next layer of the convolutional neural network. In this embodiment, the above convolution processing method can avoid the full data computation of 2D convolution, and the longer the horizontal dimension, the more computational power is saved.
[0095] Figure 6 This is a flowchart illustrating a convolution processing method according to an embodiment of this application, as shown below. Figure 6 As shown, the method includes the following steps:
[0096] Step 601: Initialize streaming 2D convolution;
[0097] Among them, the kernel size of 2D convolution and the number of vertical convolution queues can be initialized;
[0098] Step 602: Feed the new frame of data into the streaming 2D convolutional network;
[0099] Step 603: Update the entire 2D convolutional network;
[0100] Step 604: Each vertical convolution queue moves to the right, and the convolution is calculated once;
[0101] Step 605: Assemble the outputs of each convolution queue into a one-dimensional output vector;
[0102] Step 606: Decode the one-dimensional output vector to obtain the predicted frame (i.e., the output frame).
[0103] The process of performing convolution operations via NPU in this embodiment of the application may include the following steps:
[0104] Step S301: First, inform the NPU of the network structure of the streaming 2D convolutional network, i.e. how many streaming 2D convolutional layers there are, and determine the input and output parameters of each layer;
[0105] Among them, such as Figure 7 As shown, the network structure 71 of the NPU 2D convolutional network needs to be provided.
[0106] Step S302: Notify the NPU of the number of convolution kernels for each 2D convolution layer, and pass the weight parameters of each convolution kernel to the NPU, so that the NPU can request the corresponding number of computing units and initialize according to the passed weight parameters of each convolution kernel;
[0107] Step S303: Notify each computing unit of the NPU of the number of vertical convolutions in each convolutional layer, and the computing unit of the NPU performs parallel data loading for each convolutional layer;
[0108] Specifically, the number of vertical convolutions in each convolutional layer can be communicated to each computational unit of the NPU, such as... Figure 7 As shown, the number of vertical convolutions 73 of the convolutional layer 72 can be notified to each computational unit of the NPU.
[0109] Step S304: After feature processing of a frame of data, a one-dimensional vector of the frame of data is obtained. After quantization of the one-dimensional vector, it is passed to the NPU.
[0110] Among them, such as Figure 7 As shown, a data frame 741 can be selected from the data frame sequence 74 in chronological order, and a one-dimensional vector can be obtained by feature processing of the data frame. The one-dimensional vector (i.e. the first feature vector) 741 is then quantized and passed to the NPU.
[0111] Step S305: After the NPU receives a new frame of data, such as the Nth frame of data, it first sends the Nth frame of data into the first layer of the streaming 2D convolutional network and outputs one-dimensional data to the second layer of the convolutional network. At the same time, the second layer of the convolutional network outputs the previous N-1 frames of data to the third layer of the streaming 2D convolutional network, and so on.
[0112] In this way, NM frames of data 75 can be output from the Mth convolutional layer.
[0113] This approach differs significantly from the NPU computing method in related technologies. The NPU method in related technologies maximizes the use of computing units at each layer to compute the network at that layer. However, due to uneven computation across layers—some layers may have a large computational load while others have a small one—this leads to significant waste of computing resources. The embodiments of this application maximize the utilization of NPU computing units, thereby improving computational efficiency.
[0114] It should be noted that, firstly, the 2D convolution on the NPU, if based on the 2D convolution operators in related technologies, cannot support the streaming 2D convolution operation in this application embodiment, at least it cannot achieve the purpose of efficient operation. Secondly, the 2D convolution of the NPU in related technologies is a full data coverage operation, while this application embodiment only needs to calculate the latest frame part of the current period, and takes advantage of the characteristic that speech is a streaming sequence, the NPU operation is also designed to be streaming.
[0115] In this embodiment of the application, the parallel convolution kernel operation of the NPU (Neural-network Processing Unit, embedded neural network processor) can be used;
[0116] First, based on the number of convolution kernels used in each layer of the 2D convolutional network, a corresponding number of NPU computing units are generated. Within each computing unit, the number of vertical convolutions of the 2D convolutional network is obtained based on the vertical convolution stride. After all the data is packaged according to the number of convolution kernels and the number of vertical convolutions within each convolution kernel, it is sent to the NPU. The NPU performs calculations according to the previously agreed-upon computing rules, and can quickly obtain the computing results.
[0117] Figure 8 This is a flowchart illustrating a convolution processing method according to an embodiment of this application, as shown below. Figure 8 As shown, the method includes the following steps:
[0118] Step 801: Input a new frame of data;
[0119] Step 802: Inform about the NPU network structure and establish the NPU pipeline;
[0120] Step 803: Inform the NPU of the network structure parameters for each convolutional layer, including the number of convolutional kernels and the number of vertical convolutions;
[0121] Step 804: NPU completes network construction;
[0122] Step 805: Feed the data frame into the convolutional layer;
[0123] Among them, such as Figure 7 As shown, the new N frames are fed into the first convolutional layer, and the previous N-1 frames are output to the second convolutional layer. The N-2 frames from the second convolutional layer are output to the third convolutional layer, and so on.
[0124] Step 806: Allocation of computational units for each convolutional layer;
[0125] In each layer of the streaming 2D network, the following is performed: Figure 7 The diagram shows the distribution of convolution kernel operation units and the parallel computation within the convolution kernel.
[0126] Step 807: Decode the output feature vector;
[0127] Specifically, NM frames of data can be output from the Mth convolutional layer, and the output results can be assembled and decoded to obtain the current prediction frame.
[0128] It should be noted that, in the embodiments of this application, if the above-described data processing method is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the embodiments of this application, or the part that contributes to the related technology, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause an electronic device (which may be a mobile phone, tablet computer, desktop computer, personal digital assistant, navigator, digital phone, video phone, television, sensing device, etc.) to execute all or part of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, mobile hard drives, read-only memory (ROM), magnetic disks, or optical disks. Thus, the embodiments of this application are not limited to any specific hardware and software combination.
[0129] Figure 9 This is a schematic diagram of the composition structure of a data processing device according to an embodiment of this application, as shown below. Figure 9 As shown, the device 900 includes: a first acquisition module 901, a splicing module 902, and a first processing module 903, wherein:
[0130] The first acquisition module 901 is used to acquire the first feature vector corresponding to the first data frame;
[0131] The splicing module 902 is used to splice the first feature vector with a historical vector sequence containing multiple second feature vectors along the segmentation dimension of the first data frame to obtain an input feature vector; wherein each second feature vector corresponds to a second data frame, and the acquisition time of the second data frame is earlier than that of the first data frame.
[0132] The first processing module 903 is used to perform convolution processing on the input feature vector using the target convolution kernel to obtain the output feature vector.
[0133] In some embodiments, the number of dimensions of the output feature vector is the same as the number of dimensions of the first data frame; and / or, the number of dimensions of the target convolutional kernel is greater than the number of dimensions of the first data frame.
[0134] In some embodiments, the apparatus further includes: a second acquisition module, configured to acquire a first time-series signal to be identified; a segmentation module, configured to segment the first time-series signal according to time sequence in the time dimension to obtain multiple first data frames; the first acquisition module 901 includes: a first acquisition submodule, configured to acquire a first feature vector corresponding to one first data frame at a time; the first processing module includes: a first processing submodule, configured to perform convolution processing by sliding the target convolution kernel along the feature dimension of the input feature vector to obtain an output feature vector.
[0135] In some embodiments, the apparatus further includes: a first determining module, configured to determine the difference between the size of the target convolutional kernel in the segmentation dimension and the number of the first data frames as the number of the second data frames.
[0136] In some embodiments, the number of the first data frames is a first quantity, and the apparatus further includes: a storage module for storing the first quantity of the first feature vectors in a historical vector sequence; and a removal module for removing the first quantity of the second feature vectors that are earlier in the time sequence from the historical vector sequence.
[0137] In some embodiments, the apparatus further includes: a second processing module for performing the convolution processing using a convolutional neural network model; a third acquisition module for acquiring a second time-series signal for training; an extraction module for extracting features from the second time-series signal to obtain a first feature matrix corresponding to the second time-series signal, wherein the first feature matrix includes a third feature vector corresponding to each of the multiple time points in the segmentation dimension; and a third processing module for performing convolution processing by sliding the target convolutional kernel across the feature dimension and segmentation dimension of the third feature vector to train the convolutional neural network.
[0138] In some embodiments, the apparatus further includes: a second determining module, configured to determine the number of computational units in each convolutional layer of the network structure based on the size of the network structure parameters of the convolutional neural network; an allocation module, configured to allocate a corresponding number of computational units to each convolutional layer; wherein the network structure parameters include at least the number of convolutional layers; and a second processing module, comprising: a second processing submodule, configured to perform the convolution processing on the corresponding convolutional layer using the corresponding number of computational units of the convolutional layer.
[0139] In some embodiments, the network structure parameters further include at least one of the number of convolutional kernels corresponding to each convolutional layer, the number of second data frames, and the number of convolutions in the feature dimension; the second determining module includes: a determining submodule, configured to determine the number of computational units in each convolutional layer of the network structure based on at least one of the number of convolutional layers, the number of convolutional kernels corresponding to each convolutional layer, the number of second data frames, and the number of convolutions in the feature dimension; the apparatus further includes: a third determining module, configured to determine the number of convolutions in the feature dimension of the corresponding convolutional layer based on the convolution stride of each convolutional layer in the feature dimension.
[0140] The descriptions of the above device embodiments are similar to those of the above method embodiments, and have similar beneficial effects. For technical details not disclosed in the device embodiments of this application, please refer to the descriptions of the method embodiments of this application for understanding.
[0141] Correspondingly, embodiments of this application provide a neural network processor chip. Figure 10 This is a schematic diagram of a hardware entity of a neural network processor chip according to an embodiment of this application, such as... Figure 10 As shown, the hardware entity of the chip 1000 includes: multiple processing units 1001 and a processor 1002;
[0142] The processor 1002 is configured to obtain the size of the network structure parameters of the convolutional neural network, determine the number of operation units in each convolutional layer of the network structure based on the size of the network structure parameters, and allocate a corresponding number of operation units 1001 to each convolutional layer; wherein, the network structure parameters include at least the number of convolutional layers.
[0143] The corresponding number of computation units 1001 of the convolutional layer are used to perform the convolution processing for the corresponding convolutional layer.
[0144] Correspondingly, embodiments of this application provide an electronic device, Figure 11 This is a schematic diagram of a hardware entity of an electronic device according to an embodiment of this application, such as... Figure 11 As shown, the hardware entity of the device 1100 includes a memory 1101 and a processor 1102. The memory 1101 stores a computer program that can run on the processor 1102. When the processor 1102 executes the program, it implements the steps in the data processing method in the above embodiments.
[0145] The memory 1101 is configured to store instructions and applications executable by the processor 1102, and can also cache data to be processed or already processed by the processor 1102 and the various modules in the device 1100 (e.g., image data, audio data, voice communication data and video communication data), which can be implemented by flash memory or random access memory (RAM).
[0146] Correspondingly, embodiments of this application provide a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps in the data processing method provided in the above embodiments.
[0147] It should be noted that the descriptions of the storage medium and device embodiments above are similar to those of the method embodiments above, and have similar beneficial effects as the device embodiments. For technical details not disclosed in the storage medium and method embodiments of this application, please refer to the descriptions of the device embodiments of this application for understanding.
[0148] It should be understood that the phrase "one embodiment" or "an embodiment" throughout the specification means that a specific feature, structure, or characteristic related to the embodiment is included in at least one embodiment of this application. Therefore, "in one embodiment" or "in an embodiment" appearing throughout the specification does not necessarily refer to the same embodiment. Furthermore, these specific features, structures, or characteristics can be combined in any suitable manner in one or more embodiments. It should be understood that in the various embodiments of this application, the sequence numbers of the above-described processes do not imply a sequential order of execution; the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application. The sequence numbers of the above-described embodiments are merely descriptive and do not represent the superiority or inferiority of the embodiments.
[0149] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.
[0150] In the several embodiments provided in this application, it should be understood that the disclosed devices and methods can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of units is only a logical functional division, and in actual implementation, there may be other division methods, such as: multiple units or components can be combined, or integrated into another system, or some features can be ignored or not executed. In addition, the coupling, direct coupling, or communication connection between the various components shown or discussed can be through some interfaces, and the indirect coupling or communication connection between devices or units can be electrical, mechanical, or other forms.
[0151] The units described above as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; they may be located in one place or distributed across multiple network units; some or all of the units may be selected to achieve the purpose of this embodiment according to actual needs. Furthermore, the functional units in the embodiments of this application may all be integrated into one processing unit, or each unit may be a separate unit, or two or more units may be integrated into one unit; the integrated unit may be implemented in hardware or in a combination of hardware and software functional units.
[0152] Those skilled in the art will understand that all or part of the steps of the above method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it performs the steps of the above method embodiments. The aforementioned storage medium includes various media capable of storing program code, such as mobile storage devices, read-only memory (ROM), magnetic disks, or optical disks. Alternatively, if the integrated units of this application are implemented as software functional modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of this application, or the parts that contribute to the related technology, can be embodied in the form of software products. These computer software products are stored in a storage medium and include several instructions to cause computer devices (which may be mobile phones, tablets, desktops, personal digital assistants, navigators, digital phones, video phones, televisions, sensing devices, etc.) to execute all or part of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as mobile storage devices, ROMs, magnetic disks, or optical disks.
[0153] The methods disclosed in the several method embodiments provided in this application can be arbitrarily combined to obtain new method embodiments without conflict. The features disclosed in the several product embodiments provided in this application can be arbitrarily combined to obtain new product embodiments without conflict. The features disclosed in the several method or device embodiments provided in this application can be arbitrarily combined to obtain new method embodiments or device embodiments without conflict.
[0154] The above description is merely an embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A data processing method, the method comprising: Obtain the first feature vector corresponding to the first data frame; The first feature vector and a historical vector sequence containing a set number of second feature vectors are concatenated along the segmentation dimension of the first data frame to obtain the input feature vector. Each of the second feature vectors corresponds to a second data frame, and the acquisition time of the second data frame is earlier than that of the first data frame. The target convolution kernel is used to slide along the feature dimension of the input feature vector to perform convolution processing, resulting in an output feature vector with the same number of dimensions as the first data frame.
2. The method of claim 1, further comprising: The number of dimensions of the target convolutional kernel is greater than the number of dimensions of the first data frame.
3. The method according to claim 1, wherein, The method further includes: Acquire the first time-series signal to be identified; The first time series signal is divided into multiple first data frames according to the time dimension. The step of obtaining the first feature vector corresponding to the first data frame includes: obtaining the first feature vector corresponding to each first data frame at a time.
4. The method according to claim 3, wherein, The method further includes: The difference between the size of the target convolutional kernel in the segmentation dimension and the number of the first data frames is determined as the number of the second data frames.
5. The method according to claim 1, wherein, The number of the first data frames is a first quantity, and the method further includes: Store the first number of the first feature vectors in the historical vector sequence; Remove the first number of second feature vectors that are earlier in the time sequence from the historical vector sequence.
6. The method according to any one of claims 1 to 5, wherein, The convolution process is performed using a convolutional neural network model, and the training method for the convolutional neural network model includes: Acquire a second time-series signal for training; Feature extraction is performed on the second time series signal to obtain a first feature matrix corresponding to the second time series signal. The first feature matrix includes a third feature vector corresponding to each of the multiple time points in the segmentation dimension. The target convolutional kernel is slid across the feature dimension and segmentation dimension of the third feature vector to perform convolution processing in order to train the convolutional neural network.
7. The method according to claim 6, wherein, The method further includes: The number of computational units in each convolutional layer of the network structure is determined based on the size of the network structure parameters of the convolutional neural network. Each convolutional layer is allocated a corresponding number of computational units; wherein, the network structure parameters include at least the number of convolutional layers; The convolution processing via the convolutional neural network model includes: using the corresponding number of computational units of the convolutional layer to perform the convolution processing on the corresponding convolutional layer.
8. The method according to claim 7, wherein, The network structure parameters also include at least one of the following: the number of convolutional kernels corresponding to each convolutional layer, the number of second data frames, and the number of convolutions in the feature dimension; The step of determining the number of computational units in each convolutional layer of the network structure based on the size of the network structure parameters of the convolutional neural network includes: The number of computational units in each convolutional layer of the network structure is determined based on at least one of the following: the number of convolutional layers, the number of convolutional kernels corresponding to each convolutional layer, the number of second data frames, and the number of convolutions in the feature dimension. The method further includes: determining the number of convolutions in the feature dimension of the corresponding convolutional layer based on the convolution stride of each convolutional layer in the feature dimension.
9. A data processing apparatus, the apparatus comprising: The first acquisition module is used to acquire the first feature vector corresponding to the first data frame; The concatenation module is used to concatenate the first feature vector with a historical vector sequence containing a set number of second feature vectors along the segmentation dimension of the first data frame to obtain an input feature vector; wherein each second feature vector corresponds to a second data frame, and the acquisition time of the second data frame is earlier than that of the first data frame. The first processing module is used to perform convolution processing by sliding the target convolution kernel along the feature dimension of the input feature vector to obtain an output feature vector with the same number of dimensions as the first data frame.
10. A neural network processor chip, the chip comprising: Multiple processing units and processors; The processor is configured to acquire the size of the network structure parameters of the convolutional neural network, determine the number of computation units in each convolutional layer of the network structure based on the size of the network structure parameters, and allocate a corresponding number of computation units to each convolutional layer; wherein the network structure parameters include at least the number of convolutional layers; The corresponding number of computation units of the convolutional layer are used to perform convolution processing on the corresponding convolutional layer, and the convolution processing is applied to the data processing method of any one of claims 1-8.