# Neural network design and optimization method based on software and hardware joint learning

## A neural network and optimization method technology, applied in the field of neural network architecture search, can solve the problems of increased parameters, low efficiency, difficult design, etc., to achieve the effect of precision and speed balance, high precision and speed

Pending Publication Date: 2022-01-07

UNIV OF ELECTRONIC SCI & TECH OF CHINA

0 Cites 0 Cited by

## AI-Extracted Technical Summary

### Problems solved by technology

This design approach is inefficient and it is difficult to design a network that far outperforms existing advanced networks

Moreover, there are many structural parameters that can be adju...

### Method used

The 13 neural network models that analyze and select find that introducing attention mechanism can promote the accuracy of model to a certain extent.

[0075] Since the nodes in a structure block can be arranged horizontally, the result has been discussed in the part of the width of the structure block, and the nodes are also arranged vertically to form multiple layers, and the result has been discussed in the part of the depth of the structure block. The total number of nodes is counted without distinguishing width and depth, and since the tot...

## Abstract

The invention discloses a neural network design and optimization method based on software and hardware joint learning. The method comprises the following steps: counting a neural network structure rule; carrying out FPGA hardware characteristic prediction; designing an FPGA neural network structure space; and applying a software and hardware joint learning method in a search space, and by combining random search and block supervised search, obtaining a trunk neural network. Based on the design characteristics of the neural network and the hardware characteristics of the FPGA, a search space with prior information is constructed, which is the direction of search establishment; and meanwhile, by combining random search and block supervised search with FPGA model prediction, an efficient neural network model with precision and speed balance is obtained. According to the model, the Top-1 accuracy rate of 77.2% and the speed of 327.67 FPS on an ImageNet data set are achieved on the aspect of ZCU102.

Application Domain

Neural architecturesPhysical realisation

Technology Topic

Neural net architectureMachine learning +12

## Image

## Examples

- Experimental program(1)

### Example Embodiment

[0027] The specific embodiments and the operation principle of the present invention will be further described below with reference to the accompanying drawings.

[0028] The search space of the neural network structure is too large, the search time cost and calculation consumption is huge, and the three major problems of the hardware and software design caused by FPGA information, and proposes a combined neural network design and optimization method based on hardware and software joint learning. This method uses hardware and software joint learning methods to search and optimize neural networks, including the following steps:

[0029] S1) Neural Network Structure Regular Statistics: Discuss the relationship between node, number of structures, number of channels, input image resolution, parameter quantity, etc., and statistics under different network structures, the number of networks, input images Law of resolution and width.

[0030] S2) FPGA Hardware Features Prediction: Comparative convolution and depth can be compared on the FPGA, explore the best convolution, and based on the performance parameters of the FPGA (including delay clock cycles, ff, lut), Property predictive function.

[0031] S3) FPGA Neural Network Structure Space Design: In step S1) Neural Network Structure Law Statistics and Step S2) Based on the prediction of FPGA hardware characteristics, the statistical law is summarized, and the search space is obtained.

[0032] S4) Based on step S3) The acquired search space, using a hardware and software joint learning method in the search space, combined with random search and blocking search, and obtain an efficient trunk neural network.

[0033] Wherein, the step S1) specifically includes the steps of:

[0034] Step S11: Determine the main parameters of the neural network structure:

[0035] Focus on the relationship between node, number of structural blocks, number of channels, input image resolution, parameter quantity, etc., and statistics under different network structures, the number of networks, input image resolution, and width.

[0036] Step S12: Based on the main parameters determined by S11, the method is specifically regulated, and the method is as follows:

[0037] 1. parameter

[0038] First, the public published model papers, in the existing model (VGG16, Squeezenet, EfficientNet B 0, EfficientNet B4, ResNet-50, ResNext-50, REPVGG, MO Bilenet-V2, Based on the MNAsNet-A3, MixNet-M, Ecanet-50, further selected 101 models with the 13 models in structure, compared the parameter quantity of each model in these 101 models (units) For the relationship between Million, referred to as M) and TOP-1 accuracy, such as figure 1 Indicated.

[0039] Wherein, the name of the 101 model is shown in Table 1.

[0040] Table 1 Name of 101 models

[0041]

[0042]

[0043] It can be seen that in the large scale range, with the increase in the parameters of the model, the TOP-1 accuracy of the model presents first rising and a gentle trend. For a model greater than 10m, with the increase in the number of parameters, the accuracy is no longer significantly improved. The correlation coefficient between the parameter quantity and TOP-1 accuracy is 0.222, and the calculation formula of the correlation factor is:

[0044]

[0045] Where X j″ For the argument, here behave the number, Y j″ For variables, here represents TOP-1 accuracy, and The average value of the parameter quantity and TOP-1 accuracy, R is the total number of models that the final resulting coefficient, N represents the total number of models.

[0046] Only models of the above 101 models of the parameter is less than 40m, resulting in 60 models, compared to the relationship between the parameters of each model in these 60 models and the accuracy of TOP-1, such as figure 2 Indicated. At this scale, the parameter amount is presented to the TOP-1 accuracy, with the formula (1), resulting in the correlation coefficient between the parameter quantity and TOP-1 accuracy. The correlation coefficient is 0.552.

[0047] Only models of the above 60 models are less than 10m, resulting in 34 models, compared to the relationship between the parameter quantity of each model in these 34 models and TOP-1 accuracy, such as image 3 As shown, in the formula (1), the correlation coefficient between the number of parameters and TOP-1 accuracy is 0.680.

[0048] It can be seen from different sizes that the parameter amount is related to the design of the neural network model, but after a certain degree, the accuracy income is reduced by the increase in the amount of parameters. At lower parameters, it is possible to obtain a significant increase in the amount of parameter, the accuracy rate, that is, the maximum accuracy benefit can be brought by the minimum parameter amount growth. At the same time, lower parameters meet the lightweight design requirements required for automatic driving, and thus the present invention will set the parameter in the (0, 10m) in the process of designing the network.

[0049] 2. Number of networks

[0050]Only refers to the sum of the total number of layers of the network layer of a weighted, fully connected layers and comprising a convolution layer. In the present invention, 13 different types of neural network model selected as: Vgg16, SqueezeNet, EfficientNet B0, EfficientNet B4, ResNet-50, ResNeXt-50, ResNeSt-50, CSP ResneXt-50, RepVGG, MobileNet-V2, MnasNet -A3, MixNet-m, ECANet-50, model labeled a, b, c, d, e, f, g, h, i, j, k, l, m. This drawing 13 different types of neural network model the relationship between the total number of layers with the network Top-1 accuracy, such as Figure 4 Indicated.

[0051] Since j, k, d model uses three similar model structure, when the model number of layers is small, the accuracy can be significantly increased number of layers increases, but the number of layers increases further model structure can not bring significant accuracy improved. Similarly, the l, i, g, d four model point of view, with the increase in number of layers to enhance the accuracy is limited. So overall, lightweight model for it requires more overall network layers, but the number of layers increases, reducing the accuracy of earnings upgrade. For the purposes of the present invention, in accordance with the relationship between the drawing and the total number of layers of the network Top-1 accuracy, the total number of layers of the neural network model of the network will be defined between [25,90].

[0052] 3. Enter the image resolution

[0053] The neural network model is generally substantially common image with a resolution of 224 × 224. The present invention is selected from the models 13 a b, c, d, e, f, g, h, i, j, k, l, m, the impact, the resolution of the input image on the final accuracy of the Top-1 , was a tiny soft-related. Thus, the present invention is directed to comply with the resolution of the input image used different sets of data at the time of selection of the input image resolution is no longer modified.

[0054] 4. A building block characteristic

[0055] 4.1 Structure block width

[0056] The present invention is divided into three sub-concepts width, a first number of output channels is the building block, the second block structure is the maximum number of output channels, the third largest transverse structural block nodes.

[0057] Wherein the structural block number represents the number of output channels of each filter block structure. Block structure represented by the maximum number of output channels in a block configuration, there may be extended number of channels, and therefore the maximum number of output channels as a building block of the neural network to characterize width. The maximum lateral block structure showing the internal number of branch nodes a block structure, and the internal structure of the block number of branches times the number of channels of each branch to obtain the block number of output channel configuration. In Regularity, since neural networks typically rely on changes in the width of the intermediate layer is more obvious, the following statistics of the main choice of four blocks each model structure model 13 (wherein Vgg16, SqueezeNet, EfficientNet B0, EfficientNet B4, ResNet-50, ResNeXt-50, ResNeSt-50, CSP ResneXt-50, RepVGG, MobileNet-V2, MnasNet-A3, MixNet-M, ECA-Net50 this model has 5,8,7,7 respectively 13 , 4,4,4,4,5,7,7,6,4 structural blocks, a total of 72 building blocks) are compared.

[0058] a. investigate only NEURAL NETWORKS block number of output channels. The present invention is a relationship between the number of selected output channels 13 of the model structure of four blocks each model accuracy with Top-1, obtained as a result Figure 5 , The output channels of structural blocks 4 and Top-1 the accuracy of the correlation coefficient is 0.2. However, because these neural networks using different design strategies, some of the models described above in 13 model, as the number of output channels of structural blocks, Top-1 increased a certain accuracy.

[0059] Results of the inspection if 13 models the average correlation coefficient between the four output channels and the number of structural blocks accuracy Top-1 obtained are shown in Table 2. Block structure can be seen only from the number of output channels, the correlation is low.

[0060] Table 2 Number of output channel configuration block Top-1 with an average accuracy of the correlation coefficient table

[0061]

[0062] b. Only the maximum number of output channels investigated structure block to the fourth block structure, for example, as a result Image 6 Indicated. Into Equation (1), to obtain a correlation coefficient between the maximum number of blocks and output channel structures Top-1 to 0.28 accuracy. It can be seen that the overall performance and the number of output channels the same block structure, the top portion of the part of the model, as the output increases the maximum number of structural blocks of channels, Top-1 increased a certain accuracy, and considering only the number of the correlation output channels than increased.

[0063] If it is investigated a relationship between the maximum number of output channels 13 before four structural blocks Model Top-1 with accuracy, the results obtained are shown in Table 3. The maximum number of output channels can be seen only from the block structure, the number of correlation channels than the output increased, but overall lower.

[0064] Table 3 Maximum output channels block structure and accuracy of the correlation coefficient table

[0065]

[0066] c. Only the maximum lateral inspection block structure nodes, the correlation coefficient between the number of nodes and the maximum lateral accuracy Top-1 block structure of 4 was 0.17, with the first two structural blocks of output channels and structures the maximum number of output channels compared to the correlation block lower.

[0067] D. Investigation structure blocks and the maximum number of output channels if there is expansion of the number of output channels compared, i.e. whether a residual inverted structure is present. The results show that while the present invention Top 1-accuracy and low correlation if there are expandable, the correlation coefficient was 0.23.

[0068] e. the width of the expansion ratio between the investigated structure blocks, each ratio between the two structural blocks of the 13 output channels of the selected neural network model structure 4 before block averaging, the results Figure 7 Indicated. Width ratio of expansion between the structure blocks and the correlation coefficient between 1 Top-accuracy 0.61.

[0069] In general, the present invention searches a plurality of neural networks and a maximum width ratio of the number of output channels to the number of output channels (i.e., expansion ratio) of the neural network will be designed width, and the maximum number of nodes is not set laterally. And because if there is a correlation between higher correlation expansion and Top-1 block accuracy than three widths between the width of the expanded structure, and between the structural blocks and Top-1 than the accuracy, the width of the present invention the expansion ratio is set between the [1.5,2].

[0070] The depth is 4.2 Structure

[0071] A block structure similar to a small network, investigate influence their structure block internal depth for neural networks. The depth of each structure takes the first four blocks in the block model structure 13, and the statistical relationships between its accuracy Top-1 (Table 4). A certain correlation can be seen between the depth and accuracy of Top-1 structural blocks. DESCRIPTION deeper layer is to improve the accuracy of a certain block in the structure proceeds, the influence that the total number of layers of the network Top-1 is more similar accuracy.

[0072] Table 4 Structure with Top-1 block depth accuracy of the correlation coefficient table

[0073]

[0074] 4.3 Structure of total nodes block

[0075] Since the block structure of a node may be arranged laterally, which results in the configuration block portion has a width of discussion, the node is also arranged to form a plurality of vertical layers, the result has been discussed in depth portion structural blocks. Here indiscriminate width and depth of the total statistical number of nodes, and each neural network for similar block structure of the total number of nodes, only the selection of the fourth building block, the overall results Figure 8 , The total number of nodes in the [4,14] within the range, the total number of nodes increases, the accuracy.

[0076] 4.4 Operation jumper block structure

[0077] Number 13 in the model structure of the first four blocks operate jumper Top-1 with the accuracy of the correlation are shown in Table 5, it can be seen almost no correlation.

[0078] Jumper operation and accuracy of the correlation coefficient table in Table 5 block structure

[0079]

[0080] 4.5 Structure attention block mechanism

[0081] Analysis of 13 selected neural network model found that the introduction attention mechanism to enhance the accuracy of the model to some extent.

[0082] The convolution kernel characteristics

[0083] It features between different convolution kernels to extract the relationship between the capacity and hardware features and no clear conclusion, the face of numerous convolution select the desired mode is selected by the neural network to search specific design. In addition, the size of the convolution kernel selection is also an issue, the size of the convolution kernel with a characteristic network structure has a certain correlation, and therefore will be selected by the neural network search.

[0084] Step S2) the sub-step, comprising:

[0085] Step S21: comparing the performance of convolution and a depth ordinary separable convolution on the FPGA, as follows:

[0086] Carried out by way of comparison of the two convolution simulation software and on-board testing. The results achieved in the simulation software as shown in Table 6. Simulation and synthesis tools used to Xilinx VIVADO HLS, the choice of target board is ZCU102. Emulation resources to save resources, the choice of the input image resolution of 112 × 112, three input channels, output channels 16, the size of the convolution kernel of the operational example 3 × 3, the results shown in Table 6, wherein the operation of the FPGA frequency of 100MHz.

[0087] Table 6 Common convolution operation result and a separable convolution depth FPGA

[0088] Convolution type Running time (second, frequency 100MHz) General convolution 3 × 3 (112, 112, 3, 16) 1.423 Deep can be separated 3 × 3 (112, 112, 3, 16) 0.260

[0089] For the simulation results,, the LUT (lookup table), there is a certain correlation between the FF (flip-flop) and the amount of neural network parameters, respectively, as a result Figure 9 , Figure 10 Indicated. Correlation coefficients were 0.419,0.396. This part of the estimated resources required using look-up table approach to computing.

[0090] For the simulation results, the amount of Neural network parameters between the clock period weak correlation, results are Figure 11 Indicated. Into Equation (1), to obtain the correlation coefficient was 0.371.

[0091] A linear relationship exists between the (floating-point number of operations executed per second) clock period FLOPS, as a result Figure 12 Indicated. Into Equation (1), to obtain the correlation coefficient was 0.999. The direct use of the modeling performed FLOPS Therefore the present invention to predict the clock cycle.

[0092]The main parameters of the FPGA performance performance: Delay clock cycle, FF, and LUT have been experimentally, since the depth can be separated, the amount of depth can be separated in the design of the FPGA. Further, since the run time is highly correlated with FPGA, the present invention defines FLOPS within 500 m (unit MILLION, referred to as m) according to the common calculation amount of the lightweight network.

[0093] Step S22: Based on the basis of step S21, the performance prediction function is proposed, and the method is as follows:

[0094] Main parameters for FPGA performance: Delay clock cycle, FF, LUT, combined with the result of step S21, the following performance prediction functions are proposed:

[0095] Performance (conv) k′ ) = [ΑLAT (conv k′ ), βRES (conv k′ )] (2)

[0096] Among them, conv k′ Represents the neural network Kn 'a convolved operation, LAT (conv k′ ) Representing this convolution operation CONV k′ Overall delay, RES (CONV k′ ) Representing this convolution operation CONV k′ Resource consumption, α, β represents Lat (conv) k′ ) And res (conv k′ ) Important degree coefficient, [x 1 , x 2 ] Means vector.

[0097] The overall delay can estimate the number of delay cycles and data transfer times each operation, indicating that the results of step S21 indicate that the number of delay period is highly correlated with the FLOPS, so the delay period is directly calculated by FLOPS. The transmission of additional data is divided into two parts, one is the load of the neural network model parameters, and the other is the transfer of intermediate results. The amount of parameters of neural network model parameters reflects that the transmission of intermediate data relates to the output size of each structural block of neural network model, which is reflected in FLOPS and does not repeat. Therefore, convolution operation conv k′ The overall latency is as follows:

[0098] Lat (conv k′ ) = [ΜFLOPS k′ ), σparams (conv k′ )] (3)

[0099] Among them, conv k′ Represents the neural network Kn 'Solvance operation, FLOPS (conv k′ ) Representing this convolution operation CONV k′ Floating point calculation per second, params (conv) k′ ) Representing this convolution operation CONV k′ The number of parameters, μ, σ represent the designs preset FLOPS (CONV) k′ ) And params (conv k′ Important degree coefficient ([x 1 , x 2 ] Indicates vector).

[0100] Since resource consumption mainly contains LUT and FF, convolution operation CONV k′ The resource consumption is represented as follows:

[0101] Res (conv k′ ) = [Εlut k′ ), τff (conv k′ )] (4)

[0102] Among them, conv k′ Represents the neural network Kn 'a convolution, LUT (conv k′ ) Representing this convolution operation CONV k′ Display lookup table result, FF (conv k′ ) Representing this convolution operation CONV k′ The trigger resource consumption, ε, τ represent the designed to preset LUT (conv k′ ) And ff (conv k′ Important degree coefficient ([x 1 , x 2 ] Indicates vector).

[0103] A neural network model is expressed as the overall performance of the FPGA:

[0104]

[0105] N 'indicates the total number of convolution operations in the neural network model, and A represents a neural network structure, γ, η is μ, σ, respectively, and α, θ, It is ε, τ, respectively, the coefficients obtained, to represent each part of the importance, when a part requires key optimization, it can obtain the required result by adjusting the corresponding coefficient ([x 1 , x 2 , x 3 , x 4 ] Indicates vector). It is worth noting that due to the difference in the dimension in the formula, the coefficient item needs to be normalized according to each quantity.

[0106] The contents of step S3) include:

[0107] On the basis of the performance prediction function proposed in step S22, the description of the space is performed, and the statistical law is as follows:

[0108] (1) Control the neural network parameters in the range (0, 10m), and can be seen according to the simulation results on the FPGA, while the amount of the control parameter can control the FPGA runtime.

[0109] (2) The number of networks in the neural network is controlled between [25, 90].

[0110] (3) The input image resolution is set to a common resolution, not adjustment.

[0111] (4) In the structural block setting, the width expansion ratio between the structural block is set in [1.5, 2], and the number of the maximum number of output channels in the structural block is set in {1, 3, 6}. The maximum number of horizontal nodes is not set. Introduction of attention in structural blocks.

[0112] (5) The volume nuclear size can be used with 3 × 3, 5 × 5, or 7 × 7.

[0113] (6) Ceramic nuclear use depth separation convolution.

[0114] The sub-steps of the step S4) include the following:

[0115] Step S41: Determine the searches of random search and block overseeing search, the method is as follows:

[0116] Random search mainly refers to a random combination of various parameters to obtain a structure of a neural network model. Random search instead can guarantee the diversity of structures, and on the other hand, as a verification method, it is possible to ensure the robustness of the final resulting neural network model.

[0117] In the different neural network models by random search, intact training for each neural network model is inefficient. The present invention proposes a method of subjuncing a selection to train each structural block to speed up the speed of neural network model training.

[0118] The neural network is essentially a collection of operations, which can be written as:

[0119] x (j′) = Σ i′＜j′ o (i′，j′) (x (i′) ) (6)

[0120] Where X (j′) Indicates that after a series of operations O (i′,j′) Later layer, X (i′) On behalf of the I '' layer operation, O (i′,j′) Represents the total operation of the first 'layer to the J' layer from the neural network model.

[0121] Order σ i′＜j′ o (i′,j′) (x (i′) ) = F (i′,j′) (x (i′) ), The output of a structural block can be regarded as:

[0122] x (out) = F (out-1,out) (... f (in+1，in+2) ((F) (in，in+1) (x (in) ))))))))))))))))))) (7)

[0123] Where f (in，in+1) , F (out-1，out) Are all f (i′，j′) , Only the i ', J' of the two, f (i′，j′) Equivalent to a series of operations of the input, so if you can make the middle f (i′，j′) Increasing, the lightweight effect can be achieved. Therefore, the present invention uses a definition X (in) X (out) The mode, the intermediate layer is obtained by random search, thereby obtaining the entire structural block of the neural network model. And x (in) X (out) The MNAsNet that has been mentioned later by a model that has been well trained ("Model that has been trained" is obtained. The present invention uses a similar definition with the MNAsNet, ie X (in) X (out) The dimension is the same as the MNAsnet. The number of structural blocks of the MNASNET is 7. The first structural block in the neural network model to be obtained is in the same definition as the MNAsNet, and the number of structural blocks that need to be searched is 5. The number of internal modules of each structural block is 2-5, and the internal module is generated by a random search. The average number of total modules is 17, the total search space is about 10 16.

[0124] Random search and block supervision search combine the search for the search: First select the supervision model of the block supervision search, and obtain the input and output size of the search structure block by monitoring the model. Then, the number of layers in the structural block is defined by a random search manner, the ratio of the maximum output channel and the number of output channels, and the volume of the volume. Trace the random search structure block and sort the error. A structural block is selected in the structural block of each stage of the obtained stage, combining the structural blocks, resulting in a candidate intact neural network model.

[0125] Step S42: On the basis of random search and block structural block supervision search, the main process of the hardware and software joint learning method is proposed. Figure 13 As shown, the specific method is:

[0126] According to the statistics of the model and hardware characteristics, the search space is constructed, and the Block (Structural Block) is obtained by random search samples, and the random block is subjected to the search. For each Block XI of the monitoring network, there should be multiple random blocks, sorted according to the LOSS value calculated by the formula (8), where the minimum first three random block enters the BLOCK XI alternative Block collection. By the hardware feature predictor of the FPGA, each alternative Block related parameter of the block xi can be obtained. Block enters the final structural selection in the final structural selection to obtain the final optimal model.

[0127] Among them, the block supervision of the search section is as follows. First, the present invention is aimed at the problem of the imaging data set, and the number of methods of training is used, and 30% of each category in the ImageNet data is selected as the training set, called the imageNet-MID. Random search is a complete structural block for sample combination for each internal module in each structural block, and then use the MNASNET to monitor the Output of ImageNet-MID. Enter the alternative block by selecting the 3 structural blocks that LOSS minimum. The first structure block block xi's i "random block input is Label Output The loss function during the search training is defined as follows:

[0128]

[0129] Where W ii″ Represents the first weight weight of the block xi, a full weight of the random block, A ii″ Represents the i "random block structure parameters of Block Xi, N 'representative output The quantity of neurons, each training will automatically update W with gradient ii″ , A ii″ Represents the I "of the BLOCK XI, the structure parameters of the random block, no need to update the formula;

[0130]

[0131] LOSS for each random block (ie, random block xi) of Block Xi train (Loss function during search training) 3 random block, which is the smallest loss function, as an alternative block (ie, alternate block xi), each time, from Block X1 ~ Block Xn, or selection one from block x1 ~ block xn Alternative Block, the alternative block of the selected block Block X1-Block XN is combined in advance (the order is the initial start of the first, which is equivalent to the train car is one start. The matrix size of the input and output is determined, and the search is searched for each car, which is to obtain a complete neural network model structure and then need to be further screened by FPGA hardware characteristics. The convolution in the alternate block is statistically generated, and then the simulation result of the XilinX HLS is automatically generated. When the FPGA hardware feature is predicted, the corresponding item is compatible with the corresponding item.

[0132] Finally, the potential optimal structure A is screened, as follows:

[0133]

[0134] S.T.LUT (a) 1

[0135] FF (a) 2

[0136] Among them, ρ is used to control the proportion of the loss function value in the overall constraint, which is constant; Indicates any of the BLOCK XI's loss function, C 1 , C 2 For the resource limit of the target FPGA, both are constants, n represents the total number of block block xi, A i Represents any of the BLOCK XI's structural parameters, Performance (A i ) Represents any of the performance performance of the block xi alternative Block on the FPGA; LUT (a) and ff (a) are lookup tables and triggers respectively, representing resource consumption; ‖ * ‖ represents 2-figures, ‖ * ‖ 2 Represents the square of the 2-norm, the square of the 2-norms need to take a minimum, which is a subsequent target detection section leaves space; J represents the loss function of the complete neural network model structure, J is improved The loss function added to the delay and other influencing factors; A represents the optimal neural network model structure obtained when J takes the minimum value. It is worth noting that due to the different dimension in the formula (10), the coefficient item needs to be normalized according to each item.

[0137] The optimal structure screening method is to obtain each coefficient in need to adjust each network (because Block Xi selects different alternative blocks to combine, there are multiple networks, need to compare through J size, find the best The network of the network then selects the network that is the smallest J value.

[0138] The present invention proposes the design and optimization method of neural network based on hardware and software. The method is based on the design characteristics of the neural network and the hardware characteristics of the FPGA, which constructs a search space with a priori information, which is the direction in which the search is established. At the same time, a high-efficiency neural network model with accuracy and speed balance is obtained by combining random search and block monitoring search with FPGA model prediction. The model reaches 77.2% of TOP-1 accuracy and 327.67FPS (number of transmission frames per second) at the ZCU 102.

[0139] As described above, only one of the present invention is intended, and any of the features disclosed in this specification, unless otherwise specifically described alternative characteristics, all of the features, or The steps in all methods or processes, in addition to the features and / or steps of mutual rejection, can be combined in any way; those skilled in the art, according to the technical features of the present invention, any non-essentially added, replace, Both are the scope of the present invention.

## PUM

## Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

## Similar technology patents

## A Face Keypoint Detection Method Based on Sampling Convolution

Owner:CHENDU PINGUO TECH

## Classification and recommendation of technical efficacy words

- Balance of Speed and Accuracy

## A Face Keypoint Detection Method Based on Sampling Convolution

Owner:CHENDU PINGUO TECH