The invention discloses a structured
network model compression acceleration method based on multistage
pruning, and belongs to the technical field of
model compression acceleration. The method comprises the following steps: obtaining a pre-training model, and training to obtain an initial complete
network model; measuring the sensitivity of the
convolution layers, and obtaining a sensitivity-
pruning rate curve of each
convolution layer through controlling variables; carrying out single-layer
pruning from low to high according to a sensitivity sequence, and finely tuning and re-training a
network model; selecting a sample as a
verification set, and measuring the information entropy of the filter output feature map; performing iterative flexible pruning according to the size sequence of theoutput entropy, and finely tuning and re-training the network model; and hard pruning: carrying out retraining on the network model to recover the
network performance, and obtaining and storing a lightweight model. According to the method, the large-scale
convolutional neural network can be compressed on the premise of maintaining the original
network performance, the local memory occupation of the network can be reduced, the
floating point operation and the
video memory occupation during operation are reduced, and the lightweight of the network is realized.