The invention discloses a convolutional neural network weight gradient optimization method based on a data stream, and provides a configurable data stream architecture design for convolutional neural network weight gradient optimization, so that convolution operations of different sizes in weight gradient calculation can be supported, and the degree of parallelism is K * K (convolution kernel size) times that of serial input. The training performance of the whole convolutional neural network is improved, and the problem that convolution operations of different sizes are difficult to realize in weight gradient calculation is solved. Compared with the prior art, the method has the advantages that (1) the acceleration effect is obvious: for weight gradient calculation, the degree of parallelism is improved by K * K compared with that of an original serial scheme, and the transmission time of data input is remarkably reduced, so that the purpose of accelerating the whole network training is achieved, and 1-1/(K * K)% of input storage can be reduced compared with a general matrix multiplication scheme; and 2) the applicability and the universality are simultaneously met.