The invention provides a speech recognition method based on a convolution neural network, which is more good at extracting high-level features, has simple modeling process, is easy to train, has better generalization performance of the model, and can be more widely applied to various speech recognition scenes. The method comprises the following steps: S1, preprocessing the input original speech signal; S2, extracting the key feature parameters reflecting the characteristics of the speech signal to form a feature vector sequence;S 3, base on that DCNN network model, taking the connected time classifier CTC as a loss function, constructing an acoustic model of an end-to-end mode; S4, training the acoustic model to obtain the trained acoustic model; S5, inputting the feature vector sequence to be recognized obtained in the step S2 into the trained acoustic model to obtain a recognition result; and S6, performing a subsequent operation on the basis of the recognition result obtained in step S5, that is, obtaining a word string capable of outputting the speech signal with a maximum probability, that is, a language character after the original speech is recognized.