The invention relates to a speech enhancement method based on a deep neural network and a residual long-short term memory (DNN-NCLSTM) network. According to the method, voice amplitude characteristics obtained through spectral subtraction and voice Mel-frequency cepstrum coefficient (MFCC) characteristics obtained through fast Fourier transform are input into a DNN-CLSTM network model, and the purpose of voice enhancement is achieved. The method comprises the following steps: firstly, carrying out time-frequency masking and windowing framing processing on noisy speech, obtaining the amplitude and phase characteristics of the noisy speech by utilizing fast Fourier transform, and estimating the noise amplitude of the noisy speech; secondly, subtracting the estimated noise signal amplitude from the noise-containing voice amplitude to obtain a voice signal amplitude after spectral subtraction, and taking the voice signal amplitude as a first feature of neural network input; then, performing fast Fourier transform (FFT) on the noise-containing voice, and solving spectral line energy of the voice signal to obtain an MFCC feature of the noise-containing voice as a second feature of the voice signal; inputting the two features into the DNN-CLSTM network for training to obtain a network model, and evaluating the effectiveness of the model by adopting a minimum mean square error (MMSE) loss function evaluation index; and finally, inputting the actual noise-containing voice set into a trained voice enhancement network model, predicting an estimated amplitude and MFCC after enhancement, and obtaining a final enhanced voice signal by adopting inverse Fourier transform. The method has high fidelity of voice.