The invention provides a single-channel speech enhancement method based on joint dictionary learning and sparse representation. Carrying out dual-tree complex wavelet transform on the clean voice to obtain a group of sub-band signals, carrying out short-time Fourier transform on the sub-band signals to obtain a time-frequency spectrum of the sub-band signals, learning a joint dictionary of the clean voice by utilizing the amplitude, the real part, the imaginary part and the voice sparsity of the sub-band signals, and learning a joint dictionary of the clean noise as well; carrying out dual-tree complex wavelet transform and short-time Fourier transform on the noisy speech; obtaining a time-frequency spectrum of each sub-band signal; phase and real part imaginary part symbols are reserved;amplitude, real part and imaginary part absolute values are extracted and projected on the clean voice and clean noise joint dictionary; according to the method, the sparse representation coefficientsof the voice and the noise are obtained, the final estimation of the sub-band voice time-frequency spectrum is obtained by using the coefficients, the time-frequency spectrum phase, the real part imaginary part symbol, the mask, the weight and the like, and the enhanced voice signal is obtained by performing short-time inverse Fourier transform and dual-tree complex wavelet inverse transform, sothat the voice enhancement capability is improved.