The invention discloses a voice converting method based on deep learning, and belongs to the technical field of voice signal processing. According to the invention, the method includes the following steps: configuring a voice encoding and decoding device AHOcoder as a feature extraction terminal and a voice synthesis terminal, training voice features by using the deep leaning method to separatelyobtain deep features of a source speaker and a target speaker, also obtaining the capability of decoding the deep features to original features, mapping the source speaker and the target speaker by using a BP neural network, thus realizing voice conversion. According to the invention, the method stitches the original features of voice, the combined feature parameters obtained from stitching are deemed to include the dynamic features of the voice features of the speaker, the training of the deep neural network is accelerated by pre-training the deep autoencoders, and by converting the deep features, the method herein obtains quality converted voice even when less voice materials are trained. The method also supports offline learning, and saves computing resources and memory of terminal devices.