The invention relates to the technical field of voice synthesis, voice recognition and voice cloning, and provides a voice cloning implementation scheme based on Bottleneck features (language featuresof audio) by combining a voice synthesis technology, a voice recognition technology and a transfer learning technology. A training system and a training method are included. The TTS service with highnaturalness and similarity is provided by using a small number of samples, so that the TTS service with target user characteristics is provided, and problems of large service sample size, long manufacturing period and high labor cost of a voice synthesis technology are solved. The training system comprises a data acquisition module, an acoustic feature extraction module, a voice recognition module, a rhythm module, a multi-person voice acoustic module and a voice synthesis module. The invention further provides a training method based on the system. The training method comprises the steps oftraining corpus preparation, acoustic feature extraction, training and fine adjustment of all modules and speech synthesis.