The invention discloses a 
speech synthesis method and device, 
electronic equipment and a storage medium, and relates to the technical field of 
artificial intelligence such as 
deep learning and speechtechnology. The method comprises steps: in a process of performing voice synthesis on a to-be-synthesized text, obtaining 
timbre characteristics corresponding to a 
user identifier in combination withthe 
user identifier in a voice synthesis request, and obtaining at least one group of candidate 
rhythm characteristics of the to-be-synthesized text in combination with the 
user identifier; selectingone group from the at least one group of candidate 
rhythm features as the 
rhythm feature of the to-be-synthesized text; and performing voice synthesis according to the 
timbre features, the to-be-synthesized text and the rhythm features to obtain a synthesized audio corresponding to the to-be-synthesized text. Therefore, the synthesized audio of the to-be-synthesized text is synthesized by combining the 
timbre characteristics corresponding to the user identifier, the to-be-synthesized text and the rhythm characteristics, so that the obtained synthesized audio has the user voice characteristicscorresponding to the user identifier, the synthesized audio is more real and natural, and the voice synthesis effect is improved.