The invention provides a Chinese
speech synthesis method based on phonemes and
rhythm structures. The method is divided into a training stage and a synthesis stage, and comprises the steps of extracting therhythm structure features from the
rhythm labeling information in a to-be-processed text according to the linguistic knowledge, and training a
rhythm model based on the rhythm structure features; preprocessing the to-be-processed text and an audio to obtain a
pinyin sequence containing the rhythm information and the corresponding acoustic features, then training an
acoustic model, and deploying the trained rhythm model and
acoustic model to a background; obtaining a text containing the rhythm information corresponding to the input text according to the rhythm model; converting into a
pinyin sequence with rhythm information, and inputting into the
acoustic model to obtain a linear
frequency spectrum; and converting the linear
frequency spectrum into the audio. According to the Chinesespeech synthesis method, the synthesized voice can be more natural, and especially for the longer clauses, the pause position can be determined. In addition, the model is deployed to the background at the synthesis stage, and the model loading time is saved, so that the voice synthesis speed can be increased.