The embodiment of the present invention discloses an acoustic model establishment, speech synthesis method, device, equipment and storage medium, wherein the acoustic model establishment method comprises: obtaining phoneme sequence samples of a plurality of training samples from a corpus, and obtaining the phoneme sequence samples The context feature of each phoneme and the duration of each phoneme in the phoneme sequence sample; wherein, the childish phoneme in the phoneme sequence sample is split into two phonemes; Acoustic features are extracted from the training sample; with the phoneme sequence sample, The context feature and duration of each phoneme in the phoneme sequence sample are used as the input of the acoustic model, and the acoustic feature is used as the output of the acoustic model, and the acoustic model is trained to obtain a pre-trained acoustic model, which can make The modelling performance of Erhuayin is better, the synthesis of Erhuayin can be better realized, the Erhuayin that does not appear in the corpus can be synthesized, and the recording cost of the corpus in the corpus can be reduced.