Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese and English mixed speech synthesis method and device, electronic equipment and storage medium

A speech synthesis, Chinese and English technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of model influence, inconsistency of phonemes sent into the model, flawed synthesis effect, etc.

Pending Publication Date: 2021-09-10
携程科技(上海)有限公司
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the characters sent to the model in Chinese and English are the same, the phonemes sent to the model are not uniform when the two voices pronounce the same sound, which leads to the model being affected by the speaker, and the synthesis effect is flawed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese and English mixed speech synthesis method and device, electronic equipment and storage medium
  • Chinese and English mixed speech synthesis method and device, electronic equipment and storage medium
  • Chinese and English mixed speech synthesis method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

preparation example Construction

[0032] figure 1 Show the main steps of the speech synthesis method of Chinese and English mixing in one embodiment, refer to figure 1 As shown, the speech synthesis method of Chinese and English mixing includes: step S110, normalize the initial text comprising Chinese text and English text, convert the Chinese text into pinyin with tones, and convert the English text into words; step S120, convert the The regularized text is aligned with the corresponding initial audio to obtain an aligned text with a pause rhythm; step S130, perform phoneme conversion on the aligned text, and convert the pinyin and words in the aligned text into corresponding Carnegie Mellon University ( CMU) phoneme; step S140, convert each CMU phoneme into a phoneme vector and input it into the acoustic model to obtain the mel spectrum feature corresponding to the initial text; step S150, input the mel spectrum feature into the vocoder to synthesize the target audio.

[0033]The above speech synthesis meth...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of language processing, and provides a Chinese and English mixed speech synthesis method and device, electronic equipment and a storage medium. The speech synthesis method comprises the following steps: regularizing an initial text containing a Chinese text and an English text, converting the Chinese text into pinyin with tones, and converting the English text into words; aligning the regularized text with the corresponding initial audio to obtain an aligned text with a pause rhythm; carrying out phoneme conversion on the aligned text, and respectively converting pinyin and words in the aligned text into corresponding CMU phonemes; converting each CMU phoneme into a phoneme vector, inputting the phoneme vector into an acoustic model, and obtaining a Mel spectrum feature corresponding to the initial text; and inputting the Mel spectrum features into a vocoder to synthesize a target audio. By converting Chinese and English into unified CMU phonemes, Chinese and English pronunciations are mapped to the same pronunciation space, and the synthesis effect of the Chinese and English mixed speech is effectively improved.

Description

technical field [0001] The invention relates to the technical field of language processing, in particular to a Chinese-English mixed speech synthesis method, device, electronic equipment and storage medium. Background technique [0002] Large-scale online travel service companies have a large number of users who need services. Using speech synthesis technology, combined with speech recognition, dialogue management, natural language understanding, and natural language generation to build an outbound robot, it can save labor costs and serve users efficiently. Among them, the broadcast effect of speech synthesis plays a vital role in enabling users to obtain a better service experience. [0003] With the continuous expansion of the tourism business, a large number of overseas services and overseas users need to be connected, and a large amount of mixed Chinese and English information needs to be broadcast. Based on this, a Chinese-English mixed speech synthesis model (hereinaf...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/02G10L13/08G10L13/04G10L25/24G10L25/30
CPCG10L13/02G10L13/08G10L13/086G10L13/04G10L25/24G10L25/30G10L2013/083
Inventor 陈子浩罗超周明康邹宇李巍严丽
Owner 携程科技(上海)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products