Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Audio synthesis method and device based on duration information and terminal equipment

A synthesis method and time-length technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of multi-words, prone to missing words, low efficiency and accuracy of speech synthesis, and achieve the effect of reducing alignment problems

Pending Publication Date: 2022-05-13
UBTECH ROBOTICS CORP LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The embodiment of the present application provides an audio synthesis method, device, terminal device, and readable storage medium based on duration information, which can solve the problem that related methods are prone to missing characters, multiple characters, etc., and the efficiency and accuracy of speech synthesis are not high. question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Audio synthesis method and device based on duration information and terminal equipment
  • Audio synthesis method and device based on duration information and terminal equipment
  • Audio synthesis method and device based on duration information and terminal equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056]In the following description, specific details such as specific system structures and technologies are presented for the purpose of illustration rather than limitation, so as to thoroughly understand the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

[0057] It should be understood that when used in this specification and the appended claims, the term "comprising" indicates the presence of described features, integers, steps, operations, elements and / or components, but does not exclude one or more other Presence or addition of features, wholes, steps, operations, elements, components and / or collections thereof.

[0058] It should ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention is suitable for the technical field of audio synthesis, and provides an audio synthesis method and device based on duration information and terminal equipment, and the method comprises the steps: obtaining text data, inputting the text data into a pre-trained duration model for processing, and obtaining a mapping relation between the text data and acoustic feature information, and inputting the text data and the mapping relationship into a pre-trained acoustic model for processing to obtain audio data. And performing feature extraction on the text data through the duration model, determining the duration information of the text data, and determining the mapping relationship between the text data and the corresponding acoustic feature information based on the duration information, thereby obtaining the audio data corresponding to the text data based on the acoustic model. Strong supervision signals are provided for alignment processing of the text data and the voice data, the alignment problem in voice synthesis is obviously reduced, and the audio data which are completely close to the rhythm of human voice and better in rhythm are output.

Description

technical field [0001] The present application belongs to the technical field of audio synthesis, and in particular relates to an audio synthesis method, device and terminal equipment based on duration information. Background technique [0002] In recent years, with the development of human society and the advancement of science and technology, more and more life scenes require the support of speech synthesis technology, such as AI voice navigation, intelligent interaction of home robots, and extensive electronic customer service. How to synthesize a real voice signal that is closer to people has become an ongoing goal in this field. The main problems that cause the difference between mechanical speech and human natural speech include sound noise, pronunciation accuracy, and rhythm of sound. Among them, the rhythm of the sound can be quantified as the duration of each pronunciation unit. Whether the rhythm is natural or not plays a decisive role in the sound quality, which ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/033G10L25/18
CPCG10L13/033G10L25/18
Inventor 丁万黄东延梁景俊
Owner UBTECH ROBOTICS CORP LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products