Voice processing method and device, electronic equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A speech processing and speech technology, applied in the computer field, can solve the problem that the method of speech feature vector is not universal, and achieve the effect of improving accuracy and good versatility

Active Publication Date: 2020-06-23

TENCENT TECH (SHENZHEN) CO LTD

View PDF11 Cites 12 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Therefore, the existing speech classification methods require developers to have professional knowledge in audio to determine which information to extract from the spectrogram as speech feature vectors, and the method of extracting speech feature vectors is not universal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0069] In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application.

[0070] For the convenience of understanding, the nouns involved in the embodiments of the present application are explained below:

[0071] Tacotron: An end-to-end speech synthesis model proposed by Google, which is a major breakthrough in the research of text-to-speech (TTS) based on deep neural networks. Tacotron can simplify the speech construction channel and generate natural speech, which helps to better understand Realize human-computer interaction.

[0072] Prosody Embedding: The prosody embedding vector is a low-dimensional embedding of speech fragments, which is used to expand Tacotron to realize prosody modeling and prosody migration. Prosody Emb...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the technical field of computers. The invention discloses a voice processing method and device, electronic equipment and a storage medium. and relates to artificial intelligence technology, voice classification is performed by using a machine learning technology in artificial intelligence, and the method comprises the steps of converting to-be-processed voice into a rhythmic embedded vector, decomposing the rhythmic embedded vector into a preset number of basic embedded GST, and obtaining a style embedded vector representing voice rhythm features according to the preset number of GST; and obtaining a classification result corresponding to the to-be-processed voice according to the style embedding vector. According to the voice processing method and device, the electronic equipment and the storage medium provided by the embodiment of the invention, the accuracy of voice classification can be improved, and the universality is better.

Description

technical field [0001] The present application relates to the field of computer technology, and in particular to a voice processing method, device, electronic equipment and storage medium. Background technique [0002] The existing neural network-based speech classification methods usually perform short-time Fourier transform on the speech data, and then convert it into the corresponding spectrogram, and then extract a set of frequency domain information from the spectrogram based on the engineering characteristics of the audio. The collection is used as the speech feature vector input to the neural network, and then the classification result is obtained. Therefore, the existing speech classification methods require developers to have professional knowledge in audio to determine which information to extract from the spectrogram as speech feature vectors, and the method for extracting speech feature vectors is not universal. Contents of the invention [0003] Embodiments o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G10L13/02G10L13/04G10L13/047G10L13/10G10L25/03

CPCG10L13/02G10L13/04G10L13/047G10L25/03G10L13/10

Inventor林炳怀王丽园邓锦

OwnerTENCENT TECH (SHENZHEN) CO LTD

Voice processing method and device, electronic equipment and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology