Voice processing method and device, electronic equipment and storage medium

A speech processing and speech technology, applied in the computer field, can solve the problem that the method of speech feature vector is not universal, and achieve the effect of improving accuracy and good versatility

Active Publication Date: 2020-06-23
TENCENT TECH (SHENZHEN) CO LTD
View PDF11 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, the existing speech classification methods require developers to have professional knowledge in audio to determine which information to extract from the spectrogram as speech feature vectors, and the method of extracting speech feature vectors is not universal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice processing method and device, electronic equipment and storage medium
  • Voice processing method and device, electronic equipment and storage medium
  • Voice processing method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0069] In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application.

[0070] For the convenience of understanding, the nouns involved in the embodiments of the present application are explained below:

[0071] Tacotron: An end-to-end speech synthesis model proposed by Google, which is a major breakthrough in the research of text-to-speech (TTS) based on deep neural networks. Tacotron can simplify the speech construction channel and generate natural speech, which helps to better understand Realize human-computer interaction.

[0072] Prosody Embedding: The prosody embedding vector is a low-dimensional embedding of speech fragments, which is used to expand Tacotron to realize prosody modeling and prosody migration. Prosody Emb...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of computers. The invention discloses a voice processing method and device, electronic equipment and a storage medium. and relates to artificial intelligence technology, voice classification is performed by using a machine learning technology in artificial intelligence, and the method comprises the steps of converting to-be-processed voice into a rhythmic embedded vector, decomposing the rhythmic embedded vector into a preset number of basic embedded GST, and obtaining a style embedded vector representing voice rhythm features according to the preset number of GST; and obtaining a classification result corresponding to the to-be-processed voice according to the style embedding vector. According to the voice processing method and device, the electronic equipment and the storage medium provided by the embodiment of the invention, the accuracy of voice classification can be improved, and the universality is better.

Description

technical field [0001] The present application relates to the field of computer technology, and in particular to a voice processing method, device, electronic equipment and storage medium. Background technique [0002] The existing neural network-based speech classification methods usually perform short-time Fourier transform on the speech data, and then convert it into the corresponding spectrogram, and then extract a set of frequency domain information from the spectrogram based on the engineering characteristics of the audio. The collection is used as the speech feature vector input to the neural network, and then the classification result is obtained. Therefore, the existing speech classification methods require developers to have professional knowledge in audio to determine which information to extract from the spectrogram as speech feature vectors, and the method for extracting speech feature vectors is not universal. Contents of the invention [0003] Embodiments o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/02G10L13/04G10L13/047G10L13/10G10L25/03
CPCG10L13/02G10L13/04G10L13/047G10L25/03G10L13/10
Inventor 林炳怀王丽园邓锦
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products