Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

51 results about "Speech Acoustics" patented technology

The acoustic aspects of speech in terms of frequency, intensity, and time.

Small data speech acoustic modeling method in speech recognition

The invention belongs to the technical field of signal processing in the electronic industry, and aims at solving a problem that the discrimination performance of an acoustic model of a target language with just a little mark data is low. In order to solve the above problem, the invention provides a small data speech acoustic modeling method in speech recognition, and the method comprises the steps: carrying out the adversarial training of the acoustic features of many languages through a language adversarial discriminator, so as to build a multi-language adversarial bottleneck network model;taking the acoustic features of a target language as the input of the multi-language adversarial bottleneck network model, so as to extract the bottleneck features which is irrelevant to the language;carrying out the fusion of the bottleneck features which is irrelevant to the language with the acoustic features of the target language, so as to obtain fusion features; carrying out the training through the fusion features, so as to build an acoustic model of the target language. The method effectively irons out the defects, caused by a condition that the bottleneck information comprises the information correlated with the language, of the unremarkable improvement of the recognition performance of the target language and even the negative migration phenomenon in the prior art, thereby improving the voice recognition precision of the target language.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Speech recognition method and device, electronic equipment and storage medium

The invention relates to the technical field of speech recognition, in particular to a speech recognition method and device, electronic equipment and a storage medium, which can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic and auxiliary driving and are used for efficiently and accurately realizing speech recognition of multiple dialect target languages. The method comprises the following steps: acquiring to-be-recognized voice data of a target language; extracting voice acoustic features corresponding to each frame of voice data in the to-be-recognized voice data; performing deep feature extraction on the voice acoustic features to obtain corresponding dialect embedding features; encoding the voice acoustic features to obtain corresponding acoustic encoding features; and based on the dialect embedding feature and the acoustic coding feature, performing dialect speech recognition on the to-be-recognized speech data to obtain target text information and a target dialect category corresponding to the to-be-recognized speech data. According to the method, the dialect embedding feature and the acoustic coding feature are combined for comprehensive learning, so that speech recognition for recognizing various dialects can be efficiently and accurately realized.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Multi-speaker voice synthesis method based on variational auto-encoder

The invention discloses a multi-speaker voice synthesis method based on a variational auto-encoder. The method comprises the following steps: extracting a phoneme-level duration parameter and a frame-level acoustic parameter of a to-be-synthesized speaker clean voice, inputting the normalized phoneme-level duration parameter into a first variational auto-encoder, and outputting a duration speakerlabel; inputting the normalized frame-level acoustic parameter into a second variational auto-encoder, and outputting an acoustic speaker label; extracting frame-level linguistic features and phoneme-level linguistic features from speech signals to be synthesized, wherein the speech signals include a plurality of speakers; inputting the duration speaker label and the normalized phoneme-level linguistic features into a duration prediction network, and outputting a current phoneme prediction duration; obtaining the frame-level linguistic characteristics of the phoneme through the current phonemeprediction duration, inputting the frame-level linguistic characteristics and the acoustic speaker label into an acoustic parameter prediction network, and outputting the normalized acoustic parameters of the prediction voice; and inputting the normalized predicted speech acoustic parameters into a vocoder, and outputting a synthesized speech signal.
Owner:INST OF ACOUSTICS CHINESE ACAD OF SCI +1

Controllable emotion speech synthesis method and system based on emotion category labels

The invention discloses a controllable emotional speech synthesis system and method based on an emotional category label. The method comprises steps of text feature extraction, extracting speech text features from an input phoneme sequence; a voice style feature extraction step for receiving acoustic features of a target voice corresponding to the phoneme sequence and extracting voice style features from the acoustic features; a voice style characteristic memorizing step used for obtaining emotional style characteristics of the target voice according to the voice style characteristics; and an acoustic feature prediction step used for predicting synthetic emotional speech acoustic features according to the speech text features and the emotional style features. According to the method, the decoupling degree of the voice style characteristics and the voice text characteristics can be improved, so the style regulation and control result of the synthesized voice is not limited by the text content, controllability and flexibility of the synthesized voice are improved, and emotional labels and emotional data distribution information of the voice in the corpus can be effectively utilized; therefore, the voice style characteristics of each emotion can be extracted more efficiently.
Owner:SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV

Method, device, computer equipment and storage medium for establishing voiceprint model

ActiveCN108806696BReduce the error rate of voice recognitionSpeech recognitionNeural learning methodsActivation functionSpeech Acoustics
The present application discloses a method, device, computer equipment and storage medium for establishing a voiceprint model, wherein the method includes: dividing the input target user's voice signal into frames, and extracting the voice acoustic features of the framed voice signals respectively; A plurality of said speech acoustic features are input into a deep learning model based on neural network training, and are assembled into at least one cluster structure; calculating the mean value and standard deviation of at least one said cluster structure; performing coordinate transformation on said mean value and standard deviation and calculating the activation function to obtain feature vector parameters; inputting the feature vector parameters and the identity verification result of the target user into a preset basic model to obtain a voiceprint model corresponding to the target user. The speech acoustic features extracted in this application are based on the cluster structure obtained in the deep neural network training, and then the cluster structure is subjected to coordinate mapping and activation function calculation to obtain a voiceprint model, which can reduce the voice recognition error rate of the voiceprint model.
Owner:PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products