Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

37results about How to "Improve speech synthesis" patented technology

Voice synthesis method and device

The invention discloses a voice synthesis method and a device. The voice synthesis method comprises steps of performing text characteristic extraction on a text to be synthesized to obtain the context characteristic information, obtaining a pre-generated model, wherein the pre-generated model is generated by training according to the context characteristic information of the training sample and converted acoustic parameter, and the converted acoustic parameters comprise a plurality of rhythm level fundamental frequency parameters, determining the model output parameter corresponding to the context characteristic information according to the model, wherein the model output parameters comprise a plurality of the rhythm level fundamental frequency parameters, performing the fundamental frequency reconstruction on the plurality of rhythm level fundamental frequency parameter, and synthesizing voice according to the parameter after the fundamental frequency reconstruction and the other parameters in the model output parameters. The method can improve the performance result of the synthesized speech.
Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Text processing method and device, model training method and device, storage medium and computer equipment

The invention relates to a text processing method and device, a model training method and device, a storage medium and computer equipment. The text processing method comprises the steps of obtaining atarget text; performing vectorization on each word forming the target text to obtain a word vector sequence; inputting the word vector sequence into a rhythm prediction model, processing the word vectors according to a sequence of the word vectors in the word vector sequence in a hidden layer of the rhythm prediction model to obtain a semantic vector sequence; and outputting a predicted rhythm tag sequence obtained by mapping of the semantic vector sequence through an output layer of the rhythm prediction model. The scheme provided by the invention improves the text processing efficiency.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Speech synthesis method and speech synthesis system

ActiveCN106297765ASubjective sense of hearingImprove speech synthesisSpeech synthesisVoice dataAudiometry
The invention discloses a speech synthesis method and a speech synthesis system. The speech synthesis method comprises steps that a to-be-synthesized text is preprocessed to acquire the to-be-synthesized unit sequence of the to-be-synthesized text and context related information of to-be-synthesized units; the optimal alternative speech units of the to-be-synthesized units are acquired from a speech database according to the context related information of the to-be-synthesized units, and are spliced together to acquire the alternative speech data of the to-be-synthesized unit sequence; alternative speech data audiometry result of audiometry personnel is acquired; correction models of different acoustic characteristics are trained according to the audiometry result; the optimal alternative speech units of the to-be-synthesized units are reacquired from the speech database according to the correction models and the context related information of the to-be-synthesized units, and are spliced together to acquired optimized speech data; and finally, the optimized speech data used as the synthesized speech data of the to-be-synthesized text is output. Artificial subjective hearing is integrated with the synthesis result of the to-be-synthesized text, and therefore speech synthesis effect is improved.
Owner:科大讯飞长江信息科技有限公司

Speech synthesis method and system

ActiveCN106297766AAdd prosodic featuresSubjective sense of hearingSpeech synthesisHearing perceptionSpeech synthesis
The invention discloses a speech synthesis method and system. The method includes steps: receiving a to-be-synthesized text; pre-processing the to-be-synthesized text to obtain a to-be-synthesized unit sequence of the to-be-synthesized text and context related information of to-be-synthesized units\; obtaining optimal alternative speech data of the to-be-synthesized unit sequence from a speech database according to the context related information of the to-be-synthesized units; obtaining a monitoring result of the alternative speech data from monitoring personnel; expanding the speech database according to the monitoring result; obtaining the optimal alternative speech data of the to-be-synthesized unit sequence, which is regarded as optimized speech data, again by employing the expanded speech database; and outputting the synthesis speech data of the to-be-synthesized text which is the optimized speech data. By employing the method and system, artificial subjective hearing feelings can be accurately combined to a synthesis result of the to-be-synthesized text, and the speech synthesis effect is improved.
Owner:IFLYTEK CO LTD

Front end design-based speech synthesis method

The invention provides a front end design-based speech synthesis method and belongs to the technical field of speech synthesis. With the front end design-based speech synthesis method of the inventionadopted, the problem of data dependence and uncontrollable synthetic effect of a current speech synthesis method can be solved. According to the technical schemes of the invention, the method includes the following steps that: step 1, Chinese text data are preprocessed; step 2, linguistic features related to the Chinese text are extracted; step 3, at least two acoustic features of an audio file are extracted; step 4, a duration model and an acoustic model are trained according to the linguistic features and the acoustic features; step 5, the duration model obtained in the step 4 is called toobtain duration information corresponding to the Chinese text which requires synthesis and has been processed in the step 1 and step 2, with the linguistic features and the duration information adopted as the input of the acoustic model, so that corresponding acoustic features can be obtained; and step 6, the acoustic features obtained in the step 5 are synthesized into corresponding audio data with a vocoder.
Owner:SICHUAN CHANGHONG ELECTRIC CO LTD

Corpus expansion and speech synthesis system construction method and device based on artificial intelligence

The invention discloses a corpus expansion and speech synthesis system construction method and device based on artificial intelligence. The method comprises the following steps of according to a corpus in a small sample sound library, training and acquiring a WaveNet model; using the WaveNet model to generate a speech waveform corresponding to a given text; supplementing the corpus corresponding to the generated speech waveform to the small sample sound library and acquiring a large sample sound library; and using the corpus in the large sample sound library to construct a statistical parameter speech synthesis system. In the scheme, a speech synthesis effect can be increased, and manpower, material resources, time cost and the like are saved.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Speech synthesis method, device and system, and storage medium

The invention provides a speech synthesis method, device and system, and a storage medium. The method comprises the following steps: determining current scene information; acquiring all candidate speakers in consistent with the current scene information; ranking the candidate speakers according to a preset rule to obtain a candidate speaker list; determining a target speaker according to the candidate speaker list; converting text information into a target speech according to the voice of the target speaker. Therefore, the speaker in consistent with the scene is automatically selected according to the received text and scene attribute, so that the synthesized speech can be transformed for the most suitable speaker according to different scenes, the finally synthesized speech is real, the speech synthesis effect is improved, and the user experience is excellent.
Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Voice synthesis model training method, and voice synthesis method and device

The embodiment of the invention provides a voice synthesis model training method, and a voice synthesis method and device. The voice synthesis model training method comprises the steps: a first character vector sequence of a Chinese character sentence corresponding to encoding training is obtained; the first character vector sequence is encoded through an encoding module, and a first linguistic encoding characteristic is obtained; the first linguistic encoding characteristic is subjected to linguistic characteristic decoding through a linguistic characteristic decoding module, and a linguistic decoding characteristic is obtained; and according to the linguistic characteristic loss between the linguistic decoding characteristic and a reference linguistic decoding characteristic, model parameters of the encoding module of a voice synthesis model are adjusted till the linguistic characteristic loss meets a linguistic characteristic loss threshold value, and the trained encoding module of the voice synthesis model is obtained. According to the voice synthesis model training method, and the voice synthesis method and device provided by the embodiment of the invention, the complexity of voice synthesis can be lowered, meanwhile, the training accuracy of an encoder is improved, and then the effect of synthesized voice is guaranteed.
Owner:BEIJING XINTANG SICHUANG EDUCATIONAL TECH CO LTD

Speech synthesis method and device, electronic equipment and storage medium

The invention discloses a speech synthesis method and device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence such as deep learning and speechtechnology. The method comprises steps: in a process of performing voice synthesis on a to-be-synthesized text, obtaining timbre characteristics corresponding to a user identifier in combination withthe user identifier in a voice synthesis request, and obtaining at least one group of candidate rhythm characteristics of the to-be-synthesized text in combination with the user identifier; selectingone group from the at least one group of candidate rhythm features as the rhythm feature of the to-be-synthesized text; and performing voice synthesis according to the timbre features, the to-be-synthesized text and the rhythm features to obtain a synthesized audio corresponding to the to-be-synthesized text. Therefore, the synthesized audio of the to-be-synthesized text is synthesized by combining the timbre characteristics corresponding to the user identifier, the to-be-synthesized text and the rhythm characteristics, so that the obtained synthesized audio has the user voice characteristicscorresponding to the user identifier, the synthesized audio is more real and natural, and the voice synthesis effect is improved.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Speech synthesis method and device, equipment and storage medium

The invention relates to the technical field of speech synthesis of artificial intelligence, and provides a speech synthesis method and device, equipment and a storage medium. The method comprises the steps: recognizing a phoneme sequence contained in a text, and extracting context information from the phoneme sequence; carrying out length matching on the phoneme sequence and a preset Mel spectrum according to the context information, and judging whether the phoneme sequence needs to be expanded or not according to a matching result; if yes, preprocessing the text, determining alignment information corresponding to the text, and expanding the phoneme sequence based on the alignment information until the length of the phoneme sequence is consistent with the length of the preset Mel spectrum to obtain a target phoneme sequence; and synthesizing speech corresponding to the text according to the target phoneme sequence. According to the invention, the length of the phoneme sequence is expanded according to the context information of the phoneme sequence in the recognized text, so that the synthesized speech has a sense of reality of suppression and jerking, and the speech synthesis effect is improved.
Owner:PING AN TECH (SHENZHEN) CO LTD

Word segmentation method and device

The invention provides a word segmentation method and device. The method may include: sending a text to be synthesized to a search engine, and preprocessing the text to be synthesized; acquiring search results of the search engine after searching according to the text to be synthesized, and acquiring a dictionary or model corresponding to the search results; subjecting the preprocessed text to a word segmentation based on the dictionary or model corresponding to the search results. The method uses the text to be synthesized to search and acquire the more matching word segmentation dictionary or model, thus the word segmentation effect is improved, and the quality of synthesized voice is improved.
Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Prosodic mimic method and apparatus

A method and apparatus for synthesizing audible phrases (words) that includes capturing a spoken utterance, which may be a word, and extracting prosodic information (parameters) there from, then applying the prosodic parameters to a synthesized (nominal) word to produce a prosodic mimic word corresponding to the spoken utterance and the nominal word.
Owner:NUANCE COMM INC

Speech synthesis method and device based on rhythm feature prediction, terminal and medium

The invention discloses a speech synthesis method based on rhythm feature prediction. The speech synthesis method comprises the steps of obtaining a to-be-synthesized text; inputting the to-be-synthesized text into a preset rhythm prediction model, obtaining rhythm features of the to-be-synthesized text as first rhythm features, and determining target rhythm features according to the first rhythmfeatures, the rhythm features of the to-be-synthesized text including rhythm word features, rhythm phrase features and rhythm intonation phrase features; and performing voice synthesis according to the target rhythm feature to generate a target voice corresponding to the to-be-synthesized text. In addition, the invention also discloses a speech synthesis device based on rhythm feature prediction,an intelligent terminal and a computer readable storage medium. By adopting the method and the device, the accuracy of text rhythm feature prediction can be improved, and the speech synthesis effect is improved.
Owner:UBTECH ROBOTICS CORP LTD

Chinese and English mixed speech synthesis method and device, electronic equipment and storage medium

PendingCN113380221AImprove speech synthesisImprove user interaction experienceSpeech synthesisTransformation of textAcoustic model
The invention relates to the technical field of language processing, and provides a Chinese and English mixed speech synthesis method and device, electronic equipment and a storage medium. The speech synthesis method comprises the following steps: regularizing an initial text containing a Chinese text and an English text, converting the Chinese text into pinyin with tones, and converting the English text into words; aligning the regularized text with the corresponding initial audio to obtain an aligned text with a pause rhythm; carrying out phoneme conversion on the aligned text, and respectively converting pinyin and words in the aligned text into corresponding CMU phonemes; converting each CMU phoneme into a phoneme vector, inputting the phoneme vector into an acoustic model, and obtaining a Mel spectrum feature corresponding to the initial text; and inputting the Mel spectrum features into a vocoder to synthesize a target audio. By converting Chinese and English into unified CMU phonemes, Chinese and English pronunciations are mapped to the same pronunciation space, and the synthesis effect of the Chinese and English mixed speech is effectively improved.
Owner:携程科技(上海)有限公司

Speech synthesis method and device, equipment and medium

The invention provides a speech synthesis method and device, equipment and a medium. The method comprises the steps of obtaining semantic features, phoneme features and acoustic features of a target text; performing a first alignment operation on the semantic feature and the acoustic feature to obtain a first alignment result; performing a second alignment operation on the phoneme features and the acoustic features to obtain a second alignment result; performing feature fusion according to the first alignment result and the second alignment result to obtain a fused feature; and generating synthetic speech corresponding to the target text based on the fusion feature. According to the invention, the speech synthesis effect can be well improved.
Owner:BEIJING CENTURY TAL EDUCATION TECH CO LTD

Speech synthesis model training method and device, electronic equipment and storage medium

The embodiment of the invention relates to the technical field of natural language processing, and discloses a speech synthesis model training method and device, electronic equipment and a storage medium, the model comprises a generator and a discriminator, and the training method comprises the following steps: obtaining a plurality of first audio data marked with text information to generate a training sample set; inputting the text information into the generator, and obtaining second audio data output by the generator; the first audio data and the second audio data are input into the discriminator, a discrimination result output by the discriminator is obtained, and the discrimination result is used for representing the similarity degree between the first audio data and the second audio data; according to the judgment result and the preset loss function, iterative training is performed on the speech synthesis model, an intermediate conversion link is not included, accumulative errors are eliminated, the speech synthesis effect of the model is improved, the model can be trained to be converged only through a small number of training samples, and the training cost is reduced.
Owner:CLOUDMINDS BEIJING TECH CO LTD

Speech synthesis method, speech synthesis device and intelligent equipment

PendingCN112530404AImprove speech synthesisGuaranteed Speech SynthesisSpeech synthesisWord listChinese word
The invention discloses a speech synthesis method and device, intelligent equipment and a computer readable storage medium. The method comprises the steps: performing word segmentation processing on an input text based on a preset word segmentation algorithm to obtain a Chinese word list and an English word list; determining pinyin corresponding to each Chinese word in the Chinese word list; searching phonemes respectively corresponding to each English word in the English word list based on a preset word prefix dictionary; if a target English word exists, inputting the target English word intothe grapheme-to-phoneme model to obtain a phoneme corresponding to the target English word output by the grapheme-to-phoneme model; and performing speech synthesis of the input text according to thepinyin of each Chinese word and the phoneme of each English word. Through the scheme of the invention, the speech synthesis effect of the intelligent equipment facing the Chinese and English mixed text can be improved.
Owner:UBTECH ROBOTICS CORP LTD

Speech synthesis method and device, computer equipment, storage medium and product

The embodiment of the invention discloses a speech synthesis method and device, computer equipment, a storage medium and a product, and the method comprises the steps: obtaining a text of a to-be-synthesized speech, and determining the type of the to-be-synthesized speech; performing fusion processing on the reference audio feature information corresponding to the voice type and a text unit in the text to obtain text voice feature information; determining a target duration prediction network according to the voice type; predicting audio duration information corresponding to the text unit according to the target duration prediction network and the text voice feature information; performing duration matching processing on the text voice feature information according to the audio duration information to obtain matched text voice feature information; and performing speech synthesis processing according to the matched text speech feature information to obtain target speech. According to the scheme, accurate text speech feature information can be extracted, and the corresponding duration prediction network is adopted according to the speech type, so that the synthesized target speech keeps the tone, rhythm and other information of the speech type, and the speech synthesis effect is improved.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Speech synthesis method, speech synthesis device and intelligent equipment

PendingCN112530402AGuaranteed Speech SynthesisImprove speech synthesisSpeech synthesisWord listChinese word
The invention discloses a speech synthesis method and device, intelligent equipment and a computer readable storage medium. The method comprises the steps: performing word segmentation processing on an input text based on a preset word segmentation algorithm to obtain a Chinese word list and an English word list; determining pinyin corresponding to each Chinese word in the Chinese word list, and searching phonemes respectively corresponding to each English word in the English word list based on a preset word prefix dictionary; if the target English word exists, determining a target phoneme acquisition mode according to the occurrence frequency of the target English word in the input text; obtaining phonemes corresponding to the target English words based on the target phoneme acquisition mode; and performing speech synthesis of the input text according to the pinyin of each Chinese word and the phoneme of each English word. Through the scheme of the invention, the speech synthesis effect of the intelligent equipment facing the Chinese and English mixed text can be improved.
Owner:UBTECH ROBOTICS CORP LTD

Voice synthesis method and device and electronic equipment

ActiveCN110047463AImproving the effectiveness of traditional learningImprove speech synthesisSpeech synthesisSynthesis methodsStatistical learning
The embodiment of the invention provides a voice synthesis method and device and electronic equipment. According to the technical scheme, on a voice selection splicing synthesis path, a depth learningtechnology is moderately introduced, but a traditional statistical learning technology is not thoroughly abandoned, and thus the advantages of both parties are taken; the key innovation is that a depth learning model is adopted for generating simulation data to conduct back feeding on training of a traditional statistical learning model, the effect of traditional learning is improved from two aspects of algorithms and data, and thus the effect of voice synthesis is improved.
Owner:BEIJING SINOVOICE TECH CO LTD

Voice information acquisition method and device, equipment and storage medium

PendingCN112927676AImprove speech synthesisIn line with pronunciation habitsSpeech synthesisSpeech synthesisSpeech sound
The embodiment of the invention discloses a voice information obtaining method and device, equipment and a storage medium, and the method comprises the steps: obtaining a text corpus of a first language, and judging whether the text corpus comprises foreign words of a second language or not; if the text corpus comprises foreign words of the second language, obtaining phoneme information of the foreign words in the second language; and obtaining the phoneme information of the foreign words in the first language according to the phoneme association relationship between the first language and the second language and the phoneme information of the foreign words in the second language. According to the technical scheme disclosed by the embodiment of the invention, the finally obtained phoneme information of the foreign word is close to the pronunciation of the vocabulary in the source language system and accords with the pronunciation habit in the current language system, so that the speech synthesis effect of converting the text corpus into the speech information is improved.
Owner:BEIJING YOUZHUJU NETWORK TECH CO LTD

Speech synthesis method and related device and equipment

The invention discloses a speech synthesis method, a related device and equipment, and the method comprises the steps: obtaining the phoneme, pitch and phoneme duration of a target object; obtaining a to-be-synthesized object, determining a Mel spectrum of the to-be-synthesized object, and extracting a timbre characteristic matrix of the to-be-synthesized object based on the Mel spectrum; the phoneme, the pitch and the phoneme duration of the target object are coded through a speech synthesis model, and coded data are obtained through the speech synthesis model; and decoding the timbre characteristic matrix and the coded data to obtain synthetic speech of the object to be synthesized. According to the scheme, speech synthesis is performed on the to-be-synthesized object based on the phoneme, the pitch and the phoneme duration of the target object, and the tone of the to-be-synthesized object can be restored in the synthesized speech based on the phoneme, the pitch and the phoneme duration of the target object, so that the speech synthesis precision and effect are improved.
Owner:GUANGZHOU HUYA TECH CO LTD

Speech synthesis method and device, computer readable storage medium and terminal equipment

The invention belongs to the technical field of voice processing, and particularly relates to a voice synthesis method and device, a computer readable storage medium and terminal equipment. The method comprises the steps of obtaining a target text to be subjected to speech synthesis, and performing text analysis on the target text to obtain a joint feature of the target text; performing feature mapping on the joint feature by using a preset acoustic model to obtain an acoustic feature corresponding to the target text; performing voice synthesis on the acoustic features by using a preset vocoder to obtain target voice corresponding to the target text; wherein the acoustic model and the vocoder are obtained through joint training of a preset training data set in advance. According to the speech synthesis method and device, the acoustic model and the vocoder are obtained through joint training instead of being obtained through separate training, and therefore the final speech synthesis effect can be improved on the whole.
Owner:UBTECH ROBOTICS CORP LTD

Speech data amplification method and system for speech synthesis

The embodiment of the invention provides a voice data amplification method for voice synthesis. The method comprises the following steps: inputting collected real voices of multiple speakers into a style extraction system to obtain style characterization of each speaker; taking the text data and the style representation of each speaker as the input of a speech synthesis system for end-to-end modeling, and outputting synthesized speech data with the style representation; and determining the synthetic voice data which is judged to be the real scene in the synthetic voice data as amplified voice data by using an audio identification system. The embodiment of the invention also provides a voice data amplification system for voice synthesis. According to the embodiment of the invention, a large amount of multi-speaker, multi-language and multi-scene real data instead of shed-recorded high-tone-quality data is used for speech synthesis modeling, so that the modeling capability of a speech synthesis system is enriched; a style extraction module is introduced to enhance the real scene data modeling capability of the speech synthesis system; and the speech synthesis effect of the speech synthesis system is integrally improved.
Owner:AISPEECH CO LTD

Speech synthesis model training method, speech synthesis method and device

The embodiment of the present invention provides a speech synthesis model training method, speech synthesis method and device, the speech synthesis model training method includes: obtaining the first word vector sequence corresponding to the Chinese character sentence of encoding training; using the encoding module to encode the first word vector The sequence is encoded to obtain the first linguistic coding feature; the linguistic feature decoding module is used to decode the linguistic feature of the first linguistic coding feature to obtain the linguistic decoding feature; according to the relationship between the linguistic decoding feature and the reference linguistic decoding feature The linguistic feature loss of the speech synthesis model is adjusted by adjusting the model parameters of the coding module of the speech synthesis model until the linguistic feature loss meets the linguistic feature loss threshold, and the trained coding module of the speech synthesis model is obtained. The speech synthesis model training method, speech synthesis method and related devices provided by the embodiments of the present invention can reduce the complexity of speech synthesis, improve the training accuracy of the encoder, and ensure the effect of synthesized speech.
Owner:BEIJING XINTANG SICHUANG EDUCATIONAL TECH CO LTD

Speech synthesis method and device based on low-resource language, equipment and storage medium

The invention relates to an artificial intelligence technology, and discloses a speech synthesis method based on a low-resource language, which comprises the following steps: determining a low-resource language and a high-resource language, and obtaining a text corresponding to the low-resource language to obtain a low-resource language text; converting the low-resource language text into a low-resource language phoneme text; translating the low-resource language phoneme text into a high-resource language phoneme text by using a translation model trained based on dual learning; and performing speech synthesis on the high-resource language phoneme text by using a pre-trained speech synthesis model to obtain language speech. In addition, the invention also relates to a block chain technology, and the low-resource language text can be stored in a node of a block chain. The invention further provides a speech synthesis device based on the low-resource language, electronic equipment and a computer readable storage medium. The invention can provide a speech synthesis method for improving the speech synthesis effect and aiming at low-resource languages.
Owner:PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products