Unlock instant, AI-driven research and patent intelligence for your innovation.

Speech conversion method and device, electronic equipment and readable storage medium

A speech conversion and speech technology, applied in speech analysis, speech synthesis, instruments, etc., can solve problems such as slow computing speed, affecting speech sound quality, high system performance requirements, etc., to improve discontinuity, increase computing speed, and save computing resources Effect

Active Publication Date: 2017-12-22
XIAMEN MEITUZHIJIA TECH
View PDF6 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among them, TTS (Text-to-Speech, also known as text-to-speech) is a technology that converts text information generated by the computer itself or externally input into understandable and fluent spoken language output, but the voice synthesized by TTS is generally There are two problems: first, the timbre is limited to a small number of announcer samples, which cannot meet individual needs; but the rhythm is unnatural, and the traces of synthesis are obvious
The disadvantage of this method is that each K-nearest neighbor feature vector search needs to traverse the entire target feature dictionary, the calculation speed is slow, and the system performance is very high.
At the same time, when calculating the connection cost, a single frame is used as a unit, and the smoothness between speech frames is not considered, resulting in the loss of speech instantaneous information, resulting in discontinuous synthetic speech, which greatly affects the sound quality of speech

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech conversion method and device, electronic equipment and readable storage medium
  • Speech conversion method and device, electronic equipment and readable storage medium
  • Speech conversion method and device, electronic equipment and readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0057] Please refer to figure 2 , figure 2 It is a flow chart of the steps of the speech conversion method provided by the preferred embodiment of the present invention. The method is applied to the electronic device 100 described above, and the steps of the voice conversion method will be described in detail below.

[0058] Step S110 : Segment the voice to be converted of the speaker to be converted into a plurality of frame units to be converted based on a preset segmentation rule.

[0059] In this embodiment, the voice range to be converted can be selected by marking. Optionally, the voice to be converted can be selected from the voices of the speakers to be converted by calling an automatic voice marking tool for marking.

[0060] After the marked speech to be converted is obtained, the speech to be converted is segmented using a preset segmentation rule, so that each segmented frame unit includes a plurality of continuous speech frames.

[0061] Step S120, extracting...

no. 2 example

[0119] Please refer to Figure 9 , Figure 9 A structural block diagram of the speech conversion device 300 provided by the preferred embodiment of the present invention. The speech conversion device 300 includes: a segmentation module 310 , an extraction module 320 , a calculation module 330 , a matching module 340 and a processing module 350 .

[0120] The segmentation module 310 is configured to segment the voice to be converted of the speaker to be converted into multiple frame units to be converted based on preset segmentation rules, wherein each frame unit to be converted includes multiple continuous voice frames.

[0121] The extraction module 320 is used to extract the Mel cepstrum feature of each frame unit to be converted.

[0122] In this embodiment, the extraction module 320 extracts the Mel cepstrum feature of the frame unit to be converted including:

[0123] Perform time-frequency domain change on the frame unit to be converted to obtain spectrum information ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a speech conversion method and device, electronic equipment and a readable storage medium. The speech conversion method comprises the steps of segmenting the speech to be converted into a plurality of frame units to be converted based on a preset segmentation rule; extracting Mel cepstrum features of each frame unit to be converted; obtaining a plurality of candidate frame units by calculation according to a phoneme dictionary and the Mel cepstrum features of each frame unit to be converted; matching according to a corresponding relationship between frame units of a to-be-converted speaker and frame units of a target tone speaker to obtain a target frame unit; calculating the conversion cost to obtain an optimal path; and processing the target frame unit on the optimal path to obtain the target speech. According to the method, a plurality of candidate frame units are obtained by calculation in the phoneme dictionary, and the method can save calculation resources and improve the calculation speed compared with the prior art in which searching is performed on the whole technical feature dictionary. Meanwhile, the traditional single-frame calculation is improved into multi-frame calculation, thereby greatly improving the technical problems of discontinuous synthetic speech and poor tone quality.

Description

technical field [0001] The present invention relates to the technical field of voice information processing, in particular, to a voice conversion method, device, electronic equipment and a readable storage medium. Background technique [0002] After nearly half a century of development, speech synthesis technology has achieved fruitful results and plays an extremely important role in artificial intelligence and other fields. Among them, TTS (Text-to-Speech, also known as text-to-speech) is a technology that converts text information generated by the computer itself or externally input into understandable and fluent spoken language output, but the voice synthesized by TTS is generally There are two problems: first, the timbre is limited to a small number of announcer samples, which cannot meet individual needs; but the rhythm is unnatural, and the traces of synthesis are obvious. [0003] Timbre conversion (also known as voice conversion) is a technology that directly conver...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L21/013G10L13/033G10L25/48G10L25/24
CPCG10L13/033G10L13/08G10L21/013G10L25/24G10L25/48G10L2021/0135
Inventor 方博伟张康卓鹏鹏张伟尤嘉华
Owner XIAMEN MEITUZHIJIA TECH