Unlock instant, AI-driven research and patent intelligence for your innovation.

Speech alignment method, device, electronic device and storage medium

A voice and voice frame technology, applied in the information field, can solve problems such as high difficulty, labor cost, and heavy labeling workload

Active Publication Date: 2021-04-30
BEIJING CENTURY TAL EDUCATION TECH CO LTD
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the large amount of data in the voice file, the workload of manual alignment and labeling is heavy and difficult, which will consume a lot of labor costs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech alignment method, device, electronic device and storage medium
  • Speech alignment method, device, electronic device and storage medium
  • Speech alignment method, device, electronic device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] In the following, only some exemplary embodiments are briefly described. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present application. Accordingly, the drawings and descriptions are to be regarded as illustrative in nature and not restrictive.

[0061] figure 1 It is a flowchart of a speech alignment method according to an embodiment of the present application. Such as figure 1 As shown, the speech alignment method may include:

[0062] Step S110, using the speech alignment algorithm to obtain the predicted phoneme boundary point of the speech signal to be processed;

[0063] Step S120, expand the predicted phoneme boundary point to obtain the first frame expansion result, the first frame expansion result includes a plurality of continuous speech frames including the speech frame where the predicted phoneme boundary point is located;

[0064]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present application proposes a voice alignment method, device, electronic equipment and storage medium. The specific implementation plan is: use the voice alignment algorithm to obtain the predicted phoneme boundary point of the speech signal to be processed; expand the predicted phoneme boundary point to obtain the first frame expansion result, the first frame expansion result includes the voice where the predicted phoneme boundary point is located frame; calculate the difference between the short-time average amplitude value corresponding to each voice frame in the first frame expansion result and the preset short-time average amplitude threshold; the minimum difference corresponding to Speech frames are determined as precise phoneme breakpoints of the speech signal to be processed. The embodiment of the present application can reduce the error between the phoneme boundary point obtained by machine alignment and the actual phoneme boundary point, and reduce the time cost of manual labeling.

Description

technical field [0001] The present application relates to the field of information technology, and in particular to a voice alignment method, device, electronic equipment and storage medium. Background technique [0002] With the development of deep learning technology, speech synthesis technology has gradually matured and been widely used. In the speech synthesis technology, the phoneme sequence of the sample speech signal and the corresponding phoneme boundary point need to be marked before the deep learning model training. The phoneme demarcation point is also the start and end time point of each phoneme. The accuracy of the phoneme boundary points marked in the sample speech signal directly affects the accuracy of the training model, which in turn affects the quality of speech synthesis. [0003] In the traditional speech synthesis technology, the phoneme demarcation point is generally obtained through machine annotation, that is, the speech forced alignment technology...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L13/033G10L13/10
CPCG10L13/033G10L13/10
Inventor 郭立钊杨嵩王莎刘子韬
Owner BEIJING CENTURY TAL EDUCATION TECH CO LTD