Speech training data set enhancement method and device, equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology for speech training and training data, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of over-fitting of speech models and lack of speech training data, and achieve the effect of increasing robustness and improving the scope of application.

Pending Publication Date: 2021-08-10

PING AN TECH (SHENZHEN) CO LTD

View PDF0 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The main purpose of the present invention is to provide a method, device, device and storage medium for enhancing a speech training data set, aiming at solving the problem that the speech model is prone to over-fitting in the training process due to the lack of speech training data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0051] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0052] It should be noted that all directional indications (such as up, down, left, right, front, back, etc.) in the embodiments of the present invention are only used to explain the relative relationship between the components in a certain posture (as shown in the accompanying drawings). If the positional relationship, movement conditions, etc. change, the directional indication will also change accordingly, and the connection may be a direct connecti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a speech training data set enhancement method and device, equipment and a storage medium. The method comprises steps of extracting a Mel spectrogram corresponding to each piece of speech training data, carrying out pixel point rearrangement processing, and obtaining a temporary Mel spectrogram after the pixel point rearrangement processing, and setting an erasure region area, setting a shape parameter, a change parameter or a random erasure coefficient of an erasure region for each temporary Mel frequency spectrum, obtaining a plurality of extended Mel frequency spectrograms, and converting each extended Mel frequency spectrogram into corresponding target voice training data, thereby completing the supplement of the voice training data. The method is advantaged in that a problem that the voice model is prone to overfitting in the training process due to less voice training data is solved, robustness of the voice model is improved, the voice model is prevented from being caught in overfitting, and the application range of the voice model is greatly widened.

Description

technical field [0001] The invention relates to the field of artificial intelligence, in particular to a method, device, equipment and storage medium for enhancing a voice training data set. Background technique [0002] Speech recognition is a multidisciplinary field, which is closely connected with many disciplines such as acoustics, phonetics, linguistics, digital signal processing theory, information theory, computer science, etc. Its goal is to enable computers to "dictate" what different people say. Continuous voice, also known as "speech dictation machine", is a technology that realizes the conversion from "sound" to "text". Due to the lack of speech training data, the speech model is prone to overfitting problems during the training process. When the model is trained and overfitted, the model can only achieve better results on the training set and perform poorly on other data. , lack of generalization ability. Contents of the invention [0003] The main purpose o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G10L15/06

CPCG10L15/063

Inventor唐彦玺王健宗瞿晓阳

OwnerPING AN TECH (SHENZHEN) CO LTD

Speech training data set enhancement method and device, equipment and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology