Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Speech training data set enhancement method and device, equipment and storage medium

A technology for speech training and training data, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of over-fitting of speech models and lack of speech training data, and achieve the effect of increasing robustness and improving the scope of application.

Pending Publication Date: 2021-08-10
PING AN TECH (SHENZHEN) CO LTD
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The main purpose of the present invention is to provide a method, device, device and storage medium for enhancing a speech training data set, aiming at solving the problem that the speech model is prone to over-fitting in the training process due to the lack of speech training data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech training data set enhancement method and device, equipment and storage medium
  • Speech training data set enhancement method and device, equipment and storage medium
  • Speech training data set enhancement method and device, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0052] It should be noted that all directional indications (such as up, down, left, right, front, back, etc.) in the embodiments of the present invention are only used to explain the relative relationship between the components in a certain posture (as shown in the accompanying drawings). If the positional relationship, movement conditions, etc. change, the directional indication will also change accordingly, and the connection may be a direct connecti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a speech training data set enhancement method and device, equipment and a storage medium. The method comprises steps of extracting a Mel spectrogram corresponding to each piece of speech training data, carrying out pixel point rearrangement processing, and obtaining a temporary Mel spectrogram after the pixel point rearrangement processing, and setting an erasure region area, setting a shape parameter, a change parameter or a random erasure coefficient of an erasure region for each temporary Mel frequency spectrum, obtaining a plurality of extended Mel frequency spectrograms, and converting each extended Mel frequency spectrogram into corresponding target voice training data, thereby completing the supplement of the voice training data. The method is advantaged in that a problem that the voice model is prone to overfitting in the training process due to less voice training data is solved, robustness of the voice model is improved, the voice model is prevented from being caught in overfitting, and the application range of the voice model is greatly widened.

Description

technical field [0001] The invention relates to the field of artificial intelligence, in particular to a method, device, equipment and storage medium for enhancing a voice training data set. Background technique [0002] Speech recognition is a multidisciplinary field, which is closely connected with many disciplines such as acoustics, phonetics, linguistics, digital signal processing theory, information theory, computer science, etc. Its goal is to enable computers to "dictate" what different people say. Continuous voice, also known as "speech dictation machine", is a technology that realizes the conversion from "sound" to "text". Due to the lack of speech training data, the speech model is prone to overfitting problems during the training process. When the model is trained and overfitted, the model can only achieve better results on the training set and perform poorly on other data. , lack of generalization ability. Contents of the invention [0003] The main purpose o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/06
CPCG10L15/063
Inventor 唐彦玺王健宗瞿晓阳
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products