Speech recognition method, device, system and storage medium
A technology of speech recognition and speech recognition model, applied in speech recognition, speech analysis, instruments, etc., can solve the problem of low speech recognition accuracy, and achieve the effect of improving algorithm accuracy and high speech recognition effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0025] The embodiment of the present invention provides a kind of speech recognition method, and above-mentioned method mainly comprises:
[0026] S1, collecting voice sample data sets;
[0027] Specifically, the pickup can be used to collect sounds in various workplaces or social environments. In actual operation, different decibels and languages of different languages can be collected according to needs.
[0028] S2. Obtain the speech feature image of the speech sample data set;
[0029] Specifically, before acquiring the speech feature image of the speech sample data set, it also includes preprocessing the speech sample data set, and the specific preprocessing includes preprocessing operations such as noise reduction, pre-emphasis, framing, and windowing. The purpose of these operations is to eliminate the influence of sound and aliasing, high-order harmonic distortion, high frequency and other factors caused by the equipment for collecting voice signals on the quality...
Embodiment 2
[0058] Corresponding to the above embodiments, this embodiment provides a speech recognition device, which includes:
[0059] Speech processing unit 1, for extracting the speech feature image of the speech sample data set;
[0060] The calibration unit 2 is used to calibrate the speech feature image, use the classification task loss to judge the category information of the recognition target, and use the target detection method to predict the position of the recognition target.
[0061] The model training unit 3 is configured to use the training network to train the calibrated speech feature image to obtain a speech recognition model.
[0062] The functions performed by each component of the device provided in this embodiment are described in detail in Embodiment 1, so details are not repeated here.
[0063] In the embodiment of the present invention, by extracting the image features of the voice signal, the start position, end position and corresponding category information ...
Embodiment 3
[0065] Corresponding to the above-mentioned embodiments, this embodiment provides a speech recognition system, which includes: at least one processor 5 and at least one memory 4;
[0066] The memory 4 is used to store one or more program instructions;
[0067] The processor 5 is used to run one or more program instructions to execute a speech recognition method.
[0068] In the embodiment of the present invention, by extracting the image features of the voice signal, the start position, end position and corresponding category information corresponding to the start and end time information of the recognition target are obtained. In the time dimension, there is no overlap between the recognition targets, and the repulsion loss is used. The function solves the overlapping problem of the prediction frame and the adjacent real target frame, and the overlapping problem between the prediction frame and the prediction frame, thereby improving the accuracy of the algorithm and making t...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


