Speech recognition method, device, system and storage medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of speech recognition and speech recognition model, applied in speech recognition, speech analysis, instruments, etc., can solve the problem of low speech recognition accuracy, and achieve the effect of improving algorithm accuracy and high speech recognition effect

Active Publication Date: 2022-03-01

北京探境科技有限公司

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The purpose of the embodiments of the present invention is to provide a voice recognition method, device, system and storage medium to solve the problem of low accuracy of existing voice recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0025] The embodiment of the present invention provides a kind of speech recognition method, and above-mentioned method mainly comprises:

[0026] S1, collecting voice sample data sets;

[0027] Specifically, the pickup can be used to collect sounds in various workplaces or social environments. In actual operation, different decibels and languages of different languages can be collected according to needs.

[0028] S2. Obtain the speech feature image of the speech sample data set;

[0029] Specifically, before acquiring the speech feature image of the speech sample data set, it also includes preprocessing the speech sample data set, and the specific preprocessing includes preprocessing operations such as noise reduction, pre-emphasis, framing, and windowing. The purpose of these operations is to eliminate the influence of sound and aliasing, high-order harmonic distortion, high frequency and other factors caused by the equipment for collecting voice signals on the quality...

Embodiment 2

[0058] Corresponding to the above embodiments, this embodiment provides a speech recognition device, which includes:

[0059] Speech processing unit 1, for extracting the speech feature image of the speech sample data set;

[0060] The calibration unit 2 is used to calibrate the speech feature image, use the classification task loss to judge the category information of the recognition target, and use the target detection method to predict the position of the recognition target.

[0061] The model training unit 3 is configured to use the training network to train the calibrated speech feature image to obtain a speech recognition model.

[0062] The functions performed by each component of the device provided in this embodiment are described in detail in Embodiment 1, so details are not repeated here.

[0063] In the embodiment of the present invention, by extracting the image features of the voice signal, the start position, end position and corresponding category information ...

Embodiment 3

[0065] Corresponding to the above-mentioned embodiments, this embodiment provides a speech recognition system, which includes: at least one processor 5 and at least one memory 4;

[0066] The memory 4 is used to store one or more program instructions;

[0067] The processor 5 is used to run one or more program instructions to execute a speech recognition method.

[0068] In the embodiment of the present invention, by extracting the image features of the voice signal, the start position, end position and corresponding category information corresponding to the start and end time information of the recognition target are obtained. In the time dimension, there is no overlap between the recognition targets, and the repulsion loss is used. The function solves the overlapping problem of the prediction frame and the adjacent real target frame, and the overlapping problem between the prediction frame and the prediction frame, thereby improving the accuracy of the algorithm and making t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the present invention discloses a speech recognition method, device, system, and storage medium, which relate to the field of speech processing. The method includes: collecting a speech sample data set; acquiring a speech feature image of the speech sample data set; The speech feature image is calibrated; the calibrated speech feature image is trained by using a training network to obtain a speech recognition model; the speech information to be recognized is recognized by using the speech recognition model. The embodiments of the present invention can improve the accuracy of speech recognition technology.

Description

technical field [0001] The embodiments of the present invention relate to the field of speech processing, and in particular to a speech recognition method, device, system and storage medium. Background technique [0002] For a long time, speech has attracted much attention as a unique ability of human beings. It is the most important tool and channel for human beings to communicate and obtain external information resources. With the continuous development of the mobile Internet, more and more attention has been paid to the free interaction between people and computers and between people and mobile devices. Speech, as an important communication tool for human beings, is firstly considered to be integrated into the mobile Internet field. It mainly includes three technologies: speech recognition, speech coding and speech synthesis. Among them, speech recognition refers to the translation of speech into text. It is an important component of the human-computer interaction branch...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L15/02G10L15/06

CPCG10L15/02G10L15/063

Inventor 崔潇潇郎芬玲

Owner 北京探境科技有限公司

Speech recognition method, device, system and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology