Voice recognition method and device

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A speech recognition and to-be-recognized technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of low speech recognition performance and robustness, and achieve the effect of improving performance and robustness

Pending Publication Date: 2020-07-28

ALIBABA GRP HLDG LTD

View PDF13 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0008] This application provides a speech recognition method to solve the problem of low speech recognition performance and robustness in a strong noise environment in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

no. 1 example

[0066] Please refer to figure 1 , which is a flow chart of an embodiment of a speech recognition method provided by the present application, and the executing body of the method includes a speech recognition device. A speech recognition method provided by the present application includes:

[0067] Step S101: Obtain voice data to be recognized and image data corresponding to the voice data.

[0068] The speech data to be recognized and the way of obtaining it will be firstly described below.

[0069] The voice data is a sequence of sampled values of the voice signal sorted by time. The size of these sampled values represents the energy of the voice signal at the sampling point. The energy value of the silent part is small, and the energy value of the active voice part is relatively large. The speech signal is a one-dimensional continuous function with time as the independent variable. In the voice signal, the amplitude of the sound wave in the silent part is very small, ...

no. 2 example

[0125] Please see Figure 5 , which is a schematic diagram of an embodiment of the speech recognition device of the present application. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, refer to the part of the description of the method embodiment. The device embodiments described below are illustrative only.

[0126] The present application additionally provides a speech recognition device, including:

[0127] A data acquisition unit 501, configured to acquire voice data to be recognized and image data corresponding to the voice data;

[0128] The feature extraction unit 502 is configured to extract the acoustic features of the speech data through the acoustic feature extraction subnetwork included in the acoustic model; and extract the image data from the image data through the visual feature extraction subnetwork included in the acoustic model Visual features corresponding to the voic...

no. 3 example

[0142] Please refer to Figure 7 , which is a schematic diagram of an electronic device embodiment of the present application. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to part of the description of the method embodiment. The device embodiments described below are illustrative only.

[0143] An electronic device in this embodiment, the electronic device includes: a processor 701 and a memory 702; the memory is used to store a program for implementing the speech recognition method, and the device is powered on and runs the speech recognition method through the processor After the program, perform the following steps: acquire the voice data to be recognized and the image data corresponding to the voice data; extract the acoustic features of the voice data through the acoustic feature extraction sub-network included in the acoustic model; and, through the The visual featur...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a voice recognition method and device. The voice recognition method comprises the following steps: acquiring to-be-recognized voice data and image data corresponding to the voice data; extracting the acoustic features of the voice data through an acoustic feature extraction sub-network; extracting visual features corresponding to the voice data from the image data through avisual feature extraction sub-network; obtaining the acoustic score of the voice data at least according to the acoustic features and the visual features through an acoustic score prediction sub-network; and determining a text sequence corresponding to the voice data according to the acoustic score. By adopting the processing mode, the weights of the audio and the video in voice recognition are distinguished, and acoustic modeling is performed by fusing two modal features, so that the performance and robustness of acoustic modeling can be effectively improved so as to improve the voice recognition performance and robustness.

Description

technical field [0001] The present application relates to the technical field of speech recognition, in particular to a speech recognition system, method and device, an acoustic model construction method and device, and electronic equipment. Background technique [0002] With the advent of the era of artificial intelligence, a significant change is that more and more smart Internet of Things (IoT) devices appear in daily life, such as smart TVs, subway voice ticket machines, ordering machines and so on. The emergence of smart IoT devices greatly facilitates people's daily life, but also raises a question: how to interact with these devices more conveniently. Voice interaction is the most convenient way of interaction between people, so how to interact with IoT devices is the first choice. [0003] For an intelligent voice interaction system, voice commands can be used to control smart devices through modules such as voice recognition, semantic understanding, and voice synth...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/02G10L15/06G10L15/16

CPCG10L15/02G10L15/063G10L15/16

Inventor张仕良雷鸣

OwnerALIBABA GRP HLDG LTD

Voice recognition method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

no. 1 example

no. 2 example

no. 3 example

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology