Unlock instant, AI-driven research and patent intelligence for your innovation.

Speech recognition processing method and device and electronic equipment

A technology of speech recognition and processing method, applied in the computer field, can solve the problems of decreased speech recognition accuracy, loss of speech information, inability to perform effective speech recognition, etc., to achieve the effect of improving the accuracy rate

Pending Publication Date: 2022-06-21
ALIBABA GRP HLDG LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although this method improves the signal-to-noise ratio, the misjudgment of the speech enhancement system leads to the loss of speech information, which may cause a decrease in the accuracy of speech recognition, and may even result in the inability to perform effective speech due to serious loss of speech information. identified situation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech recognition processing method and device and electronic equipment
  • Speech recognition processing method and device and electronic equipment
  • Speech recognition processing method and device and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] like image 3 As shown, it is a schematic flowchart of a speech recognition processing method according to an embodiment of the present invention. The method can be applied to terminal devices such as mobile phones, computers, and vehicle-mounted voice devices, and can also be applied to cloud servers. By obtaining the original data reported by the terminal device The sound signal undergoes subsequent identification processing. Specifically, the method may include:

[0039] S101: Based on the filtering model, extract voice features from the original sound signals of multiple channels to generate voice feature data in multiple directions. The original sound signal of multiple channels mentioned here refers to the original sound signal collected by multiple microphone units of the microphone array. Each microphone unit can have a different orientation and collect the original sound signal independently, thereby forming a multi-channel original sound signal. The origina...

Embodiment 2

[0050] like Figure 4 As shown, it is the structural intention of the speech recognition processing device according to the embodiment of the present invention. The device can be applied to terminal devices such as mobile phones, computers, vehicle-mounted voice devices, etc., and can also be applied to cloud servers. By obtaining the original data reported by the terminal device The sound signal undergoes subsequent identification processing. Specifically, the device may include:

[0051] The speech feature extraction module 11 is configured to perform speech feature extraction on the original sound signals of multiple channels based on the filtering model, and generate speech feature data in multiple directions. The original sound signal of multiple channels refers to the original sound signal collected by multiple microphone units of the microphone array. Each microphone unit can have a different orientation and independently collect the original sound signal, thereby form...

Embodiment 3

[0057] The foregoing embodiments describe the process flow processing and device structure of the speech recognition processing method, and the functions of the above-mentioned method and device can be realized and completed by means of an electronic device, such as: Figure 5 As shown, it is a schematic structural diagram of an electronic device according to an embodiment of the present invention, which specifically includes: a memory 110 and a processor 120 .

[0058] The memory 110 is used to store programs.

[0059] In addition to the above-described programs, the memory 110 may also be configured to store various other data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, etc.

[0060] Memory 110 may be implemented by any type of volatile or non-volatile storage device or combination thereof, such as static ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a voice recognition processing method and device and electronic equipment, and the method comprises the steps: carrying out the voice feature extraction of original sound signals of a plurality of channels, and generating voice feature data in a plurality of directions; pooling processing is carried out on the voice feature data with the purpose of voice enhancement, and voice feature data fused in multiple directions are generated; speech recognition is carried out on the speech feature data, a recognition text is generated, and the filtering model, the pooling network model and the speech recognition model integrally adopt an end-to-end model architecture. According to the embodiment of the invention, speech enhancement processing is carried out by adopting the filtering model based on the deep learning mechanism and the pooling network model, an end-to-end architecture is adopted on the overall architecture of the model, and the optimization targets of speech enhancement and speech recognition are unified, so that the accuracy of speech recognition can be effectively improved, and the user experience is improved. And the method can adapt to various complex speech recognition environments through effective data training.

Description

technical field [0001] The present application relates to a speech recognition processing method, device and electronic device, belonging to the technical field of computers. Background technique [0002] In work and life, we often encounter scenarios that require far-field speech recognition. The so-called far-field speech recognition scene means that the sound source has a certain distance from the microphone, and is accompanied by a certain amount of ambient noise. Common scenarios include conference rooms, vehicle scenarios, and smart homes. Far-field speech recognition generally uses a microphone array to collect speech to form multiple channels of original sound signals. [0003] At present, most far-field speech recognition systems are composed of two subsystems, namely, a speech enhancement system based on signal processing and a speech recognition system for speech recognition. These two subsystems are usually optimized independently and have different optimizati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/02G10L15/06G10L15/26G10L21/02G10L21/0216G10L25/30
CPCG10L15/02G10L15/063G10L15/26G10L21/0216G10L25/30G10L2021/02166
Inventor 赵冬迪李锦珂朱磊卢璐聂再清
Owner ALIBABA GRP HLDG LTD