Supercharge Your Innovation With Domain-Expert AI Agents!

Speech processing method, speech encoder, speech decoder and speech recognition system

A speech signal and speech feature technology, applied in the field of data processing, can solve the problems of high complexity of the speech recognition model, affecting the quality and efficiency of speech signal recognition, and achieve the effect of ensuring practicability and improving quality and efficiency

Active Publication Date: 2021-09-28
ALIBABA GRP HLDG LTD
View PDF12 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, when the Transformer model uses the text-related self-attention mechanism to model the long-term correlation of speech, due to the large number of text-related parameters, the complexity of building a speech recognition model is high, and it also increases The difficulty of optimizing the speech recognition model, which greatly affects the quality and efficiency of speech signal recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech processing method, speech encoder, speech decoder and speech recognition system
  • Speech processing method, speech encoder, speech decoder and speech recognition system
  • Speech processing method, speech encoder, speech decoder and speech recognition system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0063] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0064] Terms used in the embodiments of the present invention are only for the purpose of describing specific embodiments, and are not intended to limit the present invention. The singular forms "a", "said" and "the" used in the embodiments of the present invention and the appended claims are also intended to include plural forms, unless the conte...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a speech processing method, a speech encoder, a speech decoder and a speech recognition system. The method comprises the following steps: acquiring a to-be-processed speech signal; processing the speech signal through a first neural network and a second neural network, obtaining first feature information and second feature information corresponding to the speech signals, wherein the calculation efficiency of the first neural network is higher than that of the second neural network, and the accuracy of the second feature information output by the second neural network is higher than that of the first feature information output by the first neural network; and according to the first feature information and the second feature information, determining target feature information for representing semantics in the speech signal. According to the technical scheme provided by the embodiment of the invention, the two pieces of feature information are obtained through the two different neural networks, and the two pieces of feature information have complementarity in speech processing efficiency and quality, so that the accuracy and reliability of obtaining the target feature information are improved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a speech processing method, a speech encoder, a speech decoder and a speech recognition system. Background technique [0002] Speech recognition technology can convert the voice waveform spoken by people into text that can be recognized by machines. For speech recognition technology, the speech recognition rate is an important indicator for evaluating speech recognition performance. In 2017, Google proposed a Transformer model that can perform speech recognition. Specifically, the Transformer model can use a text-related self-attention mechanism to perform speech modeling on the long-term correlation of speech to obtain a speech recognition model, and then The speech recognition operation is realized through the established speech recognition model. [0003] However, when the Transformer model uses the text-related self-attention mechanism to model the long-term correlat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/16G10L15/26
CPCG10L15/16G10L15/26G06F40/30G10L15/02G10L15/1815
Inventor 张仕良高志付雷鸣
Owner ALIBABA GRP HLDG LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More