A Customizable Chinese-English Mixed Speech Recognition End-to-End System
A hybrid speech and end-system technology, applied in speech recognition, speech analysis, natural language data processing, etc., can solve problems such as performance degradation, statistical language model complexity, inability to meet end-to-end model effective training, and reduce dependencies , Improve the effect of recognition accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0034] The first aspect of the present invention discloses a customizable Chinese-English mixed speech recognition end-to-end system, figure 1 It is a structural diagram of a customizable Chinese-English mixed speech recognition end-to-end system according to an embodiment of the present invention, specifically as figure 1 As shown, the system 100 includes:
[0035] Acoustic encoder 101, English vocabulary encoder 102 and decoder 103
[0036]The acoustic encoder 101: extract the acoustic features of the speech waveform to obtain an acoustic feature sequence, then perform convolution and re-encoding operations on the acoustic feature sequence to obtain a down-sampled and re-encoded feature sequence, and then convert the down-sampled And the re-encoded feature sequence is input to the multi-head self-attention module of the acoustic encoder based on the multi-head self-attention mechanism to obtain a sequence of high-dimensional representations of acoustic features;
[0037] I...
Embodiment 2
[0062] Such as figure 1 As shown, the system 100 includes:
[0063] Acoustic encoder 101, English vocabulary encoder 102 and decoder 103
[0064] The acoustic encoder 101: extract the acoustic features of the speech waveform to obtain an acoustic feature sequence, then perform convolution and re-encoding operations on the acoustic feature sequence to obtain a down-sampled and re-encoded feature sequence, and then convert the down-sampled And the re-encoded feature sequence is input to the multi-head self-attention module of the acoustic encoder based on the multi-head self-attention mechanism to obtain a sequence of high-dimensional representations of acoustic features;
[0065] In some embodiments, the specific method for extracting the acoustic features of the speech waveform includes: every 25 milliseconds is a frame, there is an overlap of 10 milliseconds between frames, and after the frame is divided, the 80-dimensional fbank feature is extracted as the acoustic feature;...
Embodiment 3
[0086] The invention discloses an electronic device, including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, it realizes a customizable Chinese-English mixed speech recognition in any one of the first aspects of the invention disclosure Steps in an end-to-end approach.
[0087] image 3 It is a structural diagram of an electronic device according to an embodiment of the present invention, such as image 3 As shown, the electronic device includes a processor, a memory, a communication interface, a display screen and an input device connected through a system bus. Wherein, the processor of the electronic device is used to provide calculation and control capabilities. The memory of the electronic device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer programs. The internal memory provides an environment for the operatio...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


