Audio visual speech module based on residual network and bidirectional gating recurrent units

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of recurrent unit and speech model, which is applied in speech analysis, speech recognition, instruments, etc., and can solve the problem of low recognition accuracy

Inactive Publication Date: 2018-09-28

SHENZHEN WEITESHI TECH

View PDF0 Cites 8 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] Aiming at the problem that the recognition accuracy is not high under the condition of strong noise, the object of the present invention is to provide an audio-visual speech model based on the residual network and the bidirectional gated recurrent unit. The bidirectional gated recurrent unit (BGRU) is modeled, and then the BGRU outputs of the two signal streams are concatenated and sent to the classification layer for fusion, and then their temporal dynamics are jointly modeled, and finally output from a Softmax layer, Softmax Each frame is labeled by the layer, and the labeled sequence is based on the highest average probability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0022] It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present invention will be further described in detail below in conjunction with the drawings and specific embodiments.

[0023] figure 1 It is a system frame diagram of an audio-visual speech model based on a residual network and a bidirectional gated recurrent unit of the present invention. It mainly includes visual stream, audio stream, classification layer and audio-visual fusion.

[0024] The visual flow is composed of a spatio-temporal convolution with a 34-layer residual network (ResNet-34) and a 2-layer bidirectional gated recurrent unit (BGRU); here is the version of the 34-layer identity map. The main process is: when When the output of each step becomes a single-dimensional tensor, the residual network will gradually reduce the space-time dimension; finally, the output of the 34-layer residual...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an audio visual speech module based on a residual network and bidirectional gating recurrent units. The audio visual speech module mainly includes a visual flow, an audio flow,a classifying layer and an audio-video fusion, wherein the audio-video fusion includes the following processes that in the visual flow or the audio flow, temporal dynamics are modelized by two layersof the bidirectional gating recurrent units (BRGU), and then BGRU outputs of two signal flows are connected in series and transferred to the classifying layer to fuse; the temporal dynamics are jointly modeled; and at last outputting from an Softmax layer is achieved, each frame is signed by the Softmax layer, and a signed sequence is based on a topmost average probability. The characteristics ofpixels and audio waveforms can be directly extracted at the same time, the audio visual speech module has the text recognition function in a large public context dataset, in the condition of high noise intensity, compared with a traditional audio visual speech module, accuracy of classifying is obviously improved.

Description

technical field [0001] The invention relates to the field of audio-visual speech recognition, in particular to an audio-visual speech model based on a residual network and a bidirectional gating cycle unit. Background technique [0002] With the substantial improvement of the performance of personal computers, human-computer interaction technology has gradually shifted from computer-centered to human-centered interaction methods. Under this background, audio-visual speech recognition technology has also developed rapidly. Audio-visual speech recognition technology is mainly used in telephone and communication systems. People can easily query and extract relevant information from remote database systems through voice commands; audio-visual speech recognition technology is also widely used in user interactive machines, voice notepads, etc. , business self-service processing platform and other equipment, greatly reducing labor costs; in terms of public security criminal investi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G10L15/06

CPCG10L15/063G10L2015/0631

Inventor夏春秋

OwnerSHENZHEN WEITESHI TECH

Audio visual speech module based on residual network and bidirectional gating recurrent units

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology