Unlock instant, AI-driven research and patent intelligence for your innovation.

Audio-vision collaborative lip language recognition method and system

A recognition method and lip language technology, applied in the field of visual speech recognition and lip language recognition, can solve the problems of rarely considered, difficult to cover different situations, and increase the difficulty of lip language recognition, so as to improve feature extraction ability and good classification. performance effect

Pending Publication Date: 2021-11-16
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] In addition, due to factors such as speaking speed, speaker appearance (including posture, age, makeup and personal habits, etc.), there will be large differences between the same part of speech samples
These problems further increase the difficulty of lip language recognition, because limited training data is difficult to cover samples of all different situations
Existing lip language recognition methods are basically based on large-scale lip language recognition datasets, and little consideration is given to how to obtain a model with better performance to handle lip language recognition tasks under low resource conditions

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Audio-vision collaborative lip language recognition method and system
  • Audio-vision collaborative lip language recognition method and system
  • Audio-vision collaborative lip language recognition method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0083] The present invention uses the characteristics of audio-visual synchronization to propose a lip language recognition method for audio-visual collaborative learning. In this method, we have designed three levels of metric learning: visual-visual, audio-audio, and visual-audio. Simultaneous learning of the three metrics not only shortens the training time and stages, but also enables better collaborative learning between visual and audio modalities. With the help of audio information, the visual model of the present invention can extract more distinguishing features, thereby improving the performance of the lip recognition model. The present invention includes following key technical points:

[0084] Key point 1, the present invention proposes an audio-visual collaborative learning mechanism, which uses audio to assist visual model learning, and at the same time designs three-level metric learning methods of audio-audio, video-visual, and audio-visual, so that the model c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an audio-vision collaborative lip language recognition method and a system, based on measurement learning of three levels of vision-vision, audio-audio and vision-audio, the three measurement learning mechanisms are carried out at the same time, so that the training time is shortened, the training stage is shortened, and collaborative learning between two modes of vision and audio can be better carried out. With the help of the audio information, the visual model of the invention can extract features with higher distinction degree, thereby improving the performance of the lip language recognition model.

Description

technical field [0001] The invention relates to the fields of speech recognition and computer vision, especially visual speech recognition and lip language recognition. Background technique [0002] Lip recognition, also known as visual speech recognition, refers to the technology of interpreting the content of the speaker's words by watching the speaker's facial and lip movements when speaking. This technology can be used as a supplement to audio-based speech recognition to make up for the shortcomings of audio-based speech recognition models in high-noise environments. At the same time, this technology can also be used independently in silent environments to achieve efficient transmission of spoken words. Therefore, This technology has great application value in human-computer interaction systems; at the same time, in recent years, with the emergence of large-scale lip recognition data sets and the wide application of deep learning technology in computer vision, natural la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/02G10L15/06G10L15/25
CPCG10L15/02G10L15/25G10L15/063
Inventor 杨双罗明双山世光陈熙霖
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI