Audio-vision collaborative lip language recognition method and system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A recognition method and lip language technology, applied in the field of visual speech recognition and lip language recognition, can solve the problems of rarely considered, difficult to cover different situations, and increase the difficulty of lip language recognition, so as to improve feature extraction ability and good classification. performance effect

Pending Publication Date: 2021-11-16

INST OF COMPUTING TECH CHINESE ACAD OF SCI

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0008] In addition, due to factors such as speaking speed, speaker appearance (including posture, age, makeup and personal habits, etc.), there will be large differences between the same part of speech samples

These problems further increase the difficulty of lip language recognition, because limited training data is difficult to cover samples of all different situations

Existing lip language recognition methods are basically based on large-scale lip language recognition datasets, and little consideration is given to how to obtain a model with better performance to handle lip language recognition tasks under low resource conditions

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0083] The present invention uses the characteristics of audio-visual synchronization to propose a lip language recognition method for audio-visual collaborative learning. In this method, we have designed three levels of metric learning: visual-visual, audio-audio, and visual-audio. Simultaneous learning of the three metrics not only shortens the training time and stages, but also enables better collaborative learning between visual and audio modalities. With the help of audio information, the visual model of the present invention can extract more distinguishing features, thereby improving the performance of the lip recognition model. The present invention includes following key technical points:

[0084] Key point 1, the present invention proposes an audio-visual collaborative learning mechanism, which uses audio to assist visual model learning, and at the same time designs three-level metric learning methods of audio-audio, video-visual, and audio-visual, so that the model c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an audio-vision collaborative lip language recognition method and a system, based on measurement learning of three levels of vision-vision, audio-audio and vision-audio, the three measurement learning mechanisms are carried out at the same time, so that the training time is shortened, the training stage is shortened, and collaborative learning between two modes of vision and audio can be better carried out. With the help of the audio information, the visual model of the invention can extract features with higher distinction degree, thereby improving the performance of the lip language recognition model.

Description

technical field [0001] The invention relates to the fields of speech recognition and computer vision, especially visual speech recognition and lip language recognition. Background technique [0002] Lip recognition, also known as visual speech recognition, refers to the technology of interpreting the content of the speaker's words by watching the speaker's facial and lip movements when speaking. This technology can be used as a supplement to audio-based speech recognition to make up for the shortcomings of audio-based speech recognition models in high-noise environments. At the same time, this technology can also be used independently in silent environments to achieve efficient transmission of spoken words. Therefore, This technology has great application value in human-computer interaction systems; at the same time, in recent years, with the emergence of large-scale lip recognition data sets and the wide application of deep learning technology in computer vision, natural la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/02G10L15/06G10L15/25

CPCG10L15/02G10L15/25G10L15/063

Inventor 杨双罗明双山世光陈熙霖

Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI

Audio-vision collaborative lip language recognition method and system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology