Lip language recognition method and system based on cross-modal attention enhancement

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A recognition method and attention technology, applied in the field of computer vision and pattern recognition, can solve problems such as changes in the upper lip area of the image, the impact of visual feature extraction, etc., and achieve the effect of improving accuracy

Active Publication Date: 2021-09-24

HUNAN UNIV

View PDF12 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Third, complex imaging conditions and changes in the pose of the speaker will cause obvious changes in the upper lip area of the image, which will affect the extraction of visual features

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0054] like figure 1 As shown, the lip recognition method based on cross-modal attention enhancement in this embodiment includes:

[0055] 1) Extract the lip region image sequence from the input image containing key points of the speaker's face Va , and according to the lip region image sequence Va Extract optical flow map sequence Vo ; Sequence of lip region images Va, Optical Flow Map Sequence Vo Input the pre-trained feature extractor separately to get the lip feature sequence Hv, Characteristic sequence of movement between lips Ho ; the lip feature sequence Hv, Characteristic sequence of movement between lips Ho Position encoding is performed separately to obtain the lip feature sequence that introduces position information Hvp and lip-to-lip motion feature sequence Hop The characteristic sequence of introducing position information composed of the two X ∈ { Hvp , Hop};

[0056] 2) The feature sequence that will be introduced into the position infor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a lip language recognition method and system based on cross-modal attention enhancement, and the method comprises the steps of extracting a lip image sequence and the lip motion information, obtaining a corresponding lip feature sequence and a lip motion sequence through a pre-training feature extractor, inputting the obtained feature sequences into a cross-modal attention network, obtaining a lip enhancement feature sequence; through a multi-branch attention mechanism, establishing the time sequence relevance of an intra-modal feature sequence, and specifically selecting the related information in input at an output end. According to the method, the relevance between the time sequence information is considered, optical flow calculation is carried out on the adjacent frames to obtain the motion information between the visual features, the lip visual features are represented and fused and enhanced by using the motion information, the context information in the mode is fully utilized, and finally, the correlation representation and selection of the intra-modal features are carried out through the multi-branch attention mechanism, so that the lip reading recognition accuracy is improved.

Description

technical field [0001] The invention relates to computer vision and pattern recognition technology, in particular to a lip recognition method and system based on cross-modal attention enhancement. Background technique [0002] Lip language recognition refers to understanding what the speaker is saying by capturing the movement information of the speaker's lips, which has a lot of useful speech information. In the actual human-computer natural interaction application environment, facial motion information is acquired through video and is not affected by complex environmental noise, so lip recognition can be used as one of the effective solutions for speaker content recognition without any audio input and high noise environment . The lip reading system has a variety of valuable applications. The realization of the system can assist speech recognition and solve the simultaneous speaking of multiple speakers, and realize more intelligent and robust human-computer interaction; i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06K9/62

CPCG06F18/22

Inventor 李树涛宋启亚孙斌

Owner HUNAN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Lip language recognition method and system based on cross-modal attention enhancement

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology