Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Lip language recognition method and system based on cross-modal attention enhancement

A recognition method and attention technology, applied in the field of computer vision and pattern recognition, can solve problems such as changes in the upper lip area of ​​​​the image, the impact of visual feature extraction, etc., and achieve the effect of improving accuracy

Active Publication Date: 2021-09-24
HUNAN UNIV
View PDF12 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Third, complex imaging conditions and changes in the pose of the speaker will cause obvious changes in the upper lip area of ​​​​the image, which will affect the extraction of visual features

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Lip language recognition method and system based on cross-modal attention enhancement
  • Lip language recognition method and system based on cross-modal attention enhancement
  • Lip language recognition method and system based on cross-modal attention enhancement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] like figure 1 As shown, the lip recognition method based on cross-modal attention enhancement in this embodiment includes:

[0055] 1) Extract the lip region image sequence from the input image containing key points of the speaker's face Va , and according to the lip region image sequence Va Extract optical flow map sequence Vo ; Sequence of lip region images Va, Optical Flow Map Sequence Vo Input the pre-trained feature extractor separately to get the lip feature sequence Hv, Characteristic sequence of movement between lips Ho ; the lip feature sequence Hv, Characteristic sequence of movement between lips Ho Position encoding is performed separately to obtain the lip feature sequence that introduces position information Hvp and lip-to-lip motion feature sequence Hop The characteristic sequence of introducing position information composed of the two X ∈ { Hvp , Hop};

[0056] 2) The feature sequence that will be introduced into the position infor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a lip language recognition method and system based on cross-modal attention enhancement, and the method comprises the steps of extracting a lip image sequence and the lip motion information, obtaining a corresponding lip feature sequence and a lip motion sequence through a pre-training feature extractor, inputting the obtained feature sequences into a cross-modal attention network, obtaining a lip enhancement feature sequence; through a multi-branch attention mechanism, establishing the time sequence relevance of an intra-modal feature sequence, and specifically selecting the related information in input at an output end. According to the method, the relevance between the time sequence information is considered, optical flow calculation is carried out on the adjacent frames to obtain the motion information between the visual features, the lip visual features are represented and fused and enhanced by using the motion information, the context information in the mode is fully utilized, and finally, the correlation representation and selection of the intra-modal features are carried out through the multi-branch attention mechanism, so that the lip reading recognition accuracy is improved.

Description

technical field [0001] The invention relates to computer vision and pattern recognition technology, in particular to a lip recognition method and system based on cross-modal attention enhancement. Background technique [0002] Lip language recognition refers to understanding what the speaker is saying by capturing the movement information of the speaker's lips, which has a lot of useful speech information. In the actual human-computer natural interaction application environment, facial motion information is acquired through video and is not affected by complex environmental noise, so lip recognition can be used as one of the effective solutions for speaker content recognition without any audio input and high noise environment . The lip reading system has a variety of valuable applications. The realization of the system can assist speech recognition and solve the simultaneous speaking of multiple speakers, and realize more intelligent and robust human-computer interaction; i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/62
CPCG06F18/22
Inventor 李树涛宋启亚孙斌
Owner HUNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products