Cross-modal lip language recognition method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A recognition method and cross-modal technology, applied in the field of recognition, can solve the problems of only focusing on video input information, high cost, and failure to learn better visually separable features, so as to achieve good generalization and robustness, and improve Performance, the effect of good visual characteristics

Pending Publication Date: 2021-12-28

西安电子科技大学广州研究院

View PDF0 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The limitation of traditional lip recognition methods is that they only focus on video input information, and cannot learn better visually separable features without additional experience and knowledge guidance.

Therefore, these methods usually rely on a large amount of accurately labeled data, however, the cost of obtaining labeled data in real life is prohibitively high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0042] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

[0043] The invention provides a cross-modal lip recognition method, comprising

[0044] S1, data preprocessing:

[0045] For video data, first identify 68 key points of the face, and normalize each face image to a frontal view through affine transformation, and finally crop out the lip area;

[0046] For audio data, it is first down-sampled to 16kHz and converted to Mel cepstral coefficient features, and then the Mel cepstral coefficient vectors at all moments are normalized and formed into a feature matrix in time order;

[0047] S2, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a cross-modal lip language recognition method. The cross-modal lip language recognition method comprises the steps of S1, data preprocessing: acquiring a lip region of video data and a feature matrix of audio data; s2, model training: sequentially carrying out the steps of speaker recognition task training, cross-modal contrast learning, model parameter and lip language characteristic standardization and the like until the model converges; and S3, model deployment: only inputting a to-be-recognized non-training data video sequence, obtaining the lip shape characteristics of a speaker by using a visual recognition branch, standardizing the lip language characteristics, and finally performing mapping from the lip language characteristics to characters. According to the method, visual features with better distinguishability can be extracted on the premise of not additionally manually annotating data, the generalization and robustness of feature extraction are better, the method can be used across speakers, and a group of model parameters do not need to be independently trained for each category of samples.

Description

technical field [0001] The invention relates to the field of recognition, in particular to a cross-modal lip language recognition method. Background technique [0002] Lip language recognition is a visual language recognition technology, which mainly uses the lip movement information in the video, combined with the language recognition technology of language prior knowledge and context information. Lip recognition plays an important role in both language understanding and communication, and is often used when effective audio information is not available. It also has extremely high application value and can be applied to the treatment of speech-impaired patients, the field of security, military equipment and human-computer interaction. [0003] The limitation of traditional lip recognition methods is that they only focus on video input information, and cannot learn better visually separable features without the guidance of additional experience knowledge. Therefore, these m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/25G10L15/16G10L15/20G06K9/00G06N3/04G06N3/08

CPCG10L15/25G10L15/16G10L15/20G06N3/084G06N3/088G06N3/045

Inventor 梁雪峰黄奕洋

Owner 西安电子科技大学广州研究院

Cross-modal lip language recognition method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology