Multi-mode lip reading method based on facial physiological information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A physiological information, multi-modal technology, applied in 3D modeling, image data processing, computer parts and other directions, can solve the problem that has not yet involved the human inner vocal mechanism, the lip movement feature extraction method stays in the surface phenomenon observation, the three-dimensional space point Cloud's internal relationship is not clear and other issues

Pending Publication Date: 2019-08-09

TIANJIN UNIV

View PDF5 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Although the lip movement feature extraction method integrating depth information has greatly made up for the shortcomings of the feature extraction method based on two-dimensional image information, there are still some problems to be solved, such as the inherent relationship between the three-dimensional space point clouds of the lip area. The relationship is not yet clear, and the research on lip movement feature extraction method still stays in the observation of superficial phenomena, and has not yet involved the internal vocalization mechanism of human beings

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0052] The embodiment of the present invention provides a multimodal lip reading method based on facial physiological information, see figure 1 , the method includes the following steps:

[0053] 101: Kinect-based multi-modal data acquisition and preprocessing;

[0054] 102: Facial muscle model establishment;

[0055] 103: Lip movement feature extraction based on depth information;

[0056] 104: Lip reading recognition based on DenseNet.

[0057] In one embodiment, step 101 synchronously collects the audio data, color image data and depth data during the lip movement of the speaker, and then preprocesses the collected data. The specific method is as follows:

[0058] Use the P2FA tool to force the alignment of the audio, and segment the color image and 3D depth information according to the alignment results. For color image data, first use the cascade classifier based on the OpenCV vision library to detect the face in the image to determine the position of the speaker's fa...

Embodiment 2

[0064] The scheme in embodiment 1 is further introduced below in conjunction with specific calculation formulas and examples, see the following description for details:

[0065] 201: After the multimodal data is collected, the data must first be preprocessed, the audio is forced to be aligned, and the color image and 3D depth information are segmented according to the alignment result;

[0066] 202: Perform face detection, lip area positioning, and data expansion on color image data;

[0067] Wherein, brightness change is used in data expansion, and the embodiment of the present invention uses gamma transformation to correct color image information, as shown in formula (1).

[0068] S=cg γ (1)

[0069] In the formula, both c and γ are positive real numbers, g represents the gray value of the input image, and s represents the transformed gray value. If γ is greater than 1, the grayscale of the brighter area in the image is stretched, the grayscale of the darker area is comp...

Embodiment 3

[0129] Below in conjunction with concrete experimental data, the scheme in embodiment 1 and 2 is carried out feasibility verification, see the following description for details:

[0130] The embodiment of the present invention uses DenseNet for lip reading recognition with temporal continuity for the first time, and proposes a new method of preserving the temporal continuity of images by splicing. Using color image data of 8-speaker pairs of vowels / a / , / o / , / e / , / i / , / u / to demonstrate the feasibility of the network model for lip reading recognition and the concatenation method for retention time Continuity of effectiveness.

[0131] The obtained classification results are as Figure 7 As shown, the recognition rate of the five vowels reached 99.17%, and the recognition rates of the syllables / a / , / e / all reached 100%. This result shows that a part of time information can be preserved through image splicing. In addition, the DenseNet network structure used in the present in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-mode lip reading method based on facial physiological information, which comprises the following steps: collecting a color image and 3D depth information based on Kinect, and preprocessing the color image and the 3D depth information; establishing a facial muscle model, and mapping the starting point and the end point of each muscle into a standard three-dimensionalface model through feature point matching in combination with the position information of the six muscles; extracting geometric features and angle features of the lip based on the depth information;extracting muscle length characteristics for representing muscle stretching change and muscle physiological characteristics for representing the influence of mutual cooperation between muscles on characteristic point displacement according to the facial muscle model; and identifying the features based on the multi-modal speech recognition of DenseNets, improving a full connection layer of the DenseNet, fusing the color image and the depth information, and classifying the features. The method can effectively overcome the defects of a traditional feature extraction method based on a two-dimensional image.

Description

technical field [0001] The invention relates to the field of computer intelligent recognition, to the fields of multimodal data collection, feature extraction based on depth information and multimodal speech recognition, and in particular to a multimodal lip reading method based on facial physiological information. Background technique [0002] Lip reading research is mainly composed of three modules, namely: lip area detection and positioning, lip movement feature extraction and training recognition. [0003] As for lip area positioning, in the early days, the lip area was roughly determined based on the geometric characteristics of the face, that is, the average standard ratio of face length to face width. The current popular method is based on color information. The core of this method is to use color space transformation to separate chromaticity, and then divide the image according to the color range information. Satisfactory detection rates were obtained using HSV spat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06K9/62G06T3/40G06T7/41G06T17/00

CPCG06T17/00G06T3/4038G06T7/41G06T2207/10024G06T2207/30201G06T2200/04G06V20/64G06V40/171G06V40/20G06V10/751G06F18/253

Inventor 徐天一朱雨朦高洁刘志强赵满坤王建荣李雪威杨帆

Owner TIANJIN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-mode lip reading method based on facial physiological information

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology