Multi-mode lip reading method based on facial physiological information

A physiological information, multi-modal technology, applied in 3D modeling, image data processing, computer parts and other directions, can solve the problem that has not yet involved the human inner vocal mechanism, the lip movement feature extraction method stays in the surface phenomenon observation, the three-dimensional space point Cloud's internal relationship is not clear and other issues

Pending Publication Date: 2019-08-09
TIANJIN UNIV
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the lip movement feature extraction method integrating depth information has greatly made up for the shortcomings of the feature extraction method based on two-dimensional image information, there are still some problems to be solved, such as the inherent relationship betwe...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-mode lip reading method based on facial physiological information
  • Multi-mode lip reading method based on facial physiological information
  • Multi-mode lip reading method based on facial physiological information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] The embodiment of the present invention provides a multimodal lip reading method based on facial physiological information, see figure 1 , the method includes the following steps:

[0053] 101: Kinect-based multi-modal data acquisition and preprocessing;

[0054] 102: Facial muscle model establishment;

[0055] 103: Lip movement feature extraction based on depth information;

[0056] 104: Lip reading recognition based on DenseNet.

[0057] In one embodiment, step 101 synchronously collects the audio data, color image data and depth data during the lip movement of the speaker, and then preprocesses the collected data. The specific method is as follows:

[0058] Use the P2FA tool to force the alignment of the audio, and segment the color image and 3D depth information according to the alignment results. For color image data, first use the cascade classifier based on the OpenCV vision library to detect the face in the image to determine the position of the speaker's fa...

Embodiment 2

[0064] The scheme in embodiment 1 is further introduced below in conjunction with specific calculation formulas and examples, see the following description for details:

[0065] 201: After the multimodal data is collected, the data must first be preprocessed, the audio is forced to be aligned, and the color image and 3D depth information are segmented according to the alignment result;

[0066] 202: Perform face detection, lip area positioning, and data expansion on color image data;

[0067] Wherein, brightness change is used in data expansion, and the embodiment of the present invention uses gamma transformation to correct color image information, as shown in formula (1).

[0068] S=cg γ (1)

[0069] In the formula, both c and γ are positive real numbers, g represents the gray value of the input image, and s represents the transformed gray value. If γ is greater than 1, the grayscale of the brighter area in the image is stretched, the grayscale of the darker area is comp...

Embodiment 3

[0129] Below in conjunction with concrete experimental data, the scheme in embodiment 1 and 2 is carried out feasibility verification, see the following description for details:

[0130] The embodiment of the present invention uses DenseNet for lip reading recognition with temporal continuity for the first time, and proposes a new method of preserving the temporal continuity of images by splicing. Using color image data of 8-speaker pairs of vowels / a / , / o / , / e / , / i / , / u / to demonstrate the feasibility of the network model for lip reading recognition and the concatenation method for retention time Continuity of effectiveness.

[0131] The obtained classification results are as Figure 7 As shown, the recognition rate of the five vowels reached 99.17%, and the recognition rates of the syllables / a / , / e / all reached 100%. This result shows that a part of time information can be preserved through image splicing. In addition, the DenseNet network structure used in the present in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-mode lip reading method based on facial physiological information, which comprises the following steps: collecting a color image and 3D depth information based on Kinect, and preprocessing the color image and the 3D depth information; establishing a facial muscle model, and mapping the starting point and the end point of each muscle into a standard three-dimensionalface model through feature point matching in combination with the position information of the six muscles; extracting geometric features and angle features of the lip based on the depth information;extracting muscle length characteristics for representing muscle stretching change and muscle physiological characteristics for representing the influence of mutual cooperation between muscles on characteristic point displacement according to the facial muscle model; and identifying the features based on the multi-modal speech recognition of DenseNets, improving a full connection layer of the DenseNet, fusing the color image and the depth information, and classifying the features. The method can effectively overcome the defects of a traditional feature extraction method based on a two-dimensional image.

Description

technical field [0001] The invention relates to the field of computer intelligent recognition, to the fields of multimodal data collection, feature extraction based on depth information and multimodal speech recognition, and in particular to a multimodal lip reading method based on facial physiological information. Background technique [0002] Lip reading research is mainly composed of three modules, namely: lip area detection and positioning, lip movement feature extraction and training recognition. [0003] As for lip area positioning, in the early days, the lip area was roughly determined based on the geometric characteristics of the face, that is, the average standard ratio of face length to face width. The current popular method is based on color information. The core of this method is to use color space transformation to separate chromaticity, and then divide the image according to the color range information. Satisfactory detection rates were obtained using HSV spat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06K9/62G06T3/40G06T7/41G06T17/00
CPCG06T17/00G06T3/4038G06T7/41G06T2207/10024G06T2207/30201G06T2200/04G06V20/64G06V40/171G06V40/20G06V10/751G06F18/253
Inventor 徐天一朱雨朦高洁刘志强赵满坤王建荣李雪威杨帆
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products