Cross-modal lip language recognition method
A recognition method and cross-modal technology, applied in the field of recognition, can solve the problems of only focusing on video input information, high cost, and failure to learn better visually separable features, so as to achieve good generalization and robustness, and improve Performance, the effect of good visual characteristics
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0042] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.
[0043] The invention provides a cross-modal lip recognition method, comprising
[0044] S1, data preprocessing:
[0045] For video data, first identify 68 key points of the face, and normalize each face image to a frontal view through affine transformation, and finally crop out the lip area;
[0046] For audio data, it is first down-sampled to 16kHz and converted to Mel cepstral coefficient features, and then the Mel cepstral coefficient vectors at all moments are normalized and formed into a feature matrix in time order;
[0047] S2, ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 

