Face living body detection method and device based on multi-modal large language model, equipment and storage medium
By fusing the image of the face to be identified with the target auxiliary modal image into a multimodal large language model and processing it with a visual token mask, the problem of insufficient generalization ability and poor interpretability of traditional face liveness detection methods is solved, achieving more efficient attack region localization and more comprehensive attack response capabilities.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CREATOR CHINA TCH CO
- Filing Date
- 2025-03-20
- Publication Date
- 2026-06-19
AI Technical Summary
Traditional face liveness detection methods suffer from insufficient model generalization ability, poor model interpretability, and a lack of coarse-to-fine granular localization of attack regions.
A face liveness detection method based on a multimodal large language model is adopted. The face image to be identified and the target auxiliary modality image are input into the multimodal large language model for fusion. A visual token mask is used to randomly mask the features of the fused image to enhance the model's representation ability. The traditional detection task is extended to four sub-tasks: coarse-grained classification, fine-grained classification, reasoning, and attack localization.
It improves the security and reliability of face liveness detection, enabling it to more comprehensively cope with various attack methods and enhance the model's generalization ability and interpretability.
Smart Images

Figure CN120388404B_ABST