Face living body detection method and device based on multi-modal large language model, equipment and storage medium

By fusing the image of the face to be identified with the target auxiliary modal image into a multimodal large language model and processing it with a visual token mask, the problem of insufficient generalization ability and poor interpretability of traditional face liveness detection methods is solved, achieving more efficient attack region localization and more comprehensive attack response capabilities.

CN120388404BActive Publication Date: 2026-06-19CREATOR CHINA TCH CO +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CREATOR CHINA TCH CO
Filing Date
2025-03-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Traditional face liveness detection methods suffer from insufficient model generalization ability, poor model interpretability, and a lack of coarse-to-fine granular localization of attack regions.

Method used

A face liveness detection method based on a multimodal large language model is adopted. The face image to be identified and the target auxiliary modality image are input into the multimodal large language model for fusion. A visual token mask is used to randomly mask the features of the fused image to enhance the model's representation ability. The traditional detection task is extended to four sub-tasks: coarse-grained classification, fine-grained classification, reasoning, and attack localization.

Benefits of technology

It improves the security and reliability of face liveness detection, enabling it to more comprehensively cope with various attack methods and enhance the model's generalization ability and interpretability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120388404B_ABST
    Figure CN120388404B_ABST
Patent Text Reader

Abstract

This application discloses a method, apparatus, device, and storage medium for face liveness detection based on a multimodal large language model, relating to the field of image detection technology. The method includes: inputting a face image to be identified, a target auxiliary modality image, and a user command into a trained multimodal large language model to obtain a multi-task output result for the user command. The multimodal large language model includes a visual token mask, and the multi-task output result includes at least one of coarse-grained classification results, fine-grained classification results, causal reasoning results, and attack localization results. This application can improve the security and reliability of face liveness detection.
Need to check novelty before this filing date? Find Prior Art