Face living body detection method and device based on multi-modal large language model, equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By fusing the image of the face to be identified with the target auxiliary modal image into a multimodal large language model and processing it with a visual token mask, the problem of insufficient generalization ability and poor interpretability of traditional face liveness detection methods is solved, achieving more efficient attack region localization and more comprehensive attack response capabilities.

CN120388404BActive Publication Date: 2026-06-19CREATOR CHINA TCH CO +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CREATOR CHINA TCH CO
Filing Date: 2025-03-20
Publication Date: 2026-06-19

Application Information

Patent Timeline

20 Mar 2025

Application

19 Jun 2026

Publication

CN120388404B

IPC: G06V40/16; G06V10/774; G06V10/46; G06V10/50; G06V10/80; G06V10/82; G06N3/096; G06V40/40

AI Tagging

Application Domain

Biological modelsSpoof detection

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A power distribution network voltage support evaluation method, system, device and medium based on generalized regulation resources
CN122225477ABiological models Ac network voltage adjustment
System(s) and method(s) for generative model processing of image data including object(s) having particular feature(s) and / or classification(s)
WO2026122857A1Biological models
Knowledge graph construction method and device, equipment and storage medium
CN119149753BImprove timing analysisImproving performance in directional reasoningBiological models Knowledge representation
QA system and method
US20260162247A1Programme control Image enhancement
Systems and methods for data collection in an industrial environment
US20260161153A1Machine part testing Receivers monitoring

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Traditional face liveness detection methods suffer from insufficient model generalization ability, poor model interpretability, and a lack of coarse-to-fine granular localization of attack regions.

Method used

A face liveness detection method based on a multimodal large language model is adopted. The face image to be identified and the target auxiliary modality image are input into the multimodal large language model for fusion. A visual token mask is used to randomly mask the features of the fused image to enhance the model's representation ability. The traditional detection task is extended to four sub-tasks: coarse-grained classification, fine-grained classification, reasoning, and attack localization.

Benefits of technology

It improves the security and reliability of face liveness detection, enabling it to more comprehensively cope with various attack methods and enhance the model's generalization ability and interpretability.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN120388404B_ABST

Patent Text Reader

Abstract

This application discloses a method, apparatus, device, and storage medium for face liveness detection based on a multimodal large language model, relating to the field of image detection technology. The method includes: inputting a face image to be identified, a target auxiliary modality image, and a user command into a trained multimodal large language model to obtain a multi-task output result for the user command. The multimodal large language model includes a visual token mask, and the multi-task output result includes at least one of coarse-grained classification results, fine-grained classification results, causal reasoning results, and attack localization results. This application can improve the security and reliability of face liveness detection.

Need to check novelty before this filing date? Find Prior Art