A living body detection method and device based on deep learning

By constructing a neural network and using loss functions LossDense and lossEmbed for iterative training, the parameters are optimized to extract the embedding vector, solving the problem of distinguishing between real and fake faces in existing technologies, and achieving high-accuracy liveness detection.

CN115700834BActive Publication Date: 2026-06-26FUJIAN JOYUSING TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
FUJIAN JOYUSING TECHNOLOGY CO LTD
Filing Date
2021-07-15
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies cannot effectively distinguish between real faces and various fake faces, especially images, videos, and 3D models, resulting in low accuracy in liveness detection.

Method used

A neural network is constructed, and iterative training is performed using loss functions LossDense and lossEmbed. The parameters are optimized to extract high-dimensional feature vectors as low-dimensional embedding vectors. Confidence is calculated in parallel fully connected layers to increase the inter-class distance of the embedding vectors and reduce the intra-class distance. The authenticity of the image is judged by the confidence score.

Benefits of technology

It achieves high-accuracy liveness detection for various forged images, improves the detection accuracy of neural networks, and can comprehensively and accurately reflect the differences between real and forged images.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115700834B_ABST
    Figure CN115700834B_ABST
Patent Text Reader

Abstract

The application relates to a kind of based on deep learning living body detection method, comprising the following steps: constructing the neural network for carrying out living body detection to infrared face image;The neural network is trained;The infrared face image to be detected is input to the neural network after training, the input layer and implicit layer map infrared face image as feature vector, feature vector is divided into several feature sub-vectors and several feature sub-vectors are respectively input to each parallel embedding layer;Each parallel embedding layer respectively maps each feature sub-vector as embedding vector, and inputs it to the corresponding parallel full connection layer;The parallel full connection layer outputs the confidence of each embedding vector, if each confidence is greater than the first threshold value, then the infrared face image to be detected passes living body detection.The image feature extracted by the neural network constructed by the application can more comprehensively and accurately reflect the difference between real image and fake image.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a deep learning-based method and device for liveness detection, belonging to the field of liveness detection. Background Technology

[0002] While current facial recognition technology can identify a person's identity, it cannot accurately distinguish between genuine and fake faces, leaving it vulnerable to spoofing attacks. Common methods of facial spoofing include three types: images containing the user's face, videos containing the user's face, and 3D models or masks created using the user's face. Therefore, liveness detection is necessary to determine whether the identified face comes from a real user or from an image, video, 3D model, or 3D mask containing the user's face.

[0003] Currently, liveness detection methods are mainly divided into three categories: microtexture-based, motion-information-based, and multispectral-based. Microtexture-based methods are susceptible to lighting and resolution issues and are easily susceptible to video spoofing. Motion-information-based methods require user interaction, resulting in a poor user experience. Multispectral-based methods utilize the differences in spectral reflectance between skin and other materials for detection, mostly using near-infrared light. They have strong recognition capabilities and high accuracy, but their ability to recognize printed images made of certain materials is poor. Currently, most near-infrared light-based liveness detection methods rely on constructing and training deep learning models for liveness detection.

[0004] Existing technologies, such as the patent with publication number CN110956056A, entitled "A Method and System for Facial Liveness Detection," train a neural network model by inputting various forged images into it. The features extracted by the resulting neural network model can characterize the difference between real and forged images. However, due to the diverse types of forged images, the extracted features cannot adequately characterize the difference between each category of forged and real images, resulting in a low detection accuracy for at least one category of forged images. Summary of the Invention

[0005] To address the problems existing in the prior art, this invention provides a deep learning-based liveness detection method that can more comprehensively and accurately reflect the differences between real images and various types of fake images.

[0006] The technical solution of the present invention is as follows:

[0007] Technical Solution 1:

[0008] A deep learning-based liveness detection method includes the following steps:

[0009] A neural network for liveness detection of infrared face images is constructed. The neural network includes an input layer, several hidden layers and an output layer. The output layer includes several parallel embedding layers and several parallel fully connected layers that correspond one-to-one with each parallel embedding layer.

[0010] Training the neural network involves: collecting real infrared face images and multiple categories of fake infrared face images as training samples; constructing a loss function; and iteratively training the neural network using the training samples and the loss function.

[0011] An infrared face image to be detected is input into a trained neural network. The input layer and hidden layer map the infrared face image into a feature vector, divide the feature vector into several feature sub-vectors, and input the feature sub-vectors into each parallel embedding layer. Each parallel embedding layer maps each feature sub-vector into an embedding vector and inputs it into the corresponding parallel fully connected layer. The parallel fully connected layer outputs the confidence score of each embedding vector. If each confidence score is greater than a preset first threshold, the infrared face image to be detected passes the liveness detection.

[0012] Furthermore, the method also includes: when the difference between the L2 norm of the embedding vectors obtained from the real infrared face image and the fake infrared face image is greater than a preset second threshold, the neural network ends the iterative training.

[0013] Furthermore, the method for acquiring the infrared face image to be detected is as follows:

[0014] The first image is acquired using an infrared camera; the human face region in the first image is located using a multi-task cascaded convolutional neural network; if the human face region is successfully located, the human face region is cropped to obtain the infrared face image to be detected.

[0015] Furthermore, the loss function LossDense is constructed, which is expressed by the formula:

[0016]

[0017] lossdense j =-(ylog(P)-(1-y)log(1-p))

[0018] Where k represents the number of parallel fully connected layers; p represents the confidence level of the output of the parallel fully connected layers; y represents the label value of the input image to the neural network, which takes the value 0 or 1; lossdense j This represents the loss value calculated by comparing the label value of the input image to the confidence level of the j-th parallel fully connected layer.

[0019] Furthermore, it also includes constructing another loss function, lossEmbed, and iteratively training the neural network by alternating between lossEmbed and LossDense. lossEmbed is expressed by the formula:

[0020] lossEmbed = lossEmbed alive +lossEmbed unalive

[0021] when At that time, loss Embed alive =0 when hour,

[0022]

[0023] when At that time, loss Embed unalive =0

[0024] when hour,

[0025]

[0026]

[0027] Among them, V alive i V represents the i-th liveness embedding vector, which is an embedding vector obtained by mapping a real infrared face image through a first parallel embedding layer; unalive i V represents the i-th inactive embedding vector, which is an embedding vector obtained by mapping a forged infrared face image through a first parallel embedding layer; alive c Let ||x||2 represent the center vector obtained from n liveness embedding vectors; ||x||2 represents the L2 norm of vector x; abs(x) represents the absolute value of vector x; lossEmbed alive This represents the loss value calculated by subtracting the center vector from each of the n1 liveness embedding vectors; lossEmbed unalive This represents the loss value calculated between n² non-living embedding vectors and the center vector; c and d represent constants.

[0028] Furthermore, the multi-category fake infrared face images include: images containing face pictures, 3D face mask images, and 3D face model images.

[0029] Technical Solution Two:

[0030] A deep learning-based liveness detection device includes a memory and a processor, wherein the memory stores instructions adapted for loading and execution by the processor as described in technical solution one.

[0031] The present invention has the following beneficial effects:

[0032] 1. This invention utilizes the LossDense loss function to train a neural network until the LossDense value is less than a third threshold. At this point, the parameters of the neural network are optimized, and each parallel embedding layer maps high-dimensional feature vectors to low-dimensional embedding vectors containing image classification information. The parallel fully connected layers obtain confidence scores based on these embedding vectors, thus distinguishing between real faces and various categories of forged images. The confidence score represents the similarity between the input infrared face image and the real image and various types of forged images. A higher confidence score indicates a higher similarity between the input infrared face image and the real image.

[0033] 2. The present invention also alternately uses the loss function lossEmbed for iterative training until the L2 difference between the embedding vectors obtained from real infrared face images (hereinafter referred to as real images) and fake infrared face images (hereinafter referred to as fake images) is greater than the second threshold, thereby increasing the inter-class distance of embedding vectors of different categories, reducing the intra-class distance of embedding vectors of the same category, improving the classification accuracy of embedding vectors, and thus improving the detection accuracy of the neural network.

[0034] In summary, the image features (i.e., embedding vectors) extracted by the neural network constructed in this invention can more comprehensively and accurately reflect the differences between real images and various types of fake images. Therefore, based on the embedding vectors, a high detection accuracy for multiple categories of fake images can be achieved. Attached Figure Description

[0035] Figure 1 This is a flowchart of the present invention;

[0036] Figure 2 This is a schematic diagram of the embedded vector;

[0037] Figure 3 This is a diagram of a neural network structure. Detailed Implementation

[0038] The present invention will now be described in detail with reference to the accompanying drawings and specific embodiments.

[0039] Example 1

[0040] See Figure 1 A deep learning-based liveness detection method includes the following steps:

[0041] A neural network for liveness detection of infrared face images is constructed. The neural network includes an input layer, several hidden layers, and an output layer. The output layer includes four parallel embedding layers and four parallel fully connected layers corresponding to each of the several parallel embedding layers.

[0042] Training and testing the neural network: Collect real infrared face images and multi-class fake infrared face images as training samples, and divide the sample set into a training sample set and a test sample set. Construct loss functions lossEmbed and LossDense. Input the training sample set into the neural network, and use the Adadelta optimizer to iteratively train the neural network by alternately using the loss functions lossEmbed and LossDense to optimize the neural network parameters until the LossDense value is less than the third threshold and the L2 difference between the embedding vectors obtained from real infrared face images and fake infrared face images is greater than the second threshold (e.g., ...). Figure 2 As shown, in this embodiment, using the center vector as a reference, iterative training is performed to make the L2 difference between the live embedding vector and the non-live embedding vector greater than the second threshold of 3.2. The trained neural network is then tested using the test sample set.

[0043] Input the infrared face image to be detected into the neural network after testing.

[0044] The input layer and hidden layer of the neural network map the input infrared face image into a feature vector through multiple convolution operations. The feature vector is divided into four feature sub-vectors, and the four feature sub-vectors are respectively input to each parallel embedding layer (in this embodiment, the 512-dimensional feature vector is divided into four 128-dimensional feature sub-vectors. For example, in the liveness detection device, the array a[0]……a

[511] represents the 512-dimensional feature vector. Then a[0]……a

[127] is input to the first parallel embedding layer, a

[128] ……a

[255] is input to the second parallel embedding layer, and so on).

[0045] Each parallel embedding layer then maps each feature vector to an embedding vector and inputs it into the corresponding parallel fully connected layer.

[0046] The parallel fully connected layer obtains the confidence level (value between 0 and 1) of each embedding vector through the sigmoid activation function. If each confidence level is greater than the preset first threshold of 0.6, the infrared face image to be detected passes the liveness detection.

[0047] The beneficial effects of this embodiment are as follows:

[0048] The neural network is trained using the LossDense loss function until the LossDense value is less than a third threshold. At this point, the parameters of the neural network are optimized, and each parallel embedding layer maps high-dimensional feature vectors to low-dimensional embedding vectors containing image classification information. The parallel fully connected layers then obtain confidence scores based on these embedding vectors, thus distinguishing between real faces and various categories of forged images. The confidence score represents the similarity between the input infrared face image and the real image, as well as various types of forged images. A higher confidence score indicates a higher similarity between the input infrared face image and the real image.

[0049] Furthermore, during the training process, the loss function lossEmbed is used alternately for iterative training until the L2 difference between the embedding vectors obtained from real infrared face images (hereinafter referred to as real images) and fake infrared face images (hereinafter referred to as fake images) is greater than the second threshold. This expands the inter-class distance of embedding vectors of different categories, reduces the intra-class distance of embedding vectors of the same category, improves the classification accuracy of embedding vectors, and thus improves the detection accuracy of the neural network.

[0050] In summary, the image features (i.e., embedding vectors) extracted by the neural network constructed in this invention can more comprehensively and accurately reflect the differences between real images and various types of fake images. Therefore, based on the embedding vectors, a high detection accuracy for multiple categories of fake images can be achieved.

[0051] Example 2

[0052] A neural network for liveness detection of infrared face images is pre-built and trained;

[0053] The first image is acquired using an infrared camera; the human face region in the first image is located using a multi-task cascaded convolutional neural network (MTCNN algorithm). If the human face region is successfully located, it is cropped to obtain an infrared face image; otherwise, the first image does not pass the liveness detection.

[0054] An infrared face image is input into a neural network. The neural network outputs multiple confidence scores. If all confidence scores are greater than a first threshold, the first image passes the liveness detection; otherwise, the first image fails the liveness detection.

[0055] Example 3

[0056] like Figure 3A neural network for liveness detection of infrared face images is constructed. The neural network includes an input layer, several hidden layers, and an output layer. In this embodiment, the lightweight neural network MobileNet is used to construct the input layer and several hidden layers of the neural network. The output layer includes a first parallel embedding layer, a first parallel fully connected layer corresponding to the first parallel embedding layer, a second parallel embedding layer, a second parallel fully connected layer corresponding to the second parallel embedding layer, a third parallel embedding layer, a third parallel fully connected layer corresponding to the third parallel embedding layer, a fourth parallel embedding layer, and a fourth parallel fully connected layer corresponding to the fourth parallel embedding layer.

[0057] Obtain a sample set containing real infrared face images and three categories of fake infrared face images (i.e., images containing face pictures, 3D face mask images, and 3D face model images). Divide the sample set into a training sample set and a test sample set in a 2:8 ratio.

[0058] Construct the loss function lossEmbed:

[0059] lossEmbed = lossEmbed alive +lossEmbed unalive

[0060] when At that time, lossEmbed alive =0

[0061] when hour,

[0062]

[0063] when At that time, loss Embed unalive =0 when hour,

[0064]

[0065]

[0066] Among them, V alive i V represents the i-th liveness embedding vector; unalive i V represents the i-th inactive embedding vector (for ease of distinction, the embedding vector obtained by mapping a real infrared face image through the first parallel embedding layer is called the live embedding vector, and the embedding vector obtained by mapping a fake infrared face image through the first parallel embedding layer is called the inactive embedding vector); alive c Let ||x||2 represent the center vector obtained from n liveness embedding vectors; ||x||2 represents solving for the L2 norm of x; abs(x) represents finding the absolute value of x; lossEmbed aliveThis represents the loss value calculated by subtracting the center vector from each of the n1 liveness embedding vectors; lossEmbed unalive This represents the calculation of the loss value between n² non-living embedding vectors and the center vector.

[0067] Loss function LossDense:

[0068]

[0069] lossdense j =-(ylog(P)+(1-y)log(1-p))

[0070] Where k represents the number of parallel fully connected layers; p represents the confidence level of the output of the parallel fully connected layers;

[0071] lossdense1 represents the loss value calculated between the label value of the input image of the neural network and the output confidence of the first parallel fully connected layer. If the input infrared face image is any type of fake infrared face image, then y = 0; if the input infrared face image is a real infrared face image, then y = 1.

[0072] lossdense2 represents the loss value calculated between the input image label value and the output confidence of the second parallel fully connected layer. If the input infrared face image is an image containing a face, then y = 0; if the input infrared face image is a real infrared face image, then y = 1.

[0073] lossdense3 represents the loss value calculated between the input image label value and the output confidence of the third parallel fully connected layer. If the input infrared face image is a 3D face mask image, then y = 0; if the input infrared face image is a real infrared face image, then y = 1.

[0074] lossdense4 represents the loss value calculated between the label value of the input image of the neural network and the output confidence of the fourth parallel fully connected layer. If the input infrared face image is a 3D face model image, then y = 0; if the input infrared face image is a real infrared face image, then y = 1.

[0075] The training sample set is input into the neural network (four types of images are input for each training iteration: the first image is a real infrared face image, the second is an image containing a face, the third is a 3D face mask image, and the fourth is a 3D face model image). The Adadelta optimizer is used to iteratively train the neural network using the loss functions lossEmbed and LossDense alternately until the neural network accuracy exceeds 99%, the number of iterations exceeds 300,000, the LossDense value is less than the third threshold of 0.01, and the L2 difference between the embedding vectors obtained from the real infrared face image and the fake infrared face image is greater than the second threshold of 3.2. During training, backpropagation is used to backpropagate the loss values ​​calculated by each loss function to optimize the neural network parameters, enabling each parallel embedding layer to map high-dimensional feature vectors into low-dimensional embedding vectors containing image classification information.

[0076] In this embodiment, the first parallel embedding layer extracts features (i.e., image classification information) that distinguish the forged image from the real image during the mapping process, obtaining a first embedding vector; the first parallel fully connected layer calculates the confidence score of the first embedding vector. The higher the confidence score, the higher the similarity between the input image and the real image, and the lower the similarity between the input image and the forged infrared face image.

[0077] The second to fourth parallel embedding layers extract features (i.e., image classification information) that distinguish a certain category of forged images from real images. Each parallel fully connected layer calculates a confidence score based on the input embedding vector. A higher confidence score indicates a higher similarity between the input image and the real image, and a lower similarity to a certain category of forged images. For example, the fourth parallel embedding layer extracts features that distinguish a 3D face model image from a real image during the mapping process, obtaining a fourth embedding vector; the fourth parallel fully connected layer calculates the confidence score of the fourth embedding vector. A higher confidence score indicates a higher similarity between the input image and the real image, and a lower similarity to the 3D face model image.

[0078] Example 4

[0079] A deep learning-based liveness detection device includes a memory and a processor. The memory stores instructions adapted for loading and executing the steps described in Embodiments 1 to 3 by the processor.

[0080] The above description is merely an embodiment of the present invention and does not limit the patent scope of the present invention. Any equivalent structural or procedural transformations made based on the content of the present invention's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of the present invention.

Claims

1. A deep learning-based liveness detection method, characterized in that, Includes the following steps: A neural network for liveness detection of infrared face images is constructed. The neural network includes an input layer, several hidden layers and an output layer. The output layer includes several parallel embedding layers and several parallel fully connected layers that correspond one-to-one with each parallel embedding layer. Training the neural network involves: collecting real infrared face images and multiple categories of fake infrared face images as training samples; constructing a loss function LossDense; and iteratively training the neural network using the training samples and the loss function. The loss function LossDense is expressed by the following formula: Where k represents the number of parallel fully connected layers; p represents the confidence level of the output of the parallel fully connected layers; and y represents the label value of the input image of the neural network, which can be 0 or 1. This represents the loss value calculated by comparing the label value of the input image to the neural network with the output confidence of the j-th parallel fully connected layer. The other loss function is lossEmbed. The neural network is trained iteratively by alternating between lossEmbed and LossDense. lossEmbed is expressed by the formula: Let represent the i-th liveness embedding vector, which is an embedding vector obtained by mapping a real infrared face image through a parallel embedding layer; Let represent the i-th inactive embedding vector, which is an embedding vector obtained by mapping a forged infrared face image through a parallel embedding layer; Let ||x||2 represent the center vector obtained from n live embedding vectors; ||x||2 represents the L2 norm of vector x; abs(x) represents the absolute value of vector x. Indicates calculation The loss value between the liveness embedding vector and the center vector; Indicates calculation The loss values ​​between the non-living embedding vector and the center vector; c and d represent constants respectively; An infrared face image to be detected is input into a trained neural network. The input layer and hidden layer map the infrared face image into a feature vector, divide the feature vector into several feature sub-vectors, and input the feature sub-vectors into each parallel embedding layer. Each parallel embedding layer maps each feature sub-vector into an embedding vector and inputs it into the corresponding parallel fully connected layer. The parallel fully connected layer outputs the confidence score of each embedding vector. If each confidence score is greater than a preset first threshold, the infrared face image to be detected passes the liveness detection. The deep learning-based liveness detection method includes: when the difference between the L2 norm of the embedding vectors obtained from a real infrared face image and a fake infrared face image is greater than a preset second threshold, the neural network terminates its iterative training.

2. The deep learning-based liveness detection method according to claim 1, characterized in that, The method for acquiring the infrared face image to be detected is as follows: The first image is acquired using an infrared camera; the human face region in the first image is located using a multi-task cascaded convolutional neural network; if the human face region is successfully located, the human face region is cropped to obtain the infrared face image to be detected.

3. The deep learning-based liveness detection method according to claim 1, characterized in that, The various categories of fake infrared face images include: images containing face pictures, 3D face mask images, and 3D face model images.

4. A deep learning-based liveness detection device, characterized in that, The system includes a memory and a processor, the memory storing instructions adapted to be loaded by the processor and executed as described in any one of claims 1 to 3: A neural network for liveness detection of infrared face images is constructed. The neural network includes an input layer, several hidden layers and an output layer. The output layer includes several parallel embedding layers and several parallel fully connected layers that correspond one-to-one with each parallel embedding layer. Training the neural network involves: collecting real infrared face images and multiple categories of fake infrared face images as training samples; constructing a loss function; and iteratively training the neural network using the training samples and the loss function. An infrared face image to be detected is input into a trained neural network. The input layer and hidden layer map the infrared face image into a feature vector, divide the feature vector into several feature sub-vectors, and input the feature sub-vectors into each parallel embedding layer. Each parallel embedding layer maps each feature sub-vector into an embedding vector and inputs it into the corresponding parallel fully connected layer. The parallel fully connected layer outputs the confidence score of each embedding vector. If each confidence score is greater than a preset first threshold, the infrared face image to be detected passes the liveness detection.

5. A deep learning-based liveness detection device according to claim 4, characterized in that, Also includes: When the difference between the L2 norm of the embedding vectors obtained from the real infrared face image and the fake infrared face image is greater than a preset second threshold, the processor terminates the iterative training of the neural network.

6. A deep learning-based liveness detection device according to claim 5, characterized in that, It also includes infrared cameras; The infrared camera acquires a first image and sends the first image to the processor; the processor uses a multi-task cascaded convolutional neural network to locate the human face region in the first image. If the human face region is successfully located, the human face region is cropped to obtain the infrared face image to be detected.