Methods, apparatus, media and equipment for training convolutional neural networks and face liveness detection
By constructing triple sample sets and training convolutional neural networks with asymmetric loss functions, the problem of low accuracy in face liveness detection under multiple spoofing attacks is solved, and higher detection accuracy is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING TECHSHINO TECHNOLOGY CO LTD
- Filing Date
- 2021-11-11
- Publication Date
- 2026-06-30
AI Technical Summary
In existing technologies, single-model face liveness detection algorithms cannot accommodate multiple spoofing attack methods, leading to difficulties in training and optimization and low accuracy.
A convolutional neural network is trained using a triple sample group, including anchor samples, positive samples, and negative samples. An asymmetric loss function is designed to reduce the distance between anchor samples and positive samples and increase the distance between anchor samples and negative samples. The asymmetry between real people and sculptors is utilized for training.
It improves the accuracy of liveness detection and effectively solves the training and optimization difficulties caused by the complexity of prostheses.
Smart Images

Figure CN116129479B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of face recognition, and in particular to a method, apparatus, medium, and device for training convolutional neural networks and detecting live faces. Background Technology
[0002] With the large-scale application of deep learning in the field of biometrics, the accuracy of image recognition tasks such as facial recognition has been significantly improved. However, at the same time, a large number of hacking techniques have emerged. For example, taking facial recognition systems as an example, some users use methods such as displaying facial videos on mobile phones and iPads, printing facial images, and using 3D masks to attack facial recognition systems, creating security risks.
[0003] Typical single-model face liveness detection algorithms can only effectively identify specific attack methods and cannot generalize to all attack types. A major reason for this is the vast variety of fake images, which are far more complex than real human images. Existing technologies usually only use binary classification convolutional neural networks for liveness detection, treating real human images and fake images as two equivalent categories. However, because fake images come from a wide variety of sources, such as facial videos captured by smart terminals, printed photos, and 3D masks, treating real human images and fake images as two equivalent categories leads to optimization difficulties during the training of binary classification convolutional neural networks. Summary of the Invention
[0004] To address the problems of existing technologies, this invention provides a method, apparatus, medium, and device for training convolutional neural networks and detecting liveness, which effectively solves the training and optimization difficulties caused by complex spoofed samples and improves the accuracy of liveness detection.
[0005] The technical solution provided by this invention is as follows:
[0006] In a first aspect, the present invention provides a method for training a convolutional neural network, wherein the convolutional neural network is used for face liveness detection, the method comprising:
[0007] Several sets of triple samples for training are input into the binary classification convolutional neural network to be trained. Each set of triple samples includes an anchor sample, a positive sample, and a negative sample. The anchor sample and the positive sample are real human samples, and the negative sample is a fake human sample.
[0008] During the forward propagation process, the loss is calculated using the following formula:
[0009] L=max(d(a,p)-d(a,n)+margin,0)
[0010] Where L is the loss, a is the anchor sample, p is the positive sample, n is the negative sample, d(a,p) is the distance between the anchor sample and the positive sample, d(a,n) is the distance between the anchor sample and the negative sample, and margin is a constant greater than 0.
[0011] The backpropagation process is performed based on the loss, adjusting the parameters of the binary classification convolutional neural network to reduce the distance between the anchor sample and the positive sample and increase the distance between the anchor sample and the negative sample; then the process returns to the forward propagation step until the binary classification convolutional neural network training is completed.
[0012] Furthermore, the ternary sample set includes a difficult-to-distinguish sample set and a general sample set, wherein:
[0013] For hard-to-separate sample groups, d(a,n) <d(a,p);
[0014] For a typical sample group, d(a,p) <d(a,n)<d(a,p)+margin。
[0015] Furthermore, the difficult-to-distinguish sample group and the general sample group are obtained through the following method:
[0016] Prepare a set of real human samples and a set of prosthetic samples;
[0017] Anchor samples and positive samples are obtained by randomly selecting samples from a set of real human samples.
[0018] Search for difficult-to-distinguish samples and general samples in the prosthesis sample set. Combine the searched difficult-to-distinguish samples with the extracted anchor samples and positive samples to form a difficult-to-distinguish sample group, and combine the searched general samples with the extracted anchor samples and positive samples to form a general sample group.
[0019] In a second aspect, the present invention provides a convolutional neural network training device, wherein the convolutional neural network is used for face liveness detection, and the device comprises:
[0020] The input module is used to input several triple sample groups for training into the binary classification convolutional neural network to be trained. Each triple sample group includes an anchor sample, a positive sample, and a negative sample. The anchor sample and the positive sample are real human samples, and the negative sample is a fake sample.
[0021] The forward propagation module is used to perform the forward propagation process and calculates the loss using the following formula:
[0022] L=max(d(a,p)-d(a,n)+margin,0)
[0023] Where L is the loss, a is the anchor sample, p is the positive sample, n is the negative sample, d(a,p) is the distance between the anchor sample and the positive sample, d(a,n) is the distance between the anchor sample and the negative sample, and margin is a constant greater than 0.
[0024] The backpropagation module is used to perform a backpropagation process based on the loss, adjust the parameters of the binary classification convolutional neural network so that the distance between the anchor sample and the positive sample decreases and the distance between the anchor sample and the negative sample increases; and return to the step of performing the forward propagation process until the binary classification convolutional neural network training is completed.
[0025] Furthermore, the ternary sample set includes a difficult-to-distinguish sample set and a general sample set, wherein:
[0026] For hard-to-separate sample groups, d(a,n) <d(a,p);
[0027] For a typical sample group, d(a,p) <d(a,n)<d(a,p)+margin。
[0028] Furthermore, the difficult-to-distinguish sample group and the general sample group are obtained through the following process:
[0029] Prepare a set of real human samples and a set of prosthetic samples;
[0030] Anchor samples and positive samples are obtained by randomly selecting samples from a set of real human samples.
[0031] Search for difficult-to-distinguish samples and general samples in the prosthesis sample set. Combine the searched difficult-to-distinguish samples with the extracted anchor samples and positive samples to form a difficult-to-distinguish sample group, and combine the searched general samples with the extracted anchor samples and positive samples to form a general sample group.
[0032] Thirdly, the present invention provides a computer-readable storage medium for training convolutional neural networks, including a memory for storing processor-executable instructions, which, when executed by the processor, implement the steps of the convolutional neural network training method described in the first aspect.
[0033] Fourthly, the present invention provides an apparatus for training a convolutional neural network, comprising at least one processor and a memory storing computer-executable instructions, wherein the processor executes the instructions to implement the steps of the convolutional neural network training method described in the first aspect.
[0034] Fifthly, the present invention provides a method for face liveness detection, the method comprising:
[0035] Face liveness detection is performed using a binary classification convolutional neural network trained using the convolutional neural network training method described in the first aspect.
[0036] Furthermore, the step of performing face liveness detection using a binary classification convolutional neural network trained by the convolutional neural network training method described in the first aspect includes:
[0037] The image to be detected is input into a binary classification convolutional neural network trained by the convolutional neural network training method described in the first aspect, and the image features are output.
[0038] The output image features are input into a classifier to obtain the liveness detection results.
[0039] Furthermore, the classifiers include the sofmax classifier and the SVM classifier.
[0040] Sixthly, the present invention provides a face liveness detection device, the device comprising:
[0041] A liveness detection module is used to perform face liveness detection using a binary classification convolutional neural network trained by the convolutional neural network training device described in the second aspect.
[0042] Furthermore, the liveness detection module is used to: input the image to be detected into a binary classification convolutional neural network trained by the convolutional neural network training device described in the second aspect, and output image features; input the output image features into a classifier to obtain the result of liveness detection.
[0043] Furthermore, the classifiers include the sofmax classifier and the SVM classifier.
[0044] In a seventh aspect, the present invention provides a computer-readable storage medium for face detection, including a memory for storing processor-executable instructions, which, when executed by the processor, implement the steps of the face detection method described in the fifth aspect.
[0045] Eighthly, the present invention provides an apparatus for face detection, comprising at least one processor and a memory storing computer-executable instructions, wherein the processor executes the instructions to implement the steps of the face detection method described in the fifth aspect.
[0046] The present invention has the following beneficial effects:
[0047] This invention addresses the asymmetry between real and fake faces in face liveness detection by designing an asymmetric metric learning method. It constructs a tripartite sample set comprising anchor samples, positive samples, and negative samples. This tripartite sample set is not a classical symmetric construction as in existing technologies; the anchor samples and positive samples are both real samples, while the negative samples are fake samples. Furthermore, an asymmetric loss function is designed based on this asymmetric tripartite sample set. During iterative optimization, this reduces the distance between the anchor sample and the positive sample, while increasing the distance between the anchor sample and the negative sample. This ensures that the intra-class distance of real samples is as close as possible, while the intra-class distance of fake samples does not require strong constraints; only the inter-class distance between real and fake samples needs to be strengthened. This invention more closely approximates the essence of the liveness detection problem, effectively solving the training and optimization difficulties caused by the complexity of fakes, and improving the accuracy of liveness detection. Attached Figure Description
[0048] Figure 1 A classification diagram illustrating the softmax-based loss function of existing technologies;
[0049] Figure 2 This is a flowchart of the convolutional neural network training method of the present invention;
[0050] Figure 3 This is a schematic diagram of the convolutional neural network training device of the present invention.
[0051] Figure 4 This is a flowchart of the face liveness detection method of the present invention;
[0052] Figure 5 This is a schematic diagram of the face detection device of the present invention. Detailed Implementation
[0053] To make the technical problems, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below in conjunction with the accompanying drawings and specific embodiments. Obviously, the described embodiments are only a part of the embodiments of this invention, and not all of them. The components of the embodiments of this invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without inventive effort are within the scope of protection of this invention.
[0054] Example 1:
[0055] During the research process, the inventors of this invention discovered that the softmax-based loss function is used in the training task of binary classification convolutional neural networks. The purpose is to distinguish real samples from fake samples by clustering similar samples and keeping dissimilar samples away. However, such global optimization makes it difficult to take into account the complex features of fake samples during the training process.
[0056] The purpose of using a binary classification convolutional neural network for liveness detection is to group real images together, group fake images together, and keep real and fake images far apart. In other words, the closer the real images are to each other, the better, and the farther apart they are, the better. However, because fake images come from a wide variety of sources, such as facial videos captured by smart devices, printed photos, and 3D masks, it is difficult to truly group fake images together.
[0057] like Figure 1 As shown, Figure 1 This diagram illustrates the use of a sofmax-based loss function for liveness detection during the training of a binary classification convolutional neural network. The letter A represents the set of real human images, and the letter B represents the set of fake images. However, due to the large variety of fake images, the clustering effect is quite scattered, often failing to achieve the same level of clustering as real human images. (Corresponding to...) Figure 1 In the case of A and B, the dividing line is relatively close.
[0058] Based on the above findings, embodiments of the present invention provide a convolutional neural network training method, wherein the convolutional neural network is used for face liveness detection, such as... Figure 2 As shown, the method includes:
[0059] S100: Input several triple sample groups for training into the binary classification convolutional neural network to be trained. Each triple sample group includes an anchor sample, a positive sample, and a negative sample. The anchor sample and the positive sample are real human samples, and the negative sample is a fake human sample.
[0060] The binary classification convolutional neural network of this invention takes a set of triplets as input during training. This set of triplets includes an anchor sample a, a positive sample p, and a negative sample n. Anchor sample a is a real human sample. Positive sample p belongs to the same class as anchor sample a and is also a real human sample. Negative sample n does not belong to the same class as anchor sample a and is a fake sample. Similarity calculation between samples is achieved by optimizing the distance between the anchor sample and the positive sample to be less than the distance between the anchor sample and the negative sample. The final optimization goal is to shorten the distance between anchor sample a and positive sample p, and increase the distance between anchor sample a and negative sample n.
[0061] A tripartite sample set consisting of anchor sample a, positive sample p, and negative sample n can be divided into three categories:
[0062] Easy-to-separate sample group, d(a,p) + margin < d(a,n), where d(a,p) is the distance between the anchor sample and the positive sample, d(a,n) is the distance between the anchor sample and the negative sample, and margin is a constant greater than 0. The distance between the anchor sample a and the positive sample p in the easy-to-separate sample group plus a constant greater than 0 is less than the distance between the anchor sample a and the negative sample n, indicating that the optimization goal has been achieved and no further optimization is required.
[0063] Difficult-to-separate sample group, d(a,n) < d(a,p), indicating that the distance from the anchor sample a to the negative sample n is closer than the distance from the anchor sample a to the positive sample p, and thus requires focused optimization.
[0064] General sample group, d(a,p) < d(a,n) < d(a,p) + margin. The distance from the anchor sample a to the negative sample n is farther than the distance from the anchor sample a to the positive sample p, but it is not yet sufficient to distinguish clearly, indicating that there is separability but it is not obvious, and thus also requires optimization.
[0065] When constructing a triplet sample group, it can be done through the method of the following example:
[0066] S110: Prepare a set of real-person samples and a set of prosthetic samples.
[0067] In this step, first construct a set of real-person samples (a1, a2,..., an, p1, p2,..., pm) and a set of prosthetic samples (n1, n2,..., nm).
[0068] Among them, in the set of real-person samples, a1, a2,..., am represent real-person samples, and p1, p2,..., pm represent samples of the same category as a1, a2,..., am, that is, real-person samples; n1, n2,..., nm are samples of different categories from a1, a2,..., am, that is, prosthetic samples.
[0069] The real-person samples can include live face images collected by multiple users under different environments, different lighting conditions, different sizes, different pose changes, different occlusion situations, different expressions, etc. [[ID=2**]]
[0070] The prosthetic samples can include videos翻拍 by smart terminals, photos翻拍, 3D mask images, and background images, etc. The background images can include background images of different environments, different lighting conditions, and different sizes.
[0071] Compared with real-person image samples, the翻拍 videos,翻拍 photos, 3D mask images, and background images are more complex and it is difficult to gather together as well as real-person images.
[0072] In practice, in an image containing a face region, the image can be segmented into multiple blocks, with the face region used as a real-person sample and the background region used as a prosthetic sample. Alternatively, the image containing the face region can be cropped to obtain the face region and the background region, with the face region used as a real-person sample and the background region used as a prosthetic sample.
[0073] S120: Randomly draw samples in the real-person sample set to obtain anchor samples and positive samples.
[0074] In this step, several pairs of real-person samples (a1, p1), (a2, p2), …, (am, pm) are randomly drawn from the real-person sample set.
[0075] S130: Search for difficult-to-separate samples and general samples in the prosthetic sample set, and form a difficult-to-separate sample group by combining the searched difficult-to-separate samples with the drawn anchor samples and positive samples, and form a general sample group by combining the searched general samples with the drawn anchor samples and positive samples.
[0076] Among them, the difficult-to-separate samples are those where the distance between the prosthetic sample and a1, a2, …, am is less than the distance between a1, a2, …, am and p1, p2, …, pm; the general samples are those where the distance between the prosthetic image and a1, a2, …, am is greater than the distance between a1, a2, …, am and p1, p2, …, pm and less than the sum of the distance between a1, a2, …, am and p1, p2, …, pm and margin.
[0077] Specifically, the distance d(a, p) between the anchor sample and the positive sample can be calculated, and the distance d(a, n) between each prosthetic sample and each anchor sample is calculated by searching one by one in the prosthetic sample set, and it is judged whether the prosthetic sample is a difficult-to-separate sample or a general sample according to d(a, p) and d(a, n).
[0078] For difficult-to-separate samples, d(a, n) < d(a, p), the distance from a to n is closer than the distance from a to p, which needs to be focused on for optimization and solution. For general samples, d(a, p) < d(a, n) < d(a, p) + margin, there is separability, but it is not obvious, and optimization is also needed.
[0079] After constructing the triplet sample group, a binary classification convolutional neural network is constructed, the parameters of the binary classification convolutional neural network are initialized, and the triplet sample group is input into the binary classification convolutional neural network for training.
[0080] S200: Perform the forward propagation process and calculate the loss through the following formula:
[0081] L = max(d(a, p) - d(a, n) + margin, 0)
[0082] Where L is the loss, a is the anchor sample, p is the positive sample, n is the negative sample, d(a,p) is the distance between the anchor sample and the positive sample, d(a,n) is the distance between the anchor sample and the negative sample, and margin is a constant greater than 0.
[0083] During the forward propagation process, features are extracted step by step in the binary classification convolutional neural network after the three sample groups are input into the initial parameters, and the loss is calculated.
[0084] This invention, based on the asymmetry between real and prosthetic samples, constructs a tripartite sample set comprising anchor samples, positive samples, and negative samples. This tripartite sample set is not a classic symmetrical construction as in existing technologies; instead, the anchor samples and positive samples are both real samples, and the negative samples are prosthetic samples. Furthermore, an asymmetric loss function is designed based on this asymmetric tripartite sample set. During training, the intra-class distance of real samples is maximized, while the intra-class distance of prosthetic samples does not require strong constraints; only the inter-class distance between real and prosthetic samples needs to be strengthened. This approach more closely approximates the essence of the liveness detection problem and effectively solves the training optimization difficulties caused by the complexity of prosthetics.
[0085] S300: Perform backpropagation based on the loss, adjust the parameters of the binary classification convolutional neural network to reduce the distance between the anchor sample and the positive sample, and increase the distance between the anchor sample and the negative sample; then return to S200, until the binary classification convolutional neural network training is complete.
[0086] This step performs backpropagation based on the loss, iteratively optimizing using stochastic gradient descent to reduce the distance between anchor samples and positive samples, and to minimize the intra-class distance between real samples. For fake samples, the intra-class distance does not require strong constraints; it is sufficient to increase the distance between anchor samples and negative samples, i.e., increase the inter-class distance between real samples and image samples. This process continues until the set number of iterations is reached (e.g., 10,000), or until the loss is less than a set value, completing the training of the binary classification convolutional neural network.
[0087] This invention addresses the asymmetry between real and fake faces in face liveness detection by designing an asymmetric metric learning method. It constructs a tripartite sample set comprising anchor samples, positive samples, and negative samples. This tripartite sample set is not a classical symmetric construction as in existing technologies; the anchor samples and positive samples are both real samples, while the negative samples are fake samples. Furthermore, an asymmetric loss function is designed based on this asymmetric tripartite sample set. During iterative optimization, this reduces the distance between the anchor sample and the positive sample, while increasing the distance between the anchor sample and the negative sample. This ensures that the intra-class distance of real samples is as close as possible, while the intra-class distance of fake samples does not require strong constraints; only the inter-class distance between real and fake samples needs to be strengthened. This invention more closely approximates the essence of the liveness detection problem, effectively solving the training and optimization difficulties caused by the complexity of fakes, and improving the accuracy of liveness detection.
[0088] Example 2:
[0089] This invention provides a convolutional neural network training device, wherein the convolutional neural network is used for face liveness detection, such as... Figure 3 As shown, the device includes:
[0090] The input module 100 is used to input a number of triple sample groups for training into the binary classification convolutional neural network to be trained. Each triple sample group includes an anchor sample, a positive sample, and a negative sample. The anchor sample and the positive sample are real human samples, and the negative sample is a fake human sample.
[0091] Forward propagation module 200 is used to perform the forward propagation process and calculates the loss using the following formula:
[0092] L=max(d(a,p)-d(a,n)+margin,0)
[0093] Where L is the loss, a is the anchor sample, p is the positive sample, n is the negative sample, d(a,p) is the distance between the anchor sample and the positive sample, d(a,n) is the distance between the anchor sample and the negative sample, and margin is a constant greater than 0.
[0094] The backpropagation module 300 is used to perform a backpropagation process based on the loss, adjust the parameters of the binary classification convolutional neural network so that the distance between the anchor sample and the positive sample decreases and the distance between the anchor sample and the negative sample increases; and return to the step of performing the forward propagation process until the binary classification convolutional neural network training is completed.
[0095] This invention addresses the asymmetry between real and fake faces in face liveness detection by designing an asymmetric metric learning method. It constructs a tripartite sample set comprising anchor samples, positive samples, and negative samples. This tripartite sample set is not a classical symmetric construction as in existing technologies; the anchor samples and positive samples are both real samples, while the negative samples are fake samples. Furthermore, an asymmetric loss function is designed based on this asymmetric tripartite sample set. During iterative optimization, this reduces the distance between the anchor sample and the positive sample, while increasing the distance between the anchor sample and the negative sample. This ensures that the intra-class distance of real samples is as close as possible, while the intra-class distance of fake samples does not require strong constraints; only the inter-class distance between real and fake samples needs to be strengthened. This invention more closely approximates the essence of the liveness detection problem, effectively solving the training and optimization difficulties caused by the complexity of fakes, and improving the accuracy of liveness detection.
[0096] In one implementation of the present invention, the ternary sample set includes a difficult-to-distinguish sample set and a general sample set, wherein:
[0097] For hard-to-separate sample groups, d(a,n) <d(a,p);
[0098] For a typical sample group, d(a,p) <d(a,n)<d(a,p)+margin。
[0099] The difficult-to-distinguish sample group and the general sample group are obtained through the following process:
[0100] Prepare a set of real human samples and a set of prosthetic samples.
[0101] Anchor samples and positive samples are obtained by randomly selecting samples from a set of real human samples.
[0102] Search for difficult-to-distinguish samples and general samples in the prosthesis sample set. Combine the searched difficult-to-distinguish samples with the extracted anchor samples and positive samples to form a difficult-to-distinguish sample group, and combine the searched general samples with the extracted anchor samples and positive samples to form a general sample group.
[0103] The device provided in this embodiment of the invention has the same implementation principle and technical effects as the aforementioned method embodiment 1. For the sake of brevity, any parts not mentioned in this device embodiment can be referred to the corresponding content in the aforementioned method embodiment 1. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the aforementioned device and unit can all be referred to the corresponding processes in the aforementioned method embodiment 1, and will not be repeated here.
[0104] Example 3:
[0105] The method described in Embodiment 1 of this invention can implement business logic through a computer program and record it on a storage medium. This storage medium can be read and executed by a computer, achieving the effects of the scheme described in Embodiment 1 of this specification. Therefore, this invention also provides a computer-readable storage medium for training convolutional neural networks, including a memory for storing processor-executable instructions. When these instructions are executed by the processor, they implement the steps of the convolutional neural network training method of Embodiment 1.
[0106] This invention addresses the asymmetry between real and fake faces in face liveness detection by designing an asymmetric metric learning method. It constructs a tripartite sample set comprising anchor samples, positive samples, and negative samples. This tripartite sample set is not a classical symmetric construction as in existing technologies; the anchor samples and positive samples are both real samples, while the negative samples are fake samples. Furthermore, an asymmetric loss function is designed based on this asymmetric tripartite sample set. During iterative optimization, this reduces the distance between the anchor sample and the positive sample, while increasing the distance between the anchor sample and the negative sample. This ensures that the intra-class distance of real samples is as close as possible, while the intra-class distance of fake samples does not require strong constraints; only the inter-class distance between real and fake samples needs to be strengthened. This invention more closely approximates the essence of the liveness detection problem, effectively solving the training and optimization difficulties caused by the complexity of fakes, and improving the accuracy of liveness detection.
[0107] The storage medium may include a physical device for storing information, typically digitizing the information and then storing it using electrical, magnetic, or optical methods. The storage medium may include: devices that store information using electrical energy, such as various types of memory, like RAM and ROM; devices that store information using magnetic energy, such as hard disks, floppy disks, magnetic tapes, magnetic core memory, bubble memory, and USB flash drives; and devices that store information using optical methods, such as CDs or DVDs. Of course, there are other readable storage media, such as quantum memories and graphene memories.
[0108] The storage medium described above may also include other implementation methods according to the description of method embodiment 1. The implementation principle and technical effects of this embodiment are the same as those of the aforementioned method embodiment 1. For details, please refer to the description of the relevant method embodiment 1, which will not be repeated here.
[0109] Example 4:
[0110] The present invention also provides an apparatus for training convolutional neural networks. The apparatus may be a standalone computer, or it may include an actual operating device that uses one or more of the methods or embodiments described in this specification. The convolutional neural network training apparatus may include at least one processor and a memory storing computer-executable instructions. When the processor executes the instructions, it implements the steps of any one or more of the convolutional neural network training methods described in Embodiment 1.
[0111] This invention addresses the asymmetry between real and fake faces in face liveness detection by designing an asymmetric metric learning method. It constructs a tripartite sample set comprising anchor samples, positive samples, and negative samples. This tripartite sample set is not a classical symmetric construction as in existing technologies; the anchor samples and positive samples are both real samples, while the negative samples are fake samples. Furthermore, an asymmetric loss function is designed based on this asymmetric tripartite sample set. During iterative optimization, this reduces the distance between the anchor sample and the positive sample, while increasing the distance between the anchor sample and the negative sample. This ensures that the intra-class distance of real samples is as close as possible, while the intra-class distance of fake samples does not require strong constraints; only the inter-class distance between real and fake samples needs to be strengthened. This invention more closely approximates the essence of the liveness detection problem, effectively solving the training and optimization difficulties caused by the complexity of fakes, and improving the accuracy of liveness detection.
[0112] The device described above may include other implementation methods according to the description of method embodiment 1. The implementation principle and technical effects of this embodiment are the same as those of the aforementioned method embodiment 1. For details, please refer to the description of the relevant method embodiment 1, which will not be repeated here.
[0113] Example 5:
[0114] This invention provides a method for face liveness detection, such as... Figure 4 As shown, the method includes:
[0115] S10: Perform face liveness detection using a binary classification convolutional neural network trained by the convolutional neural network training method described in Example 1.
[0116] The aforementioned Example 1 describes the process of training a convolutional neural network, learning from training samples to optimize the parameters of the binary classification convolutional neural network. This example describes the process of using the trained binary classification convolutional neural network for inference, i.e., performing face liveness detection to determine whether a face is real or fake.
[0117] One specific implementation of face liveness detection is as follows:
[0118] The image to be detected is input into a binary classification convolutional neural network trained by the convolutional neural network training method described in Example 1, and the image features are output.
[0119] The output image features are input into a classifier to obtain the liveness detection results.
[0120] The classifiers mentioned include the sofmax classifier and the SVM classifier.
[0121] This embodiment uses the convolutional neural network training method described in Embodiment 1 to train the convolutional neural network. It includes all the contents of the aforementioned Embodiment 1. For the sake of brevity, any parts not mentioned in this embodiment can be referred to the corresponding contents in the aforementioned method Embodiment 1, and will not be repeated here.
[0122] Example 6:
[0123] This invention provides a face liveness detection device, such as... Figure 5 As shown, the device includes:
[0124] The liveness detection module 10 is used to perform face liveness detection using a binary classification convolutional neural network trained by the convolutional neural network training device described in Example 2.
[0125] In one example, the liveness detection module is used to: input the image to be detected into a binary classification convolutional neural network trained by the convolutional neural network training device described in Example 2, and output image features; input the output image features into a classifier to obtain the result of liveness detection.
[0126] The classifiers include the sofmax classifier and the SVM classifier.
[0127] This embodiment uses the convolutional neural network training device described in Embodiment 2 to train the convolutional neural network. It includes all the contents of Embodiment 2. For the sake of brevity, any parts not mentioned in this embodiment can be referred to the corresponding contents in the aforementioned method embodiment 2, and will not be repeated here.
[0128] Example 7:
[0129] The method described in Embodiment 5 of this invention can implement business logic through a computer program and record it on a storage medium. This storage medium can be read and executed by a computer, achieving the effects of the scheme described in Embodiment 5 of this specification. Therefore, this invention also provides a computer-readable storage medium for face liveness detection, including a memory for storing processor-executable instructions. When executed by a processor, the instructions implement the steps of the face liveness detection method of Embodiment 5.
[0130] Example 8:
[0131] The present invention also provides a device for face liveness detection. The device may be a standalone computer, or it may include an actual operating device that uses one or more of the methods or embodiments described in this specification. The face liveness detection device may include at least one processor and a memory storing computer-executable instructions. When the processor executes the instructions, it implements the steps of any one or more of the face liveness detection methods described in Embodiment 5.
[0132] Finally, it should be noted that the above-described embodiments are merely specific implementations of the present invention, used to illustrate the technical solutions of the present invention, and not to limit it. The scope of protection of the present invention is not limited thereto. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that any person skilled in the art can still modify or easily conceive of changes to the technical solutions described in the foregoing embodiments within the scope of the technology disclosed in the present invention, or make equivalent substitutions for some of the technical features; and these modifications, changes, or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention. All should be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A method for training a convolutional neural network, characterized in that, The convolutional neural network is used for face liveness detection, and the method includes: Several sets of triple samples for training are input into the binary classification convolutional neural network to be trained. Each set of triple samples includes an anchor sample, a positive sample, and a negative sample. The anchor sample and the positive sample are real human samples, and the negative sample is a fake human sample. During the forward propagation process, the loss is calculated using the following formula: Where L is the loss, a is the anchor sample, p is the positive sample, n is the negative sample, d(a, p) is the distance between the anchor sample and the positive sample, d(a, n) is the distance between the anchor sample and the negative sample, and margin is a constant greater than 0. The backpropagation process is performed based on the loss, adjusting the parameters of the binary classification convolutional neural network so that the distance between the anchor sample and the positive sample decreases, the distance between the anchor sample and the negative sample increases, and the intra-class distance of the negative sample does not require strong constraints; and then the process returns to the step of performing the forward propagation process until the binary classification convolutional neural network training is completed. The ternary sample set includes a difficult-to-distinguish sample set and a general sample set, wherein: For difficult-to-distinguish sample groups ; For a typical sample group ; The difficult-to-distinguish sample group and the general sample group were obtained through the following method: Prepare a set of real human samples and a set of prosthetic samples; Anchor samples and positive samples are obtained by randomly selecting samples from a set of real human samples. Search for difficult-to-distinguish samples and general samples in the prosthesis sample set. Combine the searched difficult-to-distinguish samples with the extracted anchor samples and positive samples to form a difficult-to-distinguish sample group, and combine the searched general samples with the extracted anchor samples and positive samples to form a general sample group.
2. A method for face liveness detection, characterized in that, The method includes: Face liveness detection is performed using a binary classification convolutional neural network trained using the convolutional neural network training method described in claim 1.
3. The face liveness detection method according to claim 2, characterized in that, The method of using a binary classification convolutional neural network trained according to the convolutional neural network training method of claim 1 for face liveness detection includes: The image to be detected is input into a binary classification convolutional neural network trained by the convolutional neural network training method described in claim 1, and the image features are output. The output image features are input into a classifier to obtain the liveness detection results.
4. The face liveness detection method according to claim 3, characterized in that, The classifiers include the sofmax classifier and the SVM classifier.
5. A convolutional neural network training device, characterized in that, The convolutional neural network is used for face liveness detection, and the device includes: The input module is used to input several triple sample groups for training into the binary classification convolutional neural network to be trained. Each triple sample group includes an anchor sample, a positive sample, and a negative sample. The anchor sample and the positive sample are real human samples, and the negative sample is a fake sample. The forward propagation module is used to perform the forward propagation process and calculates the loss using the following formula: Where L is the loss, a is the anchor sample, p is the positive sample, n is the negative sample, d(a, p) is the distance between the anchor sample and the positive sample, d(a, n) is the distance between the anchor sample and the negative sample, and margin is a constant greater than 0. The backpropagation module is used to perform a backpropagation process based on the loss, adjust the parameters of the binary classification convolutional neural network so that the distance between the anchor sample and the positive sample decreases, the distance between the anchor sample and the negative sample increases, and the intra-class distance of the negative sample does not require strong constraints; and return to the step of performing the forward propagation process until the binary classification convolutional neural network training is completed. The ternary sample set includes a difficult-to-distinguish sample set and a general sample set, wherein: For difficult-to-distinguish sample groups ; For a typical sample group ; The difficult-to-distinguish sample group and the general sample group are obtained through the following process: Prepare a set of real human samples and a set of prosthetic samples; Anchor samples and positive samples are obtained by randomly selecting samples from a set of real human samples. Search for difficult-to-distinguish samples and general samples in the prosthesis sample set. Combine the searched difficult-to-distinguish samples with the extracted anchor samples and positive samples to form a difficult-to-distinguish sample group, and combine the searched general samples with the extracted anchor samples and positive samples to form a general sample group.
6. A face liveness detection device, characterized in that, The device includes: A liveness detection module is used to perform face liveness detection using a binary classification convolutional neural network trained by the convolutional neural network training device described in claim 5.
7. A computer-readable storage medium for training convolutional neural networks, characterized in that, It includes a memory for storing processor-executable instructions, which, when executed by the processor, implement the steps of the convolutional neural network training method of claim 1.
8. An apparatus for training a convolutional neural network, characterized in that, It includes at least one processor and a memory storing computer-executable instructions, wherein the processor executes the instructions to implement the steps of the convolutional neural network training method of claim 1.