Method, system and device for detecting living body
By training a liveness detection model in a face recognition system using perturbed image samples combined with knowledge distillation, the problems of liveness attacks and privacy leaks are solved, the model's sensitivity to perturbations and feature extraction capabilities are improved, and the liveness detection capabilities of terminal devices are enhanced.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
- Filing Date
- 2023-05-09
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies are insufficient to effectively address liveness attacks and privacy breaches in facial recognition systems, especially in scenarios such as facial recognition payment and access control, where the risks of liveness attacks and privacy breaches are high.
A liveness detection model is trained by combining perturbed image samples with knowledge distillation. The pre-set liveness detection teacher model is trained by introducing perturbed image samples to obtain the liveness detection teacher model, and the pre-set liveness detection student model is subjected to knowledge distillation to generate a liveness detection model with high sensitivity.
The liveness detection model has improved its sensitivity to various disturbances and enhanced its feature extraction capabilities, effectively addressing liveness attacks and privacy breaches, and improving the liveness detection performance of terminal detection devices.
Smart Images

Figure CN116469181B_ABST
Abstract
Description
Technical Field
[0001] This specification relates to the field of biometric technology, and in particular to a training method for a liveness detection model, a liveness detection method, and a system. Background Technology
[0002] With the widespread application of facial recognition technology (such as facial recognition payment, facial recognition access control, and facial recognition station entry), the security issues of facial recognition technology have received increasing attention. Among various security issues, liveness detection attacks and privacy leaks are two major challenges. Liveness detection attacks directly threaten the security of facial recognition systems and are the most common and threatening security risk they have long faced. Privacy leaks occur during the transmission and acquisition of user images, and in the long run, such leaks can pose risks to users' financial security.
[0003] Therefore, there is a need to provide a training method, a liveness detection method, and a system for a liveness detection model that can simultaneously address liveness attacks and privacy breaches. Summary of the Invention
[0004] The main purpose of this specification is to provide a training method, a liveness detection method, and a system for a liveness detection model.
[0005] In a first aspect, this specification provides a training method for a liveness detection model, comprising: obtaining a first image sample set, the first image sample set including N first initial image samples and M first perturbation image samples, wherein the M first perturbation image samples are obtained by perturbing the N first initial image samples, and M and N are both natural numbers greater than 1; training a preset liveness detection teacher model based on the first image sample set to obtain a liveness detection teacher model; and performing knowledge distillation on a preset liveness detection student model based on the first image sample set and the liveness detection teacher model to obtain a liveness detection model.
[0006] In some embodiments, the M first perturbation image samples are obtained by perturbing the N first initial image samples using a sample perturbation model; and the training method of the sample perturbation model includes: based on a preset sample perturbation model, determining at least one first perturbation image sample corresponding to each of the N first initial image samples, determining at least one perturbation feature loss information between the at least one first perturbation image sample and the current first initial image sample, and converging the preset sample perturbation model based on multiple perturbation feature loss information corresponding to the N first initial image samples to obtain the sample perturbation model.
[0007] In some embodiments, determining at least one perturbation feature loss information between the at least one first perturbation image sample and the current first initial image sample includes: determining at least one perturbation feature difference information between the at least one first perturbation image sample and the current first initial image sample; obtaining a preset perturbation feature difference threshold; and determining the at least one perturbation feature loss information based on the at least one preset perturbation feature difference threshold and the perturbation feature difference information.
[0008] In some embodiments, training a preset liveness detection teacher model based on the first image sample set to obtain a liveness detection teacher model includes: for each first image sample in the first image sample set: inputting the current first image sample into the preset liveness detection teacher model to obtain a first liveness detection result and a first perturbation detection result; determining teacher detection loss information corresponding to the current first image sample based on the first liveness detection result and the first perturbation detection result; and converging the preset liveness detection teacher model based on multiple teacher detection loss information corresponding to the first image sample set to obtain the liveness detection teacher model.
[0009] In some embodiments, the preset liveness detection teacher model includes a preset liveness feature extraction teacher network, a preset liveness detection teacher network, and a preset perturbation perception teacher network; and the step of inputting the current first image sample into the preset liveness detection teacher model to obtain a first liveness detection result and a first perturbation detection result includes: inputting the current first image sample into the preset liveness feature extraction teacher network to obtain a first liveness feature map, inputting the first liveness feature map into the preset liveness detection teacher network to obtain the first liveness detection result, and inputting the first liveness feature map into the preset perturbation perception teacher network to obtain the first perturbation detection result.
[0010] In some embodiments, determining the teacher detection loss information corresponding to the current first image sample based on the first liveness detection result and the first perturbation detection result includes: obtaining first liveness annotation information and first perturbation annotation information of the current first image sample; determining first liveness classification loss information corresponding to the current first image sample based on the first liveness detection result and the first liveness annotation information; and determining first perturbation perception loss information corresponding to the current first image sample based on the first perturbation detection result and the first perturbation annotation information, wherein the teacher detection loss information includes the first liveness classification loss information and the first perturbation perception loss information.
[0011] In some embodiments, the first disturbance detection result includes a first disturbance type prediction result and a first disturbance intensity prediction result, and the first disturbance annotation information includes a first disturbance annotation type and a first disturbance annotation intensity; and the step of determining the first disturbance perception loss information of the preset liveness detection teacher model based on the first disturbance detection result and the first disturbance annotation information includes: determining the first disturbance type loss information corresponding to the current first image sample based on the first disturbance type prediction result and the first disturbance annotation type, and determining the first disturbance intensity loss information corresponding to the current first image sample based on the first disturbance intensity prediction result and the first disturbance annotation intensity, wherein the first disturbance perception loss information includes the first disturbance type loss information and the first disturbance intensity loss information.
[0012] In some embodiments, the step of performing knowledge distillation on a preset liveness detection student model based on the first image sample set and the liveness detection teacher model to obtain a liveness detection model includes: for each first image sample in the first image sample set: inputting the current first image sample into the liveness detection teacher model to obtain a teacher detection result, and inputting the current first image sample into the preset liveness detection student model to obtain a student detection result; determining student detection loss information corresponding to the current first image sample based on the first liveness annotation information of the current first image sample, the teacher detection result, and the student detection result; and converging the preset liveness detection student model based on multiple student detection loss information corresponding to the first image sample set to obtain the liveness detection model.
[0013] In some embodiments, the liveness detection teacher model includes a liveness feature extraction teacher network and a perturbation perception teacher network; and the step of inputting the current first image sample into the liveness detection teacher model to obtain a teacher detection result includes: inputting the current first image sample into the liveness feature extraction teacher network to obtain a second liveness feature map, and inputting the second liveness feature map into the perturbation perception teacher network to obtain a second perturbation detection result, wherein the teacher detection result includes the second liveness feature map and the second perturbation detection result.
[0014] In some embodiments, the preset liveness detection student model includes a preset liveness feature extraction student network, a preset liveness detection student network, and a preset perturbation perception student network; and the step of inputting the current first image sample into the preset liveness detection student model to obtain a student detection result includes: inputting the current first image sample into the preset liveness feature extraction student network to obtain a third liveness feature map, inputting the third liveness feature map into the preset liveness detection student network to obtain a second liveness detection result, and inputting the third liveness feature map into the preset perturbation perception student network to obtain a third perturbation detection result, wherein the student detection result includes the third liveness feature map, the second liveness detection result, and the third perturbation detection result.
[0015] In some embodiments, determining the student detection loss information corresponding to the current first image sample based on the first liveness annotation information of the current first image sample, the teacher detection result, and the student detection result includes: determining the feature distillation loss information corresponding to the current first image sample based on the second liveness feature map and the third liveness feature map; determining the second liveness classification loss information corresponding to the current first image sample based on the first liveness annotation information and the second liveness detection result; and determining the second perturbation perception loss information corresponding to the current first image sample based on the second perturbation detection result and the third perturbation detection result, wherein the student detection loss information includes the feature distillation loss information, the second liveness classification loss information, and the second perturbation perception loss information.
[0016] In some embodiments, the second disturbance detection result includes a second disturbance type prediction result and a second disturbance intensity prediction result, and the third disturbance detection result includes a third disturbance type prediction result and a third disturbance intensity prediction result; and determining the second disturbance perception loss information corresponding to the current first image sample based on the second disturbance detection result and the third disturbance detection result includes: determining the second disturbance type loss information corresponding to the current first image sample based on the second disturbance type prediction result and the third disturbance type prediction result, and determining the second disturbance intensity loss information corresponding to the current first image sample based on the second disturbance intensity prediction result and the third disturbance intensity prediction result, wherein the second disturbance perception loss information includes the second disturbance type loss information and the second disturbance intensity loss information.
[0017] In some embodiments, the training method of the liveness detection model further includes: obtaining a second image sample set, the second image sample set including L second image samples, where L is a natural number greater than 1; and training a preset hierarchical cropping model based on the second image sample set and the liveness detection model to obtain a hierarchical cropping model, wherein the hierarchical cropping model is configured to: dynamically crop the number of network layers of the liveness detection model according to the input image.
[0018] In some embodiments, the liveness detection model includes K initial network layers, where K is a natural number greater than 1; and the step of training a preset hierarchical cropping model based on the second image sample set and the liveness detection model to obtain a hierarchical cropping model includes: for each of the L second image samples: inputting the current second image sample into the preset hierarchical cropping model to obtain the selection probability of each of the K initial network layers; pruning the liveness detection model based on the selection probability to obtain a liveness detection pruning model; inputting the current second image sample into the liveness detection pruning model to obtain a third liveness detection result; determining the pruning loss information corresponding to the current second image sample based on the selection probability and the third liveness detection result; and converging the preset hierarchical cropping model based on the L pruning loss information corresponding to the L second image samples to obtain the hierarchical cropping model.
[0019] In some embodiments, determining the pruning loss information corresponding to the current second image sample based on the selection probability and the third liveness detection result includes: obtaining a preset probability sparse threshold and second liveness annotation information of the current second image sample; determining the pruning probability sparse loss information corresponding to the current second image sample based on the selection probability and the preset probability sparse threshold; and determining the pruning liveness classification loss information corresponding to the current second image sample based on the third liveness detection result and the second liveness annotation information, wherein the pruning loss information includes the pruning probability sparse loss information and the pruning liveness classification loss information.
[0020] Secondly, this specification provides a training system for a liveness detection model, comprising: at least one storage medium storing at least one instruction set for training the liveness detection model; and at least one processor communicatively connected to the at least one storage medium, wherein, when the training system for the liveness detection model is running, the at least one processor reads the at least one instruction set and executes the training method for the liveness detection model as described in any of the preceding claims according to the instructions of the at least one instruction set.
[0021] Thirdly, this specification provides a liveness detection method applied to a terminal detection device, comprising: obtaining a target image of a target object; performing liveness detection on the target object using a liveness detection model based on the target image to obtain a liveness detection result, wherein the liveness detection model is trained using a liveness detection model training method; and outputting the liveness detection result.
[0022] In some embodiments, the step of performing liveness detection on the target object based on the target image using a liveness detection model to obtain a liveness detection result includes: pruning the liveness detection model using a hierarchical pruning model based on the target image to obtain a liveness detection pruning model; and inputting the target image into the liveness detection pruning model to obtain the liveness detection result of the target object.
[0023] In some embodiments, the liveness detection model includes K initial network layers, where K is a natural number greater than 1; and the step of pruning the liveness detection model using a hierarchical pruning model based on the target image to obtain a liveness detection pruning model includes: inputting the target image into the hierarchical pruning model to obtain the selection probability of each of the K initial network layers, and determining S initial network layers that satisfy preset conditions based on the selection probabilities, where S is a natural number greater than 1 and S is less than K, and the liveness detection pruning model includes the S initial network layers.
[0024] In some embodiments, the preset condition includes: the selection probability is greater than or equal to a preset probability threshold.
[0025] Fourthly, this specification provides a liveness detection system, comprising: at least one storage medium including at least one instruction set for implementing and analyzing a liveness detection method; and at least one processor communicatively connected to the at least one storage medium, wherein, when the system is running, the at least one processor reads the at least one instruction set and executes the liveness detection method as described above according to the instructions of the at least one instruction set.
[0026] As can be seen from the above technical solutions, this specification provides a training method for a liveness detection model, a liveness detection method, and a system for executing the above methods. The method and system employ a combination of perturbation and knowledge distillation to train a liveness detection model that can be deployed on a terminal detection device. During the training of preset liveness detection teacher and student models, perturbed image samples are introduced for liveness detection. Compared to traditional methods that train models based solely on liveness detection results, this provides more reference information for updating network parameters during model training. This results in a liveness detection model with higher sensitivity to various perturbations and stronger feature extraction capabilities from input image samples, thereby improving liveness detection performance. Deploying the liveness detection model trained using the methods and systems provided in this specification on a terminal detection device can simultaneously address the issues of liveness attacks and privacy leaks.
[0027] The training methods for the liveness detection model, the liveness detection methods, and other functions of the system provided in this specification will be partially listed in the following description. The figures and examples described below will be readily apparent to those skilled in the art. The inventive aspects of the training methods for the liveness detection model, the liveness detection methods, and the system provided in this specification can be fully explained through practice or use of the methods, apparatus, and combinations described in the detailed examples below. Attached Figure Description
[0028] To more clearly illustrate the technical solutions in the embodiments of this specification, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this specification. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0029] Figure 1 The diagram illustrates a training method for a liveness detection model according to some embodiments of this specification, as well as a schematic diagram of an application scenario of a system corresponding to the liveness detection method.
[0030] Figure 2 A schematic diagram of the structure of a computing device provided according to some embodiments of this specification is shown;
[0031] Figure 3 A flowchart illustrating a method for training a liveness detection model according to some embodiments of this specification is shown.
[0032] Figure 4 This diagram illustrates the data flow in a preset liveness detection teacher model provided according to some embodiments of this specification;
[0033] Figure 5This diagram illustrates the data flow during a knowledge distillation process according to some embodiments provided in this specification;
[0034] Figure 6 A flowchart illustrating a method for training a liveness detection model according to some embodiments of this specification is shown; and
[0035] Figure 7 A flowchart of a liveness detection method according to some embodiments of this specification is shown. Detailed Implementation
[0036] The following description provides specific application scenarios and requirements for this specification, intended to enable those skilled in the art to make and use the contents of this specification. Various partial modifications to the disclosed embodiments will be apparent to those skilled in the art, and the general principles defined herein can be applied to other embodiments and applications without departing from the spirit and scope of this specification. Therefore, this specification is not limited to the embodiments shown, but rather to the widest scope consistent with the claims.
[0037] The terminology used herein is for the purpose of describing particular exemplary embodiments only and is not restrictive. For example, unless the context clearly indicates otherwise, the singular forms “a,” “an,” and “the” used herein may also include the plural forms. When used in this specification, the terms “comprising,” “including,” and / or “containing” mean that the associated integers, steps, operations, elements, and / or components are present, but do not exclude the presence of one or more other features, integers, steps, operations, elements, components, and / or groups, or that other features, integers, steps, operations, elements, components, and / or groups may be added to the system / method.
[0038] Considering the following description, these and other features of this specification, as well as the operation and function of the related components of the structure, and the economy of assembly and manufacture of the parts, can be significantly improved. All of these form part of this specification with reference to the accompanying drawings. However, it should be clearly understood that the drawings are for illustrative and descriptive purposes only and are not intended to limit the scope of this specification. It should also be understood that the drawings are not drawn to scale.
[0039] The flowcharts used in this specification illustrate operations implemented according to some embodiments of this specification. It should be clearly understood that the operations in the flowcharts may not be implemented in a sequential order. Instead, the operations may be implemented in reverse order or simultaneously. Furthermore, one or more additional operations may be added to the flowcharts. One or more operations may be removed from the flowcharts.
[0040] For ease of description, the terms that will appear in the following descriptions will be explained as follows.
[0041] Knowledge distillation: A model compression method based on the Teacher-Student framework, with the aim of transferring the knowledge learned from a large model or multiple model ensembles to another lightweight model.
[0042] Mixup: An unconventional data augmentation method that uses linear interpolation to construct new training samples and labels. Mixup is a class-mixing augmentation algorithm that can combine images from different classes to expand the training dataset.
[0043] Structured pruning: This typically involves pruning filters or entire network layers as the basic unit. When a filter is pruned, its preceding and following feature maps will change accordingly, but the model's structure remains intact, allowing for acceleration via GPUs or other hardware.
[0044] Dynamic pruning: This involves dynamically pruning the model during training based on its performance. For example, pruning can be performed in smaller batches to fully utilize data noise and achieve better generalization performance. Dynamic pruning methods retain all weight parameters, assessing their importance based on the different input data each time, and ignoring unimportant weight parameters during calculation, thus achieving dynamic pruning.
[0045] Figure 1 The diagram illustrates a training method for a liveness detection model according to some embodiments of this specification, and a schematic diagram of an application scenario for a corresponding system 100 for the liveness detection method. The training method for the liveness detection model and the corresponding system 100 (hereinafter referred to as system 100) may include a training terminal 110, a terminal detection device 120, an integrated development platform server 130, and a database 150.
[0046] The training terminal 110 may store data or instructions for executing the training method of the liveness detection model described in this specification, and may execute or be used to execute the data or instructions for executing the training method of the liveness detection model. The training terminal 110 may include hardware devices with data processing capabilities and the necessary programs to drive the hardware devices. The training terminal 110 may include at least one mobile terminal device, which may be a computer or similar device. The training terminal 110 may also include an application program on at least one mobile terminal device. Multiple developers can perform program development or train the liveness detection model on the training terminal 110.
[0047] In this specification, the training terminal 110 can communicate with the integrated development platform server 130. The integrated development platform server 130 (hereinafter referred to as server 130) is equipped with an integrated development platform 140. The integrated development platform 140, also known as an integrated development environment (IDE), is an application program used to provide a program development environment, generally including tools such as a code editor, compiler, debugger, and graphical user interface. Developers can write program code (i.e., program development) on the integrated development platform 140 through the training terminal 110. The server 130 can be a computing device on the integrated development platform 140 specifically used to process the training method of the liveness detection model. In this specification, multiple developers can perform program development on the integrated development platform 140 through the training terminal 110. The server 130 can store data or instructions for executing the training method of the liveness detection model described in this specification, and can execute or be used to execute said data and / or instructions. The server 130 may include hardware devices with data processing capabilities and the necessary programs required to drive the hardware devices. Of course, server 130 may simply be a hardware device with data processing capabilities, or simply a program running on the hardware device. In some embodiments, server 130 may also be deployed as a plug-in on training terminal 110.
[0048] Database 150 may store data and / or instructions. In some embodiments, database 150 may store data and / or instructions executed by server 130 or used to execute training methods for the liveness detection model described herein. Training terminal 110 and server 130 may have access to database 150, and training terminal 110 and server 130 may access data or instructions stored in database 150 via a network. In some embodiments, database 150 may be directly connected to training terminal 110 and server 130. In some embodiments, database 110 may be part of server 130. In some embodiments, database 150 may include mass storage, removable storage, volatile read-write memory, read-only memory (ROM), or similar content, or any combination thereof. Exemplary mass storage may include non-transitory storage media such as disks, optical discs, and solid-state drives. Exemplary removable storage may include flash drives, floppy disks, optical discs, memory cards, zip disks, magnetic tapes, etc. Typical volatile read-write memory may include random access memory (RAM). Example RAMs may include dynamic RAM (DRAM), dual date rate synchronous dynamic RAM (DDRSDRAM), static RAM (SRAM), thyristor RAM (T-RAM), and zero-capacitance RAM (Z-RAM), etc. Exemplary ROMs may include mask ROM (MROM), programmable ROM (PROM), virtual programmable ROM (PEROM), electronically programmable ROM (EEPROM), optical disc (CD-ROM), and digital multifunction disk ROM, etc.
[0049] It should be understood that Figure 1 The number of training terminals 110 and servers 130 shown is merely illustrative. Depending on implementation needs, there can be any number of training terminals 110 and servers 130.
[0050] It should be noted that the training method of the liveness detection model can be executed entirely on the training terminal 110, entirely on the server 130, or partially on the training terminal 110 and partially on the server 130.
[0051] For ease of description, the following descriptions will use the training method of the liveness detection model executed on the training terminal 110 as an example to describe the technical solutions involved in this specification.
[0052] In this specification, the training terminal 110 can communicate with the terminal detection device 120. Specifically, the training terminal 110 can transmit data or instructions of the liveness detection method corresponding to the trained liveness detection model to the terminal detection device 120. The terminal detection device 120 can store data or instructions for executing the liveness detection method described in this specification, and can execute or be used to execute data or instructions for the liveness detection method. The terminal detection device 120 may include hardware devices with image acquisition capabilities, hardware devices with data information processing capabilities, and necessary programs for driving the above hardware devices. For example, the terminal detection device 120 may include at least one mobile terminal device with image acquisition capabilities, which may be a smartphone, tablet, or other similar device.
[0053] It should be understood that Figure 1 The number of terminal detection devices 120 shown is merely illustrative. Any number of terminal detection devices 120 can be used depending on implementation requirements.
[0054] In this specification, the liveness detection method is performed on the terminal detection device 120.
[0055] Figure 2 This is a schematic diagram of the structure of a computing device 200 provided according to some embodiments of this specification. The computing device 200 can be a general-purpose computer or a special-purpose computer. For example, the computing device 200 can be a server, a personal computer, a portable computer (such as a laptop computer, tablet computer, etc.), or other electronic devices with computing capabilities. Of course, the computing device can be... Figure 1 Training terminal 110, terminal testing equipment 120 or server 130.
[0056] like Figure 2 As shown, the computing device 200 may include a COM port 250, which can be connected to or from a network to facilitate data communication. The computing device 200 may also include a processor 220 in the form of one or more processors, such as a central processing unit (CPU), for executing program instructions. The computing device 200 may also include an internal communication bus 210 and various forms of program storage media and data storage media, such as a disk 270 (non-transitory memory) and read-only memory (ROM) 230 or random access memory (RAM) 240, etc., for storing various data files to be processed and / or transmitted. The storage media may be local to the computing device 200 or shared by the computing device 200 (e.g., Figure 1The computing device 200 may also include program instructions stored in ROM 230, RAM 240, and / or other types of non-transitory storage media to be executed by processor 220. The computing device 200 may also include I / O components 260 to support data communication with other computing devices in the distributed computing system 100. The computing device 200 may also receive programming and data via network communication.
[0057] For illustrative purposes only, only one processor 220 is described in the computing device 200. However, those skilled in the art will understand that the computing device 200 in this specification may also include multiple processors. Therefore, the methods / steps / operations performed by one processor as described in this specification may also be performed jointly or separately by multiple processors. For example, in this specification, the processor of the computing device 200 may simultaneously execute step A and step B. It should be understood that step A and step B may also be performed jointly by two different processors. For example, a first processor executes step A, a second processor executes step B, or a first processor and a second processor jointly execute steps A and B.
[0058] Figure 3 A flowchart 300 illustrating a method for training a liveness detection model according to some embodiments of this specification is shown. The following will be combined with... Figure 3 This specification describes the technical solution. The entity implementing the described technical solution may be... Figure 1 The training terminal 110 and / or server 130 are selected from at least one of them. Specifically, the training terminal 110 and / or server 130 may have the following characteristics: Figure 2 The aforementioned structure, namely, the training terminal 110 and / or server 130, can be a device for training a liveness detection model, comprising: at least one storage medium and at least one processor. The at least one storage medium includes at least one instruction set for executing the training method of the liveness detection model. The at least one processor is communicatively connected to the at least one storage medium. When the system is running, the at least one processor can read the at least one instruction set and execute instructions according to the at least one instruction set. Figure 3 The method 300. For illustrative purposes only, this specification will describe method 300 using training terminal 110 as an example. Method 300 may include:
[0059] S310, obtain a first image sample set, which includes N first initial image samples and M first perturbation image samples. The M first perturbation image samples are obtained by perturbing the N first initial image samples, and M and N are both natural numbers greater than 1.
[0060] In this specification, the first initial image sample can be understood as an existing real image sample. The first initial image sample can be an image of a human body part with specific biometric characteristics; for example, it can be a face image, a palm print image, a palm vein image, etc. For ease of description, this specification uses a face image as the first initial image sample. It should be noted that the N first initial image samples involved in the first image sample set in this specification have been authorized by the user.
[0061] Based on this, the training terminal 110 can perturb the N initial image samples at least once, thereby obtaining M perturbed image samples. This results in the first image sample set including (N+M) first image samples. By perturbing the N initial image samples, data augmentation of the N initial image samples can be achieved, increasing the number of samples in the first image sample set, thus providing more image samples for the subsequent training of the teacher model and student model.
[0062] The training endpoint 110 can perturb the N initial image samples using various perturbation methods. For example, the training endpoint 110 can perturb the N initial image samples by adding noise (such as Gaussian noise, salt-and-pepper noise, etc.), by motion blur, or by mixup. Adding noise and / or applying motion blur to the initial image samples does not change the liveness label of the initial image samples. If the liveness label of the initial image samples is live, then the liveness label of the first perturbed image sample obtained after adding noise and / or applying motion blur to the initial image samples by the training endpoint 110 will be live. Similarly, if the liveness label of the initial image samples is not live (attack), then the liveness label of the first perturbed image sample obtained after adding noise and / or applying motion blur to the initial image samples by the training endpoint 110 will be not live.
[0063] The training endpoint 110 uses a mixup approach to perturb at least two of the N initial image samples. The liveness label of the resulting perturbed image sample depends on the specific circumstances of the initial images involved in the mixup. For example, if the training endpoint 110 selects two initial image samples from the N initial image samples for mixup, for ease of description, these two initial image samples are named initial image sample A and initial image sample B, respectively. Furthermore, the liveness label for initial image sample A is "live," and the liveness label for initial image sample B is "inactive." The perturbed image samples obtained by the training endpoint 110 using the mixup approach contain different proportions of initial image sample A and initial image sample B. Taking an example where the proportion of initial image sample A in the perturbed image samples is 10% and the proportion of initial image sample B in the perturbed image samples is 90%, then the liveness label of the first perturbed image sample is: 10% live + 90% inactive.
[0064] Similarly, if the training terminal 110 selects three initial image samples (initial image sample A, initial image sample B, and initial image sample C) from N initial image samples, and the liveness label of initial image sample A is live, the liveness label of initial image sample B is live, and the liveness label of initial image sample C is not live, and if the proportion of initial image sample A, the proportion of initial image sample B, and the proportion of initial image sample C in the first perturbation image sample obtained by the training terminal 110 using the mixup method is 20%, the proportion of initial image sample B is 40%, and the proportion of initial image sample C is 40%, then the liveness label of the first perturbation image sample is: 20% live + 40% live + 40% not live (equivalent to 60% live + 40% not live).
[0065] At this point, the training terminal 110 can determine the liveness label of the obtained first perturbation image based on a preset liveness probability threshold. If the liveness probability of the obtained first perturbation image sample is greater than or equal to the preset liveness probability threshold, the liveness label of the obtained first perturbation image sample is live; otherwise, if the liveness probability of the obtained first perturbation image sample is less than the preset liveness probability threshold, the liveness label of the obtained first perturbation image sample is not live.
[0066] In this specification, the M first perturbation image samples are obtained by perturbing the N first initial image samples using a sample perturbation model. The sample perturbation model can be understood as a network model capable of performing various perturbation methods (such as adding noise, motion blur, or mixup). The input to the sample perturbation model is the first initial image sample, and the output can be the corresponding first perturbation image sample. The training end 110 can use the sample perturbation model to perturb the N first initial image samples to obtain M first perturbation image samples. Specifically, for each of the N first initial image samples, the training end 110 can input the first initial image sample into the sample perturbation model, and use the sample perturbation model to add noise and / or perform motion blur on the first initial image sample to obtain the corresponding first perturbation image sample. For at least two of the N first initial image samples, the training end 110 can input the at least two first initial image samples into the sample perturbation model, and use the sample perturbation model to perform mixup processing on the at least two first initial image samples to obtain the corresponding first perturbation image sample. The sample perturbation model outputs the first perturbated image sample along with corresponding perturbation annotation information. This perturbation annotation information can include the perturbation annotation type and the perturbation annotation intensity.
[0067] In this specification, the sample perturbation model can be a perturbation network with a single perturbation function. In this case, the training terminal 110 can select different sample perturbation models to perturb the N first initial image samples as needed.
[0068] In this specification, the sample perturbation model may also include multiple perturbation networks with different perturbation functions. Each perturbation network can run independently or jointly. The training terminal 110 can control the running state of each perturbation network in the sample perturbation model, thereby controlling the perturbation processing method for the N first initial image samples.
[0069] In this specification, the training side 110 can be trained based on a preset sample perturbation model to obtain a sample perturbation model. The training method of the sample perturbation model may include: determining at least one first perturbation image sample corresponding to each of the N first initial image samples based on the preset sample perturbation model; determining at least one perturbation feature loss information between the at least one first perturbation image sample and the current first initial image sample; and converging the preset sample perturbation model based on the multiple perturbation feature loss information corresponding to the N first initial image samples to obtain the sample perturbation model.
[0070] Specifically, for each of the N initial image samples: the training terminal 110 can input the initial image sample into a preset sample perturbation model, and use the preset sample perturbation model to perturb the initial image sample at least once (adding noise and / or applying dynamic blur) to obtain at least one first perturbed image sample. For example, the training terminal 110 can input the initial image sample A into the preset sample perturbation model, and use the preset sample perturbation model to perturb the initial image sample A four times (the four perturbations can be at least one of adding noise or applying dynamic blur), thereby obtaining four first perturbed image samples. For ease of description, the above four first perturbed image samples can be represented as first perturbed image sample A1, first perturbed image sample A2, first perturbed image sample A3, and first perturbed image sample A4. Therefore, at least one first perturbed image sample corresponding to the initial image sample A includes first perturbed image sample A1, first perturbed image sample A2, first perturbed image sample A3, and first perturbed image sample A4.
[0071] It should be understood that the first perturbed image sample obtained by the training end 110 after each perturbation of the first initial image sample A using the preset sample perturbation model is different.
[0072] Furthermore, for at least two of the N initial image samples: the training end 110 can simultaneously input the at least two initial image samples into a preset sample perturbation model, and use the preset sample perturbation model to perform at least one mixup on the at least two initial image samples. Each mixup results in a first perturbation image sample containing a different proportion of the initial image samples. For example, the training end 110 can simultaneously input initial image sample A and initial image sample B into the preset sample perturbation model, and use the preset sample perturbation model to perform two mixups on initial image sample A and initial image sample B, thereby obtaining two first perturbation image samples. For ease of description, the above two first perturbation image samples are respectively represented as first perturbation image sample C and first perturbation image sample D. At this time, first perturbation image sample C and first perturbation image sample D can be used as the first perturbation image sample corresponding to first initial image sample A. First perturbation image sample C and first perturbation image sample D can also be used as the first perturbation image sample corresponding to first initial image sample B.
[0073] It should be understood that the training end 110 uses a preset sample perturbation model to mix up the first initial image sample A and the first initial image sample B each time, resulting in different first perturbation image samples.
[0074] After determining at least one first perturbation image sample corresponding to each of the N first initial image samples, the training end 110 can determine at least one perturbation feature loss information between at least one first perturbation image sample and the current first initial image sample.
[0075] In this specification, determining at least one perturbation feature loss information between the at least one first perturbation image sample and the current first initial image sample may include: determining at least one perturbation feature difference information between the at least one first perturbation image sample and the current first initial image sample; obtaining a preset perturbation feature difference threshold; and determining the at least one perturbation feature loss information based on the at least one preset perturbation feature difference threshold and the perturbation feature difference information.
[0076] Each time the training terminal 110 perturbs the first initial image sample, it obtains a corresponding first perturbed image sample, and accordingly generates a perturbation feature loss information. The perturbation feature loss information can be understood as the loss information formed by the difference between the first perturbed image sample and its corresponding first initial image sample. The difference between the first perturbed image sample and its corresponding first initial image sample can include differences in perturbation type and perturbation intensity.
[0077] As mentioned earlier, the first initial image sample can be understood as an existing real image sample. In this specification, the training end 110 perturbs the first initial image sample to obtain the first perturbed image sample. Therefore, the first initial image sample can be understood as the original image sample without perturbation processing, and the first initial image sample itself does not have a corresponding perturbation type and perturbation intensity. For ease of description, this specification sets the perturbation type of the first initial image sample to no perturbation and the perturbation intensity of the first initial image sample to zero. Thus, the perturbation feature loss information can be understood as the perturbation type and perturbation intensity value of the first initial image sample by the training end 110.
[0078] In this specification, the training terminal 110 uses a sample perturbation model to perturb the first initial image sample each time. The resulting first perturbed image sample should maintain a certain difference from the first initial image sample as much as possible, so that the perturbed first perturbed image sample can provide more effective information in the subsequent training process of the liveness detection model.
[0079] In this specification, when training a preset sample perturbation model, the training terminal 110 can introduce a preset perturbation feature difference threshold to guide the convergence process of the preset sample perturbation model. Specifically, for a given first initial image sample and its corresponding first perturbation image sample, the training terminal 110 can compare the first perturbation image sample with the first initial image sample to obtain the perturbation feature difference information between them. The perturbation feature difference information can be the sample probability distribution difference value between the first perturbation image sample and the first initial image sample, or it can be the feature difference value between the first perturbation image sample and the first initial image sample, or other numerically quantifiable difference information. The training terminal 110 can compare the perturbation feature difference information with the preset perturbation feature difference threshold, where perturbation feature loss information = perturbation feature difference information - preset perturbation feature difference threshold. The training terminal 110 can use various methods to converge the preset perturbation model based on the perturbation feature loss information. For example, the training end 110 can use gradient descent to update the network parameters of the preset perturbation model based on the perturbation feature loss information. Then, the training end 110 can return to the step of determining at least one first perturbation image sample corresponding to each of the N first initial image samples based on the preset sample perturbation model, until the preset sample perturbation model converges, thereby obtaining the trained sample perturbation model. The training end 110 can also use other parameter update algorithms to update the network parameters of the preset perturbation model based on the perturbation feature loss information; then, the training end 110 can return to the step of determining at least one first perturbation image sample corresponding to each of the N first initial image samples based on the preset sample perturbation model, until the preset sample perturbation model converges, thereby obtaining the trained sample perturbation model.
[0080] In this specification, in order to ensure that there is a sufficient difference between the first perturbation image sample and the first initial image sample, the perturbation feature loss information should be greater than or equal to 0 when the training end 110 converges the preset sample perturbation model based on the perturbation feature loss information.
[0081] S320, The preset liveness detection teacher model is trained based on the first image sample set to obtain the liveness detection teacher model.
[0082] In this specification, the training terminal 110 can train a teacher model that can be used for liveness detection, i.e., a liveness detection teacher model, based on N first initial image samples and M first perturbation image samples contained in the first image sample set.
[0083] Specifically, S320 may include:
[0084] S321, for each first image sample in the first image sample set: input the current first image sample into the preset liveness detection teacher model to obtain the first liveness detection result and the first perturbation detection result, and determine the teacher detection loss information corresponding to the current first image sample based on the first liveness detection result and the first perturbation detection result.
[0085] In this specification, the first image sample set includes (M+N) first image samples, where each first image sample can be either a first initial image sample or a first perturbation image sample. The training terminal 110 can input the first image samples into a preset liveness detection teacher model. After prediction by the preset liveness detection teacher model, the first liveness detection result and the first perturbation detection result corresponding to the first image sample are obtained. The first liveness detection result predicted by the preset liveness detection teacher model may be the same as or different from the actual liveness label of the first image sample. Similarly, the first perturbation detection result predicted by the preset liveness detection teacher model may be the same as or different from the actual perturbation of the first image sample. The purpose of training the preset liveness detection teacher model by the training terminal 110 is to continuously adjust the network parameters of the preset liveness detection teacher model so that the first liveness detection result and the first perturbation detection result predicted by the preset liveness detection teacher model are consistent with the liveness label and perturbation of the first image sample itself input into the preset liveness detection teacher model, thereby obtaining the trained liveness detection teacher model.
[0086] In this specification, the teacher detection loss information can be understood as the loss information formed by the difference between the annotation information (including liveness label information and perturbation information) of the first image sample input to the preset liveness detection teacher model and the result predicted by the preset liveness detection teacher model (including the first liveness detection result and the first perturbation detection result). The teacher detection loss information is used to ensure that the result predicted by the preset liveness detection teacher model remains consistent with the annotation information of the first image sample.
[0087] Figure 4 A schematic diagram of the data flow in a preset liveness detection teacher model provided according to some embodiments of this specification is shown.
[0088] like Figure 4 As shown in this specification, the preset liveness detection teacher model includes a preset liveness feature extraction teacher network, a preset liveness detection teacher network, and a preset perturbation perception teacher network.
[0089] The step of inputting the current first image sample into the preset liveness detection teacher model to obtain a first liveness detection result and a first perturbation detection result may include: inputting the current first image sample into the preset liveness feature extraction teacher network to obtain a first liveness feature map; inputting the first liveness feature map into the preset liveness detection teacher network to obtain the first liveness detection result; and inputting the first liveness feature map into the preset perturbation perception teacher network to obtain the first perturbation detection result.
[0090] In this specification, after the training terminal 110 inputs the first image sample into the preset liveness detection teacher model, it can use the preset liveness feature extraction teacher network to extract features from the first image sample to obtain the first liveness feature map. Then, the training terminal 110 can perform liveness classification prediction on the first image sample based on the first liveness feature map and the preset liveness detection teacher network to obtain the first liveness detection result. At the same time, the training terminal 110 can also perform perturbation perception on the first image sample based on the first liveness feature map and the preset perturbation perception teacher network to obtain the first perturbation detection result.
[0091] In this specification, when the training terminal 110 performs liveness classification prediction on the first image sample based on the preset liveness detection teacher model, it can use a binary classification method, that is, the first liveness detection result includes live and non-live (attack). During the process of performing liveness classification prediction on the first image sample based on the preset liveness detection teacher model, the training terminal 110 can also output the probabilities of live and non-live, and the sum of the probabilities of live and non-live is 1.
[0092] After the training terminal 110 obtains the first liveness detection result and the first perturbation detection result of the first image sample based on the preset liveness detection teacher model, it can determine the teacher detection loss information corresponding to the first image sample based on the first liveness detection result and the first perturbation detection result. Specifically, determining the teacher detection loss information corresponding to the current first image sample based on the first liveness detection result and the first perturbation detection result can include: obtaining the first liveness annotation information and the first perturbation annotation information of the current first image sample; determining the first liveness classification loss information corresponding to the current first image sample based on the first liveness detection result and the first liveness annotation information; and determining the first perturbation perception loss information corresponding to the current first image sample based on the first perturbation detection result and the first perturbation annotation information, wherein the teacher detection loss information includes the first liveness classification loss information and the first perturbation perception loss information.
[0093] In this specification, the first liveness annotation information can be understood as the original liveness label of the first image sample. The first liveness annotation information can be a liveness label or a non-liveness label. The first perturbation annotation information can be understood as the original perturbation label information of the first image sample. The first perturbation annotation information is related to whether the first image sample has been perturbed by a sample perturbation model. When the first image sample is the first initial image sample, the first perturbation annotation information of the first image sample is blank or zero. When the first image sample is the first perturbation image sample, the first perturbation annotation information of the first image sample is determined according to the output result (perturbation annotation information) of the sample perturbation model.
[0094] In this specification, teacher detection loss information can be understood as the loss information (including first liveness classification loss information and first perturbation perception loss information) formed by the difference between the teacher detection results (including the first liveness detection result and the first perturbation detection result) predicted by the preset liveness detection teacher model and the original annotation information (including the first liveness annotation information and the first perturbation annotation information) of the first image sample. Teacher detection loss information is used to ensure that the classroom detection results predicted by the preset liveness detection teacher model remain consistent with the original annotation information.
[0095] The first liveness classification loss information can be understood as the loss information formed by the difference between the first liveness detection result and the first liveness annotation information. This first liveness classification loss information is used to ensure that the first liveness detection result predicted by the preset liveness detection teacher model remains consistent with the first liveness annotation information. The first perturbation perception loss information can also be understood as the loss information formed by the difference between the first perturbation detection result and the first perturbation annotation information. This first perturbation perception loss information is used to ensure that the first perturbation detection result predicted by the preset liveness detection teacher model remains consistent with the first perturbation annotation information.
[0096] In this specification, the first disturbance detection result may include the first disturbance type prediction result and the first disturbance intensity prediction result, and the first disturbance labeling information may include the first disturbance labeling type and the first disturbance labeling intensity.
[0097] In this specification, the preset perturbation-aware teacher network included in the preset liveness detection teacher model can perceive the perturbation type and the corresponding perturbation intensity of the first image sample based on the input first liveness feature map. Therefore, the first perturbation detection result output by the preset perturbation-aware teacher network can include a first perturbation type prediction result and a first perturbation intensity prediction result. Furthermore, as mentioned above, the perturbation annotation information output by the sample perturbation model can include a perturbation annotation type and a perturbation annotation intensity. Correspondingly, the first perturbation annotation information can include a first perturbation annotation type and a first perturbation annotation intensity.
[0098] Therefore, the training end 110 can determine the first perturbation perception loss information of the preset liveness detection teacher model based on the first perturbation detection result output by the preset perturbation perception teacher network and the first perturbation annotation information corresponding to the first image sample output by the sample perturbation model.
[0099] Specifically, determining the first perturbation perception loss information of the preset liveness detection teacher model based on the first perturbation detection result and the first perturbation annotation information may include: determining the first perturbation type loss information corresponding to the current first image sample based on the first perturbation type prediction result and the first perturbation annotation type, and determining the first perturbation intensity loss information corresponding to the current first image sample based on the first perturbation intensity prediction result and the first perturbation annotation intensity, wherein the first perturbation perception loss information includes the first perturbation type loss information and the first perturbation intensity loss information.
[0100] The first perturbation type loss information can be understood as the loss information formed by the difference between the predicted result of the first perturbation type and the first perturbation label type. The first perturbation type loss information is used to constrain the first perturbation type prediction result of the preset liveness detection teacher model to maintain consistency with the first perturbation label type. The first perturbation intensity loss information can be understood as the loss information formed by the difference between the predicted result of the first perturbation intensity and the first perturbation label intensity. The first perturbation intensity loss information is used to constrain the first perturbation intensity prediction result of the preset liveness detection teacher model to maintain consistency with the first perturbation label intensity.
[0101] S322, the preset liveness detection teacher model is converged based on multiple teacher detection loss information corresponding to the first image sample set to obtain the liveness detection teacher model.
[0102] In this specification, after determining the teacher detection loss information, the training terminal 110 can converge the preset liveness detection teacher model based on the teacher detection loss information, thereby obtaining the trained liveness detection teacher model. The training terminal 110 can converge the preset liveness detection teacher model based on the teacher detection loss information in various ways. For example, the training terminal 110 can use gradient descent or other parameter update algorithms to update the network parameters of the preset liveness detection teacher model based on the teacher detection loss information; then, the training terminal 110 can return to the step of inputting the current first image sample into the preset liveness detection teacher model until the preset liveness detection teacher model converges, thereby obtaining the trained liveness detection teacher model.
[0103] In this specification, during the training process of the preset liveness detection teacher model, the training terminal 110 introduces perturbed image samples for liveness detection and also performs perceptual prediction of the perturbation of the input first image sample. The training terminal 110 uses both the perceptual prediction result of the first image sample and the liveness detection result as sources of loss information during model training. Compared with the traditional method of training the model based on the liveness detection result, this provides more reference information for updating the network parameters during model training, making the trained liveness detection teacher model more sensitive to various perturbations and more capable of extracting features from the input image samples, thereby improving the liveness detection performance of the liveness detection teacher model.
[0104] S330, based on the first image sample set and the liveness detection teacher model, perform knowledge distillation on the preset liveness detection student model to obtain a liveness detection model.
[0105] In this specification, after training the preset liveness detection teacher model, the training terminal 110 obtains a liveness detection teacher model that not only has high sensitivity to various perturbations but also good liveness detection performance. At this point, the training terminal 110 can generate a lightweight student network (i.e., a liveness detection model) based on the liveness detection teacher model through knowledge transfer (i.e., knowledge distillation). This liveness detection model has similar or nearly identical liveness detection and perturbation perception capabilities to the liveness detection teacher model. Furthermore, compared to the liveness detection teacher model, the liveness detection model generated through knowledge transfer contains fewer model parameters, requires less computing power, and is more suitable for deployment on the terminal detection device 12. Deploying the liveness detection model on the terminal detection device can simultaneously address the issues of liveness attacks and privacy leaks.
[0106] Figure 5 A schematic diagram illustrating the data flow during a knowledge distillation process provided according to some embodiments of this specification is shown.
[0107] Specifically, such as Figure 5 As shown, S330 may include:
[0108] S331, for each first image sample in the first image sample set: input the current first image sample into the liveness detection teacher model to obtain the teacher detection result, and input the current first image sample into the preset liveness detection student model to obtain the student detection result. Based on the first liveness annotation information of the current first image sample, the teacher detection result and the student detection result, determine the student detection loss information corresponding to the current first image sample.
[0109] In this specification, the training terminal 110 can use the first image sample set as the training set for training the preset liveness detection student model. During the knowledge distillation process of the preset liveness detection student model, the training terminal 110 inputs the same first image sample into both the preset liveness detection student model and the liveness detection teacher model. For each first image sample in the first image sample set, for example, taking first image sample A, the training terminal 110 can input first image sample A into both the liveness detection teacher model and the preset liveness detection student model. After outputting first image sample A into the liveness detection teacher model, the training terminal 110 obtains the teacher detection result corresponding to first image sample A; after inputting first image sample A into the preset liveness detection student model, the training terminal 110 obtains the student detection result corresponding to first image sample A. Since the training process of the preset liveness detection student model requires learning the knowledge of the liveness detection teacher network to achieve consistency between the student detection result and the teacher detection result, the loss information (i.e., student detection loss information) involved in the training process of the preset liveness detection student model can be considered in conjunction with the teacher detection result and the student detection result.
[0110] In this specification, the training terminal 110 can also use the first liveness annotation information of the first image sample as one of the reference factors for determining the student detection loss information, and comprehensively consider the student detection results and the teacher detection results to finally obtain the student detection loss information. That is, the training terminal 110 can determine the student detection loss information corresponding to the first image sample based on the first liveness annotation information of the first image sample, the teacher detection results, and the student detection results.
[0111] As described above, the preset liveness detection teacher model in this specification includes a preset liveness feature extraction teacher network, a preset liveness detection teacher network, and a preset perturbation perception teacher network. The liveness detection teacher model obtained after training the preset liveness detection teacher model also has the same network structure. For ease of distinction, the networks included in the liveness detection teacher model are named the liveness feature extraction teacher network, the liveness detection teacher network, and the perturbation perception teacher network, respectively.
[0112] In this specification, the liveness detection teacher model may include a liveness feature extraction teacher network and a perturbation perception teacher network. The step of inputting the current first image sample into the liveness detection teacher model to obtain a teacher detection result may include: inputting the current first image sample into the liveness feature extraction teacher network to obtain a second liveness feature map, and inputting the second liveness feature map into the perturbation perception teacher network to obtain a second perturbation detection result, wherein the teacher detection result includes the second liveness feature map and the second perturbation detection result.
[0113] In this manual, during the training process of the training terminal 110 in training the preset liveness detection student model, the output result of the liveness detection teacher model (teacher detection result) can be understood as soft-targets.
[0114] like Figure 5 As shown, after the training terminal 110 inputs the first image sample into the liveness detection teacher model, it can use the liveness feature extraction teacher network to extract features from the first image sample, obtaining a second liveness feature map. Then, the training terminal 110 can perform liveness classification prediction on the first image sample based on the second liveness feature map and the liveness detection teacher model, obtaining a first liveness detection result. Simultaneously, the training terminal 110 can also perform perturbation perception on the first image sample based on the first liveness feature map and a pre-set perturbation perception teacher network, obtaining a second perturbation detection result. The training terminal 110 can use the second liveness feature map and the second perturbation detection result as teacher detection results and incorporate them into the training process of the pre-set liveness detection student model.
[0115] It should be understood that the training end 110 can also perform liveness classification prediction on the first image sample based on the second liveness feature map and the liveness detection teacher network to obtain the teacher liveness detection result of the first image sample.
[0116] In this specification, the preset liveness detection student model includes a preset liveness feature extraction student network, a preset liveness detection student network, and a preset perturbation perception student network. The step of inputting the current first image sample into the preset liveness detection student model to obtain a student detection result may include: inputting the current first image sample into the preset liveness feature extraction student network to obtain a third liveness feature map; inputting the third liveness feature map into the preset liveness detection student network to obtain a second liveness detection result; and inputting the third liveness feature map into the preset perturbation perception student network to obtain a third perturbation detection result, wherein the student detection result includes the third liveness feature map, the second liveness detection result, and the third perturbation detection result.
[0117] like Figure 5As shown, after the training terminal 110 inputs the first image sample into the preset liveness detection student model, it can use the preset liveness feature extraction student network to extract features from the first image sample, obtaining a third liveness feature map. Then, the training terminal 110 can perform liveness classification prediction on the first image sample based on the third liveness feature map and the preset liveness detection student network, obtaining a second liveness detection result. Simultaneously, the training terminal 110 can also perform perturbation sensing on the first image sample based on the third liveness feature map and the preset perturbation perception student network, obtaining a third perturbation detection result. Therefore, it can be seen that after the training terminal 110 inputs the first image sample into the preset liveness detection student model, the output student detection results include the third liveness feature map, the second liveness detection result, and the third perturbation detection result.
[0118] In this specification, the first liveness annotation information of the first image sample can be understood as hard-targets. For example... Figure 5 As shown, determining the student detection loss information corresponding to the current first image sample based on the first liveness annotation information of the current first image sample, the teacher detection result, and the student detection result may include: determining the feature distillation loss information corresponding to the current first image sample based on the second liveness feature map and the third liveness feature map; determining the second liveness classification loss information corresponding to the current first image sample based on the first liveness annotation information and the second liveness detection result; and determining the second perturbation perception loss information corresponding to the current first image sample based on the second perturbation detection result and the third perturbation detection result, wherein the student detection loss information includes the feature distillation loss information, the second liveness classification loss information, and the second perturbation perception loss information.
[0119] In this specification, one first image sample corresponds to one student detection loss information, and multiple first image samples correspond to multiple student detection loss information. The feature distillation loss information can be understood as the loss information formed by the difference between the third liveness feature map obtained after feature extraction of the first image sample using a pre-defined liveness detection student model and the second liveness feature map obtained after feature extraction of the first image sample using a liveness detection teacher model. The feature distillation loss information is used to ensure that the third liveness feature map extracted by the pre-defined liveness detection student model remains consistent with the second liveness feature map extracted by the liveness detection teacher model.
[0120] The second disturbance perception loss information can be understood as the loss information formed by the difference between the third disturbance detection result and the second disturbance detection result. The second disturbance perception loss information is used to constrain the third disturbance detection result to be consistent with the second disturbance detection result.
[0121] In this specification, the second perturbation detection result includes a second perturbation type prediction result and a second perturbation intensity prediction result, and the third perturbation detection result includes a third perturbation type prediction result and a third perturbation intensity prediction result. The liveness detection teacher model has the same network structure as the preset liveness detection teacher model. The perturbation perception teacher network of the liveness detection teacher model can perceive the perturbation type of the first image sample based on the input second liveness feature map (corresponding to the output of the second perturbation type prediction result) and the perturbation intensity corresponding to the perturbation type (corresponding to the output of the second perturbation intensity prediction result). Therefore, the second perturbation detection result can include the second perturbation type prediction result and the second perturbation intensity prediction result.
[0122] Furthermore, since the preset liveness detection student model and the liveness detection teacher model have similar network structures, the preset perturbation-aware student network of the preset liveness detection student model can also perceive the perturbation type of the first image sample based on the input third liveness feature map (corresponding to the output third perturbation type prediction result) and the perturbation intensity corresponding to the perturbation type (corresponding to the output third perturbation intensity prediction result). Therefore, the third perturbation detection result can include the third perturbation type prediction result and the third perturbation intensity prediction result.
[0123] In this specification, determining the second perturbation perception loss information corresponding to the current first image sample based on the second perturbation detection result and the third perturbation detection result may include:
[0124] Based on the prediction results of the second perturbation type and the third perturbation type, the second perturbation type loss information corresponding to the current first image sample is determined.
[0125] And based on the second disturbance intensity prediction result and the third disturbance intensity prediction result, determine the second disturbance intensity loss information corresponding to the current first image sample, wherein the second disturbance perception loss information includes the second disturbance type loss information and the second disturbance intensity loss information.
[0126] The second perturbation type loss information can be understood as the loss information formed by the difference between the prediction result of the third perturbation type and the prediction result of the second perturbation type. The second perturbation type loss information is used to ensure that the prediction result of the third perturbation type by the preset liveness detection student model is consistent with the prediction result of the second perturbation type by the liveness detection teacher model.
[0127] The second perturbation intensity loss information can be understood as the loss information formed by the difference between the third perturbation intensity prediction result and the second perturbation intensity prediction result. The second perturbation intensity loss information is used to ensure that the third perturbation intensity prediction result predicted by the preset liveness detection student model is consistent with the second perturbation intensity prediction result predicted by the liveness detection teacher model.
[0128] In this specification, both the feature distillation loss information and the second perturbation perception loss information are determined based on the soft-targets corresponding to the liveness detection teacher model. During knowledge distillation, introducing the soft-targets of the liveness detection teacher model allows the probability distribution P corresponding to the student detection results (including the third liveness feature map and the third perturbation detection result) output by the preset liveness detection student model to approximate as closely as possible to the probability distribution Q corresponding to the teacher detection results (including the second liveness feature map and the second perturbation detection result) output by the liveness detection teacher network. In this specification, the training end 110 can use KL divergence (also known as relative entropy) to measure the distance between probability distribution P and probability distribution Q. The closer P and Q are, the closer the KL loss value approaches zero. For example, the training end 110 can name the probability distribution corresponding to the third liveness feature map P1, the probability distribution corresponding to the third perturbation detection result P2, the probability distribution corresponding to the second liveness feature map Q1, and the probability distribution corresponding to the second perturbation detection result Q2. The training end 110 can use KL divergence to determine the distance between P1 and Q1, and use KL divergence to determine the distance between P2 and Q2. During training, backpropagation updates the weights of each network parameter of the preset liveness detection student model, so that the distance between P1 and Q1 and the distance between P2 and Q2 are close to 0.
[0129] Compared to Hard-targets, which only contain prediction categories (e.g., liveness category, perturbation category, etc.), Soft-targets also include the probabilities corresponding to each prediction category. This means Soft-targets can provide more knowledge and information for training the pre-defined liveness detection student model. In this specification, the training terminal 110 uses Soft-targets when determining the feature distillation loss information and the second perturbation perception loss information. This improves the sensitivity of the trained liveness detection student model to various perturbations, enhances its feature extraction capability, and thus improves the liveness detection accuracy of the liveness detection teacher model.
[0130] In this specification, the second liveness classification loss information can be understood as the loss information formed by the difference between the second liveness detection result and the first liveness annotation information. The second liveness classification loss information is used to constrain the second liveness detection result to maintain consistency with the first liveness annotation information. At this time, the second liveness classification loss information can be determined based on the hard-targets corresponding to the first image samples.
[0131] It should be understood that the second liveness classification loss information can also be determined based on the teacher liveness detection results output by the liveness detection teacher network in the liveness detection teacher model. Specifically, the training end 110 can use the loss information formed by the difference between the second liveness detection result and the teacher liveness detection result as the second liveness classification loss information. In this case, the student detection loss information is determined entirely based on the soft-targets corresponding to the liveness detection teacher model.
[0132] Since the training terminal 110 also uses the first image sample set as the training set during the training of the preset liveness detection teacher model, the first liveness annotation information corresponding to the first image samples in the first image sample set is known information and can be reused. The training terminal 110 directly compares the second liveness detection result with the first liveness annotation information to obtain the second liveness classification loss information, which can ensure the accuracy of the training results. Although compared to Hard-targets, which only contain the liveness category (liveness or attack), Soft-targets also contain the probability of each liveness category, that is, Soft-targets can provide more knowledge and information for the training of the preset liveness detection student model, the use of Soft-targets to determine the loss information requires higher computing power. In this specification, while ensuring the accuracy of the training results, the use of Hard-targets to determine the second liveness classification loss information by the training terminal 110 can reduce the computing power resources required by the training terminal 110 when determining the second liveness classification loss information to a certain extent.
[0133] S332, the preset liveness detection student model is converged based on multiple student detection loss information corresponding to the first image sample set to obtain the liveness detection model.
[0134] In this specification, after determining the student detection loss information, the training terminal 110 can converge the preset liveness detection student model based on the student detection loss information, thereby obtaining the trained liveness detection student model (i.e., the liveness detection model). The training terminal 110 can perform face slimming based on the preset liveness detection student model using various methods. For example, the training terminal 110 can use gradient descent or other parameter update algorithms to update the network parameters of the preset liveness detection student model based on the student detection loss information. Afterward, the training terminal 110 can return to the steps of inputting the current first image sample into the liveness detection teacher model to obtain the teacher detection result, and inputting the current first image sample into the preset liveness detection student model to obtain the student detection result, until the preset liveness detection student model converges, thereby obtaining the trained liveness detection model.
[0135] Figure 6 A flowchart 300 is shown of a method for training a liveness detection model according to some embodiments of this specification.
[0136] In this instruction manual, such as Figure 6 As shown, the method 300 may further include:
[0137] S340, obtain a second image sample set, which includes L second image samples, where L is a natural number greater than 1.
[0138] In this specification, the L second image samples contained in the second image sample set can be the same as the (M+N) first image samples contained in the first image sample set, that is, the second image sample set and the first image sample set can be from the same source.
[0139] S350, based on the second image sample set and the liveness detection model, a preset hierarchical cropping model is trained to obtain a hierarchical cropping model, wherein the hierarchical cropping model is configured to dynamically crop the number of network layers of the liveness detection model according to the input image.
[0140] In this specification, the training terminal 110 can add an L1 norm to the scaling parameters of the batch normalization (BN) layer in the liveness detection model for regularized sparsity. The importance of the network layer (channel) corresponding to the BN layer is measured by the size of the sparsified BN layer scaling parameter; the larger the BN layer scaling parameter, the more important the network layer (channel) corresponding to the BN layer. The training terminal 110 can dynamically prune the network layers in the liveness detection model based on the sparsified BN layer scaling parameter, thereby adaptively pruning the liveness detection model according to the different second image samples input each time, further reducing the computational requirements of the liveness detection model on the terminal detection device 120. It should be noted that although the dynamic pruning of the liveness detection model by the training terminal 110 can reduce the computational requirements of the liveness detection model on the terminal detection device 120, at the same time, since the number of network layers and the corresponding network parameters of the pruned liveness detection model are reduced, it may affect the liveness detection results.
[0141] In this specification, the training terminal 110 can train a hierarchical cropping model. By training the preset hierarchical cropping model, the obtained hierarchical cropping model can predict what kind of model cropping will be performed after different second image samples are input into the liveness detection model. This can ensure that the liveness detection results are not affected, and can also crop the network layers in the liveness detection model that are not important to the liveness detection results to the greatest extent, thereby reducing the computing power requirements of the liveness detection model on the terminal detection device 120.
[0142] During the training of the predictive hierarchical cropping model, the scaling parameters of each network layer in the liveness detection model can be queried or obtained online for different second image samples input into the model. Based on these scaling parameters, the model prunes the network layers in the liveness detection model to obtain the corresponding hierarchical cropping model. It should be understood that different input images result in different scaling parameters for each network layer in the liveness detection model, leading to different network layers being pruned and consequently, different hierarchical cropping models.
[0143] In this specification, the liveness detection model includes K initial network layers, where K is a natural number greater than 1. Each of the K initial network layers corresponds to a batch normalization (BN) layer. Accordingly, for each second image sample in the second image sample set, after the training terminal 110 inputs the second image sample into the liveness detection model or the preset hierarchical cropping model, the scaling parameters corresponding to each initial network layer may be different.
[0144] It should be understood that since the predictive hierarchical pruning model can query or obtain the scaling parameters of each network layer in the liveness detection model online, the pre-defined hierarchical pruning model can be understood as a network model with the same network structure and the same network parameters as the liveness detection model. Furthermore, the predictive hierarchical pruning model has a separate query or online acquisition network for querying or obtaining the scaling parameters of each network layer. In addition, the predictive hierarchical pruning model also has a pruning function network that can prune the model according to the scaling parameters of each network layer.
[0145] In this specification, the predictive hierarchical pruning model can also be understood as a network module in the liveness detection model. It has a query or online acquisition network for querying or obtaining the scaling parameters of each network layer of the liveness detection network, and a pruning function network for pruning the model based on the scaling parameters of each network layer.
[0146] Specifically, the step of training a preset hierarchical cropping model based on the second image sample set and the liveness detection model to obtain a hierarchical cropping model may include:
[0147] S351, for each of the L second image samples: input the current second image sample into the preset hierarchical pruning model to obtain the selection probability of each of the K initial network layers, prune the liveness detection model based on the selection probability to obtain a liveness detection pruning model; input the current second image sample into the liveness detection pruning model to obtain a third liveness detection result, and determine the pruning loss information corresponding to the current second image sample based on the selection probability and the third liveness detection result.
[0148] In this specification, the training terminal 110 can input a second image sample into the prediction hierarchical cropping model to obtain the selection probability (scaling parameter) of each of the K initial network layers in the liveness detection model. Then, based on the selection probability of each initial network layer, the training terminal 110 can prune the liveness detection model using a preset hierarchical cropping model. Specifically, the training terminal 110 can sort the selection probabilities of each initial network layer from high to low, retain the initial network layers with higher selection probabilities according to a certain proportion, and prune the remaining initial network layers with lower selection probabilities. The purpose of training the prediction hierarchical cropping model by the training terminal 110 is to enable the trained hierarchical cropping model to determine an appropriate cropping ratio for different second image samples.
[0149] The training end 110 can also pre-set a selection probability threshold and compare the selection probability of each initial network layer with the selection probability threshold, retaining the initial network layers whose selection probability is greater than or equal to the selection probability threshold, and pruning the initial network layers whose selection probability is less than the selection probability threshold. In this case, the purpose of training the pre-set hierarchical cropping model by the training end 110 is to enable the trained hierarchical cropping model to determine an appropriate selection probability threshold for different second image samples.
[0150] Regarding the specific pruning method, the training end 110 can set a mask for each of the K initial network layers. The mask uses 0 or 1 to indicate whether the initial network layer has been pruned. 0 indicates that the initial network layer has been pruned, and 1 indicates that the initial network layer has not been pruned. The initial mask value for all K initial network layers of the liveness detection model is 1. Therefore, the training end 110 can change the mask value of the initial network layers that were not retained due to low selection probability from 1 to 0, while keeping the mask value of the initial network layers that were retained due to high selection probability unchanged.
[0151] After the training end 110 prunes the liveness detection model, the number of initial network layers in the resulting liveness detection pruned model is less than K, and the mask value of each initial network layer in the liveness detection pruned model is 1. Then, the training end 110 can input the same second image sample into the liveness detection pruned model, and use the liveness detection pruned model to predict the liveness of the second image sample, obtaining a third liveness detection result.
[0152] In this specification, for each second image sample, the pruning loss information can be understood as the loss information formed by the difference between the result predicted by the liveness detection pruning model (obtained by dynamically pruning the liveness detection model using a preset hierarchical pruning model) and the result predicted by the liveness detection model. The pruning loss information is used to constrain the third liveness detection result predicted by the liveness detection pruning model to maintain the same identity as the original liveness label of the second image sample. Since the third liveness detection result is related to the initial network layer contained in the liveness detection pruning model, and the initial network layer contained in the liveness detection pruning model is related to the selection probability, the pruning loss information is related to the selection probability. In this specification, the training end 110 can connect the prediction hierarchical pruning model and the liveness detection model through a data interface, and train the prediction hierarchical pruning model during the reverse parameter update process based on the pruning loss information, thereby obtaining the trained hierarchical pruning model.
[0153] In this specification, the training terminal 110 can determine the pruning loss information corresponding to the current second image sample based on the selection probability and the third liveness detection result. Specifically, determining the pruning loss information corresponding to the current second image sample based on the selection probability and the third liveness detection result can include: obtaining a preset probability sparse threshold and the second liveness annotation information of the current second image sample; determining the pruning probability sparse loss information corresponding to the current second image sample based on the selection probability and the preset probability sparse threshold; and determining the pruning liveness classification loss information corresponding to the current second image sample based on the third liveness detection result and the second liveness annotation information, wherein the pruning loss information includes the pruning probability sparse loss information and the pruning liveness classification loss information.
[0154] In this specification, the second liveness labeling information can be understood as the original liveness label of the second image sample. The second liveness labeling information can be either a liveness label or a non-liveness label. The pruned liveness classification loss information can be understood as the loss information formed by the difference between the third liveness detection result predicted by the liveness detection pruning model and the second liveness labeling information. It should be noted that in this specification, during the training process of the preset hierarchical pruning model, the network parameters of the liveness detection pruning model remain fixed. Therefore, the pruned liveness classification loss information is used to constrain the dynamic pruning strategy of the preset hierarchical pruning model, ensuring that the third liveness detection result predicted by the pruned liveness detection pruning model remains consistent with the second liveness labeling information.
[0155] In this specification, the preset probability sparsity threshold can be understood as the sum of the selection probability values of all initial network layers included in the liveness detection pruning model preset by the training end 110. During the training process of the preset hierarchical pruning model, the preset hierarchical pruning model can retain the initial network layers with higher selection probability values according to the selection probability of each initial network layer, thereby obtaining the liveness detection pruning model. The training end 110 can add the selection probability values of the initial network layers included in the liveness detection pruning model to obtain a selection probability sum, and compare the selection probability sum with the preset probability sparsity threshold to obtain the pruning probability sparsity loss information. The calculation formula for the pruning probability sparsity loss information can be expressed as: Pruning probability sparsity loss information = Preset probability sparsity threshold - Selection probability sum.
[0156] Generally speaking, the smaller the sum of probabilities, the fewer the initial network layers in the liveness detection pruning model, meaning the greater the pruning amplitude of the liveness detection model. Consequently, the computational power requirements of the terminal detection device 120 are lower during liveness detection. Therefore, during the training of the pre-defined hierarchical pruning model, while ensuring consistency between the third liveness detection result and the second liveness annotation information, it is better to choose a smaller sum of probabilities.
[0157] S352, converge the preset hierarchical pruning model based on the L pruning loss information corresponding to the L second image samples to obtain the hierarchical pruning model.
[0158] In this specification, each second image sample corresponds to one pruning loss information, and L second image samples correspond to L pruning loss information. After determining the pruning loss information, the training end 110 can converge the preset hierarchical pruning model based on the pruning loss information, thereby obtaining the trained hierarchical pruning model. The training end 110 can converge the preset hierarchical pruning model based on the pruning loss information in various ways. For example, the training end 110 can use gradient descent or other parameter update algorithms to update the network parameters of the predicted hierarchical pruning model based on the pruning loss information; then, the training end 110 can return to the step of inputting the current second image sample into the preset hierarchical pruning model until the preset hierarchical pruning model converges, thereby obtaining the trained hierarchical pruning model.
[0159] In this specification, the trained hierarchical cropping model and the liveness detection model can both be deployed on the terminal detection device 120. The hierarchical cropping model can dynamically crop the initial network layer of the liveness detection model according to the input image, thereby reducing the computing power requirements of the terminal detection device 120.
[0160] Figure 7 A flowchart 400 of a liveness detection method according to some embodiments of this specification is shown. The following will be combined with... Figure 7 This specification describes the technical solution for the liveness detection method. The subject implementing the described liveness detection method can be... Figure 1 The terminal detection device 120 in the middle. Specifically, the terminal detection device 120 may have as follows: Figure 2 The aforementioned structure, namely, the terminal detection device 120, can be a device for a liveness detection method, comprising: at least one storage medium and at least one processor. The at least one storage medium includes at least one instruction set for executing the liveness detection method. The at least one processor is communicatively connected to the at least one storage medium. When the system is running, the at least one processor can read the at least one instruction set and execute instructions according to the at least one instruction set. Figure 7 The method 400. For illustrative purposes only, this specification will describe the method 400 using a terminal detection device 120 as an example. The method 400 may include:
[0161] S410, Obtain the target image of the target object.
[0162] In this specification, the target object can be understood as the user or object performing liveness detection. Correspondingly, the target image can be understood as an image of the user or object performing liveness detection. The target image can be an image of a human body part with specific biometric features, such as a face image, palm print image, palm vein image, etc. For ease of description, the liveness detection process in this specification will be described using a face image as the target image.
[0163] S420, based on the target image, a liveness detection model is used to perform liveness detection on the target object to obtain a liveness detection result, wherein the liveness detection model is trained using the method 300.
[0164] In this specification, the terminal detection device 120 can directly input the target image into the liveness detection model trained by the above method 300 to obtain the liveness detection result of the target object.
[0165] As mentioned above, the method 300 also trains a hierarchical cropping model, which can be deployed together with the liveness detection model on the terminal detection device 120. Therefore, the terminal detection device 120 can also combine the hierarchical cropping model with the liveness detection model to complete the liveness detection of the target object.
[0166] Specifically, the S420 may include:
[0167] S421, Based on the target image, a hierarchical pruning model is used to prune the liveness detection model to obtain a liveness detection pruning model.
[0168] In this specification, for a given target image, the terminal detection device 120 can input the target image into a hierarchical cropping model. The hierarchical cropping model can query or obtain the selection probability of each network layer after the target image is input online, and prune the liveness detection model according to the selection probability values of each network layer to obtain a liveness detection pruned model. The terminal detection device 120 can then re-input the target image into the liveness detection pruned model to obtain the liveness detection result.
[0169] In this specification, the liveness detection model includes K initial network layers, where K is a natural number greater than 1. The number of K initial network layers in the liveness detection model is a fixed value. Each of the K initial network layers corresponds to a batch normalization (BN) layer, and each BN layer generates a scaling parameter (selection probability). After the terminal detection device 120 inputs the same target image into the liveness detection model, the selection probabilities corresponding to each of the K initial network layers may be different. For any of the K initial network layers, the corresponding selection probability may be different after different target images are input.
[0170] In this specification, S421 may include: inputting the target image into the hierarchical pruning model to obtain the selection probability of each of the K initial network layers, and determining S initial network layers that satisfy preset conditions based on the selection probabilities, wherein S is a natural number greater than 1 and S is less than K, and the liveness detection pruning model includes the S initial network layers.
[0171] After the terminal detection device 120 inputs the target image into the hierarchical cropping model, it can use the hierarchical cropping model to determine the selection probability of each of the K initial network layers contained in the liveness detection model, and select S initial network layers as the network layers of the liveness detection pruning model. Here, the S initial network layers can be understood as the network layers of the liveness detection pruning model predicted by the hierarchical cropping model and adapted to the input target image. The size of S may differ after different target images are input into the hierarchical cropping model. Correspondingly, the number of initial network layers contained in the resulting liveness detection pruning model may differ after different target images are input into the hierarchical cropping model.
[0172] In this specification, the terminal detection device 120 uses a hierarchical pruning model to determine the selection probability of each of the S initial network layers, and each initial network layer's selection probability satisfies a preset condition. Specifically, the preset condition may include: the selection probability is greater than or equal to a preset probability threshold. The preset condition can be understood as a pruning strategy obtained by the hierarchical pruning model after predicting pruning based on different target images input into the liveness detection model. If the selection probability of the initial network layer is greater than or equal to the preset probability threshold, the initial network layer can be retained. If the selection probability of the initial network layer is less than the preset probability threshold, the initial network layer is pruned.
[0173] S422, Input the target image into the liveness detection pruning model to obtain the liveness detection result of the target object.
[0174] In this specification, after the terminal detection device 120 performs model pruning on the liveness detection model to obtain the liveness detection pruning model, the target image previously input into the hierarchical pruning model can be re-input into the liveness detection pruning model, and the liveness detection pruning model can be used to predict the liveness detection result of the target object.
[0175] S430, output the liveness detection result.
[0176] The liveness detection result includes whether the target is alive or not (attack). When the terminal detection device 120 determines that the target object's liveness detection result is alive, it can output a prompt message indicating that the liveness detection has passed. When the terminal detection device 120 determines that the target object's liveness detection result is an attack, it can output an alarm prompt message.
[0177] In summary, after reading this detailed disclosure, those skilled in the art will understand that the foregoing detailed disclosure is presented by way of example only and is not restrictive. Although not explicitly stated herein, those skilled in the art will understand that this specification is intended to encompass various reasonable changes, improvements, and modifications to the embodiments. These changes, improvements, and modifications are intended to be made by this specification and are within the spirit and scope of the exemplary embodiments described herein.
[0178] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.
[0179] Furthermore, certain terms in this specification have been used to describe embodiments of this specification. For example, "an embodiment," "an embodiment," and / or "some embodiments" mean that a particular feature, structure, or characteristic described in connection with that embodiment may be included in at least one embodiment of this specification. Therefore, it is to be emphasized and understood that two or more references to "an embodiment" or "an embodiment" or "alternative embodiment" in various parts of this specification do not necessarily refer to the same embodiment. Moreover, specific features, structures, or characteristics may be suitably combined in one or more embodiments of this specification.
[0180] It should be understood that in the foregoing description of the embodiments of this specification, for the purpose of simplifying the description and to aid in understanding a feature, various features are sometimes combined in a single embodiment, drawing, or description thereof. Alternatively, various features may be distributed across multiple embodiments of this specification. However, this does not mean that the combination of these features is necessary, and those skilled in the art, upon reading this specification, may extract some features as individual embodiments for understanding. That is, the embodiments in this specification can also be understood as an integration of multiple sub-embodiments. It is also valid when each sub-embodiment contains fewer than all the features of a single foregoing disclosed embodiment.
Claims
1. A training method for a liveness detection model, comprising: A first image sample set is obtained, which includes N first initial image samples and M first perturbation image samples. The M first perturbation image samples are obtained by perturbing the N first initial image samples, and M and N are both natural numbers greater than 1. A preset liveness detection teacher model is trained based on the first image sample set to obtain a liveness detection teacher model, wherein the liveness detection teacher model includes at least a perturbation-aware teacher network, and the perturbation-aware teacher network is configured to obtain the perturbation detection result of the image sample based on the liveness feature map of the image sample input to the preset liveness detection teacher model; and Based on the first image sample set and the liveness detection teacher model, a knowledge distillation is performed on a preset liveness detection student model to obtain a liveness detection model. The preset liveness detection student model includes at least a preset perturbation-aware student network, which is configured to obtain perturbation detection results for the image samples based on the liveness feature maps of the image samples input to the preset liveness detection student model. The knowledge distillation includes: Based on the disturbance detection results output by the disturbance-aware teacher network and the preset disturbance-aware student network, disturbance-aware loss information is determined, and based on the student detection loss information, the preset student model is converged to obtain the liveness detection model, wherein the student detection loss information includes at least the disturbance-aware loss information.
2. The training method for the liveness detection model as described in claim 1, wherein, The M first perturbation image samples are obtained by perturbing the N first initial image samples using a sample perturbation model; as well as The training method for the sample perturbation model includes: Based on a preset sample perturbation model, at least one first perturbation image sample is determined that corresponds to the N first initial image samples. Determine at least one perturbation feature loss information between the at least one first perturbation image sample and the first initial image sample, and The preset sample perturbation model is converged based on the loss information of multiple perturbation features corresponding to the N first initial image samples to obtain the sample perturbation model.
3. The training method for the liveness detection model as described in claim 2, wherein, Determining the at least one perturbation feature loss information between the at least one first perturbation image sample and the first initial image sample includes: Determine at least one perturbation feature difference information between the at least one first perturbation image sample and the first initial image sample; Obtain the preset perturbation feature difference threshold; and The at least one perturbation feature loss information is determined based on the preset perturbation feature difference threshold and the at least one perturbation feature difference information.
4. The training method for the liveness detection model as described in claim 1, wherein, The step of training the preset liveness detection teacher model based on the first image sample set to obtain the liveness detection teacher model includes: For each first image sample in the first image sample set: The current first image sample is input into the preset liveness detection teacher model to obtain a first liveness detection result and a first perturbation detection result. Based on the first liveness detection result and the first perturbation detection result, the teacher detection loss information corresponding to the current first image sample is determined. The preset liveness detection teacher model is converged based on multiple teacher detection loss information corresponding to the first image sample set to obtain the liveness detection teacher model.
5. The training method for the liveness detection model as described in claim 4, wherein, The preset liveness detection teacher model includes a preset liveness feature extraction teacher network, a preset liveness detection teacher network, and a preset perturbation perception teacher network; as well as The step of inputting the current first image sample into the preset liveness detection teacher model to obtain the first liveness detection result and the first perturbation detection result includes: The current first image sample is input into the preset liveness feature extraction teacher network to obtain the first liveness feature map. The first liveness feature map is input into the preset liveness detection teacher network to obtain the first liveness detection result, and The first liveness feature map is input into the preset perturbation perception teacher network to obtain the first perturbation detection result.
6. The training method for the liveness detection model as described in claim 5, wherein, The step of determining the teacher detection loss information corresponding to the current first image sample based on the first liveness detection result and the first perturbation detection result includes: Obtain the first liveness annotation information and the first perturbation annotation information of the current first image sample; Based on the first liveness detection result and the first liveness annotation information, determine the first liveness classification loss information corresponding to the current first image sample; and Based on the first disturbance detection result and the first disturbance annotation information, the first disturbance perception loss information corresponding to the current first image sample is determined. The teacher detection loss information includes the first liveness classification loss information and the first disturbance perception loss information.
7. The training method for the liveness detection model as described in claim 6, wherein, The first disturbance detection result includes a first disturbance type prediction result and a first disturbance intensity prediction result, and the first disturbance labeling information includes a first disturbance labeling type and a first disturbance labeling intensity; as well as The step of determining the first perturbation perception loss information of the preset liveness detection teacher model based on the first perturbation detection result and the first perturbation annotation information includes: Based on the prediction result of the first perturbation type and the first perturbation labeling type, the first perturbation type loss information corresponding to the current first image sample is determined, and Based on the first perturbation intensity prediction result and the first perturbation annotation intensity, the first perturbation intensity loss information corresponding to the current first image sample is determined. The first disturbance perception loss information includes the first disturbance type loss information and the first disturbance intensity loss information.
8. The training method for the liveness detection model as described in claim 1, wherein, The step of performing knowledge distillation on a preset liveness detection student model based on the first image sample set and the liveness detection teacher model to obtain a liveness detection model includes: For each first image sample in the first image sample set: The current first image sample is input into the liveness detection teacher model to obtain the teacher detection result, and the current first image sample is input into the preset liveness detection student model to obtain the student detection result. Based on the first liveness annotation information of the current first image sample, the teacher detection result, and the student detection result, the student detection loss information corresponding to the current first image sample is determined. The preset liveness detection student model is converged based on multiple student detection loss information corresponding to the first image sample set to obtain the liveness detection model.
9. The training method for the liveness detection model as described in claim 8, wherein, The liveness detection teacher model also includes a liveness feature extraction teacher network; as well as The step of inputting the current first image sample into the liveness detection teacher model to obtain the teacher detection result includes: The current first image sample is input into the liveness feature extraction teacher network to obtain a second liveness feature map, and The second liveness feature map is input into the perturbation-aware teacher network to obtain the second perturbation detection result. The teacher detection results include the second liveness feature map and the second perturbation detection results.
10. The training method for the liveness detection model as described in claim 9, wherein, The preset liveness detection student model also includes a preset liveness feature extraction student network and a preset liveness detection student network; as well as The step of inputting the current first image sample into the preset liveness detection student model to obtain the student detection result includes: The current first image sample is input into the preset liveness feature extraction student network to obtain the third liveness feature map. The third liveness feature map is input into the preset liveness detection student network to obtain the second liveness detection result, and The third liveness feature map is input into the preset perturbation-aware student network to obtain the third perturbation detection result. The student detection results include the third liveness feature map, the second liveness detection results, and the third disturbance detection results.
11. The training method for the liveness detection model as described in claim 10, wherein, The step of determining the student detection loss information corresponding to the current first image sample based on the first liveness annotation information of the current first image sample, the teacher detection result, and the student detection result includes: Based on the second live feature map and the third live feature map, the feature distillation loss information corresponding to the current first image sample is determined; Based on the first liveness annotation information and the second liveness detection result, determine the second liveness classification loss information corresponding to the current first image sample; and Based on the second disturbance detection result and the third disturbance detection result, the second disturbance perception loss information corresponding to the current first image sample is determined. The student detection loss information includes the feature distillation loss information, the second liveness classification loss information, and the second perturbation perception loss information.
12. The training method for the liveness detection model as described in claim 11, wherein, The second disturbance detection result includes a second disturbance type prediction result and a second disturbance intensity prediction result, and the third disturbance detection result includes a third disturbance type prediction result and a third disturbance intensity prediction result; as well as The step of determining the second perturbation perception loss information corresponding to the current first image sample based on the second perturbation detection result and the third perturbation detection result includes: Based on the prediction results of the second perturbation type and the third perturbation type, the second perturbation type loss information corresponding to the current first image sample is determined, and Based on the second and third perturbation intensity prediction results, the second perturbation intensity loss information corresponding to the current first image sample is determined. The second disturbance perception loss information includes the second disturbance type loss information and the second disturbance intensity loss information.
13. The training method for the liveness detection model as described in claim 1, wherein, Also includes: Obtain a second image sample set, which includes L second image samples, where L is a natural number greater than 1; as well as Based on the second image sample set and the liveness detection model, a preset hierarchical cropping model is trained to obtain a hierarchical cropping model. The hierarchical cropping model is configured to dynamically crop the network layers of the liveness detection model according to the input image.
14. The training method for the liveness detection model as described in claim 13, wherein, The liveness detection model includes K initial network layers, where K is a natural number greater than 1; as well as The step of training the preset hierarchical cropping model based on the second image sample set and the liveness detection model to obtain the hierarchical cropping model includes: For each second image sample in the L sets of second image samples: The current second image sample is input into the preset hierarchical cropping model to obtain the selection probability of each of the K initial network layers. Based on the selection probabilities, the liveness detection model is pruned to obtain a liveness detection pruning model. The current second image sample is then input into the liveness detection pruning model to obtain a third liveness detection result. Based on the selection probabilities and the third liveness detection result, the pruning loss information corresponding to the current second image sample is determined. The preset hierarchical pruning model is converged based on the L pruning loss information corresponding to the L second image samples to obtain the hierarchical pruning model.
15. The training method for the liveness detection model as described in claim 14, wherein, The step of determining the pruning loss information corresponding to the current second image sample based on the selection probability and the third liveness detection result includes: Obtain a preset probability sparse threshold and the second liveness annotation information of the current second image sample; Based on the selection probability and the preset probability sparsity threshold, determine the pruning probability sparsity loss information corresponding to the current second image sample; and Based on the third liveness detection result and the second liveness annotation information, the pruned liveness classification loss information corresponding to the current second image sample is determined. The pruning loss information includes the pruning probability sparsity loss information and the pruning live classification loss information.
16. A training system for a liveness detection model, comprising: At least one storage medium storing at least one instruction set for training a liveness detection model; as well as At least one processor is communicatively connected to the at least one storage medium. When the training system of the liveness detection model is running, the at least one processor reads the at least one instruction set and executes the training method of the liveness detection model according to any one of the at least one instruction set.
17. A liveness detection method, applied to a terminal detection device, comprising: Obtain the target image of the target object; Based on the target image, a liveness detection model is used to perform liveness detection on the target object to obtain a liveness detection result, wherein the liveness detection model is trained using the training method described in claim 1; and Output the liveness detection results.
18. The liveness detection method as described in claim 17, wherein, The step of performing liveness detection on the target object based on the target image using a liveness detection model to obtain liveness detection results includes: Based on the target image, a hierarchical pruning model is used to prune the liveness detection model, resulting in a liveness detection pruned model; and The target image is input into the liveness detection pruning model to obtain the liveness detection result of the target object.
19. The liveness detection method as described in claim 18, wherein, The liveness detection model includes K initial network layers, where K is a natural number greater than 1; as well as The step of pruning the liveness detection model based on the target image using the hierarchical pruning model to obtain a liveness detection pruned model includes: The target image is input into the hierarchical cropping model to obtain the selection probability of each of the K initial network layers, and Based on the selection probability, S initial network layers are determined to satisfy preset conditions, where S is a natural number greater than 1 and S is less than K, and the liveness detection pruning model includes the S initial network layers.
20. The liveness detection method as described in claim 19, wherein, The preset conditions include: The selection probability is greater than or equal to a preset probability threshold.
21. A liveness detection system, comprising: At least one storage medium, including at least one instruction set, for implementation analysis of the liveness detection method; as well as At least one processor is communicatively connected to the at least one storage medium. When the system is running, the at least one processor reads the at least one instruction set and executes the liveness detection method according to any one of claims 17-20.