Method for training live body detection model, live body detection method and related equipment
By constructing a cross-modal supervised loss to adjust the parameters of the liveness detection model, the problem of inaccurate liveness detection results in the existing technology is solved, and a higher accuracy of liveness discrimination is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG DAHUA TECH CO LTD
- Filing Date
- 2023-01-05
- Publication Date
- 2026-06-12
AI Technical Summary
Existing liveness detection methods do not provide accurate liveness determination results.
By acquiring training image pairs, the first and second feature extraction branches of the liveness detection model are used to extract features from the first and second modal images, respectively. A cross-modal supervised loss is constructed, and the parameters of the liveness detection model are adjusted based on this loss to achieve mutual supervision and enhancement of cross-modal feature extraction.
This improves the ability of the features extracted by the liveness detection model to represent the liveness category, and enhances the accuracy of liveness discrimination results obtained based on features during the application stage.
Smart Images

Figure CN116052286B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of target recognition technology, and in particular to a training method for a liveness detection model, a liveness detection method, an electronic device, and a computer-readable storage medium. Background Technology
[0002] Target recognition technology can be applied to many fields such as turnstiles, attendance machines, and facial recognition payment. The general process of target recognition technology involves acquiring an image of the target and then performing recognition based on that image. However, target recognition technology carries the risk of attacks using disguised targets. That is, the acquired image may not be a photograph of the real target (a living object), but rather an image of a disguised target (a non-living object). Disguised targets could be, for example, paper photographs of the target, electronic images of the target, or 3D models.
[0003] Therefore, it is necessary to perform liveness detection on the target image during the target recognition process to determine whether the target is a living person. However, the liveness detection methods in the prior art do not provide accurate liveness determination results. Summary of the Invention
[0004] This application provides a training method for a liveness detection model, a liveness detection method, an electronic device, and a computer-readable storage medium, which can solve the problem that the liveness detection results obtained by existing liveness detection methods are not accurate enough.
[0005] To address the aforementioned technical problems, this application provides a technical solution: a training method for a liveness detection model. The method includes: acquiring training image pairs, each pair comprising a first modality image and a second modality image, the first and second modality images originating from training objects of the same liveness category; extracting features from the first modality image using a first feature extraction branch of the liveness detection model to obtain a first feature, the first feature including a first cross-modal supervised feature; extracting features from the second modality image using a second feature extraction branch of the liveness detection model to obtain a second feature, the second feature including a second cross-modal supervised feature; constructing a cross-modal supervised loss based on the first and second cross-modal supervised features; and adjusting the parameters of the liveness detection model based at least on the cross-modal supervised loss.
[0006] To address the aforementioned technical problems, this application provides a liveness detection method. The method includes: acquiring an image of a target object, the image of which includes a first modality and / or a second modality image; extracting features from the image of the target object using a liveness detection model to obtain liveness discrimination features; and obtaining a liveness discrimination result of the target object based on the liveness discrimination features; wherein the liveness detection model is trained using the training method described above.
[0007] To solve the above-mentioned technical problems, another technical solution adopted in this application is: to provide an electronic device, which includes a processor and a memory connected to the processor, wherein the memory stores program instructions; the processor is used to execute the program instructions stored in the memory to implement the above-mentioned method.
[0008] To solve the above-mentioned technical problems, another technical solution adopted in this application is to provide a computer-readable storage medium storing program instructions that, when executed, can implement the above-mentioned method.
[0009] Through the above method, this application obtains training image pairs, which include first modality images and second modality images from training objects of the same liveness category. Features are extracted from the first modality image using the first feature extraction branch of the liveness detection model to obtain first cross-modal supervised features. Features are extracted from the second modality image using the second feature extraction branch of the liveness detection model to obtain second cross-modal supervised features. A cross-modal supervised loss is constructed based on the first and second cross-modal supervised features, and the parameters of the liveness detection model are adjusted based on the cross-modal supervised loss. Since the first and second modality images contain information about training objects of the same liveness category in different modalities, training the liveness detection model using the cross-modal supervised loss enables cross-modal mutual supervision of the feature extraction processes of the first and second feature extraction branches. This achieves mutual supplementation and enhancement of information between the features extracted by the first and second feature extraction branches, thereby improving the expressive power of the features extracted by the liveness detection model for the liveness category and improving the accuracy of the liveness discrimination results obtained based on the features during the application stage. Attached Figure Description
[0010] Figure 1 This is a flowchart illustrating an embodiment of the training method for the liveness detection model of this application;
[0011] Figure 2 This is a flowchart illustrating another embodiment of the training method for the liveness detection model of this application;
[0012] Figure 3 This is a schematic diagram of the training structure of a liveness detection model;
[0013] Figure 4 This is a schematic diagram of central difference convolution;
[0014] Figure 5 This is a schematic flowchart of an embodiment of the liveness detection method of this application;
[0015] Figure 6 This is a schematic diagram of the structure of an embodiment of the electronic device of this application;
[0016] Figure 7 This is a schematic diagram of the structure of an embodiment of the computer-readable storage medium of this application. Detailed Implementation
[0017] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.
[0018] The terms "first," "second," and "third" used in this application are for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Therefore, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of that feature. In the description of this application, "multiple" means at least two, and "several" means one or more, unless otherwise explicitly specified.
[0019] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments without conflict.
[0020] Figure 1 This is a schematic flowchart of one embodiment of the training method for the liveness detection model of this application. It should be noted that if substantially the same result is obtained, this embodiment is not necessarily identical. Figure 1 The illustrated process sequence is limited. For example... Figure 1 As shown, this embodiment may include:
[0021] S11: Obtain training image pairs.
[0022] The training image pair includes a first modality image and a second modality image, which are from training objects of the same liveness category.
[0023] The execution subject of the training method embodiment of this application is a training device, which can be an electronic device in the form of a computer, mobile phone, server, etc.
[0024] The training subjects can be live objects, such as humans or animals (e.g., dogs), or human or animal faces; they can also be non-live objects, such as paper photographs, electronic images, or 3D models masquerading as live objects. Image modalities available for liveness detection include, but are not limited to, near-infrared, visible light, and depth modalities. The first modality of an image can be one of the near-infrared, visible light, or depth modalities, and the second modality of an image can also be one of the near-infrared, visible light, or depth modalities. For example, the first modality image might be a visible light image, and the second modality image might be a near-infrared image. The first and second modalities are different; they are relative terms and can be interchanged according to actual needs.
[0025] Multiple training image pairs are used to train the liveness detection model. The first modality image and the second modality image in the same training image pair come from training objects of the same liveness category. The liveness category includes live and non-live objects. Training objects of the same liveness category can be further divided into training objects of the same liveness category and the same ID, and training objects of the same liveness category but different IDs. If the first modality image and the second modality image come from training objects of the same liveness category and the same ID, then the first modality image and the second modality image can be captured at the same time or at different times. At least two training image pairs come from training objects of different liveness categories. For simplicity, this application embodiment only uses one training image pair for illustration.
[0026] In some embodiments, the training image pairs may also include liveness labels that characterize the true liveness category of the training objects, depending on the training requirements.
[0027] S12: Use the first feature extraction branch of the liveness detection model to extract features from the first modality image to obtain the first feature.
[0028] The first feature includes the first cross-modal supervision feature.
[0029] The first cross-modal supervised feature can be used for a cross-modal supervised training task of a first modality image. The first cross-modal supervised feature may include at least one of a supervised feature of the first modality and / or a supervised feature of the first modality. In some embodiments, the supervised feature of the first modality is a frequency domain feature, and the supervised feature of the first modality is a convolutional feature.
[0030] S13: Use the second feature extraction branch of the liveness detection model to extract features from the second modality image to obtain the second feature.
[0031] The second feature includes the second cross-modal supervision feature.
[0032] The second cross-modal supervised feature can be used for training tasks involving cross-modal supervision of the second modality image. The second cross-modal supervised feature may include supervised features of the second modality and / or supervised features of the second modality. In some embodiments, both the supervised features of the second modality and the supervised features of the second modality are convolutional features.
[0033] S14: Construct cross-modal supervision loss based on the first cross-modal supervision features and the second cross-modal supervision features.
[0034] Cross-modal supervision loss can be, but is not limited to, L2 regression loss.
[0035] Cross-modal supervised loss can include the cross-modal supervised loss corresponding to the first feature extraction branch and / or the cross-modal supervised loss corresponding to the second feature extraction branch. The cross-modal supervised loss corresponding to the first feature extraction branch can be constructed based on the supervised features of the first modality and the supervised features of the second modality, and the cross-modal supervised loss corresponding to the second feature extraction branch can be constructed based on the supervised features of the first modality and the supervised features of the second modality.
[0036] In some embodiments, when the supervised features of the first modality are frequency domain features, S14 may include: converting the supervised features of the second modality to the frequency domain; and constructing a cross-modal supervision loss corresponding to the second feature extraction branch based on the difference between the supervised features of the first modality and the supervised features of the converted second modality.
[0037] In some embodiments, S14 may include: translating the supervised features of the first modality into the second modality; and constructing a cross-modal supervision loss corresponding to the first feature extraction branch based on the difference between the translated supervised features of the first modality and the supervised features of the second modality.
[0038] S15: Adjust the parameters of the liveness detection model based at least on cross-modal supervised loss.
[0039] The parameters of the first feature extraction branch can be adjusted based on the cross-modal supervised loss corresponding to the first feature extraction branch, and the parameters of the second feature extraction branch can be adjusted based on the cross-modal supervised loss corresponding to the second feature extraction branch.
[0040] Through the implementation of this embodiment, this application obtains training image pairs, which include a first modality image and a second modality image from training objects of the same liveness category. Features are extracted from the first modality image using the first feature extraction branch of the liveness detection model to obtain first cross-modal supervised features. Features are extracted from the second modality image using the second feature extraction branch of the liveness detection model to obtain second cross-modal supervised features. A cross-modal supervised loss is constructed based on the first and second cross-modal supervised features, and the parameters of the liveness detection model are adjusted based on the cross-modal supervised loss. Since the first and second modality images contain information about training objects of the same liveness category in different modalities, training the liveness detection model using the cross-modal supervised loss enables cross-modal mutual supervision of the feature extraction processes of the first and second feature extraction branches. This achieves mutual supplementation and enhancement of information between the features extracted by the first and second feature extraction branches, thereby improving the expressive power of the features extracted by the liveness detection model for the liveness category, and ultimately improving the accuracy of the liveness discrimination results obtained based on the features during the application stage.
[0041] Furthermore, in some embodiments, the first cross-modal supervision feature can also be used for training tasks such as liveness detection of training objects. In some embodiments, the first feature may also include a first liveness detection feature, the first cross-modal supervision feature is focused on the training task of cross-modal supervision, and the first liveness detection feature is applied to the training task of liveness detection.
[0042] In some embodiments, the first feature extraction branch may include at least two separate first feature extraction sub-branches, where "separate" means not sharing network layers. Each first feature extraction sub-branch may include several first feature extraction layers, which are sequentially connected, with each subsequent first feature extraction layer extracting features based on the preceding layer. Different features in the first feature may be extracted by different first feature extraction sub-branches or different first feature extraction layers. For example, in the first cross-modal supervised feature, the supervised feature of the first modality and the first liveness detection feature are extracted by the same first feature extraction sub-branch; in the first cross-modal supervised feature, the supervised feature of the first modality and the supervised feature of the first modality are extracted by different first feature extraction sub-branches. The extraction order of the first liveness detection feature and the supervised feature of the first modality is not restricted. That is, the supervised feature of the first modality can be extracted before (or after) the first liveness detection feature (or based on the supervised feature of the first modality). When the supervised feature of the first modality is a frequency domain feature, the corresponding first feature extraction sub-branch can extract the frequency domain feature through methods such as Fourier transform. For example, the first modality image can be downsampled after interpolation, or LBP or Hog feature extraction can be performed first, followed by downsampling, and then Fourier transform of the downsampling result to obtain the frequency domain feature. When the supervised feature of the first modality and the first liveness detection feature are convolutional features, the structure of the corresponding first feature extraction sub-branch can be VGG, ResNet, etc.
[0043] In some embodiments, the second cross-modal supervision feature can also be used for training tasks such as liveness detection of training objects. In some embodiments, the second feature may also include a second liveness detection feature, wherein the second cross-modal supervision feature is focused on cross-modal supervision training tasks, and the second liveness detection feature is applied to liveness detection training tasks.
[0044] In some embodiments, the second feature extraction branch may include a plurality of sequentially connected second feature extraction layers, wherein different features in the second feature are extracted by different second feature extraction layers. For example, the second cross-modal supervised feature includes a supervised feature of the second modality and a supervised feature of the second modality, wherein the supervised feature of the second modality, the supervised feature of the second modality, and the second liveness detection feature are each extracted by different second feature extraction layers.
[0045] Based on this, the following extensions can be made to the aforementioned embodiments:
[0046] Figure 2This is a schematic flowchart of another embodiment of the training method for the liveness detection model of this application. It should be noted that if substantially the same result is obtained, this embodiment is not necessarily identical. Figure 2 The illustrated process sequence is limited. In this embodiment, steps S21 to S23 are steps that may be included before S15, and S24 is a further extension of S15. The content repeated in the above embodiments will not be elaborated upon in this embodiment. The first feature also includes a first liveness detection feature, the second feature also includes a second liveness detection feature, and the training image pair also includes a liveness label representing the true liveness category of the training object. Figure 2 As shown, this embodiment may include:
[0047] S21: The first liveness detection feature and the second liveness detection feature are fused to obtain the liveness detection feature.
[0048] The first cross-modal supervision feature and the first liveness detection feature can be the same feature, or they can be extracted from different sub-branches of the first feature extraction branch, or from different first feature extraction layers of the first feature extraction sub-branch. The second cross-modal supervision feature and the second liveness detection feature are similarly analyzed. Fusion methods include, but are not limited to, multiplication and addition.
[0049] S22: Perform liveness detection on the training objects based on liveness detection features to obtain the liveness detection results of the training objects.
[0050] The liveness detection result represents the predicted liveness category of the training object and is consistent with the liveness label format. Specifically, the liveness detection result can include the probability that the training object is alive and the probability that it is not alive. If the probability of being alive is greater than the probability of being not alive, then the predicted liveness category of the training object is alive; otherwise, the predicted liveness category of the training object is not alive.
[0051] S23: Construct a liveness discrimination loss based on the difference between the liveness discrimination result and the liveness label.
[0052] S24: Adjust the parameters of the liveness detection model based on liveness discrimination loss and cross-modal supervision loss.
[0053] The training of the liveness detection model can be performed simultaneously or in stages using both liveness detection training tasks (liveness detection loss) and cross-modal supervision training tasks (cross-modal supervision loss), with no restriction on the order. For example, in the simultaneous training, the liveness detection loss and cross-modal supervision loss can be weighted, and the parameters of the liveness detection model can be adjusted based on the weighting result. The weights of the liveness detection loss and the modal supervision feature loss can be the same or different; for example, the weight of the liveness detection loss can be larger. Alternatively, in the staged training, the parameters of the liveness detection model can be adjusted based on the cross-modal supervision loss to obtain a liveness detection model that meets the cross-modal supervision conditions; the parameters of the liveness detection model that meets the cross-modal supervision conditions can be adjusted based on the liveness detection loss to obtain a liveness detection model that meets both the cross-modal supervision conditions and the liveness detection conditions. The liveness detection conditions and cross-modal supervision conditions can include, but are not limited to, achieving the expected number of training iterations, achieving the expected training time, and achieving the expected training results.
[0054] Unlike other embodiments, this embodiment combines information from features of two different modalities for liveness detection, constructs a liveness detection loss based on the liveness detection results, and trains the liveness detection model based on the liveness detection loss. This can further improve the ability of the features extracted by the liveness detection model to express the liveness category, and further improve the accuracy of the liveness detection results obtained based on the features during the application stage.
[0055] The training method provided in this application is illustrated below with an example:
[0056] See also Figure 3 , Figure 3 This is a schematic diagram of the training structure of a liveness detection model, as shown below. Figure 3 As shown, the liveness detection model includes a first feature extraction branch, a second feature extraction branch, a translation layer, a conversion layer, and a liveness discriminator. The first feature extraction branch includes two separate first feature extraction sub-branches: a frequency domain feature extraction sub-branch and a convolutional feature extraction sub-branch. The convolutional feature extraction sub-branch includes several first convolutional feature extraction layers M1i, and the second feature extraction branch includes several second convolutional feature extraction layers M2i.
[0057] 1) Acquire training image pairs, including visible light and near-infrared images from the same face, and liveness tags for the face. The first mode is the visible light mode, and the second mode is the near-infrared mode.
[0058] 2) Input the visible light image into the frequency domain feature extraction sub-branch and the convolution feature extraction sub-branch included in the first feature extraction branch, and input the near-infrared image into the second feature extraction branch.
[0059] 3) Processing of the first feature extraction branch: The frequency domain feature extraction sub-branch extracts the frequency domain features of the visible light image (supervised features of the first modality); M11 of the convolutional feature extraction sub-branch processes the visible light image, outputting the supervised features of the first modality; then M12 continues to process the supervised features of the first modality, outputting the first liveness detection features. For example, M11 includes a basic feature extraction layer and a higher-level feature extraction layer. The basic feature extraction layer can use central difference convolution to process the visible light image to obtain fine-grained features, and the higher-level feature extraction layer can process the fine-grained features to obtain the supervised features of the first modality. See the example of central difference convolution processing. Figure 4 .
[0060] 4) Processing of the second feature extraction branch: M21 processes the near-infrared image and outputs a supervised feature of the second modality; then M22 continues to process the supervised feature of the second modality and outputs the supervised feature of the second modality; then M23 processes the supervised feature of the second modality and outputs the second liveness detection feature. The network structure of M21 can be similar to that of M11.
[0061] 5) Translation layer processing: The supervised features of the first modality are translated from the first modality into the second modality to obtain the translated supervised features of the first modality.
[0062] 6) Processing of the transformation layer: The supervised features of the second mode are transformed to the frequency domain to obtain the transformed supervised features of the second mode.
[0063] 7) Based on the difference between the supervised features of the first modality and the supervised features of the transformed second modality, construct the cross-modal supervised loss corresponding to the second feature extraction branch; based on the difference between the supervised features of the translated first modality and the supervised features of the second modality, construct the cross-modal supervised loss corresponding to the first feature extraction branch. The specific formula can be as follows:
[0064]
[0065]
[0066] Among them, L GD1 L GD2 These represent the cross-modal supervision losses for the second feature extraction branch and the first feature extraction branch, respectively, where n represents the total number of dimensions. Let the j-th dimension of the supervised features of the transformed second modality be represented. Let the j-th dimension of the supervised features of the first mode be represented. This represents the j-th dimension of the supervised features of the translated first modality. Let j represent the j-th dimension of the supervised features of the second mode.
[0067] 8) The first and second liveness detection features are fused to obtain liveness detection features. The liveness discriminator obtains the liveness detection result based on the liveness detection features. Based on the difference between the liveness detection result and the liveness label, a liveness detection loss is constructed. The specific formula can be as follows:
[0068]
[0069] Among them, L cls This represents the loss in liveness detection, where C represents the total number of liveness categories. p represents the probability of the j-th liveness category in the liveness tags. j This represents the probability of the j-th liveness category in the liveness detection results.
[0070] 9) Weight each loss statement to obtain the final loss of the liveness detection model; adjust the parameters of the liveness detection model based on the final loss. The specific formula for the final loss can be as follows:
[0071] L all =γ1L GD1 +γ2L GD2 +γ3L cls ;
[0072] γ1, γ2, and γ3 can be set according to requirements. For example, γ1 and γ2 can be set to 0.4, and γ3 to 0.6.
[0073] It should be noted that in some embodiments, the number of modalities in the training image pairs can be adaptively increased, that is, the training image pairs can be expanded into training image groups, and the training image groups include images of at least three modalities. Correspondingly, the feature extraction branch of the liveness detection model can be adaptively increased, that is, the liveness detection model includes at least three feature extraction branches. The processing after the increase can be derived based on the case of two modalities, and will not be elaborated here.
[0074] The liveness detection model trained using any of the above embodiments can be used in liveness detection methods. Specifically, it can be as follows:
[0075] Figure 5 This is a schematic flowchart of an embodiment of the liveness detection method of this application. It should be noted that if substantially the same result is obtained, this embodiment is not necessarily identical. Figure 5 The illustrated process sequence is limited. For example... Figure 5 As shown, this embodiment may include:
[0076] S31: Obtain the image of the object to be detected.
[0077] The image of the object to be detected includes the first modality and / or the second modality image to be detected.
[0078] The execution subject of the liveness detection method embodiment of this application is an application device, which can be an electronic device in the form of a computer, mobile phone, server, etc.
[0079] S32: Use a liveness detection model to extract features from the image of the object to be detected, and obtain liveness discrimination features.
[0080] The liveness detection features include a first liveness detection feature and / or a second liveness detection feature of the object to be detected. The first liveness detection feature can be obtained by extracting features from the first modality of the image to be detected by the first feature extraction branch, and the second liveness detection feature can be obtained by extracting features from the second modality of the image to be detected by the second feature extraction branch.
[0081] S33: Obtain the liveness detection result of the object to be detected based on the liveness detection features.
[0082] When the liveness detection features include a first liveness detection feature and a second liveness detection feature, the first liveness detection feature and the second liveness detection feature can be fused, and the fused result can be used to perform liveness detection to obtain the liveness detection result.
[0083] For further detailed descriptions of this embodiment, please refer to the preceding embodiments, which will not be repeated here.
[0084] Through the implementation of this embodiment, since the liveness detection model is trained using the training method of the previous embodiment, the liveness detection model extracts liveness discrimination features with strong expressive power and high accuracy, resulting in high accuracy of the liveness discrimination result.
[0085] Figure 6 This is a schematic diagram of the structure of an embodiment of the electronic device of this application. Figure 6 As shown, the electronic device includes a processor 21 and a memory 22 coupled to the processor 21.
[0086] The memory 22 stores program instructions for implementing the methods of any of the above embodiments; the processor 21 executes the program instructions stored in the memory 22 to implement the steps of the above method embodiments. The processor 21 may also be referred to as a CPU (Central Processing Unit). The processor 21 may be an integrated circuit chip with signal processing capabilities. The processor 21 may also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. The general-purpose processor may be a microprocessor or any conventional processor.
[0087] In this embodiment, the electronic device may be the aforementioned training device or application device.
[0088] Figure 7 This is a schematic diagram of the structure of an embodiment of the computer-readable storage medium of this application. Figure 7 As shown, the computer-readable storage medium 30 of this application embodiment stores program instructions 31, which, when executed, implement the methods provided in the above embodiments of this application. The program instructions 31 can form a program file and be stored in the computer-readable storage medium 30 in the form of a software product, so that a computer device (which may be a personal computer, server, or network device, etc.) or processor can execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned computer-readable storage medium 30 includes various media capable of storing program code, such as a USB flash drive, portable hard drive, read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk, or terminal devices such as computers, servers, mobile phones, and tablets.
[0089] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces, or indirect coupling or communication connection between apparatuses or units, and may be electrical, mechanical, or other forms.
[0090] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated units described above can be implemented in hardware or as software functional units. The above are merely embodiments of this application and do not limit the patent scope of this application. Any equivalent structural or procedural transformations made based on the description and drawings of this application, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.
Claims
1. A training method for a liveness detection model, characterized in that, include: Acquire training image pairs, the training image pairs including a first modality image and a second modality image, the first modality image and the second modality image being from the same liveness category of training objects; The first feature is obtained by extracting features from the first modality image using the first feature extraction branch of the liveness detection model. The first feature includes a first cross-modal supervised feature, which includes a supervised feature of the first modality. The supervised feature of the first modality is a frequency domain feature. The second feature is extracted from the second modality image using the second feature extraction branch of the liveness detection model to obtain the second feature. The second feature includes the second cross-modal supervised feature, which includes the supervised feature of the second modality. Constructing a cross-modal supervision loss based on the first cross-modal supervision feature and the second cross-modal supervision feature includes: transforming the supervised features of the second modality to the frequency domain; Based on the difference between the supervised features of the first modality and the supervised features of the transformed second modality, the cross-modal supervision loss corresponding to the second feature extraction branch is constructed; The parameters of the liveness detection model are adjusted based at least on the cross-modal supervision loss.
2. The method according to claim 1, characterized in that, The first cross-modal supervision feature includes the supervised features of the first modality, and the second cross-modal supervision feature includes the supervision features of the second modality. The construction of the cross-modal supervision loss based on the first and second cross-modal supervision features includes: The supervised features of the first modality are translated from the first modality into the second modality; Based on the difference between the supervised features of the first modality and the supervised features of the second modality after translation, the cross-modal supervised loss corresponding to the first feature extraction branch is constructed.
3. The method according to claim 2, characterized in that, The first modal image is a visible light image, and the second modal image is a near-infrared image.
4. The method according to claim 1, characterized in that, The first feature further includes a first liveness detection feature, the second feature further includes a second liveness detection feature, the training image pair further includes a liveness label representing the true liveness category of the training object, and the method further includes: The first liveness detection feature and the second liveness detection feature are fused to obtain the liveness detection feature; Based on the liveness detection features, the training object is subjected to liveness detection to obtain the liveness detection result of the training object; A liveness detection loss is constructed based on the difference between the liveness detection result and the liveness label; The adjustment of the parameters of the liveness detection model based at least on the cross-modal supervised loss includes: The parameters of the liveness detection model are adjusted based on the liveness discrimination loss and the cross-modal supervision loss.
5. The method according to claim 4, characterized in that, The adjustment of the parameters of the liveness detection model based on the liveness discrimination loss and the cross-modal supervision loss includes: The parameters of the liveness detection model are adjusted based on the cross-modal supervision loss to obtain a liveness detection model that satisfies the cross-modal supervision conditions. The parameters of the liveness detection model that satisfies the cross-modal supervision conditions are adjusted based on the liveness discrimination loss to obtain a liveness detection model that satisfies both the cross-modal supervision conditions and the liveness discrimination conditions.
6. The method according to claim 5, characterized in that, The first feature extraction branch includes two separate first feature extraction sub-branches. The first cross-modal supervised feature also includes supervised features of the first modality. The supervised features of the first modality and the first liveness detection feature are extracted by the same first feature extraction sub-branch. The supervised features of the first modality and the supervised features of the first modality are extracted by different first feature extraction sub-branches, respectively; and / or The second feature extraction branch includes several second feature extraction layers connected in sequence. The second cross-modal supervised feature also includes the supervised feature of the second modality. The supervised feature of the second modality, the supervised feature of the second modality, and the second liveness detection feature are extracted by different second feature extraction layers.
7. A method for detecting liveness, characterized in that, include: Acquire an image of the object to be detected, wherein the image of the object to be detected includes a first modality and / or a second modality image to be detected; The image of the object to be detected is used to extract features using a liveness detection model to obtain liveness discrimination features; The liveness detection result of the object to be detected is obtained based on the liveness detection features; The liveness detection model is trained using any one of the methods in claims 1-6.
8. An electronic device, characterized in that, Includes a processor and a memory connected to the processor, wherein, The memory stores program instructions; The processor is configured to execute the program instructions stored in the memory to implement the method of any one of claims 1-7.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores program instructions that can be executed by a processor to implement the method as described in any one of claims 1-7.