A living body detection method, device and equipment
By generating perturbation source domain images and adjusting the network parameters of the liveness detection model, the problems of low accuracy and poor cross-domain applicability in liveness detection are solved, achieving high-accuracy liveness detection and cross-domain generalization.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HANGZHOU HIKVISION DIGITAL TECHNOLOGY CO LTD
- Filing Date
- 2023-04-26
- Publication Date
- 2026-06-23
AI Technical Summary
Existing liveness detection technologies have low accuracy in verifying whether the current operation is a real live operation, and are prone to model degradation or failure in cross-domain situations, making them unsuitable for all scenarios.
By acquiring source domain images and natural images, frequency domain transformation is performed to generate perturbed source domain images. The network parameters of the initial liveness detection model are adjusted to obtain the target liveness detection model, which is then used for liveness detection.
It improves the accuracy of liveness detection, enhances the model's generalization ability in cross-domain scenarios, and can maintain a high detection accuracy even with limited data in multiple scenarios.
Smart Images

Figure CN116486493B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, and in particular to a method, apparatus and equipment for liveness detection. Background Technology
[0002] With the development of identity verification technology, identity verification systems are used for identification in various application scenarios such as banks, train stations, and airports. However, identity verification systems are vulnerable to attacks, leading to security risks. To ensure the stability and security of identity verification systems, liveness detection is gradually becoming more widespread.
[0003] Liveness detection is a method to determine a user's true physiological characteristics. Liveness detection can verify whether the current operation is a real live operation by using technologies such as key point localization and key point detection through combined actions such as blinking, opening the mouth, shaking the head, and nodding. It can effectively resist attacks such as photos, videos, face swaps, masks, occlusions, 3D animations, and screen replays, thereby helping users identify attack behaviors and protecting their interests.
[0004] However, when using techniques such as keypoint localization and keypoint detection to verify whether the current operation is a real live operation by combining actions such as blinking, opening the mouth, shaking the head, and nodding, the accuracy of liveness detection is relatively low, that is, it cannot accurately detect whether the current operation is a real live operation. Summary of the Invention
[0005] This application provides a method for detecting liveness, the method comprising:
[0006] Acquire source domain images and natural images, wherein the source domain images include live objects and / or non-live objects;
[0007] A source domain spectrum map is obtained by performing a frequency domain transformation on the source domain image, and a natural spectrum map is obtained by performing a frequency domain transformation on the natural image. A mixed spectrum map is generated based on the source domain spectrum map and the natural spectrum map, and a perturbed source domain image is obtained by performing an inverse frequency domain transformation on the mixed spectrum map.
[0008] The network parameters of the initial liveness detection model are adjusted based on the source domain image and the perturbed source domain image to obtain the adjusted liveness detection model, and a target liveness detection model is determined based on the adjusted liveness detection model; wherein, the target liveness detection model is used to perform liveness detection on the image to be detected.
[0009] This application provides a liveness detection device, the device comprising:
[0010] An acquisition module is used to acquire source domain images and natural images; wherein, the source domain images include live objects and / or non-live objects;
[0011] The processing module is configured to perform frequency domain transformation on the source domain image to obtain a source domain spectrogram, perform frequency domain transformation on the natural image to obtain a natural spectrogram, generate a hybrid spectrogram based on the source domain spectrogram and the natural spectrogram, and perform inverse frequency domain transformation on the hybrid spectrogram to obtain a perturbed source domain image;
[0012] The training module is used to adjust the network parameters of the initial liveness detection model based on the source domain image and the perturbed source domain image to obtain the adjusted liveness detection model, and to determine the target liveness detection model based on the adjusted liveness detection model; the target liveness detection model is used to perform liveness detection on the image to be detected.
[0013] This application provides an electronic device, including: a processor and a machine-readable storage medium, the machine-readable storage medium storing machine-executable instructions that can be executed by the processor; the processor is used to execute the machine-executable instructions to implement the liveness detection method of the above example.
[0014] As can be seen from the above technical solutions, in this embodiment, a large number of simulated perturbation source domain images are constructed based on source domain images and natural images. Then, the initial liveness detection model is trained using the source domain images and perturbation source domain images to obtain the target liveness detection model. After acquiring the image to be detected, the image is input into the target liveness detection model. The target liveness detection model performs liveness detection on the image to be detected, thereby verifying whether the current operation is a genuine live operation. The accuracy of liveness detection is relatively high; that is, it can accurately detect whether the current operation is a genuine live operation. In situations where multi-scene data is limited, perturbation source domain images in multiple scenarios can be simulated using a small number of source domain images and a large number of natural images. This enriches the source domain data. Training the target liveness detection model using a large number of simulated perturbation source domain images improves the domain generalization ability of the target liveness detection model, achieving better domain generalization performance and results. It can solve the cross-domain (cross-scene, cross-device) problem of liveness detection. When crossing domains, problems such as model degradation or even complete failure are easy to occur. When the target domain is unknown, the target liveness detection model can still maintain a high accuracy in the target domain. Attached Figure Description
[0015] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments of this application or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this application. For those skilled in the art, other drawings can be obtained based on these drawings of the embodiments of this application.
[0016] Figure 1 This is a flowchart illustrating a liveness detection method according to one embodiment of this application;
[0017] Figure 2 This is a schematic diagram of the simulated generation of a disturbance source domain image in one embodiment of this application;
[0018] Figure 3 This is a schematic diagram of the structure of a liveness detection model in one embodiment of this application;
[0019] Figure 4 This is a schematic diagram of the structure of a dynamic module network in one embodiment of this application;
[0020] Figure 5 This is a schematic diagram of the structure of a dynamic module network in one embodiment of this application;
[0021] Figure 6 This is a schematic diagram of the structure of a dynamic module network in one embodiment of this application;
[0022] Figure 7 This is a schematic diagram of the structure of a liveness detection device according to one embodiment of this application;
[0023] Figure 8 This is a hardware structure diagram of an electronic device according to one embodiment of this application. Detailed Implementation
[0024] The terminology used in the embodiments of this application is for the purpose of describing particular embodiments only and is not intended to limit the application. The singular forms “a,” “the,” and “the” as used in this application and claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used herein refers to any and all possible combinations comprising one or more of the associated listed items.
[0025] It should be understood that although the terms first, second, third, etc., may be used to describe various information in embodiments of this application, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this application, first information may also be referred to as second information, and similarly, second information may also be referred to as first information. Depending on the context, the word "if" may also be interpreted as "when," "when," or "in response to a determination."
[0026] This application proposes a liveness detection method, see [link to relevant documentation]. Figure 1 As shown, the method may include:
[0027] Step 101: Obtain source domain image and natural image; wherein, source domain image may include live objects and / or non-live objects.
[0028] Step 102: Perform frequency domain transformation on the source domain image to obtain the source domain spectrum map, perform frequency domain transformation on the natural image to obtain the natural spectrum map, generate a hybrid spectrum map based on the source domain spectrum map and the natural spectrum map, and perform inverse frequency domain transformation on the hybrid spectrum map to obtain the perturbed source domain image.
[0029] For example, generating a hybrid spectrum based on the source domain spectrum and the natural spectrum may include, but is not limited to: if the source domain spectrum includes a source domain amplitude spectrum and a source domain phase spectrum, and the natural spectrum includes a natural amplitude spectrum and a natural phase spectrum, then a perturbation amplitude spectrum can be generated based on the source domain amplitude spectrum, a first perturbation coefficient of the source domain amplitude spectrum, the natural amplitude spectrum, and a second perturbation coefficient of the natural amplitude spectrum; wherein the sum of the first perturbation coefficient and the second perturbation coefficient is a fixed value, and the second perturbation coefficient is determined based on a configured perturbation intensity. A hybrid spectrum is then generated based on the perturbation amplitude spectrum and the source domain phase spectrum.
[0030] Step 103: Adjust the network parameters of the initial liveness detection model based on the source domain image and the perturbed source domain image to obtain the adjusted liveness detection model, and determine the target liveness detection model based on the adjusted liveness detection model; wherein, the target liveness detection model is used to perform liveness detection on the image to be detected.
[0031] For example, the initial liveness detection model may include a feature extraction network, a dynamic module network, and a classifier network. The network parameters of the initial liveness detection model are adjusted based on the source domain image and the perturbed source domain image to obtain an adjusted liveness detection model. This may include, but is not limited to: inputting the source domain image into the feature extraction network to obtain source domain image features; inputting the source domain image features into the dynamic module network to obtain source domain dynamic features; inputting the source domain dynamic features into the classifier network to obtain source domain output features; determining a first loss value based on the source domain output features; determining a target loss value based on the first loss value; and adjusting the network parameters of the feature extraction network, the dynamic module network, and the classifier network based on the target loss value to obtain an intermediate liveness detection model. The perturbation source domain image can be input into the feature extraction network of the intermediate liveness detection model to obtain perturbation source domain image features; the perturbation source domain image features can be input into the dynamic module network of the intermediate liveness detection model to obtain perturbation source domain dynamic features; the perturbation source domain dynamic features can be input into the classifier network of the intermediate liveness detection model to obtain perturbation source domain output features; a second loss value can be determined based on the perturbation source domain output features, and the network parameters of the feature extraction network, dynamic module network and classifier network of the intermediate liveness detection model can be adjusted based on the second loss value to obtain the adjusted liveness detection model.
[0032] For example, the dynamic module network may include a domain-invariant branch network and a domain-specific branch network. Inputting source domain image features into the dynamic module network yields source domain dynamic features, which may include, but is not limited to: inputting the source domain image features into the domain-invariant branch network to obtain domain-invariant features; wherein, the domain-invariant features are common features across multiple domains determined based on the source domain image features. Inputting the source domain image features into the domain-specific branch network to obtain domain-specific features; wherein, the domain-specific features are unique features of a single domain determined based on the source domain image features. Based on the domain-invariant features and the domain-specific features, source domain dynamic features are generated.
[0033] For example, a domain-specific branch network may include a dynamic adapter and K convolutional layers, where K is a positive integer greater than 1. Inputting source domain image features into the domain-specific branch network to obtain domain-specific features may include, but is not limited to: inputting source domain image features into the dynamic adapter, which generates K weight values corresponding to the K convolutional layers, with each weight value corresponding to one of the K convolutional layers; inputting source domain image features into each convolutional layer, which generates convolutional features based on the source domain image features and its corresponding weight values; and generating domain-specific features based on the K convolutional features corresponding to the K convolutional layers.
[0034] For example, determining the target loss value based on the first loss value may include, but is not limited to: determining the minimum entropy loss value and the difference regularization loss value based on K weight values; determining the third loss value based on the minimum entropy loss value and the difference regularization loss value; and determining the target loss value based on the first loss value and the third loss value.
[0035] For example, the domain-invariant branch network may include convolutional layers, instance normalization layers, and activation layers. Inputting source domain image features into the domain-invariant branch network to obtain domain-invariant features may include, but is not limited to: inputting the source domain image features into the convolutional layer to obtain convolutional features, inputting the convolutional features into the instance normalization layer to obtain normalized features, and inputting the normalized features into the activation layer to obtain activated features. Based on these activated features, domain-invariant features can be generated.
[0036] For example, determining the target liveness detection model based on the adjusted liveness detection model may include, but is not limited to: if the adjusted liveness detection model has converged, then the adjusted liveness detection model can be used as the target liveness detection model; if the adjusted liveness detection model has not converged, then the adjusted liveness detection model can be used as the initial liveness detection model, and the operation of adjusting the network parameters of the initial liveness detection model based on the source domain image and the perturbed source domain image can be performed to obtain the adjusted liveness detection model.
[0037] For example, after training the target liveness detection model, the target liveness detection model can be deployed to the terminal device. After the terminal device acquires the image to be detected, it can input the image to be detected into the target liveness detection model so that the target liveness detection model can perform liveness detection on the image to be detected, thereby verifying whether the current operation is a real live operation. There are no restrictions on this liveness detection process.
[0038] As can be seen from the above technical solutions, in this embodiment, a large number of simulated perturbation source domain images are constructed based on source domain images and natural images. Then, the initial liveness detection model is trained using the source domain images and perturbation source domain images to obtain the target liveness detection model. After acquiring the image to be detected, the image is input into the target liveness detection model. The target liveness detection model performs liveness detection on the image to be detected, thereby verifying whether the current operation is a genuine live operation. The accuracy of liveness detection is relatively high; that is, it can accurately detect whether the current operation is a genuine live operation. In situations where multi-scene data is limited, perturbation source domain images in multiple scenarios can be simulated using a small number of source domain images and a large number of natural images. This enriches the source domain data. Training the target liveness detection model using a large number of simulated perturbation source domain images improves the domain generalization ability of the target liveness detection model, achieving better domain generalization performance and results. It can solve the cross-domain (cross-scene, cross-device) problem of liveness detection. When crossing domains, problems such as model degradation or even complete failure are easy to occur. When the target domain is unknown, the target liveness detection model can still maintain a high accuracy in the target domain.
[0039] The technical solutions described above in the embodiments of this application will be explained below in conjunction with specific application scenarios.
[0040] Liveness detection is a method to determine a user's true physiological characteristics, verifying whether the current operation is a genuine live operation. To implement liveness detection, a liveness detection model can be trained, and liveness detection can be performed based on this model, thereby verifying whether the current operation is a genuine live operation.
[0041] To make a liveness detection model applicable to all scenarios, it's necessary to collect datasets for each scenario and train the model based on these datasets. However, due to the vast number of real-world scenarios, collecting datasets for each scenario is extremely time-consuming, resulting in long training times and low efficiency. Furthermore, collecting datasets for each scenario requires significant resources, leading to high resource consumption. Even with substantial time and resources, it's impossible to collect data for all scenarios, resulting in poor performance of the liveness detection model and its inability to be applied to all situations. In some scenarios, the accuracy of liveness detection is relatively low when using the liveness detection model.
[0042] In response to the above findings, in this embodiment, when multi-scene data is limited, only a small number of source domain images (i.e., a dataset of a small number of scenes) need to be collected, and a large number of natural images (such as a large number of natural images from public datasets that can cover a sufficiently rich range of scenes) need to be acquired. This allows for the simulation generation of perturbational source domain images (i.e., a dataset of a large number of scenes) under multiple scenes, thereby enriching the source domain data. Since only a small number of source domain images are collected, the liveness detection model can be trained in a short time, resulting in high training efficiency. Furthermore, only a small number of source domain images need to be collected, minimizing resource consumption. A liveness detection model can be trained using a large number of perturbational source domain images, resulting in a high-performance liveness detection model applicable to all or most scenes. This improves the domain generalization ability of the liveness detection model, achieving better domain generalization performance and results, and achieving high accuracy in liveness detection.
[0043] See Figure 2 The diagram shown illustrates the process of simulating the generation of a disturbance source domain image. This process may include:
[0044] Step 201: Obtain source domain image and natural image; wherein, source domain image may include living objects and / or non-living objects.
[0045] For example, for a large number of scenarios, only a small number of scenario datasets can be collected. These datasets are called source domain datasets, and the data in the source domain datasets are called source domain images. Source domain images can be labeled images, which are sample images used to train the liveness detection model.
[0046] For example, although it is difficult to obtain live samples from different domains (different scenes), a large number of natural images from public datasets, such as the ImageNet dataset and / or the COCO dataset, can be obtained. These natural images can cover a wide range of scenes and have rich domain style information, which helps to improve the generalization ability of the liveness detection model.
[0047] Step 202: Perform frequency domain transformation on the source domain image to obtain a source domain spectrum, which may include a source domain amplitude spectrum and a source domain phase spectrum; perform frequency domain transformation on the natural image to obtain a natural spectrum, which may include a natural amplitude spectrum and a natural phase spectrum.
[0048] For example, some or all source domain images can be selected from all source domain images, and for each selected source domain image (hereinafter referred to as a source domain image X) sFor example, frequency domain transformation can be performed on the source domain image X using the FFT algorithm. s Perform a Fourier transform to obtain the source domain image X. s The corresponding source domain spectrum, which may include the source domain amplitude spectrum A(X) s ) and source domain phase spectrum P(X s ).
[0049] For example, some or all natural images can be selected from all natural images, and for each selected natural image (hereinafter referred to as a natural image X) n For example, frequency domain transformation can be performed on a natural image X using the FFT algorithm. n Perform a Fourier transform to obtain the natural image X. n The corresponding natural spectrum, which may include the natural amplitude spectrum A(X) n ) and natural phase spectrum P(X n ).
[0050] Step 203: Generate a perturbation amplitude spectrum based on the source domain amplitude spectrum, the first perturbation coefficient of the source domain amplitude spectrum, the natural amplitude spectrum, and the second perturbation coefficient of the natural amplitude spectrum; wherein, the sum of the first perturbation coefficient and the second perturbation coefficient can be a fixed value (such as 1), and the second perturbation coefficient can be determined based on the configured perturbation intensity, or the first perturbation coefficient can be determined based on the configured perturbation intensity.
[0051] For example, since the phase component of the Fourier spectrum obtained by Fourier transform can preserve the high-level semantics of the original signal, while the amplitude component contains low-level statistical information, that is, the phase component of the Fourier spectrum has the property of preserving semantics, in order to better utilize the diverse data distributions in a large number of natural images, the semantic preservation property of frequency domain transform (such as Fourier transform) can be used to perturb the source domain image to simulate possible distribution changes in different target domains, that is, to simulate the domain shift of different target domains. Based on this, the source domain image can be perturbed using natural images, for example, the source domain amplitude spectrum can be perturbed using the natural amplitude spectrum.
[0052] For example, in order to utilize the natural amplitude spectrum A(X) n The diverse styles and statistical information of ) based on the natural amplitude spectrum A(X) n ) and source domain amplitude spectrum A(X) s The perturbation amplitude spectrum can be obtained based on linear interpolation. Refer to formula (1) below for determining the disturbance amplitude spectrum. Examples are provided, but there are no restrictions.
[0053]
[0054] In formula (1), λ represents the natural amplitude spectrum A(X). n The second perturbation coefficient, 1-λ, represents the source domain amplitude spectrum A(X). s The first disturbance coefficient is obviously equal to the sum of the first disturbance coefficient and the second disturbance coefficient, which can be 1.
[0055] In formula (1), λ can be greater than 0 and less than η, where η is used to control the disturbance intensity. The value of η can be configured empirically, meaning the value of λ can be configured empirically without restriction. After obtaining the second disturbance coefficient λ, the first disturbance coefficient 1-λ can be obtained. Alternatively, the value of the first disturbance coefficient 1-λ can be configured first, and the second disturbance coefficient λ can be obtained after obtaining the first disturbance coefficient 1-λ.
[0056] For example, by perturbing the natural image, the low-level statistical information of the source domain image is modified, making the low-level statistical information of the source domain image richer, but without affecting the semantic information of the source domain image.
[0057] Step 204: Generate a hybrid spectrum based on the perturbation amplitude spectrum and the source domain phase spectrum.
[0058] For example, after obtaining the perturbation amplitude spectrum Then, the perturbation amplitude spectrum can be... The source domain phase spectrum P(X) in the source domain spectrum diagram s The components are recombined to obtain a mixed spectrum.
[0059] Step 205: Perform inverse frequency domain transformation on the mixed spectrogram to obtain the disturbance source domain image.
[0060] For example, after obtaining the mixed spectrogram, an inverse frequency domain transform can be performed on it to simulate and generate a perturbation source domain image with rich domain style, that is, a rich perturbation source domain image with changed statistical information. For instance, the inverse Fourier transform (iFFT) can be used to perform an inverse frequency domain transform on the mixed spectrogram to obtain the perturbation source domain image. If the above perturbation process is performed on M source domain images and N natural images, then M*N perturbed source domain images can be obtained, which means a large number of perturbed source domain images can be obtained.
[0061] For example, a pre-configured initial liveness detection model can be obtained. After obtaining the perturbation source domain image, the network parameters of the initial liveness detection model can be adjusted based on the source domain image and the perturbation source domain image to obtain the adjusted liveness detection model. The target liveness detection model can then be determined based on the adjusted liveness detection model.
[0062] For example, a source domain image is input into an initial liveness detection model to obtain source domain output features. A loss value is determined based on these features, and the network parameters of the initial liveness detection model are adjusted accordingly to obtain an intermediate liveness detection model. Similarly, a perturbed source domain image is input into the intermediate liveness detection model to obtain perturbed source domain output features. A loss value is determined based on these perturbed features, and the network parameters of the intermediate liveness detection model are adjusted accordingly to obtain an adjusted liveness detection model.
[0063] For example, a perturbation source domain image is input into an initial liveness detection model to obtain perturbation source domain output features. A loss value is determined based on these perturbation source domain output features, and the network parameters of the initial liveness detection model are adjusted based on this loss value to obtain an intermediate liveness detection model. Then, the source domain image is input into the intermediate liveness detection model to obtain source domain output features. A loss value is determined based on these source domain output features, and the network parameters of the intermediate liveness detection model are adjusted based on this loss value to obtain an adjusted liveness detection model.
[0064] For example, if the adjusted liveness detection model has converged, it can be used as the target liveness detection model. If the adjusted liveness detection model has not converged, it can be used as the initial liveness detection model. Then, the operation of adjusting the network parameters of the initial liveness detection model based on the source domain image and the perturbed source domain image is performed to obtain the adjusted liveness detection model.
[0065] In summary, this method simulates the generation of numerous perturbation source domain images using a small number of source domain images and a large number of natural images. It also utilizes frequency domain transformation to simulate the generation of rich perturbation source domain images. Based on meta-learning, it simulates the practical application of liveness detection, achieving meta-learning-based domain generalization liveness detection. This enables the liveness detection model to achieve good domain generalization results even with limited scene data, such as cross-device and cross-scene liveness detection performance, thus improving the domain generalization capability of the liveness detection model. Domain generalization (DG) refers to generalizing data from multiple source domains to unknown / unseen target domains. When only a single domain sample is available during training, it can be called single-domain generalization.
[0066] See Figure 3 The diagram shown illustrates the structure of a liveness detection model (such as an initial liveness detection model or a target liveness detection model). This liveness detection model can include a feature extraction network, a dynamic module network, and a classifier network. Of course, Figure 3 This is merely an example of a liveness detection model and is not intended to be restrictive.
[0067] During model training, the network parameters of the feature extraction network, dynamic module network, and classifier network of the initial liveness detection model can be adjusted based on the source domain image and the perturbed source domain image to obtain the adjusted liveness detection model, and the target liveness detection model can be determined based on the adjusted liveness detection model.
[0068] For example, the source domain image is input into a feature extraction network to obtain source domain image features; the source domain image features are input into a dynamic module network to obtain source domain dynamic features; the source domain dynamic features are input into a classifier network to obtain source domain output features; a first loss value is determined based on the source domain output features, and a target loss value is determined based on the first loss value. The network parameters of the feature extraction network, dynamic module network, and classifier network are adjusted based on the target loss value to obtain an intermediate liveness detection model. Then, a perturbed source domain image is input into the feature extraction network of the intermediate liveness detection model to obtain perturbed source domain image features; the perturbed source domain image features are input into the dynamic module network of the intermediate liveness detection model to obtain perturbed source domain dynamic features; the perturbed source domain dynamic features are input into the classifier network of the intermediate liveness detection model to obtain perturbed source domain output features; a second loss value is determined based on the perturbed source domain output features, and the network parameters of the feature extraction network, dynamic module network, and classifier network of the intermediate liveness detection model are adjusted based on the second loss value to obtain an adjusted liveness detection model.
[0069] For example, the perturbation source domain image is input into a feature extraction network to obtain perturbation source domain image features; the perturbation source domain image features are input into a dynamic module network to obtain perturbation source domain dynamic features; the perturbation source domain dynamic features are input into a classifier network to obtain perturbation source domain output features; a first loss value is determined based on the perturbation source domain output features, and a target loss value is determined based on the first loss value. The network parameters of the feature extraction network, dynamic module network, and classifier network are adjusted based on the target loss value to obtain an intermediate liveness detection model. Then, the source domain image is input into the feature extraction network of the intermediate liveness detection model to obtain source domain image features; the source domain image features are input into the dynamic module network of the intermediate liveness detection model to obtain source domain dynamic features; the source domain dynamic features are input into the classifier network of the intermediate liveness detection model to obtain source domain output features; a second loss value is determined based on the source domain output features, and the network parameters of the feature extraction network, dynamic module network, and classifier network of the intermediate liveness detection model are adjusted based on the second loss value to obtain an adjusted liveness detection model.
[0070] For example, if the adjusted liveness detection model has converged, it can be used as the target liveness detection model. If the adjusted liveness detection model has not converged, it can be used as the initial liveness detection model. Then, the operation of adjusting the network parameters of the initial liveness detection model based on the source domain image and the perturbed source domain image is performed to obtain the adjusted liveness detection model.
[0071] During model usage, liveness detection can be performed on the image to be detected based on the target liveness detection model. For example, after training the target liveness detection model, it can be deployed to a terminal device. After the terminal device acquires the image to be detected, it can input the image to be detected into the target liveness detection model to perform liveness detection on the image to be detected, thereby verifying whether the current operation is a real live operation. There are no restrictions on this liveness detection process.
[0072] For example, the image to be detected is input into the feature extraction network to obtain the image features; the image features are input into the dynamic module network to obtain the dynamic features; the dynamic features are input into the classifier network to obtain the output features; and the liveness detection result is determined based on the output features. The liveness detection result is used to indicate whether the current operation is a real liveness operation or not a real liveness operation.
[0073] See Figure 4 The diagram shown illustrates the structure of a dynamic modular network, which can include domain-invariant branch networks and domain-specific branch networks. Of course, Figure 4 This is merely an example of a dynamic modular network, and no limitations are imposed. Regarding feature extraction networks and classifier networks, this embodiment does not impose any restrictions, as long as the feature extraction network can perform feature extraction and the classifier network can perform classification.
[0074] For example, after obtaining the source domain image features (or perturbed source domain image features), the source domain image features (or perturbed source domain image features) are input into a domain-invariant branch network to obtain domain-invariant features. These domain-invariant features are common features across multiple domains determined based on the source domain image features (or perturbed source domain image features). The source domain image features (or perturbed source domain image features) are then input into a domain-specific branch network to obtain domain-specific features. These domain-specific features are unique features of a single domain determined based on the source domain image features (or perturbed source domain image features). Based on the domain-invariant features and domain-specific features, source domain dynamic features (or perturbed source domain dynamic features) are generated.
[0075] See Figure 5The diagram illustrates the structure of a dynamic modular network. A domain-specific branch network can include a dynamic adapter and K convolutional layers, where K is a positive integer greater than 1. Based on this, source domain image features (or perturbed source domain image features) can be input to the dynamic adapter, which generates K weight values corresponding to the K convolutional layers. Each convolutional layer can then generate its own convolutional features based on the source domain image features (or perturbed source domain image features) and its corresponding weight values. Domain-specific features can then be generated based on the K convolutional features corresponding to the K convolutional layers.
[0076] See Figure 6 The diagram illustrates the structure of a dynamic modular network. The domain-invariant branch network can include convolutional layers, instance normalization layers, and activation layers. Based on this, source domain image features (or perturbed source domain image features) can be input into the convolutional layers to obtain convolutional features, then input into the instance normalization layers to obtain normalized features, and finally input into the activation layers to obtain activated features. After obtaining the activated features, domain-invariant features can be generated based on these activated features.
[0077] The following combination Figure 6 The dynamic modular network shown illustrates the processing of the source domain image. The processing of the perturbed source domain image is similar to that of the source domain image, and will not be repeated here.
[0078] Combination Figure 3 and Figure 6 The source domain image can be input into a feature extraction network to obtain source domain image features. Subsequently, the source domain image features can be input into the domain-invariant branch network and the domain-specific branch network, respectively. For example, the dynamic module network can be divided into a domain-invariant branch network and a domain-specific branch network. For the domain-invariant branch network, features can be aligned to a domain-agnostic space to learn domain-invariant features. The shared feature space can still have good generalization ability in unknown domains, and domain-invariant features have universality across different domains. For the domain-specific branch network, considering that domain-specific features can still help improve performance in their respective domains, that is, each domain or each sample has unique characteristics and can be regarded as a latent space, the invariance constraints imposed on these spaces can enhance the generalization ability of features in unknown domains, but also discard some discriminative information in the latent space that is effective for the target task. In order to better utilize the complementary information related to the domain in the data, a domain-specific branch network can be used to extract domain-specific features, and use domain-specific features as a supplement to domain-invariant features. That is, domain-invariant features and domain-specific features are extracted through the domain-invariant branch network and the domain-specific branch network.
[0079] In a domain-invariant branch network (BIB), the BIB consists of convolutional layers, instance normalization (IN) layers, and activation layers. Convolutional and IN layers remove sample-specific features, while IN layers eliminate sample style variations, reducing domain differences and thus enhancing the generalization ability of the BIB. Based on this, after obtaining source domain image features, the BIB performs convolution operations on these features to obtain convolutional features. IN layers then normalize these convolutional features to obtain normalized features. Finally, activation layers activate these normalized features (e.g., using the ReLU activation function) to obtain activated features, which can then serve as domain-invariant features—features common to multiple domains.
[0080] For example, the learning method for domain-invariant features can be: F inv =ReLU(IN(f 3×3 (F))), In the above formula, F represents the source domain image features, f 3×3 This indicates a convolution operation with a kernel size of 3×3, IN indicates instance normalization, ReLU indicates activation, and F represents the kernel size of 3×3. inv Domain-invariant characteristics.
[0081] In a domain-specific branch network, the network consists of a dynamic adapter and K convolutional layers. By using the dynamic adapter to adjust the weights of each convolutional layer, the convolutional weights are adjusted for each sample, making the parameters of the domain-specific branch network not fixed and less prone to degradation in unseen domains.
[0082] After obtaining the source domain image features, the domain-specific branch network processes these features using a dynamic adapter to obtain K weight values corresponding to K convolutional layers. Each of the K weight values corresponds one-to-one with one of the K convolutional layers; that is, the dynamic adapter predicts the weights of each convolutional layer based on the source domain image features. Where d represents the dynamic adapter, that is, the source domain image features F are processed by the dynamic adapter, and W represents the weight values, that is, the K weight values corresponding to the K convolutional layers, where K can be a positive integer greater than 1.
[0083] The structure of a dynamic adapter can be Pooling-FC-ReLU-FC-Softmax, where Pooling represents pooling operation, FC represents fully connected operation, ReLU represents activation operation, and Softmax represents logistic regression operation. That is, the source domain image features F are processed through Pooling, FC, ReLU, FC, and Softmax to obtain K weight values.
[0084] The source domain image features F can be input into each convolutional layer, which then generates its own convolutional features based on the source domain image features F and its corresponding weights. After obtaining the convolutional features for each convolutional layer, domain-specific features can be generated based on the K convolutional features from the K convolutional layers.
[0085] For example, the learning method for domain-specific features can be: F represents the source domain image features, f 3×3 This represents a convolution operation with a kernel size of 3×3, w k Let w represent the weight value corresponding to the k-th convolutional layer. Clearly, k takes values from 1 to K. When k is 1, w... k f represents the weight values corresponding to the first convolutional layer. k 3×3 This indicates that the first convolutional layer performs a convolution operation on the source domain image features F, w k ·f k 3×3 (F) represents the convolutional feature of the first convolutional layer. Following this pattern, we can obtain K convolutional features corresponding to K convolutional layers. Then, by summing these K convolutional features, we can obtain the domain-specific feature F. spec Thus, we can obtain domain-specific features, which can be unique characteristics of a domain.
[0086] Based on domain-invariant and domain-specific features, dynamic features of the source domain can be generated. For example, the dynamic features can be obtained by adding the domain-invariant and domain-specific features, or by concatenating the domain-invariant and domain-specific features. There are no restrictions on this approach. Clearly, dynamic features of the source domain are obtained through domain-invariant and domain-specific features, thus enabling the use of instance normalization layers to ensure domain invariance. Simultaneously, dynamic adapters can be used to dynamically adjust for samples. Dynamic features of the source domain can effectively represent both domain-invariant and domain-specific information, improving generalization ability.
[0087] Combination Figure 3 and Figure 6 After obtaining the dynamic features of the source domain, the dynamic features of the source domain can be input into the classifier network, which will then classify the dynamic features of the source domain to obtain the output features of the source domain.
[0088] In one possible implementation, during the training of the liveness detection model, a first loss function, a second loss function, and a third loss function can be constructed. The input of the first loss function is the source domain output features, and the output of the first loss function is the first loss value. There are no restrictions on the first loss function.
[0089] For example, the first loss function can be the cross-entropy loss function. An example of the first loss function can be found in Equation (2). Of course, Equation (2) is just an example and is not a limitation.
[0090]
[0091] In formula (2), X i Let F represent the i-th source domain image, D represent the dynamic module network, and C represent the classifier network. The source domain image X... i After F, D, and C, the source domain output features, Y, are obtained. i The label corresponding to the i-th source domain image is used, c represents the classification category, that is, there are a total of C categories, L Cls (S) is used to represent the output of the first loss function, i.e., the first loss value.
[0092] The input to the second loss function is the output feature of the perturbation source domain, and the output of the second loss function is the second loss value. There are no restrictions on the second loss function. For example, the second loss function can be the cross-entropy loss function, as shown in Equation (3). Of course, Equation (3) is just an example and there are no restrictions on it.
[0093]
[0094] In formula (3), X iLet X represent the i-th perturbation source domain image, F represent the feature extraction network, D represent the dynamic module network, and C represent the classifier network. i After F, D, and C, the output feature of the perturbation source domain is Y. i The label is used to represent the image corresponding to the i-th perturbation source region, c represents the classification category, that is, there are a total of C categories, L Cls (S + ) is used to represent the second loss value.
[0095] The input to the third loss function is the K weight values output by the dynamic adapter, and the output of the third loss function is the third loss value. There are no restrictions on this third loss function. For example, in order to enhance the difference between different convolutions, the Information Maximization loss function is introduced as the third loss function. The Information Maximization loss function maximizes the mutual information between the input features and the dynamic weights. The Information Maximization loss function can be composed of the Entropy Minimization loss function and the Difference Regularization loss function. See Equation (4) for an example of the Information Maximization loss function. There are no restrictions on this.
[0096]
[0097] In formula (4), θ F θ represents the network parameters of the feature extraction network. D(X) L represents the network parameters of a dynamic modular network. ent Represents network parameter θ F and network parameters θ D(X) The minimum entropy loss value, L div Represents network parameter θ F and network parameters θ D(X) The difference regularization loss value, L IM Represents network parameter θ F and network parameters θ D(X) The information below maximizes the loss value, i.e., the third loss value. Where, L ent This corresponds to the entropy minimum loss function, L div This corresponds to the difference regularization loss function, L IM This corresponds to the information maximization loss function.
[0098] In formula (4), W i For the i-th source domain image, that is, for the i-th source domain image, there are K weight values output by the dynamic adapter, where k takes values from 1 to K. When k is 1, This represents the first weight value output by the dynamic adapter, and so on.
[0099] In formula (4), This represents the average weight value of the N source domain images, that is, the average of the N weight values output by the dynamic adapter for the N source domain images. The value of k ranges from 1 to K. When the value of k is 1, This represents the average of the first weight value output by the dynamic adapter, which is the average of the first weight values of the N source domain images, and so on.
[0100] In summary, it can be seen that the minimum entropy loss value L is determined based on the K weight values of each source domain image. ent The difference regularization loss value L is determined based on K average weight values of all source domain images (such as the average of the first weight value of all source domain images, the average of the second weight value of all source domain images, and so on). div The minimum entropy loss value L ent With difference regularization loss value L div The sum is used as the third loss value L IM .
[0101] Minimum Entropy Loss Value L ent This makes the dynamic weights of the network prediction relatively certain, and the difference regularization loss value L div This is to make the prediction weights more diverse overall. Thus, domain-specific branch networks can dynamically adjust the network based on individual samples and learn more diverse feature representations by maximizing information.
[0102] In summary, we can obtain the first loss value, the second loss value, and the third loss value. Then, based on the first loss value, the second loss value, and the third loss value, we can adjust the network parameters of the feature extraction network, the dynamic module network, and the classifier network to obtain the adjusted liveness detection model.
[0103] For example, first determine the target loss value based on the first and third loss values, such as the sum of the first and third loss values. Then, adjust the network parameters of the feature extraction network, dynamic module network, and classifier network in the initial liveness detection model based on the target loss value to obtain an intermediate liveness detection model. Next, adjust the network parameters of the feature extraction network, dynamic module network, and classifier network in the intermediate liveness detection model based on the second loss value to obtain the adjusted liveness detection model.
[0104] When adjusting the network parameters of the feature extraction network, dynamic module network, and classifier network in the initial liveness detection model using the target loss value, gradient descent or other methods can be used without restriction. When adjusting the network parameters of the feature extraction network, dynamic module network, and classifier network in the intermediate liveness detection model using the second loss value, gradient descent or other methods can be used without restriction.
[0105] For example, an example of adjustment using gradient descent can be found in equation (5).
[0106]
[0107] In formula (5), β represents the learning rate, which is a pre-configured known value, and μ represents the information maximization loss value L. IM The balancing weights are pre-configured, known values. That is, based on the first loss value L Cls (S) θ is calculated D The gradient is calculated, and the network parameters of the dynamic module network are updated, resulting in the updated network parameters of the dynamic module network. For the meaning of other parameters, please refer to the above embodiment.
[0108] As can be seen from the above technical solutions, in the embodiments of this application, when multi-scene data is limited, perturbation source domain images in multiple scenes can be simulated and generated using a small number of source domain images and a large number of natural images. This enriches the source domain data. Training the target liveness detection model using a large number of perturbation source domain images improves the domain generalization ability of the target liveness detection model, achieving better domain generalization performance and effect. By designing a dynamic modular network, the network parameters can be dynamically adjusted based on the characteristics of each sample, and domain-invariant and domain-specific features can be learned, enabling the liveness detection model to adapt to changes in the characteristics of each sample, thus improving the generalization ability of the liveness detection model. Through meta-learning optimization, the simulation of liveness detection in practical applications is optimized, achieving good domain generalization of the liveness detection model. This allows the liveness detection model to adaptively adjust network parameters even in unknown, rich target domains, meaning the target liveness detection model can still maintain a high accuracy in the target domain, thereby greatly improving the generalization ability of the liveness detection model.
[0109] Based on the same concept as the above method, this application proposes a liveness detection device, see [link to relevant documentation]. Figure 7 The diagram shown is a structural schematic of the liveness detection device, which may include:
[0110] The acquisition module 71 is used to acquire source domain images and natural images; wherein, the source domain images include live objects and / or non-live objects;
[0111] Processing module 72 is used to perform frequency domain transformation on the source domain image to obtain a source domain spectrum map, perform frequency domain transformation on the natural image to obtain a natural spectrum map, generate a hybrid spectrum map based on the source domain spectrum map and the natural spectrum map, and perform inverse frequency domain transformation on the hybrid spectrum map to obtain a perturbed source domain image;
[0112] Training module 73 is used to adjust the network parameters of the initial liveness detection model based on the source domain image and the perturbed source domain image to obtain the adjusted liveness detection model, and to determine the target liveness detection model based on the adjusted liveness detection model. The target liveness detection model is used to perform liveness detection on the image to be detected.
[0113] For example, when the processing module 72 generates a hybrid spectrum based on the source domain spectrum and the natural spectrum, it is specifically used to: if the source domain spectrum includes a source domain amplitude spectrum and a source domain phase spectrum, and the natural spectrum includes a natural amplitude spectrum and a natural phase spectrum, then generate a perturbation amplitude spectrum based on the source domain amplitude spectrum, a first perturbation coefficient of the source domain amplitude spectrum, the natural amplitude spectrum, and a second perturbation coefficient of the natural amplitude spectrum; wherein, the sum of the first perturbation coefficient and the second perturbation coefficient is a fixed value, and the second perturbation coefficient is determined based on the configured perturbation intensity; and generate the hybrid spectrum based on the perturbation amplitude spectrum and the source domain phase spectrum.
[0114] For example, the initial liveness detection model includes a feature extraction network, a dynamic module network, and a classifier network. The training module 73 adjusts the network parameters of the initial liveness detection model based on the source domain image and the perturbed source domain image. Specifically, the adjusted liveness detection model is configured to: input the source domain image into the feature extraction network to obtain source domain image features; input the source domain image features into the dynamic module network to obtain source domain dynamic features; input the source domain dynamic features into the classifier network to obtain source domain output features; determine a first loss value based on the source domain output features, and determine a target loss value based on the first loss value; and adjust the network parameters of the feature extraction network and the dynamic module network based on the target loss value. The network parameters of the module network and the classifier network are adjusted to obtain an intermediate liveness detection model. The perturbation source domain image is input into the feature extraction network of the intermediate liveness detection model to obtain perturbation source domain image features. The perturbation source domain image features are input into the dynamic module network of the intermediate liveness detection model to obtain perturbation source domain dynamic features. The perturbation source domain dynamic features are input into the classifier network of the intermediate liveness detection model to obtain perturbation source domain output features. A second loss value is determined based on the perturbation source domain output features. The network parameters of the feature extraction network, the dynamic module network, and the classifier network of the intermediate liveness detection model are adjusted based on the second loss value to obtain an adjusted liveness detection model.
[0115] For example, the dynamic module network includes a domain-invariant branch network and a domain-specific branch network. When the training module 73 inputs the source domain image features into the dynamic module network to obtain source domain dynamic features, it specifically performs the following steps: inputting the source domain image features into the domain-invariant branch network to obtain domain-invariant features, wherein the domain-invariant features are common features of multiple domains determined based on the source domain image features; inputting the source domain image features into the domain-specific branch network to obtain domain-specific features, wherein the domain-specific features are individual features of a domain determined based on the source domain image features; and generating source domain dynamic features based on the domain-invariant features and the domain-specific features.
[0116] For example, the domain-specific branch network includes a dynamic adapter and K convolutional layers, where K is a positive integer greater than 1. When the training module 73 inputs the source domain image features into the domain-specific branch network to obtain domain-specific features, it specifically performs the following steps: inputting the source domain image features into the dynamic adapter, which generates K weight values corresponding to the K convolutional layers, with each weight value corresponding to one of the K convolutional layers; inputting the source domain image features into each convolutional layer, which generates convolutional features based on the source domain image features and the corresponding weight values; and generating the domain-specific features based on the K convolutional features corresponding to the K convolutional layers.
[0117] For example, when the training module 73 determines the target loss value based on the first loss value, it is specifically used to: determine the minimum entropy loss value and the difference regularization loss value based on the K weight values; determine the third loss value based on the minimum entropy loss value and the difference regularization loss value; and determine the target loss value based on the first loss value and the third loss value.
[0118] For example, based on the first loss value, the training module 73 determines the target loss value using the following formula:
[0119]
[0120]
[0121]
[0122] Where, θ F θ represents the network parameters of the feature extraction network. D(X) L represents the network parameters of a dynamic modular network. ent Represents network parameter θ F and network parameters θ D(X) The minimum entropy loss value, L, is given below. div Represents network parameter θ F and network parameters θ D(X) The difference regularization loss value, L, is described below. IM Represents network parameter θ F and network parameters θ D(X) The third loss value mentioned below; W i Let K represent the K weight values for the i-th source domain image. This represents the K average weight values corresponding to N source domain images.
[0123] For example, the domain-invariant branch network includes convolutional layers, instance normalization layers, and activation layers. When the training module 73 inputs the source domain image features into the domain-invariant branch network to obtain domain-invariant features, it specifically performs the following steps: inputting the source domain image features into the convolutional layer to obtain convolutional features; inputting the convolutional features into the instance normalization layer to obtain normalized features; inputting the normalized features into the activation layer to obtain activated features; and generating the domain-invariant features based on the activated features.
[0124] Based on the same concept as the above method, this application proposes an electronic device, see [link to previous application]. Figure 8As shown, the electronic device includes a processor 81 and a machine-readable storage medium 82, the machine-readable storage medium 82 storing machine-executable instructions that can be executed by the processor 81; the processor 81 is used to execute the machine-executable instructions to implement the liveness detection method disclosed in the above example of this application.
[0125] Based on the same concept as the above method, this application also provides a machine-readable storage medium storing a plurality of computer instructions, which, when executed by a processor, can implement the liveness detection method disclosed in the above examples of this application.
[0126] The aforementioned machine-readable storage medium can be any electronic, magnetic, optical, or other physical storage device that can contain or store information, such as executable instructions, data, etc. For example, machine-readable storage media can be: RAM (Random Access Memory), volatile memory, non-volatile memory, flash memory, storage drives (such as hard disk drives), solid-state drives, any type of storage disk (such as optical discs, DVDs, etc.), or similar storage media, or combinations thereof.
[0127] The systems, devices, modules, or units described in the above embodiments can be implemented by a computer entity or by a product with a certain function. A typical implementation device is a computer, which can be a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email sending and receiving device, game console, tablet computer, wearable device, or any combination of these devices.
[0128] For ease of description, the above devices are described separately by function as various units. Of course, in implementing this application, the functions of each unit can be implemented in one or more software and / or hardware.
[0129] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of this application can take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0130] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0131] Furthermore, these computer program instructions can also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in the process. Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0132] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0133] The above description is merely an embodiment of this application and is not intended to limit the scope of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of the claims of this application.
Claims
1. A method for detecting liveness, characterized in that, The method includes: Acquire source domain images and natural images, wherein the source domain images include live objects and / or non-live objects; A source domain spectrum map is obtained by performing a frequency domain transformation on the source domain image, and a natural spectrum map is obtained by performing a frequency domain transformation on the natural image. A mixed spectrum map is generated based on the source domain spectrum map and the natural spectrum map, and a perturbed source domain image is obtained by performing an inverse frequency domain transformation on the mixed spectrum map. The network parameters of the initial liveness detection model are adjusted based on the source domain image and the perturbed source domain image to obtain the adjusted liveness detection model. The initial liveness detection model includes a feature extraction network, a dynamic module network, and a classifier network. The source domain image is input to the feature extraction network to obtain source domain image features. The source domain image features are input to the dynamic module network to obtain source domain dynamic features. The source domain dynamic features are input to the classifier network to obtain source domain output features. A first loss value is determined based on the source domain output features, and a target loss value is determined based on the first loss value. The feature extraction network, dynamic module network, and classifier network are then adjusted based on the target loss value. The network parameters of the intermediate liveness detection model are adjusted to obtain an intermediate liveness detection model. The perturbation source domain image is input into the feature extraction network of the intermediate liveness detection model to obtain perturbation source domain image features. The perturbation source domain image features are input into the dynamic module network of the intermediate liveness detection model to obtain perturbation source domain dynamic features. The perturbation source domain dynamic features are input into the classifier network of the intermediate liveness detection model to obtain perturbation source domain output features. A second loss value is determined based on the perturbation source domain output features. The network parameters of the feature extraction network, dynamic module network, and classifier network of the intermediate liveness detection model are adjusted based on the second loss value to obtain an adjusted liveness detection model. The target liveness detection model is determined based on the adjusted liveness detection model; wherein, the target liveness detection model is used to perform liveness detection on the image to be detected.
2. The method according to claim 1, characterized in that, The generation of a hybrid spectrum map based on the source domain spectrum map and the natural spectrum map includes: If the source domain spectrum includes the source domain amplitude spectrum and the source domain phase spectrum, and the natural spectrum includes the natural amplitude spectrum and the natural phase spectrum, then a perturbation amplitude spectrum is generated based on the source domain amplitude spectrum, the first perturbation coefficient of the source domain amplitude spectrum, the natural amplitude spectrum, and the second perturbation coefficient of the natural amplitude spectrum; wherein, the sum of the first perturbation coefficient and the second perturbation coefficient is a fixed value, and the second perturbation coefficient is determined based on the configured perturbation intensity; The hybrid spectrum is generated based on the perturbation amplitude spectrum and the source domain phase spectrum.
3. The method according to claim 1, characterized in that, The dynamic module network includes a domain-invariant branch network and a domain-specific branch network. The step of inputting the source domain image features into the dynamic module network to obtain source domain dynamic features includes: The source domain image features are input into the domain-invariant branch network to obtain domain-invariant features; wherein, the domain-invariant features are common features of multiple domains determined based on the source domain image features; The source domain image features are input into the domain-specific branch network to obtain domain-specific features; wherein, the domain-specific features are domain-specific features determined based on the source domain image features; Based on the domain-invariant features and the domain-specific features, the source domain dynamic features are generated.
4. The method according to claim 3, characterized in that, The domain-specific branch network includes a dynamic adapter and K convolutional layers, where K is a positive integer greater than 1. The step of inputting the source domain image features into the domain-specific branch network to obtain domain-specific features includes: The source domain image features are input to the dynamic adapter, which generates K weight values corresponding to the K convolutional layers. The K weight values correspond one-to-one with the K convolutional layers. The source domain image features are input to each convolutional layer, and the convolutional layer generates the convolutional features corresponding to the convolutional layer based on the source domain image features and the weight values corresponding to the convolutional layer. The domain-specific features are generated based on the K convolutional features corresponding to the K convolutional layers.
5. The method according to claim 4, characterized in that, Determining the target loss value based on the first loss value includes: The minimum entropy loss value and the difference regularization loss value are determined based on the K weight values; A third loss value is determined based on the minimum entropy loss value and the difference regularization loss value; The target loss value is determined based on the first loss value and the third loss value.
6. The method according to claim 5, characterized in that, Based on the first loss value, the target loss value is determined using the following formula: in, This represents the network parameters of the feature extraction network. Represents the network parameters of a dynamic modular network. Representing network parameters and network parameters The minimum entropy loss value mentioned below, Representing network parameters and network parameters The difference regularization loss value is as follows. Representing network parameters and network parameters The third loss value mentioned below; This represents the weight value for the i-th source domain image. This represents the k-th weight value for the i-th source domain image. This represents the average weight value of the k-th weight value corresponding to N source domain images.
7. The method according to claim 3, characterized in that, The domain-invariant branch network includes convolutional layers, instance normalization layers, and activation layers. The step of inputting the source domain image features into the domain-invariant branch network to obtain domain-invariant features includes: The source domain image features are input into the convolutional layer to obtain convolutional features; The convolutional features are input into the instance normalization layer to obtain normalized features; The normalized features are input into the activation layer to obtain the activated features; The domain-invariant features are generated based on the activated features.
8. A liveness detection device, characterized in that, The device includes: An acquisition module is used to acquire source domain images and natural images; wherein, the source domain images include live objects and / or non-live objects; The processing module is configured to perform frequency domain transformation on the source domain image to obtain a source domain spectrogram, perform frequency domain transformation on the natural image to obtain a natural spectrogram, generate a hybrid spectrogram based on the source domain spectrogram and the natural spectrogram, and perform inverse frequency domain transformation on the hybrid spectrogram to obtain a perturbed source domain image; The training module is used to adjust the network parameters of the initial liveness detection model based on the source domain image and the perturbed source domain image to obtain the adjusted liveness detection model, and to determine the target liveness detection model based on the adjusted liveness detection model; the target liveness detection model is used to perform liveness detection on the image to be detected; The initial liveness detection model includes a feature extraction network, a dynamic module network, and a classifier network. The training module adjusts the network parameters of the initial liveness detection model based on the source domain image and the perturbed source domain image. Specifically, the adjusted liveness detection model involves: inputting the source domain image into the feature extraction network to obtain source domain image features; inputting the source domain image features into the dynamic module network to obtain source domain dynamic features; inputting the source domain dynamic features into the classifier network to obtain source domain output features; determining a first loss value based on the source domain output features, and determining a target loss value based on the first loss value; and adjusting the network parameters of the feature extraction network and the dynamic module network based on the target loss value. The network parameters of the module network and the classifier network are adjusted to obtain an intermediate liveness detection model. The perturbation source domain image is input into the feature extraction network of the intermediate liveness detection model to obtain perturbation source domain image features. The perturbation source domain image features are input into the dynamic module network of the intermediate liveness detection model to obtain perturbation source domain dynamic features. The perturbation source domain dynamic features are input into the classifier network of the intermediate liveness detection model to obtain perturbation source domain output features. A second loss value is determined based on the perturbation source domain output features. The network parameters of the feature extraction network, the dynamic module network, and the classifier network of the intermediate liveness detection model are adjusted based on the second loss value to obtain an adjusted liveness detection model.
9. The apparatus according to claim 8, Its features are, in, When the processing module generates a hybrid spectrum based on the source domain spectrum and the natural spectrum, it specifically performs the following steps: if the source domain spectrum includes a source domain amplitude spectrum and a source domain phase spectrum, and the natural spectrum includes a natural amplitude spectrum and a natural phase spectrum, then a perturbation amplitude spectrum is generated based on the source domain amplitude spectrum, a first perturbation coefficient of the source domain amplitude spectrum, the natural amplitude spectrum, and a second perturbation coefficient of the natural amplitude spectrum; wherein the sum of the first perturbation coefficient and the second perturbation coefficient is a fixed value, and the second perturbation coefficient is determined based on a configured perturbation intensity; the hybrid spectrum is generated based on the perturbation amplitude spectrum and the source domain phase spectrum. The dynamic module network includes a domain-invariant branch network and a domain-specific branch network. When the training module inputs the source domain image features into the dynamic module network to obtain source domain dynamic features, it specifically performs the following steps: inputting the source domain image features into the domain-invariant branch network to obtain domain-invariant features, which are common features across multiple domains determined based on the source domain image features; inputting the source domain image features into the domain-specific branch network to obtain domain-specific features, which are unique features of a single domain determined based on the source domain image features; and generating source domain dynamic features based on the domain-invariant features and the domain-specific features. The domain-specific branch network includes a dynamic adapter and K convolutional layers, where K is a positive integer greater than 1. The training module, when inputting the source domain image features into the domain-specific branch network to obtain domain-specific features, specifically performs the following steps: inputting the source domain image features into the dynamic adapter, which generates K weight values corresponding to the K convolutional layers, with each weight value corresponding to one of the K convolutional layers; inputting the source domain image features into each convolutional layer, which generates its corresponding convolutional features based on the source domain image features and its corresponding weight values; and generating the domain-specific features based on the K convolutional features corresponding to the K convolutional layers. Specifically, when the training module determines the target loss value based on the first loss value, it is used to: determine the minimum entropy loss value and the difference regularization loss value based on the K weight values; determine the third loss value based on the minimum entropy loss value and the difference regularization loss value; and determine the target loss value based on the first loss value and the third loss value. Based on the first loss value, the training module determines the target loss value using the following formula: in, This represents the network parameters of the feature extraction network. Represents the network parameters of a dynamic modular network. Representing network parameters and network parameters The minimum entropy loss value mentioned below, Representing network parameters and network parameters The difference regularization loss value is as follows. Representing network parameters and network parameters The third loss value mentioned below; This represents the weight value for the i-th source domain image. This represents the k-th weight value for the i-th source domain image. This represents the average weight value corresponding to the k-th weight value of N source domain images; The domain-invariant branch network includes convolutional layers, instance normalization layers, and activation layers. Specifically, when the training module inputs the source domain image features into the domain-invariant branch network to obtain domain-invariant features, it performs the following steps: inputting the source domain image features into the convolutional layer to obtain convolutional features; inputting the convolutional features into the instance normalization layer to obtain normalized features; and inputting the normalized features into the activation layer to obtain activated features. The domain-invariant features are generated based on the activated features.
10. An electronic device, characterized in that, include: A processor and a machine-readable storage medium, the machine-readable storage medium storing machine-executable instructions that can be executed by the processor; The processor is configured to execute machine-executable instructions to implement the method of any one of claims 1-7.