An image processing method, device, storage medium and electronic equipment

By using an image detection model trained with asymmetric angular interval loss and network self-supervised loss in biometric technology, the robustness and versatility of liveness attack detection are improved, solving the problems of low model robustness and insufficient cross-domain generalization in existing technologies, and achieving more efficient prevention of liveness attacks.

CN115798057BActive Publication Date: 2026-06-23ALIPAY (HANGZHOU) INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
Filing Date
2022-11-03
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

In existing biometric technologies, the models for detecting liveness attacks have low robustness and insufficient cross-domain generalization, making it difficult to effectively prevent various liveness attack methods, especially under low-cost devices and complex application scenarios.

Method used

An initial image detection model containing a first feature processing network and a second feature processing network is adopted. The model is trained by asymmetric angular interval loss and network self-supervised loss, and the model parameters are adjusted to improve the image feature processing quality and cross-domain robustness, thereby achieving end-to-end generalization optimization.

Benefits of technology

It improves the robustness and versatility of liveness attack detection, ensures image detection performance in complex application scenarios, avoids the problem of low model robustness, and enhances the ability to prevent liveness attacks.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115798057B_ABST
    Figure CN115798057B_ABST
Patent Text Reader

Abstract

The specification discloses an image processing method, device, storage medium and electronic equipment, wherein the method comprises: inputting acquired living body sample images and attack sample images into an initial image detection model for model training, controlling the initial image detection model to adopt a first feature processing network to perform first feature processing on the living body sample images and a second feature processing network to perform second feature processing on the attack sample images, and adjusting model parameters of the initial image detection model based on asymmetric angle interval losses and network self-supervision losses corresponding to the first feature processing network and the second feature processing network, so as to obtain an image detection model corresponding to the initial image detection model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This specification relates to the field of computer technology, and in particular to an image processing method, apparatus, storage medium, and electronic device. Background Technology

[0002] With the rapid development of computer technology, biometric technology has been widely applied to people's production and daily lives. For example, facial recognition payment, facial access control, facial attendance, and facial recognition station entry all rely on biometrics. However, as biometric technology becomes more widely used, the need for liveness detection in biometric scenarios is becoming increasingly prominent. While facial attendance, facial recognition station entry, and facial recognition payment have been widely adopted, they have also brought new risks and challenges. The most common means of threatening the security of biometric systems is liveness attacks, which involve attempting to bypass image biometric verification through means such as device screens or printed photos. To detect liveness attacks, liveness prevention technology has become an essential component of biometric scenarios. Summary of the Invention

[0003] This specification provides an image processing method, apparatus, storage medium, and electronic device, the technical solutions of which are as follows:

[0004] Firstly, this specification provides an image processing method, the method comprising:

[0005] Acquire liveness sample images and attack sample images for the initial image detection model, wherein the initial image detection model includes a first feature processing network and a second feature processing network;

[0006] The live sample image and the attack sample image are input into the initial image detection model for model training, so as to control the initial image detection model to perform first feature processing on the live sample image using a first feature processing network, and to control the initial image detection model to perform second feature processing on the attack sample image using a second feature processing network;

[0007] Obtain the asymmetric angular interval loss and network self-supervised loss corresponding to the first feature processing network and the second feature processing network, and adjust the model parameters of the initial image detection model based on the asymmetric angular interval loss and network self-supervised loss to obtain the image detection model corresponding to the initial image detection model.

[0008] Secondly, this specification provides an image processing apparatus, the apparatus comprising:

[0009] The image acquisition module is used to acquire liveness sample images and attack sample images for the initial image detection model, wherein the initial image detection model includes a first feature processing network and a second feature processing network.

[0010] The model training module is used to input the live sample image and the attack sample image into the initial image detection model for model training, so as to control the initial image detection model to perform first feature processing on the live sample image using a first feature processing network, and to control the initial image detection model to perform second feature processing on the attack sample image using a second feature processing network;

[0011] The parameter adjustment module is used to obtain the asymmetric angle interval loss and network self-supervised loss corresponding to the first feature processing network and the second feature processing network, and adjust the model parameters of the initial image detection model based on the asymmetric angle interval loss and network self-supervised loss to obtain the image detection model corresponding to the initial image detection model.

[0012] Thirdly, this specification provides a computer storage medium storing a plurality of instructions adapted for loading by a processor and executing the above-described method steps.

[0013] Fourthly, this specification provides an electronic device that may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to execute the above-described method steps.

[0014] The beneficial effects of the technical solutions provided in some embodiments of this specification include at least the following:

[0015] In one or more embodiments of this specification, the electronic device inputs acquired liveness sample images and attack sample images into an initial image detection model for model training. By controlling the initial image detection model to perform first feature processing on the liveness sample images using a first feature processing network and second feature processing on the attack sample images using a second feature processing network, and adjusting the model parameters based on the asymmetric angle interval loss and network self-supervised loss corresponding to the first and second feature processing networks, an image detection model corresponding to the initial image detection model can be obtained. This avoids the phenomenon of low robustness of the model used for image detection, and can better assist the model in obtaining high-quality image features during the feature processing stage based on the asymmetric angle interval loss and network self-supervised loss, while improving the cross-domain robustness of the model at the data level. Furthermore, the constraint method of using the asymmetric angle interval loss assists the model in achieving end-to-end generalization optimization. Finally, it can improve the image detection effect of the model in complex application scenarios, ensuring the robustness and universality of liveness attack detection. Attached Figure Description

[0016] To more clearly illustrate the technical solutions in this specification or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this specification. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0017] Figure 1 This is a flowchart illustrating an image processing method provided in this manual;

[0018] Figure 2 This is a flowchart illustrating an image processing method provided in this manual;

[0019] Figure 3 This is a schematic diagram of a scenario involving sample collection provided in this manual;

[0020] Figure 4 This is a flowchart illustrating another image processing method provided in this manual;

[0021] Figure 5 This is a scene diagram illustrating a method for extracting a model object region as provided in this manual;

[0022] Figure 6 This is a schematic diagram of a model processing scenario involved in this manual;

[0023] Figure 7 This is a schematic diagram of the structure of an image processing device provided in this specification;

[0024] Figure 8 This is a schematic diagram of the structure of a model training module provided in this manual;

[0025] Figure 9 This is a structural diagram of a parameter adjustment module provided in this manual;

[0026] Figure 10 This is a schematic diagram of the structure of an electronic device provided in this specification;

[0027] Figure 11 This is a schematic diagram of the operating system and user space provided in this manual;

[0028] Figure 12 yes Figure 11 Architecture diagram of the Android operating system in China;

[0029] Figure 13 yes Figure 11 Architecture diagram of the iOS operating system. Detailed Implementation

[0030] The technical solutions in this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this specification, and not all embodiments. Based on the embodiments in this specification, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this specification.

[0031] In the description of this specification, it should be understood that the terms "first," "second," etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance. In the description of this specification, it should be noted that, unless otherwise expressly specified and limited, "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or devices. Those skilled in the art can understand the specific meaning of the above terms in this specification based on the specific circumstances. Furthermore, in the description of this specification, unless otherwise stated, "multiple" means two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, and B alone. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship.

[0032] In related technologies, image liveness detection scenarios such as image liveness detection and interactive recognition detection often combine multimodal image data to achieve accurate image liveness detection. These methods add more modalities to the camera, such as adding NIR and 3D modalities to the RGB modalities, and even thermal imaging modalities. Adding multiple modalities significantly enhances the performance of the entire liveness detection system and improves its ability to defend against various types of attacks. However, these methods have drawbacks: the overall cost of image liveness detection increases significantly, and the equipment requirements also increase, making them unsuitable for low-cost scenarios or those with less demanding equipment requirements. Furthermore, these techniques may require highly cooperative user actions such as head shaking and blinking under prompts for accurate image liveness detection, which is often performed in less than ideal environments. Therefore, image liveness detection technologies in these areas have significant limitations.

[0033] In related technologies, with the continuous development of facial recognition systems in recent years, "liveness attack detection" has become an indispensable part of facial recognition systems. "Liveness attack detection" can effectively intercept non-liveness attack samples, including mobile phone attacks, paper attacks, head models, etc. With the continuous improvement of object recognition scenarios such as facial recognition, there are more and more recognition objects such as user faces and their application scenarios. The types of changes and combinations of object attributes are also increasing. At the same time, various different acquisition devices and attack materials have also generated a variety of image texture features. This has brought great challenges to the generalization and accuracy of image liveness attack detection in related technologies. In the practical application stage, image liveness attack detection often faces problems such as low model robustness and insufficient cross-domain generalization. It can be seen that there are certain limitations in image liveness attack detection in related technologies.

[0034] The present specification will now be described in detail with reference to specific embodiments.

[0035] Please see Figure 1 This is a scene diagram of an image processing system provided in this specification. Figure 1 As shown, the image processing system may include at least a client cluster and a service platform 100.

[0036] The client cluster may include at least one client, such as Figure 1 As shown, it specifically includes client 1 corresponding to user 1, client 2 corresponding to user 2, ..., client n corresponding to user n, where n is an integer greater than 0.

[0037] Each client in a client cluster can be an electronic device with communication capabilities, including but not limited to: wearable devices, handheld devices, personal computers, tablets, in-vehicle devices, smartphones, computing devices, or other processing devices connected to a wireless modem. Electronic devices may have different names in different networks, such as: user equipment, access terminal, user unit, user station, mobile station, mobile station, remote station, remote terminal, mobile device, user terminal, terminal, wireless communication equipment, user agent or user device, cellular phone, cordless phone, personal digital assistant (PDA), and electronic devices in 5G networks or future evolved networks.

[0038] The service platform 100 can be a standalone server device, such as a rack-mount, blade, tower, or cabinet-type server device, or a workstation, mainframe, or other hardware device with strong computing power; or it can be a server cluster composed of multiple servers. The servers in the service cluster can be composed in a symmetrical manner, wherein each server is functionally and hierarchically equivalent in the transaction chain, and each server can provide services independently. The independent provision of services can be understood as not requiring the assistance of other servers.

[0039] In one or more embodiments of this specification, the service platform 100 can establish a communication connection with at least one client in the client cluster, and complete the data interaction during the image processing process based on the communication connection. For example, the service platform 100 can use the image detection obtained by the image processing method of this specification to deploy online to several clients in multiple transaction environments, and the clients can perform image recognition based on the image detection model. As another example, the service platform 100 can obtain the target detection image to be detected in the corresponding transaction environment (such as a liveness detection transaction environment) from the client, and then input the target detection image into the image detection model to extract the target image features corresponding to the target detection image and perform image detection based on the target image features, output the image detection category for the target detection image, the image detection category including one of a liveness image category and an attack image category, and can also distribute the image detection category to the client, etc.

[0040] It should be noted that the service platform 100 establishes a communication connection with at least one client in the client cluster via a network for interactive communication. This network can be a wireless network or a wired network. Wireless networks include, but are not limited to, cellular networks, wireless LANs, infrared networks, or Bluetooth networks. Wired networks include, but are not limited to, Ethernet, universal serial bus (USB), or controller area networks. In one or more embodiments of the specification, technologies and / or formats including Hyper Text Markup Language (HTML), Extensible Markup Language (XML), etc., are used to represent data exchanged over the network (such as target compressed packets). Furthermore, conventional encryption technologies such as Secure Socket Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), and Internet Protocol Security (IPsec) can be used to encrypt all or some links. In other embodiments, customized and / or dedicated data communication technologies can be used to replace or supplement the aforementioned data communication technologies.

[0041] The image processing system embodiments provided in this specification and the image processing methods described in one or more embodiments belong to the same concept. The execution entity corresponding to the image processing method involved in one or more embodiments of this specification can be the aforementioned service platform 100; the execution entity corresponding to the image processing method involved in one or more embodiments of this specification can also be a client, specifically determined based on the actual application environment. The implementation process of the image processing system embodiments can be detailed in the following method embodiments, and will not be repeated here.

[0042] based on Figure 1 The scene diagram shown below illustrates the image processing method provided by one or more embodiments of this specification.

[0043] Please see Figure 2 This specification provides a flowchart illustrating an image processing method according to one or more embodiments. This method can be implemented using a computer program and can run on an image processing device based on the von Neumann architecture. The computer program can be integrated into an application or run as a standalone utility application. The image processing device can be a service platform.

[0044] Specifically, the image processing method includes:

[0045] S102: Obtain liveness sample images and attack sample images for the initial image detection model, wherein the initial image detection model includes a first feature processing network and a second feature processing network;

[0046] In business environments such as liveness detection, facial recognition, and interactive image recognition, machine learning-based image detection models are often used to further process image data generated in the corresponding business environment.

[0047] In one or more embodiments of this specification, the image detection model is applied to different image detection scenarios based on the liveness detection task. That is, the image recognition model can be a model applicable to image recognition scenarios under different machine vision. In actual image detection or recognition, the image recognition model can determine whether the current image data to be detected is liveness data or attack data. For example, the current image data to be detected can be facial image data collected from a user's face in a facial recognition scenario. Based on this, the image detection model can determine whether it is a liveness category corresponding to a real user or an attack category based on the facial image data. Attack category image data mainly includes images that are forged into real liveness using attack methods such as photos, mobile phones, screens, and masks.

[0048] In related technologies, image detection for the purpose of liveness detection is a method to determine the true physiological characteristics of an object in some identity verification scenarios. In facial recognition applications, liveness detection often uses combinations of actions such as blinking, opening the mouth, shaking the head, and nodding to verify whether the user is a real, living person. Liveness attack detection tasks need to effectively resist common liveness attack methods such as photos, face swapping, masks, occlusion, and screen replays, thereby helping users identify fraudulent behavior and protecting their interests. Based on this, the image detection model obtained by the image detection method in this specification can be applied to the liveness attack detection task to improve the liveness detection effect.

[0049] In practical applications, an initial image detection model can be created based on a machine learning model in response to a liveness attack detection task. A large amount of sample data can be obtained in advance. The sample data can be divided into multiple liveness class sample images and attack class sample images. The initial image detection model is trained on the liveness class sample images and attack class sample images until the initial image detection model meets the model termination condition, and a trained image detection model can be obtained.

[0050] To illustrate, for liveness-type sample images and attack-type sample images, we can list various combinations of material domain textures resulting from different acquisition devices (such as mobile phones, cameras, laptops, etc.) and different object image materials (such as the paper material containing the object, the real image material containing the object, the film material containing the object, the display screen material containing the object, etc.). Different combinations will produce different effects, such as lighting, sharpness, and reflection effects. Furthermore, data in environments with different material domain texture combinations can be divided into attack-type sample images and liveness-type sample images, such as... Figure 3 As shown, Figure 3 This is a schematic diagram of a scenario involving sample collection. Figure 3 In the image acquisition module, the device column corresponds to various devices used for image acquisition in different application scenarios, such as mobile phones, cameras, and laptops. The material column corresponds to various material types that may appear in different application scenarios, such as real image material containing objects, film material containing objects, and display screen material containing objects. By randomly combining these device and material type items, various material domain texture combinations can be obtained. Furthermore, images with material domain texture combinations can be classified into live sample images and attack sample images.

[0051] S104: Input the live sample image and the attack sample image into the initial image detection model for model training, so as to control the initial image detection model to perform first feature processing on the live sample image using a first feature processing network, and control the initial image detection model to perform second feature processing on the attack sample image using a second feature processing network;

[0052] The initial image detection model can be created by fitting one or more machine learning models, such as Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Networks (RNN), embedding models, Gradient Boosting Decision Tree (GBDT) models, and Logistic Regression (LR) models.

[0053] Understandably, initial image detection models typically perform image feature processing on input sample images to extract image features. These extracted features are then used for subsequent model detection and classification. In practical applications, it is expected that the image features extracted by the model will exhibit high quality, strong generalization, adaptability to complex image environments, and cross-domain capabilities. In practical applications, the model's ability to extract high-quality image features from sample images carries significant weight for image detection. Typically, high-quality image features are essential for accurate identification of liveness attacks.

[0054] During the model training phase, the initial image detection model may contain multiple feature processing networks. Schematic, the initial image detection model may contain a first feature processing network and a second feature processing network for performing feature processing on the input sample image.

[0055] Indicatively, the first feature processing network is used to perform feature processing such as feature extraction on the live sample image in each round to obtain the first image feature corresponding to the live sample image. The first image feature can be represented in the form of feature vector, feature map, etc. The first feature processing network can be composed of multiple sub-neural networks based on machine learning models. The specific network architecture of the first feature processing network is not specifically limited here, and can be any form of image feature processing for live sample images based on related technologies.

[0056] In illustrative terms, the second feature processing network is used to perform feature processing such as feature extraction on the attack sample images in each round to obtain the second image features corresponding to the attack sample images. The second image features can be represented in the form of feature vectors, feature maps, etc. The second feature processing network can be composed of multiple sub-neural networks based on machine learning models. The specific network architecture of the second feature processing network is not specifically limited here, and can be any form of image feature processing for attack sample images based on related technologies.

[0057] In each round of model training, liveness sample images and attack sample images are input into the initial image detection model for model training. This controls the initial image detection model to use a first feature processing network to perform first feature processing on the liveness sample images, obtaining the first image features corresponding to the liveness sample images. Based on the first image features, liveness attack detection is performed to obtain the sample image detection category. And / or attack sample images are input into the initial image detection model for model training, controlling the initial image detection model to use a second feature processing network to perform second feature processing on the attack sample images, obtaining the second image features corresponding to the attack sample images. Based on the second image features, liveness attack detection is performed to obtain the sample image detection category. The above is the forward propagation process based on sample images in the model training process.

[0058] The sample image detection category includes either the liveness image category or the attack image category.

[0059] S106: Obtain the asymmetric angular interval loss and network self-supervised loss corresponding to the first feature processing network and the second feature processing network, and adjust the model parameters of the initial image detection model based on the asymmetric angular interval loss and network self-supervised loss to obtain the image detection model corresponding to the initial image detection model.

[0060] In each round of model training, backpropagation learning is used to obtain the asymmetric angular interval loss and network self-supervised loss corresponding to the first feature processing network and the second feature processing network. The model parameters of the initial image detection model are adjusted based on the asymmetric angular interval loss and network self-supervised loss until the model training termination condition of the initial image detection model is met, thus obtaining the image detection model corresponding to the initial image detection model.

[0061] In one or more embodiments of this specification, the network self-supervised loss can be an independent loss measure for the first feature processing network and the second feature processing network, respectively. That is, during the backpropagation learning process, the first network self-supervised loss corresponding to the first feature processing network and the second network self-supervised loss corresponding to the second feature processing network are determined respectively. The network self-supervised loss can be robust to issues such as image forgery and hyperparameter changes.

[0062] In one or more embodiments of this specification, the network self-supervised loss can be an overall loss measure for the first feature processing network and the second feature processing network. That is, during the backpropagation learning process, the network self-supervised loss corresponding to both the first and second feature processing networks is determined. The network self-supervised loss can be robust to issues such as image forgery and hyperparameter changes.

[0063] Optionally, the network self-supervised loss can be calculated using a time-based self-supervised loss function from related technologies, or it can be calculated using a contrast-based self-supervised loss function from related technologies.

[0064] The asymmetric angular interval loss employs different angular interval parameters for the first and second feature processing networks, using an asymmetric constraint to measure the loss of these networks. This asymmetric supervised interval loss maximizes the angular interval between extracted image features from live and attack sample images, while minimizing the angular interval between features within the live sample image category and features within the attack sample image category. This makes the extracted features more discriminative, facilitating subsequent live / attack detection classification. In some scenarios, the asymmetric angular interval loss can be viewed as an asymmetric loss constraint. By introducing an asymmetric constraint into the loss function, constraints are not applied to both the live and attack categories in the live / attack detection task. Instead, the loss function and the set angular interval parameters are applied only to specific cases, thus improving the model's cross-domain robustness.

[0065] In a schematic manner, the initial image detection model is trained using offline sample object data. That is, each sample object data is input into the initial image detection model for training. During the model training process, the asymmetric angle interval loss and network self-supervised loss corresponding to the first feature processing network and the second feature processing network are determined and obtained. Based on the asymmetric angle interval loss and network self-supervised loss, the total model loss (which may also be called the model comprehensive loss in some embodiments) can be obtained. Based on the total model loss, the backpropagation algorithm is used to adjust the model parameters of the initial image detection model until the initial image detection model meets the model termination condition. Then, the trained image detection model can be obtained. The image detection model can then be deployed online to the actual image detection scenario to perform image detection on the corresponding target image data.

[0066] In one or more embodiments of this specification, the model termination condition may include, for example, the value of the loss function being less than or equal to a preset loss function threshold, or the number of iterations reaching a preset threshold. The specific model termination condition can be determined based on actual circumstances and is not specifically limited here.

[0067] Understandably, after the model training ends when the training conditions are met, the image detection model is deployed to the target transaction environment to obtain the target detection image in the target transaction environment.

[0068] The target transaction environment can be an environment that requires liveness attack detection, such as liveness detection, facial recognition, or interactive image recognition.

[0069] The target detection image is an image in the target transaction environment that is to be detected for liveness attacks, such as facial image data collected from a user's face in a facial recognition scenario. Based on this, an image detection model can determine whether it is a liveness category corresponding to a real user or an attack category based on the facial image data. Image data for attack categories mainly include images that are forged into real liveness using attack methods such as photos, mobile phones, screens, and masks.

[0070] Furthermore, the electronic device inputs the target detection image into the image detection model to extract the target image features corresponding to the target detection image and performs image detection based on the target image features, and outputs the image detection category for the target detection image, wherein the image detection category includes one of a liveness image category and an attack image category.

[0071] In one or more embodiments of this specification, the electronic device inputs acquired liveness sample images and attack sample images into an initial image detection model for model training. By controlling the initial image detection model to perform first feature processing on the liveness sample images using a first feature processing network and second feature processing on the attack sample images using a second feature processing network, and adjusting the model parameters based on the asymmetric angle interval loss and network self-supervised loss corresponding to the first and second feature processing networks, an image detection model corresponding to the initial image detection model can be obtained. This avoids the phenomenon of low robustness of the model used for image detection, and can better assist the model in obtaining high-quality image features during the feature processing stage based on the asymmetric angle interval loss and network self-supervised loss, while improving the cross-domain robustness of the model at the data level. Furthermore, the constraint method of using the asymmetric angle interval loss assists the model in achieving end-to-end generalization optimization. Finally, it can improve the image detection effect of the model in complex application scenarios, ensuring the robustness and universality of liveness attack detection.

[0072] Please see Figure 4 , Figure 4 This is a schematic flowchart of another embodiment of an image processing method proposed in one or more embodiments of this specification. Specifically:

[0073] S202: Obtain liveness sample images and attack sample images for the initial image detection model. The initial image detection model includes a first feature processing network and a second feature processing network. Input the liveness sample images and the attack sample images into the initial image detection model for model training.

[0074] For details, please refer to the method steps of other embodiments described herein, which will not be repeated here.

[0075] S204: Control the initial image detection model to extract object regions from the live sample image to obtain at least one set of first object region blocks, and use a first feature processing network to perform first feature extraction processing on the first object region blocks;

[0076] Understandably, during the model training phase, for each liveness sample image, an object region extraction method is used to extract multiple object region blocks from the liveness sample image. These object region blocks are image blocks containing a portion of the entire liveness sample image, such as... Figure 5 As shown, Figure 5 This is a scene diagram illustrating the extraction of a model object region. Figure 5 In this context, the liveness sample image is a type of facial image. During the training process, the initial image detection model can extract object regions from the liveness sample facial image, extracting three sets of first object region blocks. Each object region block can represent the image texture characteristics of a fixed region in the entire liveness sample image. These image texture characteristics can reflect potential image characteristics, which helps the feature processing network to extract deep-level important image features for distinguishing between liveness and attack types through model training.

[0077] The first object region block is a block image of a certain region obtained after performing object region extraction on the live sample image. Generating multiple image texture feature regions will improve the overall robustness of the model and ensure that the model does not converge to certain specific regions when directly extracting features from the entire image.

[0078] Furthermore, the initial image detection model extracts the object region from the live sample image to obtain at least one set of first object region blocks, and then uses the first feature processing network to extract image features from each set of first object region blocks to obtain first image features;

[0079] Indicatively, each block can represent the image texture characteristics of a fixed region. Generating regions with multiple image texture characteristics will improve the overall robustness of the model, preventing the model from converging to certain specific regions.

[0080] In one feasible implementation, the initial image detection model can be controlled to randomly extract the object region of the live sample image using a random block extraction method to obtain multiple sets of first object region blocks.

[0081] As an illustration, three different groups of face regions are generated for each sample by randomly extracting blocks;

[0082] In one feasible implementation, the initial image detection model can be controlled to randomly extract object regions from the live sample image using a fixed-position block extraction method to obtain multiple sets of first object region blocks. For example, by setting a fixed position for the image object, the image block at the fixed position is extracted from the input live sample image.

[0083] S206: Obtain a first image feature of the first feature processing network for at least one of the live sample images, wherein the first image feature is an image feature generated by the first feature processing network performing a first feature extraction process on the live sample image;

[0084] In one or more embodiments of this specification, the first feature processing network can directly perform first feature processing on the input liveness sample image of the initial image detection model to obtain the first image features;

[0085] In one or more embodiments of this specification, a first feature processing network can perform first feature processing on each group of first object region blocks of the initial image detection model to obtain first image features;

[0086] S208: Control the initial image detection model to extract object regions from the attack-type sample image to obtain at least one set of second object region blocks, and use a second feature processing network to perform second feature extraction processing on the second object region blocks.

[0087] Understandably, during the model training phase, for each attack-type sample image, an object region extraction method is used to extract multiple object region blocks from the attack-type sample image. These object region blocks are image blocks containing a portion of the entire attack-type sample image, such as... Figure 5 As shown, Figure 5 This is a scene diagram illustrating the extraction of a model object region. Figure 5 In this model, the attack-type sample image is a facial image. During the training process, the initial image detection model can extract the object region from the attack-type sample facial image and extract three sets of second object region blocks. Each object region block can represent the image texture characteristics of a fixed region in the entire attack-type sample image. These image texture characteristics can feed back potential image characteristics, which helps the feature processing network to extract deep-level important image features for distinguishing between attack-type and attack-type samples through model training.

[0088] The second object region block is a block image of a certain region obtained after performing object region extraction on the attack-type sample image. Generating multiple image texture feature regions will improve the overall robustness of the model and ensure that the model does not converge to certain specific regions when directly extracting features from the entire image.

[0089] Furthermore, the initial image detection model extracts object regions from attack-type sample images to obtain at least one set of second object region blocks. Then, the second feature processing network extracts image features from each set of second object region blocks to obtain second image features.

[0090] Indicatively, each block can represent the image texture characteristics of a fixed region. Generating regions with multiple image texture characteristics will improve the overall robustness of the model, preventing the model from converging to certain specific regions.

[0091] In one feasible implementation, the initial image detection model can be controlled to randomly extract the object region of the attack sample image using a random block extraction method to obtain multiple sets of second object region blocks.

[0092] As an illustration, three different groups of face regions are generated for each sample by randomly extracting blocks;

[0093] In one feasible implementation, the initial image detection model can be controlled to randomly extract the object region of the attack sample image using a fixed-position block extraction method to obtain multiple sets of second object region blocks. For example, by setting a fixed position of the image object, the image block at the fixed position of the input attack sample image can be extracted.

[0094] S210: Obtain second image features of the second feature processing network for at least one of the attack-type sample images, wherein the second image features are image features generated by the second feature processing network performing second feature extraction processing on the attack-type sample images;

[0095] In one or more embodiments of this specification, the second feature processing network can directly perform second feature processing on the input attack sample image of the initial image detection model to obtain second image features;

[0096] In one or more embodiments of this specification, the second feature processing network can perform second feature processing on each group of second object region blocks of the initial image detection model to obtain second image features;

[0097] S212: Determine a first network self-supervised loss for the first feature processing network based on the first image features, and determine a second network self-supervised loss for the second feature processing network based on the second image features;

[0098] According to some embodiments, the network self-supervised loss can be an independent loss measure for the first feature processing network and the second feature processing network, respectively. That is, during the backpropagation learning process, the first network self-supervised loss corresponding to the first feature processing network and the second network self-supervised loss corresponding to the second feature processing network are determined respectively. The network self-supervised loss can be robust to image forgery and hyperparameter changes.

[0099] In one feasible implementation, the network supervision loss for the first feature processing network can be calculated based on any pairwise first image features.

[0100] Schematic illustration: During model training, a fourth image feature corresponding to each third image feature can be determined from all first image features, wherein the third image feature is one of all first image features, and the fourth image feature is an image feature other than the third image feature among all first image features; and a first network self-supervised loss for the first feature processing network is determined based on the third image feature and the fourth image feature using a first loss calculation formula.

[0101] The first loss calculation formula satisfies the following formula:

[0102]

[0103] Wherein, L1 is the self-supervised loss of the first network, and f a For the third image feature, f b The fourth image feature is defined as n1, where n1 is the total number of the first image features.

[0104] In some embodiments, model training is typically performed in multiple batches of samples, where n1 can be the total number of first image features corresponding to the current batch of samples. In some embodiments, a live sample image can correspond to multiple first object region blocks, while the feature extraction network typically performs feature processing on the multiple first object region blocks to correspond to only one first image feature.

[0105] Understandably, the self-supervised similarity form shown in the first loss calculation formula is used to constrain the distance between any two extracted image features in the angular space for the first feature processing network, thereby refining the similarity supervision between sample images into similarity supervision between region blocks.

[0106] In one feasible implementation, the network supervision loss for the second feature processing network can be calculated based on any pairwise second image features.

[0107] The step of determining the second network self-supervised loss for the second feature processing network based on the second image features can be:

[0108] Determine a sixth image feature corresponding to each fifth image feature from all the second image features, wherein the fifth image feature is one of all the second image features, and the sixth image feature is an image feature other than the fifth image feature from all the second image features; determine a second network self-supervised loss for the second feature processing network based on the fifth image feature and the sixth image feature using a second loss calculation formula;

[0109] The second loss calculation formula satisfies the following formula:

[0110]

[0111] Wherein, L2 is the self-supervised loss of the second network, and f c For the fifth image feature, f d The sixth image feature is n2, and n2 is the total number of the second image features.

[0112] In some embodiments, model training is typically performed in multiple batches of samples, where n2 can be the total number of second image features corresponding to the current batch of samples. In some embodiments, attack-type sample images can correspond to multiple second object region blocks, while the feature extraction network typically performs feature processing on the multiple second object region blocks to correspond to only one second image feature.

[0113] Understandably, the self-supervised similarity form shown in the second loss calculation formula above is used to constrain the distance between any two extracted image features in the angle space of the second feature processing network. This realizes the fine-grained supervision of similarity between sample images to the supervision of similarity between region blocks. The network's self-supervised loss form can be robust to image forgery and hyperparameter changes.

[0114] S214: Determine the asymmetric angle interval loss corresponding to the first feature processing network and the second feature processing network based on the first image features and the second image features.

[0115] The asymmetric angular interval loss can be determined by the first symmetric angular interval loss corresponding to the first feature processing network and the second symmetric angular interval loss corresponding to the second feature processing network.

[0116] In one or more embodiments of this specification, the first image feature and the second image feature can be in the form of a feature map, such as "C*H*W", where C is a channel parameter, H represents a length or height parameter, and W represents a width parameter; in some embodiments, the first image feature and the second image feature can be understood as high-dimensional feature vectors mapped to angular space.

[0117] Understandably, during the model training phase, the initial image detection model employs an asymmetric classification training method, using asymmetric angular interval loss to adjust model parameters. The first and second feature processing networks of the initial image detection model are trained on sample images of different classes, respectively. Compared to related technologies that train on different classes of sample images based on the same model structure for feature processing, this approach can significantly improve the model's generalization ability and training efficiency. This method of training separate feature processing model structures based on different classes of sample images achieves an end-to-end training framework for different classes of images, greatly optimizing the existing training framework. In this way, during the actual training phase, the liveness and attack image features extracted by the model can be more discriminative. At the same time, the asymmetric angular interval loss constrains the model parameters. During the model training process, the integrated parameters of the feature processing part can achieve good cross-domain robustness.

[0118] The following explains the calculation process for asymmetric angular spacing loss:

[0119] A2: The electronic device can acquire at least one live sample image corresponding to a first category label and a first image feature, determine the live class angle interval loss and the live class angle interval loss bias corresponding to the first category label and the first image feature in the angle space, and determine a first asymmetric angle interval loss based on the live class angle interval loss and the live class angle interval loss bias.

[0120] The first category label can be understood as the sample label annotated for the live sample image during the model training phase. The first category label has been determined during the sample collection phase, and the first category label at least indicates that the classification category of the live sample image is live.

[0121] The liveness-class angular spacing loss is further obtained by mapping the first category label and the first image features to the same angular space and calculating their angular distance based on the spacing loss calculation formula.

[0122] The live class angular interval loss bias is equivalent to a loss bias for the live class angular interval, which is used to perturb the model during the model training and learning process to improve the robustness of the model.

[0123] Indicatively, the following can be achieved: obtaining the liveness class angular interval parameters and angular scaling hyperparameters; determining the first label parameter features corresponding to the first category label based on the first feature processing network; determining the liveness class angular interval loss corresponding to the first category label and the first image features in the angular space using a third interval loss calculation formula based on the liveness class angular interval parameters, angular scaling hyperparameters, and the first label parameter features; and determining the liveness class angular interval loss bias corresponding to the first category label and the first image features in the angular space using a third interval bias loss calculation formula based on the liveness class angular interval parameters, angular scaling hyperparameters, and the first label parameter features.

[0124] The formula for calculating the third interval loss satisfies the following formula:

[0125]

[0126] Wherein, the L 3a The angular interval loss is the living organism type, where s is the angular scaling hyperparameter, and y is the... i The first category label is the label of the i-th live sample image corresponding to the first image feature, where T is the feature parameter in angular space, and f is the first category label. i The first image feature corresponding to the i-th live sample image, the The first label parameter feature corresponding to the i-th first category label in the angle space, the The m is the label classification angle in angular space corresponding to the liveness class image and the first category label of the first image feature. l The angular interval parameter for the living organism;

[0127] Understandably, the training and learning process of the living class angle interval parameter and angle scaling hyperparameter model has been determined and used for loss constraints.

[0128] The living organism angular interval parameter m l The loss constraint is set at intervals of m1 to control the image features between categories during the model parameter adjustment phase; the angle scaling hyperparameter s is equivalent to an angle scaling factor, which is used to effectively distinguish distribution differences and improve convergence speed.

[0129] The first label parameter features are the model parameters of the first feature processing network when the first feature processing network measures the characteristics of the label and the first image features, such as model weight parameters, neuron feature parameters, etc.

[0130] Understandably, the first feature processing network takes a liveness sample image or several block regions based on a liveness sample as input, and the output of the first feature processing network is the first image feature. During the calculation of the model loss, the first image feature and the first category label are mapped to the angle space through the initial image model. The first label parameter feature corresponding to the first category label in the angle space can also be obtained through the... It is indicated that the label classification angle corresponding to the first label parameter features and the first image features in the angle space can be calculated, and then the living class angle interval loss can be determined based on the above-mentioned third interval loss calculation formula.

[0131] Furthermore, the electronic device uses a third interval bias loss calculation formula to determine the live class angle interval loss bias corresponding to the first category label and the first image feature in the angle space based on the live class angle interval parameter, the angle scaling hyperparameter, and the first label parameter features.

[0132] The formula for calculating the third interval bias loss satisfies the following formula:

[0133]

[0134] Wherein, the L 3b The angular interval loss bias of the living class is given, where j is the label number of the j-th label among all the labels of the first category. The first label parameter feature corresponding to the j-th first category label in the angle space, the The label classification angle in the angle space corresponding to the j-th first category label and the i-th live sample image;

[0135] Furthermore, during model training, after determining the liveness class angular interval loss and the liveness class angular interval loss bias, the electronic device performs the step of determining the first asymmetric angular interval loss based on the liveness class angular interval loss and the liveness class angular interval loss bias, which may be:

[0136] The first asymmetric angle interval loss is determined using the first asymmetric calculation formula based on the aforementioned living-type angle interval loss and the living-type angle interval loss bias.

[0137] The first asymmetric calculation formula satisfies the following formula:

[0138]

[0139] Wherein, the L 3a For the angular interval loss of the living organism, the L 3b For the angular interval loss bias of the living organism, the L 5aThis is the first asymmetric angular interval loss;

[0140] A4: The electronic device can acquire at least one second category label and second image feature corresponding to the attack type sample image, determine the attack type angle interval loss and attack type angle interval loss bias corresponding to the second category label and the second image feature in the angle space, and determine the second asymmetric angle interval loss based on the attack type angle interval loss and attack type angle interval loss bias.

[0141] The second category label can be understood as the sample label annotated for the attack-type sample image during the model training phase. The second category label has been determined during the sample collection phase, and the second category label at least indicates that the attack-type sample image belongs to the attack category.

[0142] The attack-type angular interval loss is obtained by further calculating the angular distance between the second category label and the second image features by mapping them to the same angular space using the interval loss calculation formula.

[0143] The attack-type angle interval loss bias is equivalent to a loss bias for the attack-type angle interval, which is used to perturb the model during the model training and learning process to improve the robustness of the model.

[0144] In a schematic manner, the attack class angle interval parameter and angle scaling hyperparameter are obtained; the second label parameter feature corresponding to the second category label is determined based on the second feature processing network; the attack class angle interval loss corresponding to the second category label and the second image feature in the angle space is determined by the fourth interval loss calculation formula based on the attack class angle interval parameter, angle scaling hyperparameter and the second label parameter feature; and the attack class angle interval loss bias corresponding to the second category label and the second image feature in the angle space is determined by the fourth interval bias loss calculation formula based on the attack class angle interval parameter, angle scaling hyperparameter and the second label parameter feature.

[0145] The formula for calculating the fourth interval loss satisfies the following formula:

[0146]

[0147] Wherein, the L 4a The angle interval loss is the angle of the attack type, where s is the angle scaling hyperparameter, and Y is the angle spacing loss. i The second category label is the label of the i-th attack class sample image corresponding to the second image feature, where T is the feature parameter in angular space, and F is the second category label. i The second image feature corresponding to the i-th attack class sample image, the The second label parameter features corresponding to the i-th second category label in the angle space, the The m represents the label classification angle in angular space between the attack class image corresponding to the second image feature and the second category label. S The angle interval parameter for the attack type;

[0148] Understandably, the training and learning process of the attack class angle interval parameter and angle scaling hyperparameter model has been determined and used for loss constraints.

[0149] The attack type angle interval parameter m S To control image features between categories during the model parameter tuning phase, using m S The loss is constrained by the interval; the angle scaling hyperparameter s is equivalent to an angle scaling factor, which is used to effectively distinguish different classes and the distribution differences between classes, thereby improving the convergence speed of the network.

[0150] The second label parameter features are the model parameters of the second feature processing network when the second feature processing network measures the characteristics of the label and the second image features, such as model weight parameters, neuron feature parameters, etc.

[0151] Furthermore, the formula for calculating the fourth interval bias loss satisfies the following formula:

[0152]

[0153] Wherein, the L 4b The angle interval loss bias of the attack class, where J is the label number of the Jth label among all the second category labels, is... The second label parameter features corresponding to the Jth second category label in the angular space, the The label classification angle is the angle space corresponding to the Jth second category label and the ith attack class sample image.

[0154] Furthermore, during model training, after determining the liveness class angular interval loss and the liveness class angular interval loss bias, the electronic device performs the step of determining the second asymmetric angular interval loss based on the attack class angular interval loss and the attack class angular interval loss bias, which may be:

[0155] The second asymmetric angle interval loss is determined using the second asymmetric calculation formula based on the attack-type angle interval loss and the attack-type angle interval loss bias.

[0156] The second asymmetric calculation formula satisfies the following formula:

[0157]

[0158] Wherein, the L 4a For the attack-type angle interval loss, the L 4b For the attack-type angular interval loss bias, the L 5b This is the second asymmetric angular interval loss;

[0159] A6: The electronic device determines the asymmetric angle interval loss corresponding to the first feature processing network and the second feature processing network based on the first asymmetric angle interval loss and the second asymmetric angle interval loss.

[0160] In one feasible implementation, the electronic device performing the determination of the asymmetric angle interval loss corresponding to the first feature processing network and the second feature processing network based on the first asymmetric angle interval loss and the second asymmetric angle interval loss can be:

[0161] Based on the first asymmetric angle interval loss corresponding to the live sample image and the second asymmetric angle interval loss corresponding to the attack sample image, the sixth asymmetric loss calculation formula is used to determine the asymmetric angle interval loss for the first feature processing network and the second feature processing network.

[0162] The sixth asymmetric loss calculation formula satisfies the following formula:

[0163]

[0164] Wherein, L6 is the asymmetric angular interval loss, N is the total number of sample images corresponding to the liveness class sample images and the attack class sample images, and L... 5a For the second asymmetric angular spacing loss, the L 5b This is the second asymmetric angular interval loss.

[0165] In the above steps, since asymmetric angular interval loss is introduced for the first feature processing network and the second feature processing network, the asymmetric angular interval loss calculated in the above manner can greatly constrain the model training effect of the first feature processing network and the second feature processing network in the feature processing stage, improve the model's ability to distinguish between different image features between and outside classes, and extract high-quality image features to assist the remaining network architecture parts of the initial detection model (such as the classifier) ​​to achieve accurate classification based on high-quality image features, thereby improving the model's cross-domain generalization ability.

[0166] S216: Adjust the model parameters of the initial image detection model based on the asymmetric angle interval loss and network self-supervised loss to obtain the image detection model corresponding to the initial image detection model.

[0167] In one or more embodiments of this specification, the network self-supervised loss includes a first network self-supervised loss and a second network self-supervised loss.

[0168] During model training, the asymmetric angular interval loss and network self-supervised loss of the initial image detection model are calculated. Based on the asymmetric angular interval loss and network self-supervised loss, the overall model loss in the initial image detection model can be determined at least. Based on this overall model loss, the backpropagation algorithm is used to adjust the model parameters of the initial image detection model until the initial image detection model meets the model termination condition. Then, the trained image detection model can be obtained and deployed to the actual image detection scenario to perform image detection on the corresponding target image data.

[0169] Schematic illustration: The electronic device performing the model parameter adjustment of the initial image detection model based on the asymmetric angular interval loss and network self-supervised loss can be as follows:

[0170] Obtain a first weighting factor for the asymmetric angle interval loss and a second weighting factor for the asymmetric angle interval loss. Based on the first weighting factor, the second weighting factor, the asymmetric angle interval loss, and the network self-supervised loss, use the seventh loss calculation formula to determine the model comprehensive loss. Based on the model comprehensive loss, adjust the model parameters of the initial image detection model.

[0171] The seventh loss calculation formula satisfies the following formula:

[0172] L7 = AL ASym +BL ASim

[0173] Wherein, L7 is the model comprehensive loss, A is the first weighting factor, and L... ASym The asymmetric angular interval loss is defined as B, where B is the second weighting factor, and L is the weighting factor. ASim The network self-supervised loss is given.

[0174] In one or more embodiments of this specification, the first and second feature processing networks of the initial image detection model are independently trained using liveness sample images and attack sample images, respectively. During model training, the two independently trained first and second feature processing networks can share model weight parameters. Independent training can quickly enable the feature processing network to accurately process features of images of a certain class. Sharing model weight parameters only needs to be performed during a certain period of model training to enable the network to have cross-domain processing capabilities, thereby improving the model's generalization ability and training efficiency. In this way, during the actual training phase, the liveness image features and attack image features extracted by the model can be more discriminative. At the same time, the model parameters are constrained by the asymmetric angle interval loss. During the model training process, the feature processing parameters are integrated through weight parameter sharing, which can achieve good cross-domain robustness.

[0175] Indicative, such as Figure 6 As shown, Figure 6 This is a schematic diagram illustrating a model processing scenario involved in this specification. For the first and second feature processing networks of the initial image detection model, different types of attack-type sample images and live-type sample images are collected and input into the initial image detection model. The initial image detection model extracts several region blocks from the attack-type sample images / live-type sample images. The first and second feature processing networks of the initial image detection model are controlled to perform independent training on different categories using the corresponding region blocks from the live-type sample images and attack-type sample images, respectively. During model training, the asymmetric angular interval loss of the first and second feature processing networks is calculated. The algorithm calculates the network self-supervised loss of the first and second feature processing networks. Based on the asymmetric angle interval loss and the network self-supervised loss, the overall model loss in the initial image detection model can be determined at least. At the same time, the model weight parameters of the two independently trained first and second feature processing networks are shared at the reference training rounds. Based on the aforementioned overall model loss, the backpropagation algorithm is used to adjust the model parameters of the initial image detection model until the initial image detection model meets the model termination condition. The trained image detection model can then be deployed online to a real image detection scenario to perform image detection on the corresponding target image data.

[0176] Furthermore, during model training, at least one reference training epoch can be determined for the first feature processing network and the second feature processing network, and model weight parameter sharing processing can be performed on the first feature processing network and the second feature processing network based on the reference training epoch.

[0177] The reference training round number is used to indicate the sharing of model weight parameters between the first feature processing network and the second feature processing network. Assuming that the determined reference training round number is k (k is a natural number), it indicates that the model weight parameters of the first feature processing network and the second feature processing network are shared during the kth round of model training. The number of reference training rounds can be multiple.

[0178] Optionally, the reference training rounds can be fixed, and a fixed reference training rounds can be preset, for example, the first feature processing network and the second feature processing network can perform model weight parameter sharing processing at rounds 20, 30 and 60.

[0179] Optionally, the reference training epoch can be dynamic. It can be determined based on at least one of the current model's overall loss, asymmetric angle-margin loss, and network self-supervised loss. For example, a loss value reference range or threshold can be set for at least one of the model's overall loss, asymmetric angle-margin loss, and network self-supervised loss. If at least one of the current epoch's model's overall loss, asymmetric angle-margin loss, and network self-supervised loss meets the loss value reference range or threshold, the next epoch is determined to be a reference training epoch. Conversely, if at least one of the current epoch's model's overall loss, asymmetric angle-margin loss, and network self-supervised loss does not meet the loss value reference range or threshold, the next epoch is determined not to be a reference training epoch. In practical applications, at least one of the current epoch's model's overall loss, asymmetric angle-margin loss, and network self-supervised loss can be obtained in real time and then matched with the loss value reference range or threshold to determine whether the next epoch is a reference training epoch.

[0180] Indicatively, when an electronic device performs the model weight parameter sharing process on the first feature processing network and the second feature processing network based on the reference training epochs, it may be as follows:

[0181] If the number of training rounds of the initial image detection model matches the number of reference training rounds, then the first network feature parameters corresponding to the first feature processing network and the second network feature parameters corresponding to the second feature processing network are obtained, and the model weight parameter sharing processing of the first feature processing network and the second feature processing network is performed based on the first network feature parameters and the second network feature parameters.

[0182] The number of training rounds can be understood as the current number of training rounds of the initial image detection model during the model training process.

[0183] The first network feature parameters can be understood as the model parameters of the first feature processing network, such as weights, gradient values, network unit state values, etc. The second network feature parameters can be understood as the model parameters of the second feature processing network, such as weights, gradient values, network unit state values, etc.

[0184] Optionally, the model weight parameter sharing process can be as follows: perform parameter averaging on the first network feature parameters and the second network feature parameters to obtain the processed reference network feature parameters, and then update the parameters of the first feature processing network and the second feature processing network using the reference network feature parameters.

[0185] Optionally, the model weight parameter sharing process can be as follows: sum the first network feature parameters and the second network feature parameters to obtain the processed summed network feature parameters, then downsample the summed network feature parameters to obtain the processed reference network feature parameters, and then update the parameters of the first feature processing network and the second feature processing network using the reference network feature parameters.

[0186] In one or more embodiments of this specification, the electronic device inputs acquired liveness sample images and attack sample images into an initial image detection model for model training. By controlling the initial image detection model to perform first feature processing on the liveness sample images using a first feature processing network and second feature processing on the attack sample images using a second feature processing network, and adjusting the model parameters based on the asymmetric angle interval loss and network self-supervised loss corresponding to the first and second feature processing networks, an image detection model corresponding to the initial image detection model can be obtained. This avoids the phenomenon of low robustness of the model used for image detection, and can better assist the model in obtaining high-quality image features during the feature processing stage based on the asymmetric angle interval loss and network self-supervised loss, while improving the cross-domain robustness of the model at the data level. Furthermore, the constraint method of using the asymmetric angle interval loss assists the model in achieving end-to-end generalization optimization. Finally, it can improve the image detection effect of the model in complex application scenarios, ensuring the robustness and universality of liveness attack detection.

[0187] The following will combine Figure 7 This manual provides a detailed description of the image processing apparatus provided. It should be noted that... Figure 7 The image processing apparatus shown is used to execute this specification. Figures 1-6 The methods of the embodiments shown are illustrated only in the parts relevant to this specification for ease of explanation. For specific technical details not disclosed, please refer to this specification. Figures 1-6 The example shown.

[0188] Please see Figure 7This diagram illustrates the structure of the image processing apparatus described in this specification. The image processing apparatus 1 can be implemented as all or part of a user terminal through software, hardware, or a combination of both. According to some embodiments, the image processing apparatus 1 includes an image acquisition module 11, a model training module 12, and a parameter adjustment module 13, specifically used for:

[0189] Image acquisition module 11 is used to acquire liveness sample images and attack sample images for the initial image detection model, wherein the initial image detection model includes a first feature processing network and a second feature processing network;

[0190] Model training module 12 is used to input the live sample image and the attack sample image into the initial image detection model for model training, so as to control the initial image detection model to perform first feature processing on the live sample image using a first feature processing network, and to control the initial image detection model to perform second feature processing on the attack sample image using a second feature processing network;

[0191] The parameter adjustment module 13 is used to obtain the asymmetric angle interval loss and network self-supervised loss corresponding to the first feature processing network and the second feature processing network, and adjust the model parameters of the initial image detection model based on the asymmetric angle interval loss and network self-supervised loss to obtain the image detection model corresponding to the initial image detection model.

[0192] Optional, such as Figure 8 As shown, the model training module 12 includes:

[0193] The first network processing unit 121 is used to control the initial image detection model to extract object regions from the live sample image to obtain at least one set of first object region blocks, and to use a first feature processing network to perform first feature extraction processing on the first object region blocks.

[0194] The second network processing unit 122 is configured to: control the initial image detection model to extract object regions from the attack-type sample image to obtain at least one set of second object region blocks, and use a second feature processing network to perform second feature extraction processing on the second object region blocks;

[0195] Optionally, the first network processing unit 121 is configured to: control the initial image detection model to randomly extract object regions from the live sample image using a random block extraction method, thereby obtaining three sets of first object region blocks;

[0196] Optionally, the second network processing unit 122 is configured to: control the initial image detection model to randomly extract object regions from the attack sample image using a random block extraction method, thereby obtaining three sets of second object region blocks.

[0197] Optional, such as Figure 9 As shown, the parameter adjustment module 13 includes:

[0198] The image feature acquisition unit 131 is configured to acquire first image features of the first feature processing network for at least one of the live sample images, wherein the first image features are image features generated by the first feature processing network performing a first feature extraction process on the live sample images; and to acquire second image features of the second feature processing network for at least one of the attack sample images, wherein the second image features are image features generated by the second feature processing network performing a second feature extraction process on the attack sample images.

[0199] The self-supervised loss determination unit 132 is used to determine a first network self-supervised loss for the first feature processing network based on the first image features, and to determine a second network self-supervised loss for the second feature processing network based on the second image features.

[0200] The asymmetric loss determination unit 133 is used to determine the asymmetric angle interval loss corresponding to the first feature processing network and the second feature processing network based on the first image features and the second image features.

[0201] Optionally, the self-supervised loss determination unit 132 is used for:

[0202] Determine a fourth image feature corresponding to each third image feature from all the first image features, wherein the third image feature is one of all the first image features and the fourth image feature is an image feature other than the third image feature from all the first image features; determine a first network self-supervised loss for the first feature processing network based on the third image feature and the fourth image feature using a first loss calculation formula;

[0203] The first loss calculation formula satisfies the following formula:

[0204]

[0205] Wherein, L1 is the self-supervised loss of the first network, and f a For the third image feature, f b The fourth image feature is defined as n1, where n1 is the total number of the first image features.

[0206] Optionally, the self-supervised loss determination unit 132 is used for:

[0207] Determine a sixth image feature corresponding to each fifth image feature from all the second image features, wherein the fifth image feature is one of all the second image features, and the sixth image feature is an image feature other than the fifth image feature from all the second image features; determine a second network self-supervised loss for the second feature processing network based on the fifth image feature and the sixth image feature using a second loss calculation formula;

[0208] The second loss calculation formula satisfies the following formula:

[0209]

[0210] Wherein, L2 is the self-supervised loss of the second network, and f c For the fifth image feature, f d The sixth image feature is n2, and n2 is the total number of the second image features.

[0211] Optionally, the asymmetric loss determination unit 133 is used for:

[0212] Obtain at least one first category label and first image feature corresponding to the live sample image, determine the live angular interval loss and live angular interval loss bias corresponding to the first category label and the first image feature in angular space, and determine the first asymmetric angular interval loss based on the live angular interval loss and live angular interval loss bias.

[0213] Obtain at least one second category label and second image feature corresponding to the attack type sample image, determine the attack type angle interval loss and attack type angle interval loss bias corresponding to the second category label and the second image feature in the angle space, and determine the second asymmetric angle interval loss based on the attack type angle interval loss and attack type angle interval loss bias.

[0214] Based on the first asymmetric angle interval loss and the second asymmetric angle interval loss, the asymmetric angle interval loss corresponding to the first feature processing network and the second feature processing network is determined.

[0215] Optionally, the asymmetric loss determination unit 133 is used for:

[0216] The method involves obtaining the liveness class angle interval parameters and angle scaling hyperparameters, determining the first label parameter features corresponding to the first category label based on the first feature processing network, determining the liveness class angle interval loss corresponding to the first category label and the first image features in the angle space using the third interval loss calculation formula based on the liveness class angle interval parameters, angle scaling hyperparameters, and the first label parameter features, and determining the liveness class angle interval loss bias corresponding to the first category label and the first image features in the angle space using the third interval bias loss calculation formula based on the liveness class angle interval parameters, angle scaling hyperparameters, and the first label parameter features.

[0217] The formula for calculating the third interval loss satisfies the following formula:

[0218]

[0219] Wherein, the L 3a The angular interval loss is the living organism type, where s is the angular scaling hyperparameter, and y is the... i The first category label is the label of the i-th live sample image corresponding to the first image feature, where T is the feature parameter in angular space, and f is the first category label. i The first image feature corresponding to the i-th live sample image, the The first label parameter feature corresponding to the i-th first category label in the angle space, the The m is the label classification angle in angular space corresponding to the liveness class image and the first category label of the first image feature. l The angular interval parameter for the living organism;

[0220] The formula for calculating the third interval bias loss satisfies the following formula:

[0221]

[0222] Wherein, the L 3b The angular interval loss bias of the living class is given, where j is the label number of the j-th label among all the labels of the first category. The first label parameter feature corresponding to the j-th first category label in the angle space, the The label classification angle in the angle space corresponding to the j-th first category label and the i-th live sample image;

[0223] Optionally, the asymmetric loss determination unit 133 is used for:

[0224] The attack class angle interval parameter and angle scaling hyperparameter are obtained. Based on the second feature processing network, the second label parameter feature corresponding to the second category label is determined. Based on the attack class angle interval parameter, angle scaling hyperparameter and the second label parameter feature, the attack class angle interval loss corresponding to the second category label and the second image feature in the angle space is determined by the fourth interval loss calculation formula. Based on the attack class angle interval parameter, angle scaling hyperparameter and the second label parameter feature, the attack class angle interval loss bias corresponding to the second category label and the second image feature in the angle space is determined by the fourth interval bias loss calculation formula.

[0225] The formula for calculating the fourth interval loss satisfies the following formula:

[0226]

[0227] Wherein, the L 4a The angle interval loss is the angle of the attack type, where s is the angle scaling hyperparameter, and Y is the angle spacing loss. i The second category label is the label of the i-th attack class sample image corresponding to the second image feature, where T is the feature parameter in angular space, and F is the second category label. i The second image feature corresponding to the i-th attack class sample image, the The second label parameter features corresponding to the i-th second category label in the angle space, the The m represents the label classification angle in angular space between the attack class image corresponding to the second image feature and the second category label. S The angle interval parameter for the attack type;

[0228] The formula for calculating the fourth interval bias loss satisfies the following formula:

[0229]

[0230] Wherein, the L 4b The angle interval loss bias of the attack class, where J is the label number of the Jth label among all the second category labels, is... The second label parameter features corresponding to the Jth second category label in the angular space, the The label classification angle is the angle space corresponding to the Jth second category label and the ith attack class sample image.

[0231] Optionally, the asymmetric loss determination unit 133 is used for:

[0232] The first asymmetric angle interval loss is determined using the first asymmetric calculation formula based on the aforementioned living-type angle interval loss and the living-type angle interval loss bias.

[0233] The first asymmetric calculation formula satisfies the following formula:

[0234]

[0235] Wherein, the L 3a For the angular interval loss of the living organism, the L 3b For the angular interval loss bias of the living organism, the L 5a This is the first asymmetric angular interval loss;

[0236] Optionally, the asymmetric loss determination unit 133 is used for:

[0237] The second asymmetric angle interval loss is determined using the second asymmetric calculation formula based on the attack-type angle interval loss and the attack-type angle interval loss bias.

[0238] The second asymmetric calculation formula satisfies the following formula:

[0239]

[0240] Wherein, the L 4a For the attack-type angle interval loss, the L 4b For the attack-type angular interval loss bias, the L 5b This is the second asymmetric angular interval loss.

[0241] Optionally, the asymmetric loss determination unit 133 is used for:

[0242] Based on the first asymmetric angle interval loss corresponding to the live sample image and the second asymmetric angle interval loss corresponding to the attack sample image, the sixth asymmetric loss calculation formula is used to determine the asymmetric angle interval loss for the first feature processing network and the second feature processing network.

[0243] The sixth asymmetric loss calculation formula satisfies the following formula:

[0244]

[0245] Wherein, L6 is the asymmetric angular interval loss, N is the total number of sample images corresponding to the liveness class sample images and the attack class sample images, and L... 5a For the second asymmetric angular spacing loss, the L 5b This is the second asymmetric angular interval loss.

[0246] Optionally, the parameter adjustment module 13 is used for:

[0247] Obtain a first weighting factor for the asymmetric angle interval loss and a second weighting factor for the asymmetric angle interval loss. Based on the first weighting factor, the second weighting factor, the asymmetric angle interval loss, and the network self-supervised loss, use the seventh loss calculation formula to determine the model comprehensive loss. Based on the model comprehensive loss, adjust the model parameters of the initial image detection model.

[0248] The seventh loss calculation formula satisfies the following formula:

[0249] L7 = AL ASym +BL ASim

[0250] Wherein, L7 is the model comprehensive loss, A is the first weighting factor, and L... ASym The asymmetric angular interval loss is defined as B, where B is the second weighting factor, and L is the weighting factor. ASim The network self-supervised loss is given.

[0251] Optionally, the device 1 is configured to: during model training, determine at least one reference training epoch for the first feature processing network and the second feature processing network, and perform model weight parameter sharing processing on the first feature processing network and the second feature processing network based on the reference training epoch.

[0252] Optionally, the device 1 is configured to: if the number of training rounds of the initial image detection model matches the number of reference training rounds, then obtain the first network feature parameters corresponding to the first feature processing network and the second network feature parameters corresponding to the second feature processing network, and perform model weight parameter sharing processing on the first feature processing network and the second feature processing network based on the first network feature parameters and the second network feature parameters.

[0253] Optionally, the device 1 is used to: deploy the image detection model to the target transaction environment and acquire the target detection image in the target transaction environment;

[0254] The target detection image is input into the image detection model to extract the target image features corresponding to the target detection image and perform image detection based on the target image features. The output image detection category for the target detection image is then output, and the image detection category includes either a liveness image category or an attack image category.

[0255] It should be noted that the image processing apparatus provided in the above embodiments is only illustrated by the division of the above functional modules when executing the image processing method. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the image processing apparatus and the image processing method embodiments provided in the above embodiments belong to the same concept, and the implementation process can be found in the method embodiments, which will not be repeated here.

[0256] The serial numbers in this specification are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0257] In one or more embodiments of this specification, the electronic device inputs acquired liveness sample images and attack sample images into an initial image detection model for model training. By controlling the initial image detection model to perform first feature processing on the liveness sample images using a first feature processing network and second feature processing on the attack sample images using a second feature processing network, and adjusting the model parameters based on the asymmetric angle interval loss and network self-supervised loss corresponding to the first and second feature processing networks, an image detection model corresponding to the initial image detection model can be obtained. This avoids the phenomenon of low robustness of the model used for image detection, and can better assist the model in obtaining high-quality image features during the feature processing stage based on the asymmetric angle interval loss and network self-supervised loss, while improving the cross-domain robustness of the model at the data level. Furthermore, the constraint method of using the asymmetric angle interval loss assists the model in achieving end-to-end generalization optimization. Finally, it can improve the image detection effect of the model in complex application scenarios, ensuring the robustness and universality of liveness attack detection.

[0258] This specification also provides a computer storage medium capable of storing multiple instructions adapted to be loaded and executed by a processor as described above. Figures 1-6 The image processing method described in the illustrated embodiment can be found in the following document for a detailed execution process. Figures 1-6 The specific details of the illustrated embodiments will not be elaborated here.

[0259] This specification also provides a computer program product that stores at least one instruction, said at least one instruction being loaded and executed by the processor as described above. Figures 1-6 The image processing method described in the illustrated embodiment can be found in the following document for a detailed execution process. Figures 1-6 The specific details of the illustrated embodiments will not be elaborated here.

[0260] Please refer to Figure 10This diagram illustrates a structural block diagram of an electronic device provided in an exemplary embodiment of this specification. The electronic device in this specification may include one or more components such as a processor 110, a memory 120, an input device 130, an output device 140, and a bus 150. The processor 110, memory 120, input device 130, and output device 140 may be connected via the bus 150.

[0261] Processor 110 may include one or more processing cores. Processor 110 connects to various parts of the electronic device via various interfaces and lines, and performs various functions and processes data of electronic device 100 by running or executing instructions, programs, code sets, or instruction sets stored in memory 120, and by calling data stored in memory 120. Optionally, processor 110 may be implemented using at least one hardware form of digital signal processing (DSP), field-programmable gate array (FPGA), or programmable logic array (PLA). Processor 110 may integrate one or more of the following: central processing unit (CPU), graphics processing unit (GPU), and modem. The CPU primarily handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the displayed content; and the modem handles wireless communication. It is understood that the modem may also not be integrated into processor 110 and may be implemented separately through a communication chip.

[0262] The memory 120 may include random access memory (RAM) or read-only memory (ROM). Optionally, the memory 120 may include a non-transitory computer-readable storage medium. The memory 120 may be used to store instructions, programs, code, code sets, or instruction sets. The memory 120 may include a program storage area and a data storage area. The program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (such as touch functionality, sound playback functionality, image playback functionality, etc.), instructions for implementing the various method embodiments described below, etc. The operating system may be the Android system, including systems deeply developed based on the Android system, the iOS system developed by Apple Inc., including systems deeply developed based on the iOS system, or other systems. The data storage area may also store data created by the electronic device during use, such as phonebook data, audio and video data, chat log data, etc.

[0263] See Figure 11 As shown, the memory 120 can be divided into operating system space and user space. The operating system runs in the operating system space, while native and third-party applications run in the user space. To ensure that different third-party applications can achieve good running performance, the operating system allocates corresponding system resources for each application. However, different application scenarios within the same third-party application have different requirements for system resources. For example, in local resource loading scenarios, third-party applications have high requirements for disk read speed; in animation rendering scenarios, third-party applications have high requirements for GPU performance. Since the operating system and third-party applications are independent of each other, the operating system often cannot promptly perceive the current application scenario of a third-party application, resulting in the operating system's inability to adapt system resources accordingly to the specific application scenario of the third-party application.

[0264] In order for the operating system to distinguish the specific application scenarios of third-party applications, it is necessary to establish data communication between the third-party applications and the operating system. This would allow the operating system to obtain the current scenario information of the third-party applications at any time, and then perform targeted system resource adaptation based on the current scenario.

[0265] Taking the Android operating system as an example, the programs and data stored in memory 120 are as follows: Figure 12As shown, the memory 120 can store the Linux kernel layer 320, the system runtime library layer 340, the application framework layer 360, and the application layer 380. The Linux kernel layer 320, system runtime library layer 340, and application framework layer 360 belong to the operating system space, while the application layer 380 belongs to the user space. The Linux kernel layer 320 provides low-level drivers for various hardware components of the electronic device, such as display drivers, audio drivers, camera drivers, Bluetooth drivers, Wi-Fi drivers, and power management. The system runtime library layer 340 provides support for key features of the Android system through several C / C++ libraries. For example, the SQLite library provides database support, the OpenGL / ES library provides 3D graphics support, and the Webkit library provides browser kernel support. The system runtime library layer 340 also provides the Android runtime library, which mainly provides core libraries that allow developers to write Android applications using the Java language. The Application Framework Layer 360 provides various APIs that may be used when building applications. Developers can also use these APIs to build their own applications, such as activity management, window management, view management, notification management, content provider, package management, call management, resource management, and location management. At least one application runs in the Application Layer 380. These applications can be native applications that come with the operating system, such as contacts, SMS, clock, and camera apps; or third-party applications developed by third-party developers, such as games, instant messaging, and photo editing apps.

[0266] Taking the operating system as an example (iOS), the programs and data stored in memory 120 are as follows: Figure 13As shown, the iOS system includes: Core OS layer 420, Core Services layer 440, Media layer 460, and Cocoa Touch layer 480. Core OS layer 420 includes the operating system kernel, drivers, and low-level program frameworks. These low-level program frameworks provide hardware-level functionality for use by the program frameworks located in Core Services layer 440. Core Services layer 440 provides system services and / or program frameworks required by applications, such as Foundation framework, account framework, advertising framework, data storage framework, network connectivity framework, geolocation framework, motion framework, etc. Media layer 460 provides applications with audiovisual interfaces, such as interfaces related to graphics and images, audio technology, video technology, and AirPlay (wireless playback of audio and video transmission technologies). Cocoa Touch layer 480 provides various commonly used interface-related frameworks for application development and is responsible for user touch interaction on electronic devices. Examples include local notification services, remote push services, advertising frameworks, game tool frameworks, message user interface (UI) frameworks, UIKit frameworks, map frameworks, and so on.

[0267] exist Figure 13 The framework shown includes, but is not limited to, the base framework in the core service layer 440 and the UIKit framework in the touchable layer 480. The base framework provides many basic object classes and data types, offering the most basic system services to all applications, and is independent of the UI. The UIKit framework, on the other hand, provides a basic UI class library for creating touch-based user interfaces. iOS applications can use the UIKit framework to provide their UI, thus providing the application's infrastructure for building user interfaces, drawing, handling user interaction events, responding to gestures, and so on.

[0268] The methods and principles for implementing data communication between third-party applications and the operating system in the iOS system can be found in the Android system, and will not be repeated here.

[0269] The input device 130 is used to receive input instructions or data, and includes, but is not limited to, a keyboard, mouse, camera, microphone, or touch device. The output device 140 is used to output instructions or data, and includes, but is not limited to, a display device and a speaker. In one example, the input device 130 and the output device 140 can be combined into a touch screen, which is used to receive touch operations from the user using a finger, stylus, or any suitable object on or near it, and to display the user interface of various applications. The touch screen is usually located on the front panel of the electronic device. The touch screen can be designed as a full-screen, curved screen, or irregularly shaped screen. The touch screen can also be designed as a combination of a full-screen and a curved screen, or a combination of an irregularly shaped screen and a curved screen; this specification does not limit this.

[0270] In addition, those skilled in the art will understand that the structure of the electronic device shown in the above figures does not constitute a limitation on the electronic device. The electronic device may include more or fewer components than shown, or combine certain components, or have different component arrangements. For example, the electronic device may also include radio frequency circuits, input units, sensors, audio circuits, wireless fidelity (WiFi) modules, power supplies, Bluetooth modules, etc., which will not be described in detail here.

[0271] In this specification, the entity executing each step can be the electronic device described above. Optionally, the entity executing each step can be the operating system of the electronic device. The operating system can be Android, iOS, or other operating systems; this specification does not limit this.

[0272] The electronic device described in this manual may also be equipped with a display device. This display device can be any device capable of displaying information, such as a cathode ray tube display (CR), a light-emitting diode display (LED), an e-ink screen, a liquid crystal display (LCD), or a plasma display panel (PDP). Users can use the display device on electronic device 101 to view displayed text, images, videos, and other information. The electronic device may be a smartphone, tablet computer, gaming device, AR (Augmented Reality) device, automobile, data storage device, audio playback device, video playback device, laptop, desktop computing device, or wearable device such as an electronic watch, electronic glasses, electronic helmet, electronic bracelet, electronic necklace, or electronic clothing.

[0273] exist Figure 10 In the illustrated electronic device, the processor 110 can be used to call the application program stored in the memory 120 and specifically perform the following operations:

[0274] Acquire liveness sample images and attack sample images for the initial image detection model, wherein the initial image detection model includes a first feature processing network and a second feature processing network;

[0275] The live sample image and the attack sample image are input into the initial image detection model for model training, so as to control the initial image detection model to perform first feature processing on the live sample image using a first feature processing network, and to control the initial image detection model to perform second feature processing on the attack sample image using a second feature processing network;

[0276] Obtain the asymmetric angular interval loss and network self-supervised loss corresponding to the first feature processing network and the second feature processing network, and adjust the model parameters of the initial image detection model based on the asymmetric angular interval loss and network self-supervised loss to obtain the image detection model corresponding to the initial image detection model.

[0277] In one embodiment, the processor 110, when executing the command to control the initial image detection model to perform first feature processing on the live sample image using a first feature processing network, performs the following steps:

[0278] The initial image detection model is controlled to extract object regions from the live sample image to obtain at least one set of first object region blocks, and a first feature processing network is used to perform first feature extraction processing on the first object region blocks;

[0279] The control of the initial image detection model to perform second feature processing on the attack-type sample image using a second feature processing network includes:

[0280] The initial image detection model is controlled to extract object regions from the attack-type sample image to obtain at least one set of second object region blocks. A second feature processing network is then used to perform second feature extraction processing on the second object region blocks.

[0281] In one embodiment, when the processor 110 executes the control of the initial image detection model to extract object regions from the live sample image and obtains at least one set of first object region blocks, it performs the following steps:

[0282] The initial image detection model is controlled to randomly extract the object region of the live sample image using a random block extraction method, resulting in three sets of first object region blocks;

[0283] The method controls the initial image detection model to extract object regions from the attack-type sample image, obtaining at least one set of second object region blocks, including:

[0284] The initial image detection model is controlled to randomly extract the object region of the attack sample image using a random block extraction method, resulting in three sets of second object region blocks.

[0285] In one embodiment, the processor 110 performs the following steps when acquiring the asymmetric angular spacing loss and network self-supervised loss corresponding to the first feature processing network and the second feature processing network:

[0286] The first image feature of the first feature processing network for at least one of the live sample images is obtained, wherein the first image feature is an image feature generated by the first feature processing network performing a first feature extraction process on the live sample image; and the second image feature of the second feature processing network for at least one of the attack sample images is obtained, wherein the second image feature is an image feature generated by the second feature processing network performing a second feature extraction process on the attack sample image.

[0287] A first network self-supervised loss is determined for the first feature processing network based on the first image features, and a second network self-supervised loss is determined for the second feature processing network based on the second image features.

[0288] Based on the first image features and the second image features, determine the asymmetric angle interval loss corresponding to the first feature processing network and the second feature processing network.

[0289] In one embodiment, the processor 110 performs the following steps when executing the process of determining a first network self-supervised loss for the first feature processing network based on the first image features:

[0290] Determine a fourth image feature corresponding to each third image feature from all the first image features, wherein the third image feature is one of all the first image features and the fourth image feature is an image feature other than the third image feature from all the first image features; determine a first network self-supervised loss for the first feature processing network based on the third image feature and the fourth image feature using a first loss calculation formula;

[0291] The first loss calculation formula satisfies the following formula:

[0292]

[0293] Wherein, L1 is the self-supervised loss of the first network, and f a For the third image feature, f b The fourth image feature is defined as n1, where n1 is the total number of the first image features.

[0294] The step of determining the second network self-supervised loss for the second feature processing network based on the second image features involves the following steps:

[0295] Determine a sixth image feature corresponding to each fifth image feature from all the second image features, wherein the fifth image feature is one of all the second image features, and the sixth image feature is an image feature other than the fifth image feature from all the second image features; determine a second network self-supervised loss for the second feature processing network based on the fifth image feature and the sixth image feature using a second loss calculation formula;

[0296] The second loss calculation formula satisfies the following formula:

[0297]

[0298] Wherein, L2 is the self-supervised loss of the second network, and f c For the fifth image feature, f d The sixth image feature is n2, and n2 is the total number of the second image features.

[0299] In one embodiment, the processor 110 performs the following steps when determining the asymmetric angular interval loss corresponding to the first feature processing network and the second feature processing network based on the first image features and the second image features:

[0300] Obtain at least one first category label and first image feature corresponding to the live sample image, determine the live angular interval loss and live angular interval loss bias corresponding to the first category label and the first image feature in angular space, and determine the first asymmetric angular interval loss based on the live angular interval loss and live angular interval loss bias.

[0301] Obtain at least one second category label and second image feature corresponding to the attack type sample image, determine the attack type angle interval loss and attack type angle interval loss bias corresponding to the second category label and the second image feature in the angle space, and determine the second asymmetric angle interval loss based on the attack type angle interval loss and attack type angle interval loss bias.

[0302] Based on the first asymmetric angle interval loss and the second asymmetric angle interval loss, the asymmetric angle interval loss corresponding to the first feature processing network and the second feature processing network is determined.

[0303] In one embodiment, the processor 110, when performing the steps of determining the liveness class angular interval loss and the liveness class angular interval loss bias corresponding to the first category label and the first image feature in angular space, and determining the first asymmetric angular interval loss based on the liveness class angular interval loss and the liveness class angular interval loss bias, executes the following steps:

[0304] The method involves obtaining the liveness class angle interval parameters and angle scaling hyperparameters, determining the first label parameter features corresponding to the first category label based on the first feature processing network, determining the liveness class angle interval loss corresponding to the first category label and the first image features in the angle space using the third interval loss calculation formula based on the liveness class angle interval parameters, angle scaling hyperparameters, and the first label parameter features, and determining the liveness class angle interval loss bias corresponding to the first category label and the first image features in the angle space using the third interval bias loss calculation formula based on the liveness class angle interval parameters, angle scaling hyperparameters, and the first label parameter features.

[0305] The formula for calculating the third interval loss satisfies the following formula:

[0306]

[0307] Wherein, the L 3a The angular interval loss is the living organism type, where s is the angular scaling hyperparameter, and y is the... i The first category label is the label of the i-th live sample image corresponding to the first image feature, where T is the feature parameter in angular space, and f is the first category label. i The first image feature corresponding to the i-th live sample image, the The first label parameter feature corresponding to the i-th first category label in the angle space, the The m is the label classification angle in angular space corresponding to the liveness class image and the first category label of the first image feature. l The angular interval parameter for the living organism;

[0308] The formula for calculating the third interval bias loss satisfies the following formula:

[0309]

[0310] Wherein, the L 3bThe angular interval loss bias of the living class is given, where j is the label number of the j-th label among all the labels of the first category. The first label parameter feature corresponding to the j-th first category label in the angle space, the The label classification angle in the angle space corresponding to the j-th first category label and the i-th live sample image;

[0311] The steps to determine the attack-type angle interval loss and attack-type angle interval loss bias corresponding to the second category label and the second image feature in the angle space, and to determine the second asymmetric angle interval loss based on the attack-type angle interval loss and attack-type angle interval loss bias, are as follows:

[0312] The attack class angle interval parameter and angle scaling hyperparameter are obtained. Based on the second feature processing network, the second label parameter feature corresponding to the second category label is determined. Based on the attack class angle interval parameter, angle scaling hyperparameter and the second label parameter feature, the attack class angle interval loss corresponding to the second category label and the second image feature in the angle space is determined by the fourth interval loss calculation formula. Based on the attack class angle interval parameter, angle scaling hyperparameter and the second label parameter feature, the attack class angle interval loss bias corresponding to the second category label and the second image feature in the angle space is determined by the fourth interval bias loss calculation formula.

[0313] The formula for calculating the fourth interval loss satisfies the following formula:

[0314]

[0315] Wherein, the L 4a The angle interval loss is the angle of the attack type, where s is the angle scaling hyperparameter, and Y is the angle spacing loss. i The second category label is the label of the i-th attack class sample image corresponding to the second image feature, where T is the feature parameter in angular space, and F is the second category label. i The second image feature corresponding to the i-th attack class sample image, the The second label parameter features corresponding to the i-th second category label in the angle space, the The m represents the label classification angle in angular space between the attack class image corresponding to the second image feature and the second category label. S The angle interval parameter for the attack type;

[0316] The formula for calculating the fourth interval bias loss satisfies the following formula:

[0317]

[0318] Wherein, the L 4b The angle interval loss bias of the attack class, where J is the label number of the Jth label among all the second category labels, is... The second label parameter features corresponding to the Jth second category label in the angular space, the The label classification angle is the angle space corresponding to the Jth second category label and the ith attack class sample image.

[0319] In one embodiment, the processor 110 performs the following steps when determining the first asymmetric angular interval loss based on the liveness class angular interval loss and the liveness class angular interval loss bias:

[0320] The first asymmetric angle interval loss is determined using the first asymmetric calculation formula based on the aforementioned living-type angle interval loss and the living-type angle interval loss bias.

[0321] The first asymmetric calculation formula satisfies the following formula:

[0322]

[0323] Wherein, the L 3a For the angular interval loss of the living organism, the L 3b For the angular interval loss bias of the living organism, the L 5a This is the first asymmetric angular interval loss;

[0324] The second asymmetric angle interval loss is determined based on the attack-type angle interval loss and the attack-type angle interval loss bias, and the following steps are performed:

[0325] The second asymmetric angle interval loss is determined using the second asymmetric calculation formula based on the attack-type angle interval loss and the attack-type angle interval loss bias.

[0326] The second asymmetric calculation formula satisfies the following formula:

[0327]

[0328] Wherein, the L 4a For the attack-type angle interval loss, the L 4b For the attack-type angular interval loss bias, the L 5b This is the second asymmetric angular interval loss.

[0329] In one embodiment, the processor 110, when executing the process of determining the asymmetric angle interval loss corresponding to the first feature processing network and the second feature processing network based on the first asymmetric angle interval loss and the second asymmetric angle interval loss, performs the following steps:

[0330] Based on the first asymmetric angle interval loss corresponding to the live sample image and the second asymmetric angle interval loss corresponding to the attack sample image, the sixth asymmetric loss calculation formula is used to determine the asymmetric angle interval loss for the first feature processing network and the second feature processing network.

[0331] The sixth asymmetric loss calculation formula satisfies the following formula:

[0332]

[0333] Wherein, L6 is the asymmetric angular interval loss, N is the total number of sample images corresponding to the liveness class sample images and the attack class sample images, and L... 5a For the second asymmetric angular spacing loss, the L 5b This is the second asymmetric angular interval loss.

[0334] In one embodiment, the processor 110 performs the following steps when adjusting the model parameters of the initial image detection model based on the asymmetric angular interval loss and the network self-supervised loss:

[0335] Obtain a first weighting factor for the asymmetric angle interval loss and a second weighting factor for the asymmetric angle interval loss. Based on the first weighting factor, the second weighting factor, the asymmetric angle interval loss, and the network self-supervised loss, use the seventh loss calculation formula to determine the model comprehensive loss. Based on the model comprehensive loss, adjust the model parameters of the initial image detection model.

[0336] The seventh loss calculation formula satisfies the following formula:

[0337] L7 = AL ASym +BL ASim

[0338] Wherein, L7 is the model comprehensive loss, A is the first weighting factor, and L... ASym The asymmetric angular interval loss is defined as B, where B is the second weighting factor, and L is the weighting factor. ASim The network self-supervised loss is given.

[0339] In one embodiment, the processor 110, while executing the image processing method, also performs the following steps:

[0340] During model training, at least one reference training epoch is determined for the first feature processing network and the second feature processing network, and model weight parameter sharing processing is performed on the first feature processing network and the second feature processing network based on the reference training epoch.

[0341] In one embodiment, the processor 110 performs the following steps when executing the model weight parameter sharing processing of the first feature processing network and the second feature processing network based on the reference training epochs:

[0342] If the number of training rounds of the initial image detection model matches the number of reference training rounds, then the first network feature parameters corresponding to the first feature processing network and the second network feature parameters corresponding to the second feature processing network are obtained, and the model weight parameter sharing processing of the first feature processing network and the second feature processing network is performed based on the first network feature parameters and the second network feature parameters.

[0343] In one embodiment, after executing the image detection model corresponding to the initial image detection model, the processor 110 performs the following steps:

[0344] The image detection model is deployed to the target transaction environment to obtain the target detection image in the target transaction environment;

[0345] The target detection image is input into the image detection model to extract the target image features corresponding to the target detection image and perform image detection based on the target image features. The output image detection category for the target detection image is then output, and the image detection category includes either a liveness image category or an attack image category.

[0346] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory, or random access memory, etc.

[0347] It should be noted that the information (including but not limited to user device information, user personal information, etc.), data (including but not limited to data used for analysis, stored data, displayed data, etc.), and signals involved in the embodiments of this specification are all authorized by the user or fully authorized by all parties, and the collection, use, and processing of related data must comply with the relevant laws, regulations, and standards of the relevant countries and regions. For example, the live sample images, attack sample images, and target detection images involved in this specification were all obtained under full authorization.

[0348] The above-disclosed embodiments are merely preferred embodiments of this specification and should not be construed as limiting the scope of this specification. Therefore, any equivalent variations made in accordance with the claims of this specification shall still fall within the scope of this specification.

Claims

1. An image processing method, the method comprising: Acquire liveness sample images and attack sample images for the initial image detection model, wherein the initial image detection model includes a first feature processing network and a second feature processing network; The live sample image and the attack sample image are input into the initial image detection model for model training, so as to control the initial image detection model to perform first feature processing on the live sample image using a first feature processing network, and to control the initial image detection model to perform second feature processing on the attack sample image using a second feature processing network; The first image feature of the first feature processing network for at least one of the live sample images is obtained, wherein the first image feature is an image feature generated by the first feature processing network performing a first feature extraction process on the live sample image; and the second image feature of the second feature processing network for at least one of the attack sample images is obtained, wherein the second image feature is an image feature generated by the second feature processing network performing a second feature extraction process on the attack sample image. A first network self-supervised loss is determined for the first feature processing network based on the first image features, and a second network self-supervised loss is determined for the second feature processing network based on the second image features. Obtain at least one first category label and first image feature corresponding to the live sample image; determine the live class angle interval loss and live class angle interval loss bias corresponding to the first category label and the first image feature in angle space; determine a first asymmetric angle interval loss based on the live class angle interval loss and live class angle interval loss bias; obtain at least one second category label and second image feature corresponding to the attack sample image; determine the attack class angle interval loss and attack class angle interval loss bias corresponding to the second category label and the second image feature in angle space; determine a second asymmetric angle interval loss based on the attack class angle interval loss and attack class angle interval loss bias; determine asymmetric angle interval loss corresponding to the first feature processing network and the second feature processing network based on the first asymmetric angle interval loss and the second asymmetric angle interval loss. The model parameters of the initial image detection model are adjusted based on the asymmetric angle interval loss and the network self-supervised loss to obtain the image detection model corresponding to the initial image detection model. The network self-supervised loss includes a first network self-supervised loss and a second network self-supervised loss.

2. The method according to claim 1, wherein controlling the initial image detection model to perform first feature processing on the liveness sample image using a first feature processing network comprises: The initial image detection model is controlled to extract object regions from the live sample image to obtain at least one set of first object region blocks, and a first feature processing network is used to perform first feature extraction processing on the first object region blocks; The control of the initial image detection model to perform second feature processing on the attack-type sample image using a second feature processing network includes: The initial image detection model is controlled to extract object regions from the attack-type sample image to obtain at least one set of second object region blocks. A second feature processing network is then used to perform second feature extraction processing on the second object region blocks.

3. The method according to claim 2, wherein controlling the initial image detection model to extract object regions from the live sample image to obtain at least one set of first object region blocks includes: The initial image detection model is controlled to randomly extract the object region of the live sample image using a random block extraction method, resulting in three sets of first object region blocks; The method controls the initial image detection model to extract object regions from the attack-type sample image, obtaining at least one set of second object region blocks, including: The initial image detection model is controlled to randomly extract the object region of the attack sample image using a random block extraction method, resulting in three sets of second object region blocks.

4. The method according to claim 1, wherein determining the first network self-supervised loss for the first feature processing network based on the first image features comprises: Determine a fourth image feature corresponding to each third image feature from all the first image features, wherein the third image feature is one of all the first image features and the fourth image feature is an image feature other than the third image feature from all the first image features; determine a first network self-supervised loss for the first feature processing network based on the third image feature and the fourth image feature using a first loss calculation formula; The first loss calculation formula satisfies the following formula: Wherein, L1 is the self-supervised loss of the first network, and f a For the third image feature, f b The fourth image feature is defined as n1, where n1 is the total number of the first image features. The step of determining the second network self-supervised loss for the second feature processing network based on the second image features includes: Determine a sixth image feature corresponding to each fifth image feature from all the second image features, wherein the fifth image feature is one of all the second image features, and the sixth image feature is an image feature other than the fifth image feature from all the second image features; determine a second network self-supervised loss for the second feature processing network based on the fifth image feature and the sixth image feature using a second loss calculation formula; The second loss calculation formula satisfies the following formula: Wherein, L2 is the self-supervised loss of the second network, and f c For the fifth image feature, f d The sixth image feature is n2, and n2 is the total number of the second image features.

5. The method according to claim 1, wherein determining the liveness class angular interval loss and the liveness class angular interval loss bias corresponding to the first category label and the first image feature in angular space, and determining the first asymmetric angular interval loss based on the liveness class angular interval loss and the liveness class angular interval loss bias, comprises: The method involves obtaining the liveness class angle interval parameters and angle scaling hyperparameters, determining the first label parameter features corresponding to the first category label based on the first feature processing network, determining the liveness class angle interval loss corresponding to the first category label and the first image features in the angle space using the third interval loss calculation formula based on the liveness class angle interval parameters, angle scaling hyperparameters, and the first label parameter features, and determining the liveness class angle interval loss bias corresponding to the first category label and the first image features in the angle space using the third interval bias loss calculation formula based on the liveness class angle interval parameters, angle scaling hyperparameters, and the first label parameter features. The formula for calculating the third interval loss satisfies the following formula: . Wherein, the L 3a The angular interval loss is the loss for the living organism, where s is the angular scaling hyperparameter, and y is the loss for the living organism. i The first category label is the first category label annotated for the i-th liveness sample image corresponding to the first image feature, where T is the feature parameter in angular space, and f is the first category label for the i-th liveness sample image corresponding to the first image feature. i The first image feature corresponding to the i-th live sample image, the The first label parameter feature corresponding to the i-th first category label in the angle space, the The m represents the label classification angle in angular space between the liveness class image corresponding to the first image feature and the first category label. l The angular interval parameter for the living organism; The formula for calculating the third interval bias loss satisfies the following formula: Wherein, the L 3b The angular interval loss bias of the living class is given, where j is the label number of the j-th label among all the labels of the first category. The first label parameter feature corresponding to the j-th first category label in the angle space, the The label classification angle in the angle space corresponding to the j-th first category label and the i-th live sample image; The step of determining the attack-class angular interval loss and the attack-class angular interval loss bias corresponding to the second category label and the second image feature in angular space, and determining the second asymmetric angular interval loss based on the attack-class angular interval loss and the attack-class angular interval loss bias, includes: The attack class angle interval parameter and angle scaling hyperparameter are obtained. Based on the second feature processing network, the second label parameter feature corresponding to the second category label is determined. Based on the attack class angle interval parameter, angle scaling hyperparameter and the second label parameter feature, the attack class angle interval loss corresponding to the second category label and the second image feature in the angle space is determined by the fourth interval loss calculation formula. Based on the attack class angle interval parameter, angle scaling hyperparameter and the second label parameter feature, the attack class angle interval loss bias corresponding to the second category label and the second image feature in the angle space is determined by the fourth interval bias loss calculation formula. The formula for calculating the fourth interval loss satisfies the following formula: . Wherein, the L 4a The angle interval loss is the angle of the attack type, where s is the angle scaling hyperparameter, and Y is the angle spacing loss. i The second category label is the label of the i-th attack class sample image corresponding to the second image feature, where T is the feature parameter in angular space, and F is the second category label. i The second image feature corresponding to the i-th attack class sample image, the The second label parameter features corresponding to the i-th second category label in the angle space, the The m represents the label classification angle in angular space between the attack class image corresponding to the second image feature and the second category label. S The angle interval parameter for the attack type; The formula for calculating the fourth interval bias loss satisfies the following formula: Wherein, the L 4b The angle interval loss bias of the attack class, where J is the label number of the Jth label among all the second category labels, is... The second label parameter features corresponding to the Jth second category label in the angular space, the The label classification angle is the angle space corresponding to the Jth second category label and the ith attack class sample image.

6. The method according to claim 5, wherein determining the first asymmetric angular interval loss based on the living class angular interval loss and the living class angular interval loss bias comprises: The first asymmetric angle interval loss is determined using the first asymmetric calculation formula based on the aforementioned living-type angle interval loss and the living-type angle interval loss bias. The first asymmetric calculation formula satisfies the following formula: Wherein, the L 3a For the living angular interval loss, the L 3b For the angular interval loss bias of the living organism, the L 5a This is the first asymmetric angular interval loss; The determination of the second asymmetric angle interval loss based on the attack-type angle interval loss and the attack-type angle interval loss bias includes: The second asymmetric angle interval loss is determined using the second asymmetric calculation formula based on the attack-type angle interval loss and the attack-type angle interval loss bias. The second asymmetric calculation formula satisfies the following formula: Wherein, the L 4a For the attack-type angle interval loss, the L 4b For the attack-type angular interval loss bias, the L 5b This is the second asymmetric angular interval loss.

7. The method according to claim 5, wherein determining the asymmetric angle interval loss corresponding to the first feature processing network and the second feature processing network based on the first asymmetric angle interval loss and the second asymmetric angle interval loss includes: Based on the first asymmetric angle interval loss corresponding to the live sample image and the second asymmetric angle interval loss corresponding to the attack sample image, the sixth asymmetric loss calculation formula is used to determine the asymmetric angle interval loss for the first feature processing network and the second feature processing network. The sixth asymmetric loss calculation formula satisfies the following formula: Wherein, L6 is the asymmetric angular interval loss, N is the total number of sample images corresponding to the liveness class sample images and the attack class sample images, and L... 5a For the second asymmetric angular spacing loss, the L 5b This is the second asymmetric angular interval loss.

8. The method according to claim 1, wherein adjusting the model parameters of the initial image detection model based on the asymmetric angular interval loss and the network self-supervised loss comprises: Obtain a first weighting factor for the asymmetric angle interval loss and a second weighting factor for the asymmetric angle interval loss. Based on the first weighting factor, the second weighting factor, the asymmetric angle interval loss, and the network self-supervised loss, use the seventh loss calculation formula to determine the model comprehensive loss. Based on the model comprehensive loss, adjust the model parameters of the initial image detection model. The seventh loss calculation formula satisfies the following formula: L7=AL ASym +BL ASim Wherein, L7 is the model comprehensive loss, A is the first weighting factor, and L... ASym The asymmetric angular interval loss is defined as B, where B is the second weighting factor, and L is the weighting factor. ASim The network self-supervised loss is given.

9. The method according to claim 1, further comprising: During model training, at least one reference training epoch is determined for the first feature processing network and the second feature processing network, and model weight parameter sharing processing is performed on the first feature processing network and the second feature processing network based on the reference training epoch.

10. The method according to claim 9, wherein the step of sharing model weight parameters between the first feature processing network and the second feature processing network based on the reference training epochs comprises: If the number of training rounds of the initial image detection model matches the number of reference training rounds, then the first network feature parameters corresponding to the first feature processing network and the second network feature parameters corresponding to the second feature processing network are obtained, and the model weight parameter sharing processing of the first feature processing network and the second feature processing network is performed based on the first network feature parameters and the second network feature parameters.

11. The method according to any one of claims 1-10, wherein after obtaining the image detection model corresponding to the initial image detection model, it further comprises: The image detection model is deployed to the target transaction environment to obtain the target detection image in the target transaction environment. The target detection image is input into the image detection model to extract the target image features corresponding to the target detection image and perform image detection based on the target image features. The output image detection category for the target detection image is then output, and the image detection category includes either a liveness image category or an attack image category.

12. An image processing apparatus, the apparatus comprising: The image acquisition module is used to acquire liveness sample images and attack sample images for the initial image detection model, wherein the initial image detection model includes a first feature processing network and a second feature processing network. The model training module is used to input the live sample image and the attack sample image into the initial image detection model for model training, so as to control the initial image detection model to perform first feature processing on the live sample image using a first feature processing network, and to control the initial image detection model to perform second feature processing on the attack sample image using a second feature processing network; The parameter adjustment module is used for: The first image feature of the first feature processing network for at least one of the live sample images is obtained, wherein the first image feature is an image feature generated by the first feature processing network performing a first feature extraction process on the live sample image; and the second image feature of the second feature processing network for at least one of the attack sample images is obtained, wherein the second image feature is an image feature generated by the second feature processing network performing a second feature extraction process on the attack sample image. A first network self-supervised loss is determined for the first feature processing network based on the first image features, and a second network self-supervised loss is determined for the second feature processing network based on the second image features. The process involves: acquiring at least one first category label and first image feature corresponding to a live sample image; determining the live class angular interval loss and live class angular interval loss bias corresponding to the first category label and the first image feature in angular space; determining a first asymmetric angular interval loss based on the live class angular interval loss and live class angular interval loss bias; acquiring at least one second category label and second image feature corresponding to an attack sample image; determining the attack class angular interval loss and attack class angular interval loss bias corresponding to the second category label and the second image feature in angular space; determining a second asymmetric angular interval loss based on the attack class angular interval loss and attack class angular interval loss bias; determining asymmetric angular interval losses for the first feature processing network and the second feature processing network based on the first asymmetric angular interval loss and the second asymmetric angular interval loss; and adjusting the model parameters of the initial image detection model based on the asymmetric angular interval loss and the network self-supervised loss to obtain the image detection model corresponding to the initial image detection model. The network self-supervised loss includes a first network self-supervised loss and a second network self-supervised loss.

13. A computer storage medium storing a plurality of instructions adapted for loading by a processor and executing the method steps of any one of claims 1 to 11.

14. A computer program product storing at least one instruction, said at least one instruction being loaded by a processor and executing the method steps of any one of claims 1 to 11.

15. An electronic device comprising: A processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and executed the method steps as claimed in any one of claims 1 to 11.