Attack detection method, electronic device, and computer-readable medium

By extracting block features from the face image under test and processing them according to rows and columns to generate row features and column features, the method solves the problems of low accuracy and slow speed in the existing technology for detecting attacks on face images with edge differences, and achieves more efficient attack detection.

CN115620359BActive Publication Date: 2026-06-26YUANLI JINZHI (CHONGQING) TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
YUANLI JINZHI (CHONGQING) TECHNOLOGY CO LTD
Filing Date
2022-09-23
Publication Date
2026-06-26

Smart Images

  • Figure CN115620359B_ABST
    Figure CN115620359B_ABST
Patent Text Reader

Abstract

Embodiments of the application disclose an attack detection method, an electronic device and a computer readable medium. An embodiment of the method comprises: inputting a to-be-tested face image into an attack detection model; dividing the to-be-tested face image into a plurality of image blocks by the attack detection model, extracting block features of the image blocks, processing the block features of the rows and columns of the image blocks, obtaining row features and column features of the to-be-tested face image, processing the row features and the column features, and obtaining an attack detection result; the attack detection result is used to indicate whether the to-be-tested face image is an attack face image that has a difference in image edges from a real face image. The embodiment can effectively defend against edge difference type face attacks and improve the speed and accuracy of attack detection.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, specifically to attack detection methods, electronic devices, and computer-readable media. Background Technology

[0002] With the development of computer technology, the application scenarios of facial recognition and related technologies are becoming increasingly diverse. Often, criminals attempt to bypass facial recognition systems by forging facial images. Based on this, liveness detection technology has emerged.

[0003] In existing technologies, attack detection models can be trained for attack detection. However, these models can only effectively identify attack data such as screen re-images and mask-like images. For attack images with edge differences obtained by processing real facial images (referred to as edge-difference attack images), they cannot be effectively identified because the facial regions are similar to real facial images and there is no obvious editing, resulting in low attack detection accuracy. Furthermore, the computational cost of attack detection models is high, leading to slow attack detection speed. Summary of the Invention

[0004] This application provides an attack detection method, electronic device, and computer-readable medium to address the technical problems of low speed and accuracy in attack detection in the prior art.

[0005] In a first aspect, embodiments of this application provide an attack detection method, which includes: inputting a face image to be tested into an attack detection model; dividing the face image to be tested into multiple image blocks through the attack detection model, extracting block features of each image block, and processing the block features of each row and each column of image blocks to obtain row features and column features of the face image to be tested; processing the row features and the column features to obtain an attack detection result; the attack detection result is used to indicate whether the face image to be tested is an attack face image that differs from the image edges of a real face image.

[0006] In a second aspect, embodiments of this application provide an electronic device, including: one or more processors; and a storage device having one or more programs stored thereon, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the method described in the first aspect.

[0007] Thirdly, embodiments of this application provide a computer-readable medium having a computer program stored thereon that, when executed by a processor, implements the method described in the first aspect.

[0008] Fourthly, embodiments of this application provide a computer program product, including a computer program that, when executed by a processor, implements the method described in the first aspect.

[0009] The attack detection method, electronic device, and computer-readable medium provided in this application divide the face image to be tested into multiple image blocks using an attack detection model, extracts the block features of each image block, and then processes the block features of each row and column of image blocks to obtain the row features and column features of the face image to be tested. Finally, the row features and column features are processed to obtain the attack detection result. On the one hand, given that edge-difference type attack face images do not have obvious editing in the face region, this application embodiment extracts block features from the face image to be tested by dividing it into blocks and combining the block features according to rows and columns. This allows the attack detection model to focus on the relationship between each image block and other image blocks in the face image to be tested. Compared with attack detection models that only focus on the global features of the image or the features of the face region, this is more in line with the characteristics of edge-difference type attack face images, thus effectively defending against edge-difference type face attacks and improving the accuracy of attack detection. On the other hand, dividing the face image to be tested into blocks reduces the amount of image data calculated each time, thereby greatly reducing the amount of computation and effectively improving the running speed of the model when deployed and used. Attached Figure Description

[0010] Other features, objects, and advantages of this application will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings:

[0011] Figure 1 This is a flowchart of an embodiment of the attack detection method according to this application;

[0012] Figure 2 This is a structural diagram of the attack detection model based on this application;

[0013] Figure 3 This is a schematic diagram of a process for generating an attack face sample image according to this application;

[0014] Figure 4 This is a schematic diagram illustrating yet another generation process of the attack face sample image according to this application;

[0015] Figure 5 This is a schematic diagram illustrating the process of generating an attack face sample image according to this application;

[0016] Figure 6 This is a schematic diagram illustrating the processing procedure of the attack detection method according to this application;

[0017] Figure 7This is a schematic diagram of one embodiment of the attack detection device according to this application;

[0018] Figure 8 This is a schematic diagram of the structure of a computer system used to implement the electronic device of the present application. Detailed Implementation

[0019] The present application will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it. Furthermore, it should be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings.

[0020] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. This application will now be described in detail with reference to the accompanying drawings and embodiments.

[0021] It should be noted that all actions involving the acquisition of signals, information, or data in this application are carried out in compliance with the relevant data protection laws and policies of the country where the application is located, and with the authorization granted by the owner of the relevant device.

[0022] In recent years, significant progress has been made in research on technologies based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition. Artificial intelligence (AI) is an emerging science and technology that studies and develops theories, methods, technologies, and application systems to simulate and extend human intelligence. AI is a comprehensive discipline involving numerous technologies, including chips, big data, cloud computing, the Internet of Things, distributed storage, deep learning, machine learning, and neural networks. Computer vision, as an important branch of AI, specifically enables machines to recognize the world. Computer vision technologies typically include face recognition, attack detection, fingerprint recognition and anti-counterfeiting verification, biometric recognition, face detection, pedestrian detection, object detection, image processing, image recognition, image semantic understanding, image retrieval, text recognition, video processing, video content recognition, 3D reconstruction, virtual reality, augmented reality, simultaneous localization and mapping (SLAM), computational photography, and robot navigation and localization. With the research and advancement of artificial intelligence technology, this technology has been applied in numerous fields, such as urban management, traffic management, building management, park management, facial recognition access control, facial recognition attendance, logistics management, warehouse management, robotics, intelligent marketing, computational photography, mobile imaging, cloud services, smart homes, wearable devices, autonomous driving, autonomous driving, smart healthcare, facial payment, facial unlocking, fingerprint unlocking, identity verification, smart screens, smart TVs, cameras, mobile internet, live streaming, beauty filters, cosmetics, medical aesthetics, and intelligent temperature measurement.

[0023] Furthermore, biometric technology has been widely applied to various terminal devices and electronic devices. Biometric recognition technologies include, but are not limited to, fingerprint recognition, palm print recognition, vein recognition, iris recognition, face recognition, liveness detection, and anti-counterfeiting technologies. Among them, fingerprint recognition typically includes optical fingerprint recognition, capacitive fingerprint recognition, and ultrasonic fingerprint recognition. With the rise of full-screen technology, fingerprint recognition modules can be placed in a partial or complete area under the display screen, thus forming under-display optical fingerprint recognition; alternatively, the optical fingerprint recognition module can be partially or completely integrated into the display screen of the electronic device, thus forming in-display optical fingerprint recognition. The aforementioned display screen can be an organic light-emitting diode (OLED) display screen or a liquid crystal display (LCD), etc. Fingerprint recognition methods typically include steps such as fingerprint image acquisition, preprocessing, feature extraction, and feature matching. Some or all of the above steps can be implemented using traditional computer vision (CV) algorithms or deep learning algorithms based on artificial intelligence (AI). Fingerprint recognition technology can be applied to portable or mobile terminals such as smartphones, tablets, and gaming devices, as well as other electronic devices such as smart door locks, cars, and bank ATMs, for fingerprint unlocking, fingerprint payment, fingerprint attendance, and identity authentication.

[0024] In attack detection scenarios, an attack detection model is typically pre-trained and used for attack detection. However, existing attack detection models can only effectively defend against attacks such as screen replays and mask attacks, and cannot effectively identify facial images with edge differences, resulting in low attack detection accuracy. Furthermore, the computational cost of these attack detection models is high, leading to slow detection speed. This application provides an attack detection method that can effectively defend against facial attacks with edge differences, improving both the speed and accuracy of attack detection.

[0025] Please refer to Figure 1This document illustrates a flow 100 of an embodiment of the attack detection method according to this application. This attack detection method can be applied to various electronic devices, including but not limited to smartphones, tablets, e-book readers, MP3 (Moving Picture Experts Group Audio Layer III) players, MP4 (Moving Picture Experts Group Audio Layer IV) players, laptops, in-vehicle computers, PDAs, desktop computers, set-top boxes, smart TVs, wearable devices, and display panels. The attack detection method includes the following steps:

[0026] Step 101: Input the face image to be tested into the attack detection model.

[0027] Step 102: Divide the face image to be tested into multiple image blocks using the attack detection model, extract the block features of each image block, and process the block features of each row and column image block to obtain the row features and column features of the face image to be tested. Process the row features and column features to obtain the attack detection result.

[0028] In this embodiment, an attack detection model for attack detection can be pre-stored in the electronic device. This model can be trained using machine learning methods (e.g., supervised learning). The model can be used to divide a test face image into multiple image blocks, extract block features from each image block, and process the block features of each row and column of image blocks to obtain row and column features of the test face image. Furthermore, it can be used to process the row and column features to obtain an attack detection result. The attack detection result indicates whether the test face image is an attack face image whose image edges differ from those of a real face image.

[0029] In some alternative implementations, see [link to relevant documentation]. Figure 2 The diagram shows the structure of an attack detection model. This attack detection model may include, but is not limited to, feature extraction networks, sequence processing networks, and classification networks.

[0030] The aforementioned feature extraction network can be used for image segmentation and to extract block features from each segmented image patch. Block features can be represented as feature maps. In practice, the feature extraction network can employ Convolutional Neural Networks (CNNs), which may include at least one convolutional layer. The specific network structure of the feature extraction network is not limited here.

[0031] The sequence processing network described above can be used to process the block features of each row of image blocks to obtain the row features of the face image under test, and to process the block features of each column of image blocks to obtain the column features of the face image under test. Both row and column features can be represented in the form of feature vectors. In practice, the sequence processing network can employ neural networks for processing sequence data, such as, but not limited to, recurrent neural networks (RNNs) and gated recurrent unit (GRU) networks. The sequence processing network can process features in sequence form and output features in another sequence form.

[0032] The aforementioned classification network can be used to process row and column features to output attack detection results. These results can indicate whether an image input to the attack detection model is an attack image of the edge difference type. In practice, the classification network can be a network structure used to implement binary classification; for example, it may include at least one linear layer (e.g., a fully connected layer).

[0033] The execution unit of the attack detection method (such as the processing device in the aforementioned electronic device) can first divide the face image to be tested into multiple image blocks. For example, if the original image is 64×64 pixels, it can be divided into 16 image blocks of 16×16 pixels. It should be noted that the size of the image blocks can be set as needed and is not limited here.

[0034] After dividing the face image to be tested into multiple image blocks, block features of each image block can be extracted based on the feature extraction network in the pre-trained attack detection model. Then, the block features of each row and column image block can be processed based on the sequence processing network in the attack detection model to obtain the row and column features of the face image to be tested. Finally, the row and column features can be processed based on the classification network in the attack detection model to obtain the attack detection result. Optionally, the classification network may include a feature concatenation layer and a fully connected layer. The aforementioned row and column features can be input into the classification network, and the features concatenated therein can be used to obtain the target features. Then, the target features are input into the fully connected layer to obtain the attack detection result.

[0035] The method provided in the above embodiments of this application divides the face image to be tested into multiple image blocks using an attack detection model, extracts the block features of each image block, and then processes the block features of each row and column of image blocks to obtain the row features and column features of the face image to be tested. Finally, the row features and column features are processed to obtain the attack detection result. On the one hand, given that edge-difference type attack face images do not have obvious editing in the face region, the embodiments of this application extract block features from the face image to be tested by dividing it into blocks and combining the block features according to rows and columns. This allows the attack detection model to focus on the relationship between each image block and other image blocks in the face image to be tested. Compared with attack detection models that only focus on the global features of the image or the features of the face region, this is more in line with the characteristics of edge-difference type attack face images, thereby effectively defending against edge-difference type face attacks and improving the accuracy of attack detection. On the other hand, dividing the face image to be tested into blocks reduces the amount of image data calculated each time, thereby greatly reducing the amount of computation and effectively improving the running speed of the model when deployed and used.

[0036] In some alternative embodiments, such as Figure 2 As shown, the sequence processing network may include a first branch and a second branch. The first and second branches may employ the same network structure, for example, both using an RNN or GRU network structure. This network structure can process features in sequence form (e.g., feature sequences composed of block features of each image patch in each row, and feature sequences composed of block features of each image patch in each column), and learn the relationships between features in the feature sequence (e.g., block features of different image patches in the same row, and block features of different image patches in the same column). Since the first and second branches are independent, they can have different network parameters. Based on this structure, when performing step 102, the block features of each row of image patches can be processed based on the first branch to obtain the row features of the face image to be tested; and the block features of each column of image patches can be processed based on the second branch to obtain the column features of the face image to be tested.

[0037] In some alternative implementations, based on the first branch, the row features of the face image to be tested can be generated according to the following sub-steps S11 to S14:

[0038] Sub-step S11 involves concatenating the block features of each row of image blocks to obtain the row-level concatenated features corresponding to each row. Here, the block feature of each image block can be represented as a feature map. For each row of image blocks, the row-level concatenated features can be obtained by concatenating (concat) the image blocks in that row. For example, if the original image size is 64×64 and the size of each image block is 16×16, then 4 rows of image blocks and 4 columns of image blocks can be obtained. The block features of the first row of image blocks can be concatenated to obtain the row-level concatenated features corresponding to the first row of image blocks; the block features of the second row of image blocks can be concatenated to obtain the row-level concatenated features corresponding to the second row of image blocks; and so on. If the size of each feature map is represented as c×w×h, then after concatenating the block features, the size of each row-level concatenated feature can be represented as 4c×w×h.

[0039] Sub-step S12 involves pooling the stitched features of each row to obtain the initial combined features corresponding to each row of image patches. Here, the execution entity can perform pooling on the stitched features of each row to obtain the initial combined features corresponding to each row of image patches. Pooling is a downsampling process that can be implemented based on a non-linear pooling function. In practice, average pooling, max pooling, and other pooling methods can be used. Each initial combined feature can be a feature vector. Continuing the example above, each initial combined feature can be a 4c-dimensional feature vector. In practice, pooling layers corresponding to each row of image patches can be included between the feature extraction network and the sequence processing network. The stitched features of each row can be pooled using these pooling layers.

[0040] Sub-step S13 involves inputting the initial combined features corresponding to each row of image patches into the first branch. Based on the first branch, the initial combined features corresponding to each row of image patches are processed sequentially to obtain the target combined features corresponding to each row of image patches. Here, the target combined features can also be feature vectors, and their length can be n. Continuing the example above, since there are 4 rows of image patches, the input to the first branch is 4 4c-dimensional feature vectors, and the output of the first branch is 4 n-dimensional feature vectors.

[0041] Sub-step S14 involves pooling the target combination features corresponding to each row of image patches to obtain the row features of the face image to be tested. Here, a pooling layer can be set between the first branch and the classification network. This pooling layer can be used to pool the target combination features corresponding to each row of image patches. Continuing the example above, after pooling the four n-dimensional feature vectors output by the first branch, an n-dimensional feature vector can be obtained, which is the row feature of the face image to be tested.

[0042] Similarly, in some alternative implementations, based on the second branch, the column features of the face image to be tested can be generated according to the following sub-steps S21 to S24:

[0043] Sub-step S21 involves concatenating the block features of each column of image blocks to obtain the column-concatenated features corresponding to each column of image blocks. Here, for each column of image blocks, the column-concatenated features corresponding to that column can be obtained by concatenating (concat) the image blocks in that column. For example, if the original image size is 64×64 and the size of each image block is 16×16, then 4 columns of image blocks and 4 columns of image blocks can be obtained. The block features of the first column of image blocks can be concatenated to obtain the column-concatenated features corresponding to the first column of image blocks; the block features of the second column of image blocks can be concatenated to obtain the column-concatenated features corresponding to the second column of image blocks; and so on. If the size of each feature map is represented as c×w×h, then after concatenating the block features, the size of each column-concatenated feature can be represented as 4c×w×h.

[0044] Sub-step S22 involves performing column pooling on the concatenated features of each column to obtain the initial combined features corresponding to each column of image patches. Here, the aforementioned execution entity can perform pooling on the concatenated features of each column to obtain the initial combined features corresponding to each column of image patches. Each initial combined feature can be a feature vector. Continuing the example above, each initial combined feature can be a 4c-dimensional feature vector. In practice, pooling layers corresponding to each column of image patches can be included between the feature extraction network and the sequence processing network. The concatenated features of each column can be pooled using the above pooling pairs.

[0045] Sub-step S23 inputs the initial combined features corresponding to each column of image patches into the second branch to obtain the target combined features corresponding to each column of image patches. Here, the target combined features can also be feature vectors, and their length can be n. Continuing the above example, since there are 4 columns of image patches, the input to the second branch is 4 4c-dimensional feature vectors, and the output of the second branch is 4 n-dimensional feature vectors.

[0046] Sub-step S24 involves performing column pooling on the target combination features corresponding to each column of image patches to obtain the column features of the face image to be tested. Here, a pooling layer can be set between the second branch and the classification network. This pooling layer can be used to perform column pooling on the target combination features corresponding to each column of image patches. Continuing with the above example, after performing column pooling on the four n-dimensional feature vectors output by the second branch, an n-dimensional feature vector can be obtained, which is the column feature of the face image to be tested.

[0047] By segmenting the face image under test into blocks, extracting block features, and combining these features according to rows and columns, followed by sequential processing, the attack detection model can focus on the relationship between each image block and other image blocks in the face image under test. Compared to attack detection models that only focus on global features or facial region features, this is more consistent with the characteristics of edge-difference type attack face images, thus effectively defending against edge-difference type face attacks and improving attack detection accuracy. Simultaneously, segmenting the face image under test reduces the amount of image data required for each computation (e.g., convolution calculation), significantly reducing computational load and effectively improving the model's running speed during deployment.

[0048] In some optional embodiments, since the number of attack face sample images is limited, attack face sample images can be generated by edge-filling processing of real face sample images to defend against face attacks of edge difference type. Specifically, the sample set generation process and the training process of the attack detection model can be seen in the following sub-steps S31 to S32:

[0049] Sub-step S31: Obtain the sample set. Here, the sample set may include real face sample images and attack face sample images corresponding to the real face sample images. Attack face sample images are images with different image edges obtained by processing real face sample images. Furthermore, real face sample images may be labeled with a first label, and attack face sample images may be labeled with a second label. The first and second labels can be used to represent the type of sample image and can serve as supervision signals during supervised training.

[0050] Sub-step S32 involves training the aforementioned attack detection model based on the sample set. Here, sample images from the sample set can be used to perform supervised training on the base model, and the trained base model can be used as the attack detection model. The aforementioned base model is the training model that includes the aforementioned feature extraction network, sequence processing network, and classification network.

[0051] In some scenarios, because the attacker's facial image must meet a certain aspect ratio when input into a facial recognition system, directly stretching the image will cause facial distortion, while directly cropping the facial image may result in reduced resolution or incomplete facial areas. Therefore, attackers typically pad the image with solid-color areas around it to make the facial image meet the aspect ratio requirement without affecting the image quality of the face. In view of this, in some optional implementations of this embodiment, the above-mentioned type of attack can be referred to as a proportional padding attack. An attack facial sample image is synthesized by performing edge padding processing on a real facial sample image using the following synthesis method to simulate proportional padding attack data.

[0052] Specifically, see Figure 3As shown, the measured aspect ratio of the real face sample image can be determined first, and a target aspect ratio can be randomly selected from a preset aspect ratio set. Then, the measured aspect ratio and the target aspect ratio are compared. If the measured aspect ratio is not equal to the target aspect ratio, a background image with the target background color can be generated based on the target aspect ratio, and the shorter side of the background image can be made equal to the shorter side of the real face sample image. The real face sample image is then pasted into the center of the background image to obtain the attack face sample image. If the measured aspect ratio is equal to the target aspect ratio, a new target aspect ratio can be randomly selected from the preset aspect ratio set.

[0053] It should be noted that, to ensure the synthesized attack face sample images (i.e., attack data) have sufficient randomness and better reflect real attack characteristics, the aspect ratio set can include image aspect ratios commonly used in face data acquisition hardware, and is not restricted to horizontal or vertical orientations. The target background color can be randomly selected from the color set or selected according to a certain proportion. For example, a certain proportion of black background can be fixed according to the actual attack characteristics, while other colors are randomly selected.

[0054] By synthesizing attack face sample images (i.e., attack data) through proportional edge padding, the attack face sample images (i.e., attack data) can be made sufficiently close to real attack data, resulting in excellent defense capabilities of the trained attack detection model. Furthermore, because the selection of target background color and target aspect ratio has a certain degree of randomness and diversity, the trained attack detection model has strong generalization ability; even if the real attack method changes slightly, the defense performance of the attack detection model will not fluctuate drastically.

[0055] In some scenarios, attackers often use a large proportion of the face area in a face image to maintain high resolution. If such a face image is directly input into a face recognition system, the face area will be stretched or incomplete and cannot be correctly detected. Therefore, attackers will repeatedly copy face images, shrink them, and stack them sequentially, so that the face area is roughly located in the center of the final image and has a suitable area proportion. In view of this, in some optional implementations of this embodiment, the above-mentioned type of attack can be referred to as an original image copying attack. By performing edge filling processing on a real face sample image through the following synthesis method, an attack face sample image is synthesized to simulate the original image copying attack data.

[0056] Specifically, see Figure 4As shown, a background image can first be generated based on a real facial sample image. Then, the following processing steps are performed: First, the background image is degraded and resized to obtain a degraded image and a resized image. Then, the resized image is pasted onto the degraded image to obtain the target image. Next, it is determined whether the target image meets certain conditions. For example, it is determined whether the first area proportion of the facial region in the target image is less than or equal to a first threshold (e.g., any value greater than or equal to 1 / 3 and less than or equal to 1 / 2), and whether the facial region in the target image is located within a target region (e.g., a predetermined area in the center). If the first area proportion of the facial region in the target image is less than or equal to the first threshold, and the facial region in the target image is located within the target region, then the target image can be used as an attack facial sample image. It should be noted that if the first area proportion is greater than the first threshold, or if the facial region in the target image is located outside the target region, then the target image can be used as a background image, and the processing steps continue until the target conditions are met.

[0057] In generating the initial background image based on real face sample images, a second area proportion of the face region in the real face sample image can be determined first. If the second area proportion is less than a second threshold, a sub-image of the extended face region in the real face sample image, including the face region, can be extracted and used as the background image. If the second area proportion is greater than or equal to the second threshold, the real face sample image is used as the background image. This ensures that the face region area in the background image is sufficiently large.

[0058] In the process of cropping sub-images, the facial regions in the real facial sample image are first detected. Then, the boundaries of the detected facial regions are expanded to obtain the expanded facial region. Finally, a sub-image of the expanded facial region is cropped from the real facial sample image.

[0059] It should be noted that, to ensure the synthesized attack face sample images (i.e., attack data) have sufficient randomness and better reflect the characteristics of realistic attacks, the expansion of the facial region boundaries in the real face sample images can be done randomly. That is, a certain proportion can be randomly expanded. Furthermore, during image degradation processing, different degradation methods can be randomly combined (e.g., blurring, increasing noise, reducing transparency, reducing brightness, reducing contrast, etc.), and the parameters used in the degradation process can be randomized (e.g., random blur radius, random noise, random transparency, random brightness, random contrast, etc.). When resizing the image, a random reduction ratio can be used. When pasting the resized image onto the degraded image, the pasting area can be randomly selected.

[0060] By synthesizing attack face sample images (i.e., attack data) through image duplication and overlay, the types of attack face sample images (i.e., attack data) can be enriched, improving the attack detection model's ability to defend against real attacks and enhancing its anti-interference capabilities.

[0061] In some scenarios, in the facial images possessed by attackers, a portion of the facial area is not centered or is tilted at an angle, which does not conform to the habits of real people using facial recognition systems. Therefore, attackers often use affine transformations to rotate and center the face in the image, mimicking the shooting habits of real people. In this case, irregular edges naturally form around the face image due to the affine transformation. Therefore, in some optional implementations of this embodiment, the above-mentioned type of attack can be referred to as an affine transformation attack. The following synthesis method is used to perform edge filling processing on a real facial sample image to synthesize an attack facial sample image, simulating affine transformation attack data.

[0062] Specifically, see Figure 5 As shown, facial keypoint detection can be performed on a real face sample image first to obtain the measured coordinates of the facial keypoints. The number of facial keypoints detected during the facial keypoint detection process can be set as needed. For example, it can be set to 5, 68, or 81, etc., without limitation here. Then, based on the measured and standard facial keypoint coordinates, an affine transformation matrix is ​​generated. An affine transformation, also known as an affine mapping, is geometrically defined as an affine transformation or affine mapping between two vector spaces. An affine transformation consists of a non-singular linear transformation followed by a translation transformation. The measured and standard facial keypoint coordinates can be represented in matrix form, and the matrix relationship can be solved using common affine transformation solving algorithms to obtain the affine transformation matrix. Then, based on the affine transformation matrix, affine transformation is performed on each pixel in the real face sample image to obtain an affine transformed image. Finally, image filling processing is performed on the irregular areas of the affine transformed image to generate an attack face sample image.

[0063] It should be noted that, to ensure the randomness and diversity of the synthesized attack face sample images (i.e., attack data), during the facial keypoint detection step, the real face sample image can first be randomly cropped to update the real face sample image, ensuring that the updated real face sample image includes the complete face region; then, facial keypoint detection is performed on the updated real face sample image. This alters the aspect ratio of the original image and the relative position of the face within the image, improving the randomness and diversity of the attack face sample images (i.e., attack data). Furthermore, when filling irregular regions of the affine transformed image, different content can be used. For example, this can include, but is not limited to, solid color filling, image stringing filling, image mirroring filling, and filling with other images.

[0064] The attack data synthesized in this implementation can generate more diverse attack data using richer synthesis methods, while covering actual attack methods, thus making the model more robust.

[0065] As an example, parameters Figure 6 The diagram illustrates the processing steps of the attack detection method shown. Figure 6 As shown, edge-difference type facial attacks can include proportional edge-padding attacks, original image copying attacks, and affine transformation attacks. For each type of facial attack, sample images can be synthesized using different synthesis methods to simulate the attack data corresponding to each type. Then, an attack detection model can be trained based on the sample images synthesized using each method. Finally, the trained attack detection model can be used to detect attacks on the test facial image, outputting the attack detection result. The attack detection result can determine whether the test facial image is an edge-difference type attack image.

[0066] The attack detection model trained on the aforementioned sample set can effectively defend against facial attacks with edge differences, and can be deployed in parallel with existing liveness detection models. Furthermore, due to the randomness and diversity of the training data, the attack detection model's defensive capabilities can remain at a high level over a long period, without rapidly declining due to subtle changes in attack methods. This achieves proactive defense against facial attacks with edge differences without relying on massive amounts of real attack data.

[0067] In some alternative implementations, the attack detection model in this application embodiment can be deployed together with other types of attack detection models (such as light-based liveness detection models, motion liveness detection models, screen capture liveness detection models, etc.) on the same device. Multiple attack detection models can be used to detect the collected target video in parallel. If any attack detection model detects that the target video or the target object is not alive, the detection result that the target object is not alive is output. By deploying multiple different types of attack detection models, the accuracy of attack detection can be improved.

[0068] Further reference Figure 7 As an implementation of the methods shown in the above figures, this application provides an embodiment of an attack detection device, which is similar to... Figure 1 Corresponding to the method embodiments shown, this device can be specifically applied to various electronic devices.

[0069] like Figure 7 As shown, the attack detection device 700 of this embodiment includes: an input unit 701, used to input a face image to be tested into an attack detection model; and a detection unit 702, used to divide the face image to be tested into multiple image blocks through the attack detection model, extract the block features of each image block, and process the block features of each row and each column of image blocks to obtain the row features and column features of the face image to be tested, and process the row features and the column features to obtain an attack detection result; the attack detection result is used to indicate whether the face image to be tested is an attack face image that differs from the image edges of a real face image.

[0070] In some optional implementations, the attack detection model includes a sequence processing network, which includes a first branch and a second branch; the detection unit 702 is further configured to perform sequence processing on the block features of each row of image blocks based on the first branch to obtain the row features of the face image to be tested, and to process the block features of each column of image blocks based on the second branch to obtain the column features of the face image to be tested.

[0071] In some optional implementations, the detection unit 702 is further configured to concatenate the block features of each row of image blocks to obtain row concatenation features corresponding to each row of image blocks; perform pooling processing on each row of concatenation features to obtain initial combined features corresponding to each row of image blocks; input the initial combined features corresponding to each row of image blocks into the first branch, perform sequence processing on the initial combined features corresponding to each row of image blocks based on the first branch to obtain target combined features corresponding to each row of image blocks; and perform pooling processing on the target combined features corresponding to each row of image blocks to obtain the row features of the face image to be tested.

[0072] In some optional implementations, the detection unit 702 is further configured to concatenate the block features of each column of image blocks to obtain column concatenation features corresponding to each column of image blocks; perform pooling processing on each column concatenation features to obtain initial combination features corresponding to each column of image blocks; input the initial combination features corresponding to each column of image blocks to the second branch, perform sequence processing on the initial combination features corresponding to each column of image blocks based on the second branch to obtain target combination features corresponding to each column of image blocks; and perform pooling processing on the target combination features corresponding to each column of image blocks to obtain the column features of the face image to be tested.

[0073] In some optional implementations, the attack detection model further includes a classification network, which includes a feature concatenation layer and a fully connected layer; the detection unit 702 is further used to input the row features and column features into the feature concatenation layer to obtain target features; and to input the target features into the fully connected layer to obtain the attack detection result.

[0074] In some optional implementations, the attack detection model described above is trained through the following steps: obtaining a sample set, which includes real face sample images and attack face sample images corresponding to the real face sample images, wherein the attack face sample images are images with different image edges obtained by processing the real face sample images; and training the attack detection model based on the sample set.

[0075] In some optional implementations, the aforementioned attack face sample image is generated through the following steps: determining the measured aspect ratio of the aforementioned real face sample image, and randomly selecting a target aspect ratio from the set of aspect ratios; if the aforementioned measured aspect ratio is not equal to the aforementioned target aspect ratio, then generating a background image with a target background color based on the aforementioned target aspect ratio, pasting the aforementioned real face sample image into the center of the aforementioned background image to obtain the attack face sample image, wherein the shorter side length of the aforementioned background image is equal to the shorter side length of the aforementioned real face sample image.

[0076] In some optional implementations, the aforementioned attack face sample image is generated through the following steps: a background image is generated based on the aforementioned real face sample image; the following processing steps are performed: the background image is subjected to image degradation processing and image reduction processing respectively to obtain a degraded image and a reduced image; the reduced image is pasted into the degraded image to obtain a target image; if the first area ratio of the face region in the target image is less than or equal to a first threshold, and the face region in the target image is located within the target region, then the target image is used as the attack face sample image; if the first area ratio is greater than the first threshold, or the face region in the target image is located outside the target region, then the target image is used as the background image, and the aforementioned processing steps are continued.

[0077] In some optional implementations, generating a background image based on the real face sample image includes: determining a second area ratio of the face region in the real face sample image; if the second area ratio is less than a second threshold, then cropping a sub-image of the face extension region in the real face sample image, the face extension region including the face region, and using the sub-image as the background image; if the second area ratio is greater than or equal to the second threshold, then using the real face sample image as the background image.

[0078] In some optional implementations, the above-mentioned cropping of the sub-image of the extended facial region in the real facial sample image includes: detecting the facial region in the real facial sample image; extending the boundary of the detected facial region to obtain the extended facial region; and cropping the sub-image of the extended facial region in the real facial sample image.

[0079] In some optional implementations, the aforementioned attack face sample image is generated through the following steps: facial keypoint detection is performed on the aforementioned real face sample image to obtain the measured facial keypoint coordinates; an affine transformation matrix is ​​generated based on the measured facial keypoint coordinates and the standard facial keypoint coordinates; an affine transformation is performed on each pixel in the aforementioned real face sample image based on the aforementioned affine transformation matrix to obtain an affine transformed image; and image filling processing is performed on the irregular regions of the aforementioned affine transformed image to generate the attack face sample image.

[0080] In some optional implementations, the above-mentioned facial landmark detection on the real face sample image includes: randomly cropping the real face sample image to update the real face sample image, wherein the updated real face sample image includes a complete face region; and performing facial landmark detection on the updated real face sample image.

[0081] The apparatus provided in the above embodiments of this application divides the face image to be tested into multiple image blocks using an attack detection model, extracts the block features of each image block, and then processes the block features of each row and column of image blocks to obtain the row features and column features of the face image to be tested. Finally, the row features and column features are processed to obtain the attack detection result. On the one hand, given that edge-difference type attack face images do not have obvious editing in the face region, the embodiments of this application extract block features from the face image to be tested by dividing it into blocks and combining the block features according to rows and columns. This allows the attack detection model to focus on the relationship between each image block and other image blocks in the face image to be tested. Compared with attack detection models that only focus on the global features of the image or the features of the face region, this is more in line with the characteristics of edge-difference type attack face images, thereby effectively defending against edge-difference type face attacks and improving the accuracy of attack detection. On the other hand, dividing the face image to be tested into blocks reduces the amount of image data calculated each time, thereby greatly reducing the amount of computation and effectively improving the running speed of the model when deployed and used.

[0082] This application also provides an electronic device, including one or more processors and a storage device storing one or more programs thereon. When the one or more programs are executed by the one or more processors, the one or more processors implement the above-described attack detection method.

[0083] The following is for reference. Figure 8 It shows a schematic diagram of the structure of an electronic device used to implement some embodiments of this application. Figure 8 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of the embodiments of this application.

[0084] like Figure 8 As shown, the electronic device 800 may include a processing device (e.g., a central processing unit, a graphics processor, etc.) 801, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 802 or a program loaded from a storage device 808 into a random access memory (RAM) 803. The RAM 803 also stores various programs and data required for the operation of the electronic device 800. The processing device 801, ROM 802, and RAM 803 are interconnected via a bus 804. An input / output (I / O) interface 805 is also connected to the bus 804.

[0085] Typically, the following devices can be connected to I / O interface 805: input devices 806 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 807 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 808 including, for example, disks, hard disks, etc.; and communication devices 809. Communication device 809 allows electronic device 800 to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 8 An electronic device 800 with various devices is shown; however, it should be understood that it is not required to implement or possess all of the devices shown. More or fewer devices may be implemented or possessed alternatively. Figure 8 Each box shown can represent a device or multiple devices as needed.

[0086] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the above-described attack detection method.

[0087] In particular, according to some embodiments of this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, some embodiments of this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device 809, or installed from a storage device 808, or installed from a ROM 802. When the computer program is executed by the processing device 801, it performs the functions defined in the methods of some embodiments of this application.

[0088] This application also provides a computer-readable medium having a computer program stored thereon, which, when executed by a processor, implements the above-described attack detection method.

[0089] It should be noted that, in some embodiments of this application, the computer-readable medium described above can be a computer-readable signal medium, a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In some embodiments of this application, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In some embodiments of this application, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium can be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (radio frequency), etc., or any suitable combination thereof.

[0090] In some implementations, clients and servers can communicate using any currently known or future-developed network protocol such as HTTP (Hypertext Transfer Protocol) and can interconnect with digital data communication (e.g., communication networks) of any form or medium. Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), the Internet (e.g., the Internet of Things), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any currently known or future-developed networks.

[0091] The aforementioned computer-readable medium may be included in the aforementioned electronic device; or it may exist independently without being assembled into the electronic device. The aforementioned computer-readable medium carries one or more programs that, when executed by the electronic device, cause the electronic device to: input a test face image into an attack detection model; divide the test face image into multiple image blocks using the attack detection model, extract block features of each image block, and process the block features of each row and column of image blocks to obtain row and column features of the test face image; process the row and column features to obtain an attack detection result; the attack detection result is used to indicate whether the test face image is an attack face image whose image edges differ from those of a real face image.

[0092] Computer program code for performing operations of some embodiments of this application can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++; and conventional procedural programming languages ​​such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network, or it can be connected to an external computer (e.g., via the Internet using an Internet service provider), including local area networks (LANs) or wide area networks (WANs).

[0093] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0094] The units described in some embodiments of this application can be implemented in software or hardware. The described units can also be housed in a processor; for example, a processor may be described as including a first determining unit, a second determining unit, a selecting unit, and a third determining unit. The names of these units do not necessarily limit the specific unit itself.

[0095] The functions described above in this document can be performed, at least in part, by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: Field Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs), Application Standard Products (ASSPs), System-on-Chip (SoCs), Complex Programmable Logic Devices (CPLDs), and so on.

[0096] The above description is merely a selection of preferred embodiments of this application and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in the embodiments of this application is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the above-described inventive concept. For example, technical solutions formed by substituting the above-described features with (but not limited to) technical features with similar functions disclosed in the embodiments of this application.

Claims

1. An attack detection method, characterized in that, The method includes: Input the face image to be tested into the attack detection model; The attack detection model divides the face image to be tested into multiple image blocks, extracts the block features of each image block, and processes the block features of each row and column image block to obtain the row features and column features of the face image to be tested. The row features and column features are then processed to obtain the attack detection result. The attack detection result is used to indicate whether the tested face image is an attack face image whose image edges differ from those of a real face image.

2. The method according to claim 1, characterized in that, The attack detection model includes a sequence processing network, which includes a first branch and a second branch; the processing of block features of each row and column of image blocks to obtain the row features and column features of the face image to be tested includes: Based on the first branch, the block features of each row of image blocks are processed to obtain the row features of the face image to be tested. Based on the second branch, the block features of each column of image blocks are processed to obtain the column features of the face image to be tested.

3. The method according to claim 2, characterized in that, The step of processing the block features of each row of image blocks based on the first branch to obtain the row features of the face image to be tested includes: The block features of each row of image blocks are concatenated to obtain the row concatenation features corresponding to each row of image blocks; The spliced ​​features of each row are pooled to obtain the initial combined features corresponding to each row of image blocks; The initial combined features corresponding to each row of image blocks are input into the first branch, and the initial combined features corresponding to each row of image blocks are processed sequentially based on the first branch to obtain the target combined features corresponding to each row of image blocks. Pooling is performed on the target combination features corresponding to each row of image blocks to obtain the row features of the face image to be tested.

4. The method according to claim 2 or 3, characterized in that, The step of processing the block features of each column of image blocks based on the second branch to obtain the column features of the face image to be tested includes: The block features of each column of image blocks are concatenated to obtain the column concatenation features corresponding to each column of image blocks; The concatenated features of each column are pooled to obtain the initial combined features corresponding to each column of image blocks; The initial combined features corresponding to each column of image blocks are input into the second branch. Based on the second branch, the initial combined features corresponding to each column of image blocks are processed sequentially to obtain the target combined features corresponding to each column of image blocks. Pooling is performed on the target combination features corresponding to each column of image blocks to obtain the column features of the face image to be tested.

5. The method according to any one of claims 1-3, characterized in that, The attack detection model further includes a classification network, which comprises a feature concatenation layer and a fully connected layer; the processing of the row features and the column features to obtain the attack detection result includes: The row and column features are input into the feature concatenation layer to obtain the target features; The target features are input into the fully connected layer to obtain the attack detection result.

6. The method according to any one of claims 1-3, characterized in that, The attack detection model is trained through the following steps: Obtain a sample set, which includes real face sample images and attack face sample images corresponding to the real face sample images. The attack face sample images are images with different image edges obtained by processing the real face sample images. The attack detection model is trained based on the sample set.

7. The method according to claim 6, characterized in that, The attack face sample image is generated through the following steps: Determine the measured aspect ratio of the real face sample image, and randomly select a target aspect ratio from the aspect ratio set; If the measured aspect ratio is not equal to the target aspect ratio, a background image with the target background color is generated based on the target aspect ratio, and the real face sample image is pasted into the center of the background image to obtain the attack face sample image. The short side length of the background image is equal to the short side length of the real face sample image.

8. The method according to claim 6 or 7, characterized in that, The attack face sample image is generated through the following steps: A background image is generated based on the real face sample image; The following processing steps are performed: the background image is subjected to image degradation and image reduction processing respectively to obtain a degraded image and a reduced image; the reduced image is pasted into the degraded image to obtain the target image; if the first area ratio of the face region in the target image is less than or equal to a first threshold, and the face region in the target image is located within the target region, then the target image is used as the attack face sample image; If the first area ratio is greater than the first threshold, or if the face region in the target image is located outside the target region, then the target image is used as the background image, and the processing steps continue to be performed.

9. The method according to claim 8, characterized in that, The process of generating a background image based on the real facial sample image includes: Determine the second area proportion of the facial region in the real facial sample image; If the second area ratio is less than the second threshold, then a sub-image of the face extension region in the real face sample image is extracted, the face extension region includes the face region, and the sub-image is used as the background image; If the second area ratio is greater than or equal to the second threshold, then the real face sample image is used as the background image.

10. The method according to claim 6, characterized in that, The attack face sample image is generated through the following steps: Facial key point detection is performed on the real face sample image to obtain the measured coordinates of the facial key points; Based on the measured facial key point coordinates and the standard facial key point coordinates, an affine transformation matrix is ​​generated; Based on the affine transformation matrix, an affine transformation is performed on each pixel in the real face sample image to obtain an affine transformed image. The irregular regions of the affine transformation image are filled to generate an attack face sample image.

11. An electronic device, characterized in that, include: One or more processors; Storage device, on which one or more programs are stored, When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of claims 1-10.

12. A computer-readable medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method as described in any one of claims 1-10.

13. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the method described in any one of claims 1-10.