Face living body detection method, device, equipment and medium
By using the baseline distance constraint method to determine facial feature points in the image coordinate system of a binocular camera, the accuracy of disparity calculation in the existing technology is solved, the positioning accuracy of facial feature points is improved, and the effect of face liveness detection is enhanced.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HANGZHOU HIKVISION DIGITAL TECHNOLOGY CO LTD
- Filing Date
- 2023-11-20
- Publication Date
- 2026-06-26
AI Technical Summary
In existing technologies, face liveness detection suffers from poor detection results due to positioning errors between matching feature points. In particular, when matching images from binocular cameras, the default equality of the vertical coordinates leads to inaccurate matching and large errors in disparity calculation.
By using the baseline distance constraint method to determine the horizontal correction line in the image coordinate system of a binocular camera, facial feature points are accurately located, improving the positioning accuracy of facial feature points and thus enhancing the accuracy of disparity calculation.
It improves the positioning accuracy of facial feature point coordinates, enhances the accuracy of disparity calculation, and strengthens the effect of face liveness detection.
Smart Images

Figure CN117765584B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, and in particular to a method, apparatus, device and medium for human face liveness detection. Background Technology
[0002] Face anti-spoofing technology can be applied to attendance software, payment software, social media software, etc. However, some unauthorized users may attempt to bypass face liveness detection by using photos, videos, or head models of legitimate users. In related technologies, liveness detection requires pixel matching of two images captured by a binocular camera to identify matching feature points, and then performs detection based on the disparity information between these matching feature points.
[0003] However, the related technology ignores the positioning error between matching feature points and forces the ordinates of the two images to be equal, that is, to assume that they are on the same epipolar line, which inevitably leads to inaccurate matching and large disparity calculation errors. Summary of the Invention
[0004] The main objective of this application is to provide a face liveness detection method, apparatus, device, and medium, aiming to solve the technical problem that the face liveness detection effect is poor due to the positioning error between matching feature points in related technologies.
[0005] To achieve the above objectives, in a first aspect, this application provides a face liveness detection method, the method comprising:
[0006] Acquire the first and second images of the target to be detected simultaneously using a binocular camera;
[0007] Facial feature points are extracted from the first image and the second image to obtain at least one first initial facial feature point and at least one second initial facial feature point;
[0008] In the image coordinate system of the binocular camera, for each feature point category, a horizontal correction line is determined between the first horizontal line where the first initial facial feature point is located and the second horizontal line where the second initial facial feature point is located.
[0009] For each feature point category, the corrected first facial feature points located on the horizontal correction line are determined from the first image to obtain a first face semantic map including all corrected first facial feature points. The corrected second facial feature points located on the horizontal correction line are determined from the second image to obtain a second face semantic map including all corrected second facial feature points.
[0010] Based on the disparity information of each feature point group between the first and second face semantic maps, the face liveness detection result is obtained; wherein, the feature point group includes the corrected first facial feature point and the corrected second facial feature point belonging to the same feature point category.
[0011] In one possible embodiment of this application, for each feature point category, corrected first facial feature points located on a horizontal correction line are determined from a first image to obtain a first face semantic map including all corrected first facial feature points; corrected second facial feature points located on a horizontal correction line are determined from a second image to obtain a second face semantic map including all corrected second facial feature points, including:
[0012] For each feature point category, determine multiple first pixels that the horizontal correction line passes through from the first image, and determine multiple second pixels that the horizontal correction line passes through from the second image;
[0013] For each feature point category, a corrected first facial feature point is determined from multiple first pixels, and a corrected second facial feature point is determined from multiple second pixels.
[0014] The first initial face semantic map is updated based on all corrected first facial feature points to obtain the first face semantic map. The second initial face semantic map is updated based on all corrected second facial feature points to obtain the second face semantic map.
[0015] In one possible embodiment of this application, for each feature point category, determining a plurality of first pixels through which the horizontal correction line passes from a first image, and determining a plurality of second pixels through which the horizontal correction line passes from a second image, includes:
[0016] Extract the first pixel region where each initial facial feature point is located from the first image to obtain the first semantically sparse face image.
[0017] Extract the second pixel region where each second initial facial feature point is located from the second image to obtain the second face semantic sparse image; and for each feature point category, the first pixel region and the second pixel region are the same size.
[0018] For each feature point category, multiple first pixels through which the horizontal correction line passes are determined from the first pixel region, and multiple second pixels through which the horizontal correction line passes are determined from the second pixel region.
[0019] In one possible embodiment of this application, determining a horizontal correction line between a first horizontal line where the first facial feature point is located and a second horizontal line where the second facial feature point is located includes:
[0020] Use the parallel line between the first and second horizontal lines as the horizontal correction line.
[0021] In one possible embodiment of this application, for each feature point category, a corrected first facial feature point is determined from a plurality of first pixels, and a corrected second facial feature point is determined from a plurality of second pixels, including:
[0022] For each feature point category, the first pixel corresponding to the largest first probability value among multiple first pixels is taken as the corrected first facial feature point, and the second pixel corresponding to the largest second probability value among multiple second pixels is taken as the corrected first facial feature point; wherein, each first pixel has a first probability value to be identified as a first initial facial feature point, and each second pixel has a second probability value to be identified as a second initial facial feature point.
[0023] In one possible embodiment of this application, a face liveness detection result is obtained based on the disparity information of each feature point group between the first face semantic map and the second face semantic map, including:
[0024] Perform face liveness detection on the disparity information of the face feature set corresponding to the main facial parts in all feature sets to obtain the subject disparity classification result;
[0025] Face liveness detection is performed on the disparity information of the local facial feature set corresponding to a single organ in all feature sets to obtain local disparity classification results;
[0026] Based on the subject disparity classification results and the local disparity classification results, the face liveness detection results are obtained.
[0027] In one possible embodiment of this application, a face liveness detection result is obtained based on the subject disparity classification result and the local disparity classification result, including:
[0028] Based on the subject disparity classification result, the first weight of the subject disparity classification result, the local disparity classification result, and the second weight of the local disparity classification result, the face liveness detection result is obtained.
[0029] Secondly, this application also provides a face liveness detection device, the method of which includes:
[0030] The image acquisition module is used to acquire the first and second images captured by the binocular camera at the same time for the target to be detected;
[0031] The key point extraction module is used to extract facial feature points from the first image and the second image to obtain at least one first initial facial feature point and at least one second initial facial feature point.
[0032] The correction line generation module is used to determine, in the image coordinate system of the binocular camera, a horizontal correction line between the first horizontal line where the first initial facial feature point is located and the second horizontal line where the second initial facial feature point is located, for each feature point category.
[0033] The feature point correction module is used to determine the corrected first facial feature points located on the horizontal correction line from the first image for each feature point category, so as to obtain a first face semantic map including all corrected first facial feature points, and to determine the corrected second facial feature points located on the horizontal correction line from the second image, so as to obtain a second face semantic map including all corrected second facial feature points.
[0034] The liveness detection module is used to obtain face liveness detection results based on the disparity information of each feature point group between the first face semantic map and the second face semantic map; wherein, the feature point group includes the corrected first facial feature point and the corrected second facial feature point belonging to the same feature point category.
[0035] Thirdly, this application also provides a face liveness detection device, including: a processor, a memory, and a computer program stored in the memory, wherein the computer program is executed by the processor to implement the face liveness detection method as described in the first aspect.
[0036] Fourthly, this application also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the face liveness detection method as described in the first aspect.
[0037] This application proposes a face liveness detection method, which includes: acquiring a first image and a second image simultaneously captured by a binocular camera targeting a target; extracting facial feature points from the first image and the second image to obtain at least one first initial facial feature point and at least one second initial facial feature point; determining a horizontal correction line between a first horizontal line where the first initial facial feature point is located and a second horizontal line where the second initial facial feature point is located, for each feature point category, in the image coordinate system of the binocular camera; determining corrected first facial feature points located on the horizontal correction line from the first image for each feature point category to obtain a first face semantic map including all corrected first facial feature points; determining corrected second facial feature points located on the horizontal correction line from the second image to obtain a second face semantic map including all corrected second facial feature points; and obtaining a face liveness detection result based on the disparity information of each feature point group between the first face semantic map and the second face semantic map; wherein the feature point group includes corrected first facial feature points and corrected second facial feature points belonging to the same feature point category.
[0038] Therefore, compared to the related technologies that assume the vertical coordinates of the left and right images are equal, the embodiments of this application, based on the invariance of the face structure and using facial feature points identified by facial semantic recognition as the matching benchmark for the left and right images, utilize the baseline distance constraint method, that is, to determine the corrected first facial feature point and the corrected second facial feature point from the horizontal correction line between the first horizontal line and the second horizontal line, so as to accurately locate the identified facial feature points, thereby improving the positioning accuracy of the facial feature point coordinates and thus improving the parallax calculation accuracy. Attached Figure Description
[0039] Figure 1 This is a schematic diagram of the applicant's face liveness detection device;
[0040] Figure 2 This is a flowchart illustrating the first embodiment of the applicant's face liveness detection method;
[0041] Figure 3 A schematic diagram illustrating the feature point localization and correction of the applicant's face liveness detection method;
[0042] Figure 4 This is a schematic diagram of the first and second horizontal lines of this application;
[0043] Figure 5 This is a schematic diagram illustrating the correction of the nth facial feature point in this application;
[0044] Figure 6 This is a flowchart illustrating the second embodiment of the applicant's face liveness detection method;
[0045] Figure 7 This is a schematic diagram of the branching process in the second embodiment of the applicant's face liveness detection method;
[0046] Figure 8 This is a schematic diagram of the applicant's face liveness detection device.
[0047] The realization of the purpose, functional features and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation
[0048] It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to limit this application.
[0049] Face anti-spoofing technology can be applied to attendance software, payment software, social media software, etc. However, some unauthorized users may attempt to bypass face liveness detection by using illegitimate means such as photos, videos, or head models of legitimate users.
[0050] In related technologies, liveness detection requires pixel matching of two images captured by a stereo camera. Specifically, feature points are first located in each stereo image. Then, based on the feature points in the first image, the SAD (Sum of Absolute Differences) method is used to search for matching feature points around the corresponding feature points in the second image. This determines the matching feature points in the two images. Finally, disparity is calculated, and liveness detection is performed. The SAD algorithm is an image matching algorithm commonly used for image patch matching. It sums the absolute values of the differences between corresponding pixel values to evaluate the similarity between two image patches.
[0051] However, in the field of binocular stereo vision, the constraints required for calculating disparity include epipolar constraints. This means that for a point in the left image, its corresponding matching point in the right image must lie on the same straight line, called the epipolar line, which is parallel to the baseline of the binocular camera. However, due to manufacturing and installation errors in the binocular camera, the projection centers of the binocular cameras in the image coordinate system are not necessarily located on the same straight line. This leads to positioning errors between matching feature points during disparity calculation. Related technologies ignore this difference and force the ordinates of the two images to be equal, assuming they lie on the same epipolar line, inevitably resulting in inaccurate matching and large disparity calculation errors.
[0052] To address this, this application provides a solution that, based on the invariance of facial structure and using facial feature points identified through facial semantic recognition as the matching benchmark for left and right images, utilizes a baseline distance constraint method. Specifically, it determines the corrected first and second facial feature points from the horizontal correction line between the first and second horizontal lines to accurately locate the identified facial feature points, thereby improving the positioning accuracy of the facial feature point coordinates and thus enhancing the accuracy of disparity calculation.
[0053] The inventive concept of this application is further illustrated below with reference to some specific embodiments and implementation methods.
[0054] The following explains some technical terms used in the embodiments of this application:
[0055] Parallax: When the same target is imaged by two cameras at a certain distance, it appears as a difference in offset on the pixel plane. The closer the target is to the camera, the greater the offset, and vice versa. Therefore, distance information of each target can be extracted based on parallax.
[0056] Reference Figure 1 , Figure 1 This is a schematic diagram of the structure of a face liveness detection device in the hardware operating environment involved in the embodiments of this application.
[0057] like Figure 1As shown, the face liveness detection device may include: a processor 1001, such as a CPU, a user interface 1003, a memory 1005, and a communication bus 1002. The communication bus 1002 is used to enable communication between these components. The user interface 1003 may be a display screen, an input unit such as a keyboard, etc. The memory 1005 may be high-speed RAM or stable non-volatile memory, such as disk storage. Alternatively, the memory 1005 may be a storage device independent of the aforementioned processor 1001.
[0058] Understandably, the face liveness detection device may also include a network interface 1004, which may optionally include a standard wired interface or a wireless interface (such as a Wi-Fi interface). Optionally, the face liveness detection device may also include RF (Radio Frequency) circuitry, sensors, audio circuitry, a Wi-Fi module, etc.
[0059] Those skilled in the art will understand that Figure 1 The structure of the face liveness detection device shown does not constitute a limitation on the face liveness detection device. It may include more or fewer components than shown, or combine certain components, or have different component arrangements.
[0060] Based on the above structure, a first embodiment of the applicant's face liveness detection method is proposed. (See attached document.) Figure 2 , Figure 2 This is a flowchart illustrating the first embodiment of the applicant's face liveness detection method.
[0061] It should be noted that although the logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than that shown here.
[0062] In this embodiment, the face liveness detection method includes:
[0063] Step S100: Acquire the first and second images of the target to be detected by the binocular camera at the same time.
[0064] Specifically, the execution subject in this embodiment is a face liveness detection device. This face liveness detection device can be a user equipment (UE) such as a smart mobile terminal, laptop computer, personal digital assistant (PDA), or tablet computer (PAD), a server, or a virtual server integrated in the cloud.
[0065] The face liveness detection device can read a pre-stored first and second image from a database. Alternatively, the face liveness detection device can also establish a connection with a client to obtain the first and second images uploaded by the client in real time. Understandably, the client can be a smart mobile terminal, tablet, access control system, or other device equipped with a binocular camera to provide authentication functionality.
[0066] The first image and the second image are both images captured simultaneously by a stereo camera targeting the object to be detected. The first image is one of two images captured by the stereo camera at the same time, and the second image is the other of the two images. A stereo camera is a camera system consisting of two parallel cameras. By calculating the parallax between the two cameras, it can obtain the depth information of objects in a scene, thereby achieving 3D reconstruction and depth perception. Of course, in this embodiment, the first image and the second image can be images from the same source, such as both being RGB images. Alternatively, they can be images from different sources, such as the first image being an IR image and the second image being an RGB image.
[0067] The target to be detected can be a real person, a portrait shown in a video or image, or even a head model. In this embodiment, it is necessary to identify whether the target to be detected is a real person, a face shown in a video or image, or a head model.
[0068] Step S200: Extract facial feature points from the first image and the second image to obtain at least one first facial feature point and at least one second facial feature point.
[0069] Specifically, after obtaining the first and second images, neural networks such as facial semantic recognition models can be used to annotate facial feature points in both images. Facial feature points are key feature points pre-defined according to facial physiological characteristics, such as the corners of the eyes, the tip of the nose, the corners of the mouth, and the facial contours, automatically located based on the input facial data. When annotating facial feature points in the first and second images, a 68-point annotation scheme can be used for extraction. Alternatively, to improve detection accuracy, the more facial feature points annotated, the better; for example, in one instance, more than 1000 facial feature points were annotated.
[0070] It is worth mentioning that the same facial feature point recognition strategy can be used when annotating facial feature points. For example, if the 68-point annotation scheme is used to annotate facial feature points in the first image, the same 68-point annotation scheme can be used to annotate facial feature points in the second image at the same time, so that the first facial feature points on the first image and the second facial feature points on the second image correspond one-to-one with each other.
[0071] Furthermore, since the first and second images may include both facial and non-facial regions, when a user authenticates their identity using a smart mobile terminal, the terminal captures the user's avatar. This avatar includes not only the facial region but also non-facial regions such as the neck and background. In this case, the face liveness detection device first performs face detection on the first and second images to extract their respective facial regions, and then performs the aforementioned facial feature point annotation steps on each of these facial regions.
[0072] After marking facial feature points on the first and second images respectively, each facial feature point can be extracted to generate the corresponding first and second initial facial semantic maps. It is understandable that each feature point in the first and second initial facial semantic maps has feature point category and coordinate information. For example, in one example, for the facial feature point at the left corner of the mouth, its feature point category is "left corner of the mouth," and its coordinate information is its coordinate information in the image coordinate system of the stereo camera.
[0073] Understandably, the first and second initial facial semantic maps are sparser images compared to the original first and second images, thus saving processor computing power. Furthermore, the extracted first and second initial facial features are all original facial feature points obtained by the neural network.
[0074] Step S300: In the image coordinate system of the binocular camera, for each feature point category, determine the horizontal correction line located between the first horizontal line where the first initial facial feature point is located and the second horizontal line where the second initial facial feature point is located.
[0075] After obtaining the first and second initial face semantic maps, the first and second initial face semantic maps can be generated in the same image coordinate system based on the camera intrinsic parameters of each lens in the stereo camera and the interrelationships between the cameras. It can be understood that in this image coordinate system, the horizontal axis is the baseline distance direction, while the vertical axis is the direction perpendicular to the baseline distance.
[0076] Please see Figure 3 Then, feature point localization and correction are performed for the feature point group corresponding to each feature point category. That is, for the first and second initial facial feature points corresponding to the nth feature point category, a first horizontal line extending along the baseline distance direction and passing through the first initial facial feature point is generated, and a second horizontal line extending along the baseline distance direction and passing through the second initial facial feature point is generated. It can be understood that, generally speaking, the first and second horizontal lines are spaced apart from each other in the vertical axis direction.
[0077] Please see Figure 4At this point, Area_n1 is the region where the first initial facial feature point belonging to the nth facial feature point category is located, and Area_n1 is the region where the second initial facial feature point belonging to the nth facial feature point category is located. For the nth facial feature point category, the first horizontal line and the second horizontal line together define a strip-shaped region Area_l, and a horizontal correction line can be determined within the strip-shaped region Area_l.
[0078] This is understandable; please refer to [link / reference]. Figure 6 As an option in this embodiment, the horizontal correction line can be the parallel line between the first and second horizontal lines. That is, the horizontal correction line is equidistant from the first and second horizontal lines.
[0079] Alternatively, as another option in this embodiment, the specific position of the horizontal correction line can also be determined based on the relative relationship between the cameras. For example, the distance between the horizontal correction line and the first horizontal line or the second horizontal line can be determined based on the deviation of the projection centers of the left and right cameras, thereby determining the horizontal correction line.
[0080] Of course, in addition to the above methods, other methods can be used to determine the horizontal correction line between the first and second horizontal lines, and this embodiment does not limit this.
[0081] Step S400: For each feature point category, determine the corrected first facial feature points located on the horizontal correction line from the first image to obtain a first face semantic map including all corrected first facial feature points; determine the corrected second facial feature points located on the horizontal correction line from the second image to obtain a second face semantic map including all corrected second facial feature points.
[0082] After generating a horizontal correction line in the image coordinate system, facial feature points that can be identified as belonging to a certain feature point category are re-determined from the area through which the horizontal correction line passes in the first image. These are the corrected first facial feature points. Similarly, facial feature points that can be identified as belonging to a certain feature point category can be re-determined from the area through which the horizontal correction line passes in the second image. These are the corrected second facial feature points.
[0083] As an option in this embodiment, facial feature points can be re-annotated, and when re-annotating facial feature points, facial feature points of a certain feature point category can be located on a specific horizontal correction line as a constraint condition, thereby obtaining each corrected first facial feature point and each corrected second facial feature point.
[0084] Alternatively, as another option in this embodiment, step S400 specifically includes:
[0085] Step S410: For each feature point category, determine multiple first pixels through which the horizontal correction line passes from the first image, and determine multiple second pixels through which the horizontal correction line passes from the second image.
[0086] That is, after generating the horizontal correction line in the image coordinate system, the equation of the horizontal correction line and the camera intrinsic parameters of the left and right cameras can be used to determine the multiple first pixels that the horizontal correction line passes through in the first image, i.e., the pixels located on the horizontal correction line, and the multiple second pixels that the horizontal correction line passes through in the second image, i.e., the pixels located on the horizontal correction line.
[0087] Furthermore, in one specific embodiment, when the face liveness detection device performs step S400, it may extract the first pixel region where each first initial facial feature point is located from the first image to obtain a first face semantic sparse image; extract the second pixel region where each second initial facial feature point is located from the second image to obtain a second face semantic sparse image; for each feature point category, determine multiple first pixels through which the horizontal correction line passes from the first pixel region, and determine multiple second pixels through which the horizontal correction line passes from the second pixel region. Wherein, for each feature point category, the first pixel region and the second pixel region are of the same size.
[0088] Specifically, when the neural network annotates facial feature points in the first and second images, it needs to calculate the probability of each pixel being a facial feature point of a certain category, and annotate the pixel corresponding to the highest probability value as a facial feature point of that category. Understandably, for a specific feature point category, in the first pixel region where the first initial facial feature point is ultimately identified (e.g., a 5×5 pixel region), the first initial facial feature point is located in the center of the first pixel region, and its first probability value for being identified as the first initial facial feature point is greater than the first probability values for the other 24 pixels. Similarly, for a specific feature point category, in the second pixel region where the second initial facial feature point is ultimately identified (e.g., a 5×5 pixel region), the second initial facial feature point is located in the center of the second pixel region, and its second probability value for being identified as the second initial facial feature point is greater than the second probability values for the other 24 pixels.
[0089] When determining the first and second pixels corresponding to a certain feature point category, it is not necessary to traverse all pixels of the first image, but only to traverse the first pixel region where the first facial feature point is located and the second pixel region where the second facial feature point is located.
[0090] In other words, for all feature point categories, a first semantic sparse image of the face can be obtained based on the first image, consisting of the first pixel regions where each first facial feature point is located. Similarly, a second semantic sparse image of the face can be obtained based on the second image, consisting of the second pixel regions where each second facial feature point is located. When determining the first and second pixels corresponding to any feature point category, this can be done solely from the first and second semantic sparse images of the face, thus saving computational resources.
[0091] Step S420: For each feature point category, determine the corrected first facial feature point from multiple first pixels and determine the corrected second facial feature point from multiple second pixels.
[0092] After determining the first and second pixels located on the horizontal correction line, the corrected first facial feature points and the corrected second facial feature points can be obtained from them.
[0093] In one example, when the face liveness detection device performs step S420, it may select the two first pixels and second pixels with the closest first probability value and second probability value (i.e., the smallest difference) among all the first pixels and second pixels involved at this time as the corrected first facial feature point and the corrected second facial feature point, respectively.
[0094] Alternatively, in another example, when performing step S420, the face liveness detection device may also, for each feature point category, take the first pixel corresponding to the largest first probability value among multiple first pixels as the corrected first facial feature point, and take the second pixel corresponding to the largest second probability value among multiple second pixels as the corrected second facial feature point.
[0095] Please see Figure 5 The parallel line between the first horizontal line Line_1 and the second horizontal line Line_2 is used as the horizontal correction line Line_check. For the first initial facial feature point P1, after facial feature point correction, the corrected first initial facial feature point P2 is located in the row above it, that is, the second pixel from the left out of the 5 pixels on the horizontal correction line Line_check.
[0096] It is easy to see that the corrected second facial feature points and the corrected first facial feature points determined by this method are closer to the semantic recognition results, thus having higher semantic recognition accuracy. Moreover, since the two are located on the same epipolar line, that is, on the same horizontal line, they also have higher positioning accuracy.
[0097] In one example, for the nth facial feature point category, the first probability values of the 5 pixels that the corrected horizontal line passes through in sequence are p1, p2, p3, p4 and p5, where pmax = p1. Therefore, the first pixel can be used as the first corrected facial feature point n.
[0098] Step S430: Based on all corrected first facial feature points, obtain a first face semantic map; based on all corrected second facial feature points, obtain a second face semantic map.
[0099] After obtaining each corrected first facial feature point and each corrected second facial feature point, the first initial face semantic map and the second initial face semantic map can be updated respectively, thereby obtaining the first face semantic map and the second face semantic map.
[0100] It is worth mentioning that the first horizontal line and the second horizontal line may be collinear. In this case, no correction is required, and the first initial facial feature point and the second initial facial feature point can be directly used as the corrected first facial feature point and the corrected second facial feature point, respectively.
[0101] Step S500: Based on the disparity information of each feature point group between the first face semantic map and the second face semantic map, obtain the face liveness detection result.
[0102] The feature point group includes the first facial feature point and the second facial feature point, which belong to the same feature point category.
[0103] After obtaining the first and second face semantic maps, facial feature points belonging to the same feature point category are grouped into a single feature point group, thus obtaining multiple one-to-one pairs of facial feature points. Since the feature points in both images correspond one-to-one to form facial feature point pairs (e.g., the left corner of the mouth feature point in the first face semantic map corresponds to the left corner of the mouth feature point in the second face semantic map), pixel matching between the first and second face semantic maps is unnecessary. Disparity calculation can be directly performed on two facial feature points of the same feature point category to obtain a sparse disparity map. Then, based on the obtained sparse disparity map, the face liveness detection result is calculated.
[0104] It is easy to understand that in related technologies, the lack of facial texture in the two images captured by a binocular camera leads to poor pixel matching. Furthermore, due to differences in light source position and imaging characteristics, the local features of the two images often differ significantly, making pixel matching impossible. However, in this embodiment, the facial feature points obtained through semantic recognition from the first and second images (left and right images) are directly used as the basis for calculating disparity between them. Pixel matching is unnecessary, thus avoiding the problem of significant differences in local pixel features caused by differences in imaging characteristics between the two images, which prevents pixel matching. This also reduces the computational load.
[0105] Furthermore, it is easy to see that, compared to the related technologies that assume that the ordinates of the two matching points in the left and right images are the same, in this embodiment, based on the invariance of the face structure and using the facial feature points identified by facial semantic recognition as the matching benchmark for the left and right images, the baseline distance constraint method is used to determine the corrected first facial feature point and the corrected second facial feature point from the horizontal correction line between the first and second horizontal lines, so as to accurately locate the identified facial feature points, improve the positioning accuracy of the facial feature point coordinates, and thus improve the accuracy of disparity calculation.
[0106] Furthermore, based on the above embodiments, a second embodiment of the applicant's face liveness detection method is proposed. See also... Figure 6 , Figure 6 This is a flowchart illustrating the second embodiment of the applicant's face liveness detection method.
[0107] It should be noted that although the logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than that shown here.
[0108] In this embodiment, step S500 specifically includes:
[0109] Step S510: Perform face liveness detection on the disparity information of the face feature group set corresponding to the main face part in all feature groups to obtain the subject disparity classification result.
[0110] Step S520: Perform face liveness detection on the disparity information of the face local feature group set corresponding to a single organ in all feature groups to obtain the local disparity classification result.
[0111] Step S530: Based on the subject disparity classification results and the local disparity classification results, obtain the face liveness detection results.
[0112] Specifically, please refer to Figure 7In this embodiment, after obtaining the first and second face semantic maps, the disparity information of each feature group is calculated, thereby generating a sparse disparity map. When using the sparse disparity map for liveness detection, two branches are used: one branch is for main body disparity classification, and the other branch is for local disparity classification.
[0113] The "main body" refers to all the major features of the face, including but not limited to the set of facial feature points such as the tip of the nose, the center point of the left eye, the center point of the right eye, the center point of the forehead, the center point of the chin, the center point of the left cheek, and the center point of the right cheek. The set of multiple feature groups corresponding to the aforementioned facial feature point categories, i.e., the set of facial main feature groups, is input into a pre-trained face liveness detection model for detection, thereby obtaining the subject disparity classification result.
[0114] It is easy to see that parallax classification of the main body can be performed from a global or near-global perspective, thus making it better suited to dealing with planar attacks such as those on photos and real people.
[0115] In this context, "local" refers to one or more facial feature points corresponding to a single organ in the face. For example, when the local feature point is the lips, the set of all feature groups corresponding to all feature point categories belonging to the lips is taken as the local facial feature set. This local facial feature set is then input into a pre-trained face liveness detection model for detection, thereby obtaining the local disparity classification result.
[0116] It is easy to see that the set of facial layout features used in the local parallax classification results includes more detailed depth information of the local organs, which can distinguish between real people or stereoscopic attacks such as head models with poor accuracy.
[0117] Of course, it is worth mentioning that the face liveness detection model is not limited to the aforementioned face liveness detection model.
[0118] After obtaining the local disparity classification results and the main body disparity classification results, the final face liveness detection result can be obtained by combining the two results.
[0119] As a specific implementation method, when performing step S530, the face liveness detection model can obtain the face liveness detection result based on the subject disparity classification result, the first weight of the subject disparity classification result, the local disparity classification result, and the second weight of the local disparity classification result.
[0120] Specifically, both the subject disparity classification result and the local disparity classification result can be the probability value (or confidence value) of liveness. In this case, the product of the subject disparity classification result and the first weight, the product of the local disparity classification result and the second weight, and the sum of the two products are taken as the face liveness detection result.
[0121] In different usage scenarios, the subject disparity classification results and the local disparity classification results have different confidence levels, that is, their weights are different. Therefore, the first weight of the subject disparity classification results and the second weight of the local disparity classification results can be adjusted according to the usage scenario, so that the final face liveness detection results can better adapt to the usage scenario.
[0122] For different populations in different regions, adaptation can be achieved by adjusting the first weight of the subject disparity classification result and the second weight of the local disparity classification result, without having to retrain the face liveness detection model, thereby improving scene adaptability.
[0123] Alternatively, for populations with greater differences in facial contours, the value of the first weight can be increased to enhance the weight of the subject disparity classification result, thereby improving the final face liveness detection quality.
[0124] Furthermore, in this embodiment, performing disparity classification through separate main body disparity classification branches and local disparity classification branches helps enhance the interpretability of the overall face liveness detection model. Understandably, improved interpretability leads to stronger robustness and prevents overfitting.
[0125] See Figure 8 Based on the same inventive concept, in a second aspect, embodiments of this application also provide a face liveness detection device, comprising:
[0126] The image acquisition module is used to acquire the first and second images captured by the binocular camera at the same time for the target to be detected;
[0127] The key point extraction module is used to extract facial feature points from the first image and the second image to obtain at least one first initial facial feature point and at least one second initial facial feature point.
[0128] The correction line generation module is used to determine, in the image coordinate system of the binocular camera, a horizontal correction line between the first horizontal line where the first initial facial feature point is located and the second horizontal line where the second initial facial feature point is located, for each feature point category.
[0129] The feature point correction module is used to determine the corrected first facial feature points located on the horizontal correction line from the first image for each feature point category, so as to obtain a first face semantic map including all corrected first facial feature points, and to determine the corrected second facial feature points located on the horizontal correction line from the second image, so as to obtain a second face semantic map including all corrected second facial feature points.
[0130] The liveness detection module is used to obtain face liveness detection results based on the disparity information of each feature point group between the first face semantic map and the second face semantic map; wherein, the feature point group includes the corrected first facial feature point and the corrected second facial feature point belonging to the same feature point category.
[0131] It should be noted that the various implementation methods of the face liveness detection device in this embodiment and the technical effects they achieve can be referred to the various implementation methods of the face liveness detection method in the foregoing embodiments, and will not be repeated here.
[0132] Furthermore, embodiments of this application also propose a computer-readable storage medium storing a computer program. When executed by a processor, the computer program implements the steps of the face liveness detection method described above. Therefore, it will not be repeated here. Additionally, the beneficial effects of using the same method will not be repeated here either. For technical details not disclosed in the embodiments of the computer-readable storage medium involved in this application, please refer to the description of the method embodiments of this application. As an example, program instructions can be deployed to execute on a single computing device, or on multiple computing devices located in one location, or on multiple computing devices distributed across multiple locations and interconnected via a communication network.
[0133] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.
[0134] It should also be noted that the device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Furthermore, in the accompanying drawings of the device embodiments provided in this application, the connection relationships between modules indicate that they have communication connections, which can be specifically implemented as one or more communication buses or signal lines. Those skilled in the art can understand and implement this without any creative effort.
[0135] Through the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware, or it can be implemented by special-purpose hardware including application-specific integrated circuits, special-purpose CPUs, special-purpose memory, special-purpose components, etc. Generally, any function performed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be diverse, such as analog circuits, digital circuits, or special-purpose circuits. However, for this application, software program implementation is more often the preferred implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a readable storage medium, such as a computer floppy disk, USB flash drive, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods of the various embodiments of this application.
[0136] The above are merely preferred embodiments of this application and do not limit the patent scope of this application. Any equivalent structural or procedural transformations made using the content of this application's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.
Claims
1. A method for detecting human face liveness, characterized in that, The method includes: Acquire the first and second images of the target to be detected simultaneously using a binocular camera; Facial feature points are extracted from the first image and the second image to obtain at least one first initial facial feature point and at least one second initial facial feature point; In the image coordinate system of the binocular camera, for each feature point category, a horizontal correction line is determined between the first horizontal line where the first initial facial feature point is located and the second horizontal line where the second initial facial feature point is located; For each of the aforementioned feature point categories, corrected first facial feature points located on the horizontal correction line are determined from the first image to obtain a first face semantic map including all the corrected first facial feature points. Corrected second facial feature points located on the horizontal correction line are determined from the second image to obtain a second face semantic map including all the corrected second facial feature points. Based on the disparity information of each feature point group between the first face semantic map and the second face semantic map, the face liveness detection result is obtained; wherein, the feature point group includes the corrected first facial feature point and the corrected second facial feature point belonging to the same feature point category.
2. The face liveness detection method according to claim 1, characterized in that, For each of the aforementioned feature point categories, determining corrected first facial feature points located on the horizontal correction line from the first image to obtain a first facial semantic map including all the corrected first facial feature points, and determining corrected second facial feature points located on the horizontal correction line from the second image to obtain a second facial semantic map including all the corrected second facial feature points, includes: For each feature point category, a plurality of first pixels through which the horizontal correction line passes are determined from the first image, and a plurality of second pixels through which the horizontal correction line passes are determined from the second image; For each feature point category, a corrected first facial feature point is determined from a plurality of first pixels, and a corrected second facial feature point is determined from a plurality of second pixels; Based on all the corrected first facial feature points, a first face semantic map is obtained, and based on all the corrected second facial feature points, a second face semantic map is obtained.
3. The face liveness detection method according to claim 2, characterized in that, For each feature point category, determining multiple first pixels traversed by the horizontal correction line from the first image and multiple second pixels traversed by the horizontal correction line from the second image includes: Extract the first pixel region where each of the first initial facial feature points is located from the first image to obtain a first face semantic sparse image; Extract the second pixel region where each of the second initial facial feature points is located from the second image to obtain a second face semantic sparse image; and for each feature point category, the size of the first pixel region and the second pixel region is the same. For each of the aforementioned feature point categories, a plurality of first pixels through which the horizontal correction line passes are determined from the first pixel region, and a plurality of second pixels through which the horizontal correction line passes are determined from the second pixel region.
4. The face liveness detection method according to claim 2, characterized in that, For each feature point category, determining a corrected first facial feature point from a plurality of first pixels and a corrected second facial feature point from a plurality of second pixels includes: For each feature point category, the first pixel corresponding to the largest first probability value among multiple first pixels is taken as the corrected first facial feature point, and the second pixel corresponding to the largest second probability value among multiple second pixels is taken as the corrected first facial feature point; wherein, each first pixel has a first probability value to be identified as the first initial facial feature point, and each second pixel has a second probability value to be identified as the second initial facial feature point.
5. The face liveness detection method according to claim 1, characterized in that, Determining the horizontal correction line between the first horizontal line where the first facial feature point is located and the second horizontal line where the second facial feature point is located includes: The horizontal correction line is defined as the parallel line between the first horizontal line and the second horizontal line.
6. The face liveness detection method according to any one of claims 1 to 5, characterized in that, Based on the disparity information of each feature point group between the first face semantic map and the second face semantic map, the face liveness detection result is obtained, including: Perform face liveness detection on the disparity information of the face feature point set corresponding to the main face part in all the feature point sets to obtain the subject disparity classification result; Face liveness detection is performed on the disparity information of the set of local facial feature points corresponding to a single organ in all the feature point sets to obtain local disparity classification results; Based on the subject disparity classification result and the local disparity classification result, the face liveness detection result is obtained.
7. The face liveness detection method according to claim 6, characterized in that, The process of obtaining face liveness detection results based on the subject disparity classification results and the local disparity classification results includes: Based on the subject disparity classification result, the first weight of the subject disparity classification result, the local disparity classification result, and the second weight of the local disparity classification result, the face liveness detection result is obtained.
8. A face liveness detection device, characterized in that, The device includes: The image acquisition module is used to acquire the first and second images captured by the binocular camera at the same time for the target to be detected; The key point extraction module is used to extract facial feature points from the first image and the second image to obtain at least one first initial facial feature point and at least one second initial facial feature point. The correction line generation module is used to determine, for each feature point category, a horizontal correction line located between the first horizontal line where the first initial facial feature point is located and the second horizontal line where the second initial facial feature point is located in the image coordinate system of the binocular camera. The feature point correction module is used to determine, for each of the feature point categories, the corrected first facial feature points located on the horizontal correction line in the first image to obtain a first face semantic map including all the corrected first facial feature points, and to determine, the corrected second facial feature points located on the horizontal correction line in the second image to obtain a second face semantic map including all the corrected second facial feature points. The liveness detection module is used to obtain face liveness detection results based on the disparity information of each feature point group between the first face semantic map and the second face semantic map; wherein, the feature point group includes corrected first facial feature points and corrected second facial feature points belonging to the same feature point category.
9. A face liveness detection device, characterized in that, include: A processor, a memory, and a computer program stored in the memory, the computer program being executed by the processor to implement the face liveness detection method as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the face liveness detection method as described in any one of claims 1 to 7.