Identity authentication method and device, storage medium and electronic device
By acquiring and analyzing multiple frames of facial images, and combining recognition models and networks to determine the real face and its action state, the problem of low accuracy of key point localization in existing technologies for identity authentication is solved, achieving higher accuracy and efficiency in identity authentication.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG DAHUA TECH CO LTD
- Filing Date
- 2022-12-27
- Publication Date
- 2026-06-23
AI Technical Summary
Existing identity authentication technologies rely too heavily on the accuracy of key point positioning, resulting in low accuracy and efficiency, and are prone to failure, especially in large pose scenarios.
By acquiring K frames of face images and inputting them into a pre-trained recognition model, the system determines whether each frame is a real face and whether its parts are in the target state. The system then combines an activity judgment network and a part processing network to generate authentication results.
It improves the accuracy and efficiency of identity authentication, effectively handles identity authentication in large pose scenarios, and resists deepfake video attacks.
Smart Images

Figure CN115935317B_ABST
Abstract
Description
Technical Field
[0001] The present invention relates to the field of computers, and more specifically, to an identity authentication method, apparatus, storage medium, and electronic device. Background Technology
[0002] With the development of information technology, identity security authentication technology is being applied in more and more fields. At present, the demand for identity security authentication technology in access control systems, online payment, real-name authentication and other fields is increasing. Current identity security authentication technology relies too much on the detected key point positions and then judges whether the action has occurred based on the key point position conversion index. However, most key points are based on camera calibration and conversion and do not have generality. In addition, the method of judging the target activity based on the 3D model detection of key points introduces errors and cannot handle large posture scenes. Authentication may fail when shaking or nodding.
[0003] There is currently no effective solution to the problem that identity authentication in related technologies relies too heavily on the accuracy of key point positioning, resulting in low accuracy and efficiency. Summary of the Invention
[0004] This invention provides an identity authentication method, apparatus, storage medium, and electronic device to at least solve the problem in related technologies where identity authentication relies too heavily on the accuracy of key point positioning, resulting in low accuracy and efficiency of identity authentication.
[0005] According to an embodiment of the present invention, an identity authentication method is provided, comprising: in response to a playback command associated with a target action, acquiring K frames of face images to be processed, wherein the K frames of face images are face images acquired by an image acquisition device from a face to be identified, the K frames of face images are used to authenticate the face to be identified, the playback command is used to prompt the face to be identified to perform the target action, and K is an integer greater than 1; inputting the K frames of face images into a pre-trained recognition model to obtain K recognition results, wherein each of the K recognition results includes the probability that an input face image frame is a real face and the probability that a target part of the face image frame is in a target state, the target part including different face parts performing different target actions; authenticating the face to be identified according to the K recognition results to obtain a target authentication result, wherein the target authentication result includes whether the face to be identified is a real face and whether the face to be identified has performed the target action.
[0006] According to another embodiment of the present invention, an identity authentication device is provided, comprising:
[0007] The acquisition module is used to acquire K-frame face images to be processed in response to a playback command associated with a target action. The K-frame face images are face images acquired by an image acquisition device from the face to be identified. The K-frame face images are used to authenticate the face to be identified. The playback command is used to prompt the face to be identified to perform the target action. K is an integer greater than 1.
[0008] The first processing module is used to input the K frames of face images into a pre-trained recognition model to obtain K recognition results. Each of the K recognition results includes the probability that the input face image is a real face and the probability that the target part of the face image is in a target state. The target part includes different face parts that perform different target actions.
[0009] The second processing module is used to authenticate the face to be identified based on the K recognition results to obtain a target authentication result, wherein the target authentication result includes whether the face to be identified is a real face and whether the face to be identified has performed the target action.
[0010] Optionally, the apparatus is further configured to: perform frame-by-frame detection on the K frames of face images to obtain K groups of local face images, wherein one group of local face images in the K groups represents a set of local face images detected from a frame of face images in the K frames of face images, and each local face image corresponds to a target region; input the K groups of local face images into a pre-trained recognition model to obtain the K recognition results, wherein each of the K recognition results includes whether the input group of local face images is a real face and whether the target region in the group of local face images is in the target state.
[0011] Optionally, the apparatus is further configured to: input the K groups of local face images into a pre-trained recognition model, and obtain the K recognition results in the following manner, wherein each input local face image is regarded as a target local face image, and the obtained recognition result is regarded as a target recognition result: perform a target feature extraction operation on the target local face image to obtain target feature information; input the target feature information into an activity judgment network to obtain an activity judgment result, wherein the activity judgment result indicates whether the target local face image is a real face; input the target feature information into a target part processing network to obtain a target state result, wherein the target state result is used to indicate whether the target part is in the target state in the target local face image; and generate the target recognition result based on the activity judgment result and the target state result.
[0012] Optionally, the device is further configured to: when the target region includes a first region and a second region, perform a first extraction operation and a second extraction operation on the target feature information respectively to obtain first region feature information and second region feature information, wherein the first extraction operation is used to extract the first region feature information associated with the first region, and the second extraction operation is used to extract the second region feature information associated with the second region, and the first region and the second region perform different target actions; determine a first probability that the first region is in a first state in the target local face image based on the first region feature information, and generate a first state result; determine a second probability that the second region is in a second state in the target local face image based on the second region feature information, and generate a second state result; generate the target state result based on the first state result and the second state result, wherein the target state result includes: when the first probability satisfies a first preset condition and the second probability does not satisfy a second preset condition, the target state result indicates that the first region is in the first state and the second region is not in the second state; or
[0013] When the first probability satisfies the first preset condition and the second probability satisfies the second preset condition, the target state result indicates that the first part is in the first state and the second part is in the second state; or
[0014] If the first probability does not meet the first preset condition and the second probability meets the second preset condition, the target state result indicates that the first part is not in the first state and the second part is in the second state; or
[0015] If the first probability does not meet the first preset condition and the second probability does not meet the second preset condition, the target state result indicates that the first part is not in the first state and the second part is not in the second state.
[0016] Optionally, the device is further configured to: determine the target authentication result as successful when the Nth identification result indicates that the target part is not in the target state and the Mth identification result indicates that the target part is in the target state, wherein N is less than M, M is less than or equal to K, and N and M are both integers greater than 0; determine the target authentication result as unsuccessful when the Nth identification result indicates that the target part is not in the target state and the (N+1)th to (K)th identification results all indicate that the target part is not in the target state; determine the target authentication result as successful when the Nth identification result indicates that the target part is in the target state and the Mth identification result indicates that the target part is not in the target state; and determine the target authentication result as unsuccessful when the Nth identification result indicates that the target part is in the target state and the (N+1)th to (K)th identification results all indicate that the target part is in the target state.
[0017] Optionally, the device is further configured to: determine that the target authentication result is successful when the number of recognition results indicating that the input one-frame face image is a real face among the K recognition results meets a preset number threshold, and the Nth recognition result indicates that the target part is not in the target state, and the Mth recognition result indicates that the target part is in the target state; determine that the target authentication result is unsuccessful when the number of recognition results indicating that the input one-frame face image is a real face among the K recognition results meets the preset number threshold, and both the Nth recognition result and the Mth recognition result indicate that the target part is either not in the target state or in the target state; and determine that the target authentication result is unsuccessful when the number of recognition results indicating that the input one-frame face image is a real face among the K recognition results does not meet the preset number threshold.
[0018] Optionally, the device is further configured to: determine the target authentication result as successful when the Nth identification result indicates that the target part is not in the target state, the Mth identification result indicates that the target part is in the target state, and the Pth identification result indicates that the target part is not in the target state, wherein P is greater than M and less than or equal to K, and P is an integer greater than 0; and determine the target authentication result as successful when the Nth identification result indicates that the target part is in the target state, the Mth identification result indicates that the target part is not in the target state, and the Pth identification result indicates that the target part is in the target state.
[0019] According to yet another embodiment of the present invention, a computer-readable storage medium is also provided, wherein a computer program is stored therein, wherein the computer program is configured to perform the steps in any of the above method embodiments when executed.
[0020] According to yet another embodiment of the present invention, an electronic device is also provided, including a memory and a processor, wherein the memory stores a computer program and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.
[0021] This invention addresses the problem in related technologies where identity authentication relies too heavily on the accuracy of key point localization, resulting in low accuracy and efficiency. It achieves the technical effect of improving both the efficiency and accuracy of identity authentication. The invention employs a method that responds to a playback command associated with a target action, acquires K frames of face images to be processed, inputs each K frame into a pre-trained recognition model to obtain K recognition results, and then authenticates the face based on these K results to obtain the target authentication result. Attached Figure Description
[0022] Figure 1 This is a hardware structure block diagram of a mobile terminal for an identity authentication method according to an embodiment of the present invention.
[0023] Figure 2 This is a flowchart of an identity authentication method according to an embodiment of the present invention;
[0024] Figure 3 This is a schematic diagram illustrating a specific example of an identity authentication method according to an embodiment of the present invention;
[0025] Figure 4 This is a flowchart of a model generation method for an identity authentication method according to an embodiment of the present invention;
[0026] Figure 5 This is a schematic diagram illustrating a specific example of an identity authentication method according to an embodiment of the present invention;
[0027] Figure 6 This is a schematic diagram illustrating a specific example of an identity authentication method according to an embodiment of the present invention;
[0028] Figure 7 This is a structural block diagram of an identity authentication device according to an embodiment of the present invention. Detailed Implementation
[0029] The embodiments of the present invention will be described in detail below with reference to the accompanying drawings and examples.
[0030] It should be noted that the terms "first," "second," etc., in the specification, claims, and drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.
[0031] The methods and embodiments provided in this application can be executed on a mobile terminal, computer terminal, or similar computing device. Taking running on a mobile terminal as an example, Figure 1 This is a hardware structure block diagram of a mobile terminal for an identity authentication method according to an embodiment of the present invention. Figure 1 As shown, a mobile terminal may include one or more ( Figure 1 Only one is shown in the diagram. A processor 102 (which may include, but is not limited to, a microprocessor MCU or a programmable logic device FPGA, etc.) and a memory 104 for storing data are also shown. The mobile terminal may further include a transmission device 106 for communication functions and an input / output device 108. Those skilled in the art will understand that... Figure 1 The structure shown is for illustrative purposes only and does not limit the structure of the mobile terminal described above. For example, the mobile terminal may also include components that are more... Figure 1 The more or fewer components shown, or having the same Figure 1 The different configurations shown.
[0032] The memory 104 can be used to store computer programs, such as application software programs and modules, like the computer program corresponding to the authentication method in this embodiment of the invention. The processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, thereby implementing the above-described method. The memory 104 may include high-speed random access memory and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 104 may further include memory remotely located relative to the processor 102, and these remote memories can be connected to the mobile terminal via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
[0033] The transmission device 106 is used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by the mobile terminal's communication provider. In one example, the transmission device 106 includes a Network Interface Controller (NIC), which can connect to other network devices via a base station to communicate with the Internet. In another example, the transmission device 106 may be a Radio Frequency (RF) module, used for wireless communication with the Internet.
[0034] This embodiment provides an identity authentication method. Figure 2 This is a flowchart of an identity authentication method according to an embodiment of the present invention, such as... Figure 2 As shown, the process includes the following steps:
[0035] S202, in response to a playback command associated with the target action, acquire K frames of face images to be processed, wherein the K frames of face images are face images acquired by the image acquisition device from the face to be identified, the K frames of face images are used to authenticate the face to be identified, the playback command is used to prompt the face to be identified to perform the target action, and K is an integer greater than 1;
[0036] Optionally, in this embodiment, the target action may include, but is not limited to, blinking, closing the mouth, turning the head to the right, or lip reading, or any combination of the above actions. It should be noted that the premise of any combination of the above actions is not limited to the user being able to complete the target action, such as the user blinking and opening the mouth at the same time.
[0037] It should be noted that the above K-frame face images can be acquired continuously in chronological order, or they can be acquired at certain time intervals.
[0038] Optionally, in this embodiment, the playback instructions may include, but are not limited to, prompts in the form of text, animation, voice, video, etc., to instruct the user to make facial expressions or body movements corresponding to the prompts.
[0039] For example, Figure 3 This is a diagram illustrating identity authentication methods, such as... Figure 3 As shown, on the client interface 301, the user receives a playback instruction 302: Start identity authentication, please open your mouth. At this time, "open your mouth" is the target action that the acquisition device wants to acquire.
[0040] Optionally, in this embodiment, the K-frame face images to be processed may include, but are not limited to, K face images captured during the process of the user performing the target action after receiving the playback command. For example, the playback command instructs the user to open their mouth and blink at the same time. During the process of the user opening their mouth and blinking at the same time after receiving the playback command, K consecutive face images are acquired, wherein each frame contains the user's facial features and the current facial action state.
[0041] S204, input K frames of face images into the pre-trained recognition model to obtain K recognition results. Each of the K recognition results includes the probability of whether the input face image is a real face and the probability of the target part of the face image being in the target state. The target part includes different face parts performing different target actions.
[0042] Optionally, in this embodiment, the pre-trained recognition model may include, but is not limited to, a pre-trained model that can be used to recognize facial features or ongoing action states.
[0043] Optionally, in this embodiment, the target part may include, but is not limited to, parts of the human face such as the eyes and mouth. The target part may include only the eyes, or it may include multiple parts such as the eyes and mouth. The target state may include, but is not limited to, state changes made when performing a target action according to the playback command, such as opening the mouth, closing the mouth, closing the eyes, and opening the eyes.
[0044] It should be noted that the aforementioned target body parts, including different facial parts performing different target actions, can be understood as follows: when there are multiple target actions, for example, when the target action is to open the eyes and open the mouth, the target body parts include the eyes performing the eye-opening action and the mouth performing the mouth-opening action. In other words, the target action can be an action that can be completed by one facial part, in which case the target body part is the facial part performing the action. For example, the target body parts for the actions of opening the mouth and closing the eyes are the mouth and the target body parts for the action of closing the eyes, respectively. Alternatively, it can be an action that is completed by multiple facial parts, in which case the target body parts are the multiple facial parts performing the action. For example, blinking with the mouth open and closing the eyes and mouth are both target body parts, in which case the target body parts are the eyes and the mouth.
[0045] Optionally, in this embodiment, the above K recognition results may include, but are not limited to, determining whether it is a real face and whether the action state has changed accordingly according to the playback command. The K recognition results are the results obtained after inputting the above K frame face images into the pre-trained recognition model. This result can be represented by probability. For example, this result may be: the probability that the above K frame face images are real faces is 70%, and the probability that the target part is in the target state is 80%.
[0046] The above is merely an example, and this application does not impose any specific limitations.
[0047] S206. Authentication of the face to be identified is performed based on K recognition results to obtain the target authentication result. The target authentication result includes whether the face to be identified is a real face and whether the face to be identified has performed the target action.
[0048] Optionally, in this embodiment, the above-mentioned target authentication results may include, but are not limited to: the currently identified face is a real face, the currently identified face is not a real face, the currently identified face has performed the target action, the currently identified face has not performed the target action, the currently identified face is a real face and has performed the target action, the currently identified face is a real face and has not performed the target action, the currently identified face is not a real face and has performed the target action, and the currently identified face is not a real face and has not performed the target action.
[0049] This application's embodiments employ a method that responds to a playback command associated with a target action, acquires K frames of face images to be processed, then inputs each of the K frames into a pre-trained recognition model to obtain K recognition results, and finally authenticates the face to be identified based on these K results to obtain the target authentication result. This method can determine whether a face is real and whether the target action has been completed by recognizing K frames of face images. It solves the problem in related technologies where identity authentication relies too heavily on the accuracy of key point localization, leading to low accuracy and efficiency. This achieves the technical effect of improving identity authentication efficiency and accuracy.
[0050] In an exemplary embodiment, K frames of face images are input into a pre-trained recognition model to obtain K recognition results, including: performing frame-by-frame detection on the K frames of face images to obtain K sets of local face images, wherein one set of local face images in the K sets of local face images represents a set of local face images detected from one frame of face images in the K frames of face images, and each local face image corresponds to a target part; inputting the K sets of local face images into the pre-trained recognition model to obtain K recognition results, wherein each of the K recognition results includes whether the input set of local face images is a real face and whether the target part in the set of local face images is in the target state.
[0051] Optionally, in this embodiment, the aforementioned K sets of partial face images may include, but are not limited to, K sets of images of the eyes, K sets of images of the mouth, K sets of images of the nose, and other facial features, or any combination of multiple parts of the facial features, such as K sets of images of the mouth and nose, K sets of images of the eyes, nose, and mouth, etc.
[0052] It should be noted that the above K sets of local face images can be K sets of face images of the same part, or they can be a collection of K sets of face images of different parts.
[0053] It should be noted that there are many methods to obtain K sets of local face images by performing frame-by-frame detection on K frames of face images. For example, a corresponding face detector can be used to locate, cut out, and combine the face region and local feature images frame by frame into K sets of local face images; another method is to locate the approximate position of the local region by key points and then perform rectangular cropping with the maximum range to obtain local face images; yet another method is to utilize significant gradient features and use template matching to locate local regions and obtain local face images.
[0054] Optionally, in this embodiment, the above K recognition results may include, but are not limited to, the currently recognized set of partial face images being real faces, the currently recognized set of partial face images not being real faces, the currently recognized set of partial face images being in the target state, the currently recognized set of partial face images being real faces and in the target state, the currently recognized set of partial face images being real faces and not in the target state, the currently recognized set of partial face images not being real faces and in the target state, and the currently recognized set of partial face images not being real faces and not in the target state, etc.
[0055] It should be noted that the basis for determining whether the above-mentioned target part is in the target state can be, for example, when the user receives the playback instruction: "Please open your mouth", the target part is in the target state when the detector detects that the state of the K groups of mouth images changes from closed to open.
[0056] In an exemplary embodiment, K sets of local face images are input into a pre-trained recognition model to obtain K recognition results. This includes: inputting the K sets of local face images into the pre-trained recognition model to obtain K recognition results in the following manner, wherein each input local face image is considered a target local face image, and the obtained recognition result is considered a target recognition result: performing target feature extraction on the target local face image to obtain target feature information; inputting the target feature information into a liveness detection network to obtain a liveness detection result, wherein the liveness detection result indicates whether the target local face image is a real face; inputting the target feature information into a target region processing network to obtain a target state result, wherein the target state result indicates whether the target region is in a target state in the target local face image; and generating a target recognition result based on the liveness detection result and the target state result.
[0057] Optionally, in this embodiment, the aforementioned target local face image may include, but is not limited to, obtaining K sets of local face images by detecting K frames of face images, and then inputting the local face images into a trained recognition model. The input local face images are the target local face images. For example, inputting a set of eye images into a pre-trained recognition model, at this time, the set of eye images are the target local face images. When a set of images containing eyes, nose and mouth is input into a pre-trained recognition model, this set of images is also the target local face image.
[0058] Optionally, in this embodiment, the target feature information may include, but is not limited to, information such as the category, state, and texture of the target local face image. For example, when the input target local face image is a set of eye images, feature extraction is performed on the set of eye images to obtain the feature information of the set of eye images.
[0059] Optionally, in this embodiment, the above-mentioned activity determination network may include, but is not limited to, a network that can be used to determine whether the target local face image corresponding to the input feature information is a real face.
[0060] Optionally, in this embodiment, the above-mentioned activity judgment result may include, but is not limited to, results in the form of yes, no, 30%, 0.3, etc., which can be used to indicate whether the target local face image is a real face. No specific limitation is made on the expression form of the activity judgment result here.
[0061] Optionally, in this embodiment, the target part processing network may include, but is not limited to, a network that can be used to determine whether the target part corresponding to the input feature information is in the target state in the target local face image. For example, the mouth in the first frame is in a closed state, the mouth in the second frame is in an open state, and the mouth in the third frame is in a closed state.
[0062] Optionally, in this embodiment, the target recognition result may include, but is not limited to, the activity judgment result and the target state judgment result. For example, it may be expressed as: the activity judgment result is yes, and the target state judgment result is yes; or it may be expressed as: the probability of it being a real face is 80%, and the probability of it being in the target state is 90%. No specific restrictions are placed on the form of the target recognition result here.
[0063] In this embodiment, K sets of local face images are input into a pre-trained recognition model to obtain K recognition results. Then, target feature extraction is performed on the target local face image to obtain target feature information. This target feature information is then input into a liveness detection network to obtain a liveness detection result. Finally, the target feature information is input into a target part processing network to obtain a target state result. Based on the liveness detection result and the target state result, K recognition results are generated. Integrating liveness detection into the identity authentication process can capture attack clues during verification, further ensuring the accuracy of identity authentication. It also supports judging the occurrence of multiple target actions, achieving the technical effect of improving the accuracy of identity authentication.
[0064] In an exemplary embodiment, target feature information is input into a target region processing network to obtain a target state result, including:
[0065] When the target part includes a first part and a second part, the target feature information is subjected to a first extraction operation and a second extraction operation respectively to obtain the first part feature information and the second part feature information. The first extraction operation is used to extract the first part feature information associated with the first part, and the second extraction operation is used to extract the second part feature information associated with the second part. The first part and the second part perform different target actions.
[0066] Based on the feature information of the first part, determine the first probability that the first part is in the first state in the target local face image, and generate the first state result;
[0067] Based on the feature information of the second part, determine the second probability that the second part is in the second state in the target local face image, and generate the second state result;
[0068] A target state result is generated based on the first state result and the second state result, wherein the target state result includes:
[0069] If the first probability satisfies the first preset condition and the second probability does not satisfy the second preset condition, the target state result indicates that the first part is in the first state and the second part is not in the second state; or
[0070] If the first probability satisfies the first preset condition and the second probability satisfies the second preset condition, the target state result indicates that the first part is in the first state and the second part is in the second state; or
[0071] If the first probability does not meet the first preset condition and the second probability meets the second preset condition, the target state result indicates that the first part is not in the first state and the second part is in the second state; or
[0072] If the first probability does not meet the first preset condition and the second probability does not meet the second preset condition, the target state result indicates that the first part is not in the first state and the second part is not in the second state.
[0073] Optionally, in this embodiment, the first part may include, but is not limited to, one of the mouth, eyes, nose, etc., and the second part may include, but is not limited to, one of the mouth, eyes, nose, etc. It should be noted that the first part and the second part are different parts. For example, when the first part is the eyes, the second part is the mouth.
[0074] Optionally, in this embodiment, the first extraction operation may include, but is not limited to, the extraction of features such as shape, color, texture, and motion state of the first part, and the second extraction operation may include, but is not limited to, the extraction of features such as shape, color, texture, and motion state of the second part.
[0075] Optionally, in this embodiment, the first part feature information may include, but is not limited to, the shape, color, texture, current state, and other feature information of the first part, and the second part feature information may include, but is not limited to, the shape, color, texture, current state, and other feature information of the second part.
[0076] For example, when the target part includes the first part being the eyes and the second part being the mouth, the first extraction operation and the second extraction operation are performed on the eyes and the mouth in the target feature information respectively to obtain the feature information of the eyes (corresponding to the feature information of the first part mentioned above) and the feature information of the mouth (corresponding to the feature information of the second part mentioned above).
[0077] It should be noted that the first part and the second part perform different target actions. (When the first part is the eyes and the second part is the mouth) It can be expressed as the target action performed by the first part is to close the eyes and the target action performed by the second part is to open the mouth. It can also be expressed as the target action performed by the first part is to close the eyes and the target action performed by the second part is to close the mouth. It can also be expressed as the target action performed by the first part is to close the eyes and the target action performed by the second part is to close the mouth at the same time, or the target action performed by the first part is to close the eyes and the target action performed by the second part is to open the mouth at the same time.
[0078] Optionally, in this embodiment, the first state may include, but is not limited to, the state of the first part in the target local face image, such as open eyes, closed eyes, open mouth, and closed mouth. The first state is the state corresponding to the first part. If the first part is the eyes, the first state may include, but is not limited to, open eyes and closed eyes. If the first part is the mouth, the first state may include, but is not limited to, open mouth and closed mouth.
[0079] Optionally, in this embodiment, the second state may include, but is not limited to, the state of the second part in the target local face image, such as open eyes, closed eyes, open mouth, and closed mouth. The second state is the state corresponding to the second part. If the second part is the eyes, the second state may include, but is not limited to, open eyes and closed eyes. If the second part is the mouth, the second state may include, but is not limited to, open mouth and closed mouth.
[0080] Optionally, in this embodiment, the first probability may include, but is not limited to, a numerical value in the form of 0.9, 90%, etc., representing the likelihood that the first part is in a first state in the target local face image. The result obtained by analyzing the first probability is the first state result. For example, it can be set that when the first probability exceeds 50%, the first state result is "the first state has occurred".
[0081] Optionally, in this embodiment, the aforementioned second probability may include, but is not limited to, the probability that the second part is in the second state in the target local face image, expressed as a numerical value in the form of 0.9, 90%, etc. The result obtained by analyzing the second probability is the second state result. For example, it can be set that when the second probability exceeds 50%, the second state result is "the second state has occurred".
[0082] Optionally, in this embodiment, the first preset condition may include, but is not limited to, a first probability greater than 50% or a first probability greater than or equal to 0.5. The first preset condition may be set in advance by relevant technical personnel based on prior experience, and may be adjusted accordingly during use based on actual conditions.
[0083] Optionally, in this embodiment, the above-mentioned second preset condition may include, but is not limited to, a second probability greater than 50% or a second probability greater than or equal to 0.5. The second preset condition may be set in advance by relevant technical personnel based on prior experience, and may be adjusted accordingly during use based on actual conditions.
[0084] It should be noted that the above-mentioned generation of the target state result based on the first state result and the second state result can be understood as the target state result being affected by the first state result and the second state.
[0085] For example, if the first part is the eyes and the second part is the mouth, and the first preset condition is that the probability of opening the eyes is greater than 50%, and the second preset condition is that the probability of opening the mouth is greater than 50%, then when the probability of opening the eyes is greater than 50% and the probability of opening the mouth is less than or equal to 50%, the target state result indicates that the eyes are open and the mouth is not open; or
[0086] When the probability of opening the eyes is greater than 50% and the probability of opening the mouth is greater than 50%, the target state result indicates that the eyes are open and the mouth is open; or
[0087] When the probability of opening the eyes is less than or equal to 50% and the probability of opening the mouth is greater than 50%, the target state result indicates that the eyes are not open and the mouth is open; or
[0088] When the probability of opening the eyes is less than or equal to 50% and the probability of opening the mouth is less than or equal to 50%, the target state result indicates that the eyes are not open and the mouth is not open.
[0089] The comprehensive judgment method adopted in this application's embodiments, based on local feature information and combining target state and activity judgment results, can better handle deepfake video attacks compared to liveness verification based solely on interactive actions.
[0090] In an exemplary embodiment, the face to be identified is authenticated based on K recognition results to obtain the target authentication result, including:
[0091] If the Nth identification result indicates that the target part is not in the target state, and the Mth identification result indicates that the target part is in the target state, the target authentication result is determined to be successful, where N is less than M, M is less than or equal to K, and N and M are both integers greater than 0.
[0092] If the Nth identification result indicates that the target part is not in the target state, and the N+1 to Kth identification results all indicate that the target part is not in the target state, then the target authentication result is determined to be authentication failure.
[0093] If the Nth identification result indicates that the target part is in the target state, and the Mth identification result indicates that the target part is not in the target state, the target authentication result is determined to be successful.
[0094] If the Nth identification result indicates that the target part is in the target state, and the N+1 to Kth identification results all indicate that the target part is in the target state, the target authentication result is determined to be authentication failure.
[0095] Optionally, in this embodiment, the Nth recognition result may include, but is not limited to, any one of the K recognition results except the last one, and the Mth recognition result may be any one of the K recognition results.
[0096] It should be noted that the above target authentication result of "authentication passed" can be understood as the target part undergoing a corresponding state change in the K recognition results, that is, the state of the target part is not completely the same in the K recognition results.
[0097] For example, when the target part is the eyes, the target state is eyes open, K=3, N=1, M=2, the first recognition result indicates that the eyes are not open, while the second recognition result indicates that the eyes are open. In this case, the authentication pass condition is met, i.e., authentication passes.
[0098] The first recognition result indicates that the eyes are open, while the second recognition result indicates that the eyes are not open. At this point, the authentication condition is also met, and the authentication is successful.
[0099] In other words, if the first recognition result indicates that the eyes are not open, and the second and third recognition results also indicate that the eyes are not open, then the authentication condition is not met, meaning the authentication fails.
[0100] If the first recognition result indicates that the eyes are open, and the second and third recognition results also indicate that the eyes are open, then the authentication condition is not met, and the authentication fails.
[0101] By using the embodiments of this application to determine the real state of faces in different frames and to combine the target state judgment, the accuracy of identity authentication can be effectively improved when faces undergo state changes, including large and small poses, thus achieving the technical effect of improving the accuracy of identity authentication.
[0102] In one exemplary embodiment, the method further includes:
[0103] If the number of recognition results indicating that the input frame of face image is a real face meets the preset number threshold, and the Nth recognition result indicates that the target part is not in the target state, and the Mth recognition result indicates that the target part is in the target state, then the target authentication result is determined to be successful.
[0104] If the number of recognition results in the K recognition results that indicate that the input frame of face image is a real face meets the preset number threshold, and if both the Nth recognition result and the Mth recognition result indicate that the target part is not in the target state or is in the target state, then the target authentication result is determined to be authentication failure.
[0105] If the number of recognition results indicating that the input frame of face image is a real face does not meet the preset threshold among the K recognition results, the target authentication result is determined to be authentication failure.
[0106] Optionally, in this embodiment, the preset quantity threshold may include, but is not limited to, greater than 3, greater than or equal to 3, etc. The preset quantity threshold may include, but is not limited to, being set in advance by relevant technical personnel based on actual application needs and historical experience, and can be used as a condition for judging whether a K-frame face image is a real face.
[0107] For example, when K=5, the target state is open eyes, and the preset number threshold is greater than 3, if 3 out of 5 frames are real faces, the preset number threshold is not met, and the authentication result is failed. If 4 out of 5 frames are real faces, and the second recognition result indicates that the eyes are not open, and the fourth recognition result indicates that the eyes are open, the target authentication result is determined to be successful. If 4 out of 5 frames are real faces, and all 5 recognition results indicate that the eyes are either open or closed, the target authentication result is determined to be unsuccessful.
[0108] It should be noted that when the number of recognition results indicating that the input face image frame is a real face in the K recognition results meets the preset threshold, the operation can be performed only on images belonging to real faces to determine whether the target part in the real face image is in the target state. Alternatively, the operation can be performed on all face images to determine whether the target part in both real and non-real face images is in the target state. When the number of recognition results indicating that the input face image frame is a real face does not meet the preset threshold, the determination of whether the target part is in the target state is not performed, and the result is directly output as authentication failed.
[0109] This application's embodiments introduce a preset threshold to determine the number of recognition results for real faces in a face image. Only when the number of recognition results indicating that the input face image frame is a real face meets the preset threshold is the determination made regarding whether the target part in the face image is in the target state. This further simplifies the identity authentication scheme; cases where the preset threshold is not met will be directly determined as authentication failure, achieving a higher level of security and improved identity authentication efficiency.
[0110] In one exemplary embodiment, the method further includes:
[0111] If the Nth identification result indicates that the target part is not in the target state, the Mth identification result indicates that the target part is in the target state, and the Pth identification result indicates that the target part is not in the target state, the target authentication result is determined to be successful, where P is greater than M and less than or equal to K, and P is an integer greater than 0.
[0112] If the Nth identification result indicates that the target part is in the target state, the Mth identification result indicates that the target part is not in the target state, and the Pth identification result indicates that the target part is in the target state, then the target authentication result is determined to be successful.
[0113] It should be noted that determining the target authentication result as successful can be understood as:
[0114] For example, when K=3, N=1, M=2, P=3, the target part is the eyes, and the target state is open eyes, if the first recognition result indicates that the eyes are not open, the second recognition result indicates that the eyes are open, and the third recognition result indicates that the eyes are not open, then it is determined that the eyes have completed the blinking action, and the target authentication result is determined to be successful. Alternatively, if the first recognition result indicates that the eyes are open, the second recognition result indicates that the eyes are not open, and the third recognition result indicates that the eyes are open, the target authentication result is determined to be successful.
[0115] Optionally, in this embodiment, the method further includes determining that the target authentication result is successful when the Nth identification result indicates that the target part is in a first sub-state, the Mth identification result indicates that the target part is in a second sub-state, and the Pth identification result indicates that the target part is in a first sub-state.
[0116] It should be noted that the first, second, and third sub-states mentioned above may include, but are not limited to, the lip state when reading lip movements. The mouth may be closed or open, but the shape of the lips and the corresponding lip movements may be different or the same.
[0117] Through the embodiments of this application, the occurrence of multiple actions such as blinking and lip reading can be determined simultaneously. Combined with the activity determination process, malicious attacks can be better prevented.
[0118] The present invention will be described in detail below with reference to specific embodiments:
[0119] Figure 4 This is a flowchart of an identity authentication method. The identity authentication scheme of this application includes, but is not limited to, the following steps:
[0120] 1. For a randomly given action instruction (corresponding to the playback instruction above), such as opening the mouth, blinking, opening the mouth and blinking at the same time, etc., obtain the video frame sequence V of the action (V = (F0, F1, ..., Fk), k is the number of video frames) (corresponding to the K frames of face images above), each frame contains the face target and the current action state;
[0121] 2. Using a detector, the face region and local feature images are located frame by frame, cut out, and combined into an event sequence set S (S=(S0,S1,Si,…,Sk), where k is the number of video frames, and Si is the set of local images in the i-th frame, including the eyes, mouth, etc.);
[0122] 3. Input the event sequence set S into the shared feature network and extract the shared local features Feat (Feat = (Feat0, Feat1, ..., Featk), where k is the number of video frames);
[0123] 4. The shared local features (Feat) are fed into the activity judgment, mouth state feature layer, and eye state feature layer, respectively. The activity judgment integrates the shared local features and determines whether the verifier is a real face. The mouth (eye) state feature layer extracts action state features Am(Ae) based on the shared local features of the mouth (eyes) to assist in mouth (eye) action judgment, such as... Figure 5 As shown;
[0124] 5. Utilize the temporal characteristics reflected by the action state features Am(Ae) of continuous event sequences in the perceptual layer (such as deep neural networks like Long Short-Term Memory Network LSTM) to determine the occurrence state and confidence level of the action;
[0125] 6. Finally, based on the combined results of activity assessment, action occurrence status, and confidence level, it is determined whether the specified action was performed by a real human face. That is, if the activity assessment indicates that it is a real human face, and the mouth (eye) perception layer outputs that the action has occurred or the confidence level has reached the preset threshold, then the interactive liveness authentication is considered successful; otherwise, the action verification is considered to have failed.
[0126] Furthermore, the following explanations are provided regarding video frames, local feature images, event sequence sets, shared feature networks, activity determination, mouth (eye) feature layers, mouth (eye) perception layers, and comprehensive judgment:
[0127] 1. Explanation of video frames
[0128] The video frame data collected here must be continuous, and the video frame data must be at least two frames; these two frames must contain a human face.
[0129] 2. Explanation of local image acquisition
[0130] Local image acquisition refers to the process of obtaining local facial features, such as the mouth (eye) region, after a face detection algorithm has captured a face. There are many methods available, but the current mainstream approach is to locate the approximate position of the local region using key points and then crop a rectangle to the maximum extent possible to obtain the target region. Other methods exist, such as training a detector to directly detect the target region's location in the image and obtaining the target image based on the detection results; or utilizing significant gradient features and template matching to locate the target region. This invention will not describe these methods in detail, but all of the above approaches can assist in obtaining the research object.
[0131] 3. Explanation of event sequence sets
[0132] In this invention, the event sequence set S (S=(S0,S1,Si,…,Sk), k is the video frame number, and Si is the set of local images of the i-th frame, including eyes, mouth, etc.) refers to a set of continuous frame images of a specific action event (such as opening the mouth, blinking, opening the mouth and blinking, etc.). Each frame image Si in the set is composed of detector products, including eye Si0, mouth image Si1, etc., and the face pose conforms to the angle requirements of the continuous video frame description.
[0133] 4. Explanation of shared feature networks, activity determination, and mouth (eye) feature layers
[0134] The shared feature network, the liveness detection network, and the mouth (eye) feature layer network are all part of the same multi-task network. The shared feature network consists of convolutional, pooling, and normalization layers, used to extract shallow features. The liveness detection network extracts fraud clue features (i.e., non-liveness clues) based on shallow features. These clues are usually generated by the differences between real faces and attack methods in imaging, and are usually more obvious at edge features such as eyes and mouth. Based on this, the liveness detection branch network can effectively determine the authenticity of the verifier's identity in consecutive video frames. The mouth (eye) feature layer extracts local texture semantic features based on shallow features. These features are used to express the motion state of the mouth (eye) at the current frame. Such features in a single frame cannot describe the event state, but as the local motion state changes in each frame of the event sequence set, the occurrence and confidence of the action event can be determined. The capture of motion state changes is the responsibility of the perception layer.
[0135] 5. Explanation of the sensory layer around the mouth (eyes)
[0136] 6. The mouth (eye) perception layer is the second-stage network of the feature layer. It primarily analyzes the continuous motion state of a specific region within a set of event sequences based on the features extracted by the feature layer, used to express whether an action event has occurred. Taking the eye as an example, for instance... Figure 6As shown, t0 represents the eye image in frame 0. After passing through the feature layer, the semantic features of the eye texture are sent to the perception layer. At this time, the perception layer outputs an action state of 0 and a confidence score of 0.9, indicating that the blinking event did not occur and the confidence level of not occurring is 0.9. As the eye movement completes the complete blinking cycle from opening the eyes (t0) to closing the eyes (ti) and then opening the eyes (tk), the output of the perception layer completes the change from action state 0 to 1, and the confidence score changes from 0.9 for blinking action not occurring to 0.9 for blinking action occurring.
[0137] Currently, the perceptual layer network can use a Long Short-Term Memory (LSTM) network or other deep convolutional neural networks to determine the occurrence of action events.
[0138] 6. Explanation regarding comprehensive judgment
[0139] The target's activity and action completion status can be obtained frame by frame through the feature layer, perception layer, and activity judgment network. The present invention can simultaneously judge the occurrence of multiple actions such as blinking, mouth opening, and lip reading numbers. Therefore, they can be used in combination, such as "blinking accompanied by mouth opening", "blinking only without mouth opening", "blinking accompanied by lip reading numbers", etc. Combined with the activity judgment process, it can better prevent malicious attacks. In actual verification, the above actions are randomly prompted, which can properly deal with the behavior of attackers who prepare corresponding attack materials in advance.
[0140] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods according to the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods of the various embodiments of the present invention.
[0141] This embodiment also provides an identity authentication device for implementing the above embodiments and preferred embodiments; details already described will not be repeated. As used below, the term "module" can refer to a combination of software and / or hardware that performs a predetermined function. Although the device described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.
[0142] Figure 6 This is a structural block diagram of an identity authentication device according to an embodiment of the present invention, such as... Figure 6 As shown, the device includes:
[0143] The acquisition module is used to acquire K-frame face images to be processed in response to a playback command associated with the target action. The K-frame face images are face images acquired by the image acquisition device from the face to be identified. The K-frame face images are used to authenticate the face to be identified. The playback command is used to prompt the face to be identified to perform the target action. K is an integer greater than 1.
[0144] The first processing module is used to input K frames of face images into a pre-trained recognition model to obtain K recognition results. Each of the K recognition results includes the probability of whether the input face image is a real face and the probability that the target part of the face image is in the target state. The target part includes different face parts that perform different target actions.
[0145] The second processing module is used to authenticate the face to be identified based on K recognition results to obtain the target authentication result. The target authentication result includes whether the face to be identified is a real face and whether the face to be identified has performed the target action.
[0146] In an exemplary embodiment, the above-described apparatus further includes: performing frame-by-frame detection on K frames of face images to obtain K sets of local face images, wherein one set of local face images in the K sets of local face images represents a set of local face images detected from one frame of face images in the K sets of face images, and each local face image corresponds to a target part; inputting the K sets of local face images into a pre-trained recognition model to obtain K recognition results, wherein each of the K recognition results includes whether the input set of local face images is a real face and whether the target part in the set of local face images is in a target state.
[0147] In an exemplary embodiment, the apparatus further includes: inputting K groups of local face images into a pre-trained recognition model to obtain K recognition results in the following manner, wherein each input local face image is considered a target local face image, and the obtained recognition result is considered a target recognition result: performing a target feature extraction operation on the target local face image to obtain target feature information; inputting the target feature information into an activity judgment network to obtain an activity judgment result, wherein the activity judgment result indicates whether the target local face image is a real face; inputting the target feature information into a target part processing network to obtain a target state result, wherein the target state result is used to indicate whether the target part is in a target state in the target local face image; and generating a target recognition result based on the activity judgment result and the target state result.
[0148] In an exemplary embodiment, the apparatus further includes: when the target region includes a first region and a second region, performing a first extraction operation and a second extraction operation on the target feature information respectively to obtain first region feature information and second region feature information, wherein the first extraction operation is used to extract first region feature information associated with the first region, the second extraction operation is used to extract second region feature information associated with the second region, and the first region and the second region perform different target actions; determining a first probability that the first region is in a first state in the target local face image based on the first region feature information, and generating a first state result; determining a second probability that the second region is in a second state in the target local face image based on the second region feature information, and generating a second state result; generating a target state result based on the first state result and the second state result, wherein the target state result includes: when the first probability satisfies a first preset condition and the second probability does not satisfy a second preset condition, the target state result indicates that the first region is in the first state and the second region is not in the second state; or
[0149] If the first probability satisfies the first preset condition and the second probability satisfies the second preset condition, the target state result indicates that the first part is in the first state and the second part is in the second state; or
[0150] If the first probability does not meet the first preset condition and the second probability meets the second preset condition, the target state result indicates that the first part is not in the first state and the second part is in the second state; or
[0151] If the first probability does not meet the first preset condition and the second probability does not meet the second preset condition, the target state result indicates that the first part is not in the first state and the second part is not in the second state.
[0152] In an exemplary embodiment, the apparatus further includes: determining the target authentication result as successful when the Nth identification result indicates that the target part is not in the target state and the Mth identification result indicates that the target part is in the target state, wherein N is less than M, M is less than or equal to K, and N and M are both integers greater than 0; determining the target authentication result as unsuccessful when the Nth identification result indicates that the target part is not in the target state and the N+1 to Kth identification results all indicate that the target part is not in the target state; determining the target authentication result as successful when the Nth identification result indicates that the target part is in the target state and the Mth identification result indicates that the target part is not in the target state; and determining the target authentication result as unsuccessful when the Nth identification result indicates that the target part is in the target state and the N+1 to Kth identification results all indicate that the target part is in the target state.
[0153] In an exemplary embodiment, the apparatus further includes: determining that the target authentication result is successful when the number of recognition results indicating that the input face image frame is a real face among the K recognition results meets a preset number threshold, and the Nth recognition result indicates that the target part is not in the target state, and the Mth recognition result indicates that the target part is in the target state; determining that the target authentication result is unsuccessful when the number of recognition results indicating that the input face image frame is a real face among the K recognition results meets the preset number threshold, and both the Nth and Mth recognition results indicate that the target part is not in the target state or is in the target state; and determining that the target authentication result is unsuccessful when the number of recognition results indicating that the input face image frame is a real face among the K recognition results does not meet the preset number threshold.
[0154] In an exemplary embodiment, the apparatus further includes: determining the target authentication result as successful when the Nth identification result indicates that the target part is not in the target state, the Mth identification result indicates that the target part is in the target state, and the Pth identification result indicates that the target part is not in the target state, wherein P is greater than M and less than or equal to K, and P is an integer greater than 0; and determining the target authentication result as successful when the Nth identification result indicates that the target part is in the target state, the Mth identification result indicates that the target part is not in the target state, and the Pth identification result indicates that the target part is in the target state.
[0155] It should be noted that the above modules can be implemented by software or hardware. For the latter, they can be implemented in the following ways, but are not limited to: all the above modules are located in the same processor; or, the above modules are located in different processors in any combination.
[0156] Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, wherein the computer program is configured to perform the steps in any of the above method embodiments when executed.
[0157] In one exemplary embodiment, the aforementioned computer-readable storage medium may include, but is not limited to, various media capable of storing computer programs, such as a USB flash drive, read-only memory (ROM), random access memory (RAM), portable hard disk, magnetic disk, or optical disk.
[0158] Embodiments of the present invention also provide an electronic device including a memory and a processor, the memory storing a computer program and the processor being configured to run the computer program to perform the steps in any of the above method embodiments.
[0159] In one exemplary embodiment, the electronic device may further include a transmission device and an input / output device, wherein the transmission device is connected to the processor and the input / output device is connected to the processor.
[0160] Specific examples in this embodiment can be found in the examples described in the above embodiments and exemplary implementations, and will not be repeated here.
[0161] It is obvious to those skilled in the art that the modules or steps of the present invention described above can be implemented using general-purpose computing devices. They can be centralized on a single computing device or distributed across a network of multiple computing devices. They can be implemented using computer-executable program code, and thus can be stored in a storage device for execution by a computing device. In some cases, the steps shown or described can be performed in a different order than those described herein, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, the present invention is not limited to any particular combination of hardware and software.
[0162] The above are merely preferred embodiments of the present invention and are not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, or improvements made within the principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. An identity authentication method, characterized in that, include: In response to a playback command associated with a target action, K frames of face images to be processed are acquired by an image acquisition device to capture the face to be identified. The K frames of face images are then detected frame by frame to obtain K groups of local face images. The K frames of face images are used to authenticate the face to be identified. The playback command is used to prompt the face to be identified to perform the target action. K is an integer greater than 1. A group of local face images represents a set of local face images detected from a frame of face images. Each local face image corresponds to a target part. The target part includes different face parts that perform different target actions. Target feature extraction is performed sequentially on the K groups of local face images to obtain target feature information. The target feature information is then input into an activity judgment network and a target part processing network to obtain an activity judgment result and a target state result. The target feature information indicates whether the target local face image is a real face. The activity judgment result indicates whether the target local face image is a real face. The target state result indicates whether the target part is in the target state in the target local face image. K recognition results are generated based on the activity judgment result and the target state result. The face to be identified is authenticated based on the K recognition results to obtain the target authentication result. Each recognition result includes whether the corresponding local face image is a real face and whether the target part is in the target state. The target authentication result includes whether the face to be identified is a real face and whether the face to be identified has performed the target action.
2. The method according to claim 1, characterized in that, The step of inputting the target feature information into the activity determination network and the target location processing network respectively to obtain the activity determination result and the target status result includes: When the target part includes a first part and a second part, a first extraction operation and a second extraction operation are performed on the target feature information to obtain first part feature information and second part feature information, wherein the first extraction operation is used to extract the first part feature information associated with the first part, and the second extraction operation is used to extract the second part feature information associated with the second part, and the first part and the second part perform different target actions; Based on the feature information of the first part, determine the first probability that the first part is in a first state in the target local face image, and generate the first state result; Based on the feature information of the second part, determine the second probability that the second part is in the second state in the target local face image, and generate the second state result; The target state result is generated based on the first state result and the second state result, wherein the target state result includes: If the first probability satisfies the first preset condition and the second probability does not satisfy the second preset condition, the target state result indicates that the first part is in the first state and the second part is not in the second state; or When the first probability satisfies the first preset condition and the second probability satisfies the second preset condition, the target state result indicates that the first part is in the first state and the second part is in the second state; or If the first probability does not meet the first preset condition and the second probability meets the second preset condition, the target state result indicates that the first part is not in the first state and the second part is in the second state; or If the first probability does not meet the first preset condition and the second probability does not meet the second preset condition, the target state result indicates that the first part is not in the first state and the second part is not in the second state.
3. The method according to claim 1, characterized in that, The step of authenticating the face to be identified based on the K recognition results to obtain the target authentication result includes: If the Nth identification result indicates that the target part is not in the target state, and the Mth identification result indicates that the target part is in the target state, the target authentication result is determined to be successful, wherein N is less than M, M is less than or equal to K, and N and M are both integers greater than 0; If the Nth identification result indicates that the target part is not in the target state, and the N+1 to Kth identification results all indicate that the target part is not in the target state, then the target authentication result is determined to be authentication failure. If the Nth identification result indicates that the target part is in the target state, and the Mth identification result indicates that the target part is not in the target state, then the target authentication result is determined to be successful. If the Nth identification result indicates that the target part is in the target state, and the N+1 to Kth identification results all indicate that the target part is in the target state, then the target authentication result is determined to be authentication failure.
4. The method according to claim 3, characterized in that, The method further includes: If the number of recognition results indicating that the input frame of face image is a real face meets a preset threshold, and the Nth recognition result indicates that the target part is not in the target state, and the Mth recognition result indicates that the target part is in the target state, then the target authentication result is determined to be successful. If the number of recognition results indicating that the input frame of face image is a real face in the K recognition results meets the preset number threshold, and if both the Nth recognition result and the Mth recognition result indicate that the target part is not in the target state or is in the target state, then the target authentication result is determined to be authentication failure. If the number of recognition results indicating that the input frame of face image is a real face does not meet the preset number threshold among the K recognition results, the target authentication result is determined to be authentication failure.
5. The method according to claim 3, characterized in that, The method further includes: If the Nth identification result indicates that the target part is not in the target state, the Mth identification result indicates that the target part is in the target state, and the Pth identification result indicates that the target part is not in the target state, then the target authentication result is determined to be successful, where P is greater than M and less than or equal to K, and P is an integer greater than 0. If the Nth identification result indicates that the target part is in the target state, the Mth identification result indicates that the target part is not in the target state, and the Pth identification result indicates that the target part is in the target state, then the target authentication result is determined to be successful.
6. An identity authentication device, characterized in that, include: The acquisition module is used to acquire K frames of face images to be processed by the image acquisition device to acquire the face to be identified in response to a playback command associated with the target action. The K frames of face images are used to authenticate the face to be identified, and the playback command is used to prompt the face to be identified to perform the target action. K is an integer greater than 1. The first processing module is used to perform frame-by-frame detection on the K frames of face images to obtain K groups of local face images. Among the K groups of local face images, one group of local face images represents a set of local face images detected from a frame of face images in the K frames of face images. Each local face image corresponds to a target part, and the target part includes different face parts performing different target actions. The device is further configured to: sequentially perform target feature extraction operations on the K groups of local face images to obtain target feature information, input the target feature information into an activity judgment network and a target part processing network respectively, and obtain an activity judgment result and a target state result, wherein the target feature information indicates whether the target local face image is a real face, the activity judgment result indicates whether the target local face image is a real face, and the target state result indicates whether the target part is in the target state in the target local face image; The device is further configured to: generate K recognition results based on the activity judgment result and the target state result; authenticate the face to be identified based on the K recognition results to obtain a target authentication result; wherein, a recognition result includes whether the corresponding local face image is a real face and whether the target part therein is in the target state; and the target authentication result includes whether the face to be identified is a real face and whether the face to be identified has performed the target action.
7. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements the steps of the method described in any one of claims 1 to 5.
8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method described in any one of claims 1 to 5.