Image processing method and related device
By using a multi-feature joint detection method, semantic and detail features of the palm image are extracted, which solves the problems of insufficient robustness and accuracy of palm liveness detection in the existing technology, and realizes efficient and secure recognition of palm liveness detection.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2025-12-02
- Publication Date
- 2026-07-02
AI Technical Summary
Existing methods for detecting liveness of hands have limited features and are easily affected by changes in lighting and shooting angle. They lack robustness and accuracy, making it difficult to effectively prevent attacks such as screen captures and paper printing.
A multi-feature joint detection method is adopted to extract semantic and detail features of the palm image. Liveness detection is performed by complementing high-dimensional and low-dimensional features, including semantic and detail features of the palm, such as palm shape, texture, edges, and background information. The liveness detection model is then used for comprehensive discrimination.
It improves the robustness and accuracy of palm liveness detection, effectively distinguishing between live and non-live palms, enhancing the security and reliability of palmprint recognition, and preventing spoofing attacks.
Smart Images

Figure CN2025139194_02072026_PF_FP_ABST
Abstract
Description
Image processing methods and related equipment
[0001] This application claims priority to Chinese Patent Application No. 202411977176.2, filed on December 27, 2024, entitled "An Image Processing Method and Related Equipment", the entire contents of which are incorporated herein by reference. Technical Field
[0002] This application relates to the field of computer technology, and in particular to an image processing method, an image processing apparatus, a computer device, a computer-readable storage medium, and a computer program product. Background Technology
[0003] With the development of artificial intelligence technology, biometric recognition technology has shown broad application prospects in various business scenarios requiring identity verification, such as mobile payment, access control, public transportation, and mobile phone unlocking. Palmprint recognition technology, as a type of biometric recognition, has significant advantages in privacy protection because it does not require direct contact with the device or the collection of sensitive personal biometric information. However, in practical applications, palmprint recognition also faces various attacks (such as screen-flipping attacks and paper-printing attacks). Liveness detection of the palm in the image is a crucial line of defense for palmprint recognition security.
[0004] Technical content
[0005] This application provides an image processing method and related equipment that can improve the robustness and accuracy of liveness detection of a hand in an image.
[0006] This application provides an image processing method, the method comprising:
[0007] Obtain the palm image to be processed; the palm image contains the hand.
[0008] Semantic features are extracted from the palm image, and liveness detection is performed on the palm in the palm image based on the semantic features to obtain a first detection result; wherein, the semantic features are used to characterize the semantic information of the palm image;
[0009] At least one detail feature is extracted from the palm image, and liveness detection is performed on the palm in the palm image based on each of the at least one detail feature to obtain at least one second detection result; wherein, each of the at least one detail feature is used to characterize a detail information of the palm image;
[0010] Based on the first detection result and at least one second detection result, a liveness detection is performed on the palm in the palm image to obtain a liveness detection result of the palm in the palm image, wherein the liveness detection result is used to indicate whether the palm in the palm image is a live palm.
[0011] This application provides an image processing apparatus, which includes:
[0012] The acquisition unit is used to acquire the palm image to be processed; the palm image contains the hand.
[0013] A processing unit is configured to extract semantic features from a palm image, and perform liveness detection on the palm in the palm image based on the semantic features to obtain a first detection result; wherein the semantic features are used to characterize the semantic information of the palm image; extract at least one detail feature from the palm image, and perform liveness detection on the palm in the palm image based on each of the at least one detail feature to obtain at least one second detection result; wherein each of the at least one detail feature is used to characterize a detail information of the palm image; perform liveness detection on the palm in the palm image based on the first detection result and at least one second detection result to obtain a liveness detection result of the palm in the palm image, wherein the liveness detection result is used to indicate whether the palm in the palm image is a live palm.
[0014] This application provides a computer device, which includes: a processor adapted to execute a computer program; and a computer-readable storage medium storing the computer program, which, when executed by the processor, implements the image processing method described above.
[0015] This application provides a computer-readable storage medium storing a computer program, which is loaded by a processor and executed as described above in the image processing method.
[0016] This application provides a computer program product, which includes a computer program or computer instructions. When the computer program or computer instructions are executed by a processor, they implement the above-described image processing method.
[0017] Brief description of the attached figures
[0018] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0019] Figure 1 is an architecture diagram of an image processing system provided by some exemplary embodiments of this application;
[0020] Figure 2 is a flowchart illustrating an image processing method provided by some exemplary embodiments of this application;
[0021] Figure 3 is a schematic diagram of an image processing flow provided by some exemplary embodiments of this application;
[0022] Figure 4 is a flowchart illustrating another image processing method provided by some exemplary embodiments of this application;
[0023] Figure 5a is a schematic diagram of a hand detection frame and the detection results of key points of the hand provided by some exemplary embodiments of this application;
[0024] Figure 5b is a schematic diagram of cropping a palm image provided by some exemplary embodiments of this application;
[0025] Figure 5c is a schematic diagram of a paper cutout image for touching the hand, provided by some exemplary embodiments of this application;
[0026] Figure 6 is a schematic diagram of the processing flow of a liveness detection model provided by some exemplary embodiments of this application;
[0027] Figure 7 is a schematic diagram of the training process of a first model provided by some exemplary embodiments of this application;
[0028] Figure 8 is a schematic diagram of the structure of an image processing apparatus provided by some exemplary embodiments of this application;
[0029] Figure 9 is a schematic diagram of the structure of a computer device provided by some exemplary embodiments of this application. Detailed Implementation
[0030] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0031] In this application, the terms "first," "second," etc., are used to distinguish identical or similar items with substantially the same function. It should be understood that there is no logical or temporal dependency between "first," "second," and "nth," nor does it limit the quantity or execution order. The term "at least one" refers to one or more, and "multiple" means two or more; for example, "at least one image" refers to one, two, or more images. The terms "module" or "unit" refer to a computer program or part of a computer program with a predetermined function, working together with other related parts to achieve a predetermined goal, and can be implemented wholly or partially using software, hardware (such as processing circuitry or memory), or a combination thereof. Similarly, a processor (or multiple processors or memory) can be used to implement one or more modules or units. Furthermore, each module or unit can be part of a larger module or unit that contains the functionality of that module or unit.
[0032] Some palm liveness detection methods use single features and are easily affected by factors such as changes in lighting and shooting angle, leading to false positives. Their robustness and accuracy need improvement. This application proposes a multi-feature joint palm liveness detection technique. This technique extracts semantic features (high-dimensional features with rich semantic information) from the palm image, and also extracts detail features (low-dimensional features with additional detail information). Through the complementarity between high-dimensional and low-dimensional features, it can more accurately distinguish whether the palm in the palm image is a live or inactive palm, improving the robustness and accuracy of palm liveness detection and thus enhancing the security of palmprint recognition.
[0033] In this embodiment, high-dimensional features refer to feature vectors with a large number of features that can express rich semantic information. For example, high-dimensional features may include global information such as the overall texture and shape of the palm. Low-dimensional features refer to features with a smaller number of features that focus on local details, such as the fine lines, edges, or vein distribution of palm prints. Low-dimensional features are computationally efficient and can effectively reflect local details.
[0034] By transforming raw image pixels into structured feature vectors (semantic features), machine-readable input features can be provided for subsequent operations. For example, the semantic features of a palm image can include palm attribute information and associated auxiliary features. Among them, palm attributes can include geometric features of the palm, palmprint features, etc.; associated auxiliary information can include background information, border information, etc.
[0035] The image processing solution provided in this application can be applied to the following business scenarios, including but not limited to: door locks, access control, palm payment, palm registration, public transportation, and identity verification. For example, in the palm registration scenario, when a user registers for a palm payment application using their palmprint, the terminal (such as a mobile phone or palm-printing device) can take a photo of the palm as the palm image to be processed. The palm image is then analyzed according to the liveness detection process provided in this application to determine whether the palm in the image is a live palm. This determines whether the palm image is a real palm image taken by the user or a fake palm image that is copied or printed on paper, thus ensuring the security of palmprint registration.
[0036] It is understood that in the specific embodiments of this application, data related to palm images, palm prints, etc. are involved. When the above embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of related data must comply with relevant laws, regulations and standards.
[0037] The architecture of the image processing system provided in the embodiments of this application will now be described with reference to the accompanying drawings.
[0038] Please refer to Figure 1, which is an architecture diagram of an image processing system provided by some exemplary embodiments of this application. As shown in Figure 1, the image processing system includes a terminal device 101, a computer device 102, and a database 103; both the terminal device 101 and the database 103 can establish a communication connection with the computer device 102 via wired or wireless means. The computer device 102 is used to execute the image processing flow, and both the terminal device 101 and the database 103 can be used to provide data support for the image processing process of the computer device 102. Depending on the deployment location, the database 103 can be a local database of the computer device 102, or a cloud database that can establish a connection with the computer device 102. Depending on its attributes, the database 103 can be a public database, i.e., a database open to all computer devices; or it can be a private database, i.e., a database open only to a specific computer device (such as computer device 102). Computer device 102 may include any one or both of terminal devices and servers. Terminal devices include, but are not limited to, smartphones, tablets, smart wearable devices, smart voice interaction devices, smart home appliances, personal computers, in-vehicle terminals, smart cameras, etc., and this application does not limit their number. The server may be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms, but is not limited to these. The number of servers is not limited.
[0039] The image processing flow performed by computer device 102 generally includes: ① acquiring a palm image to be processed. This palm image contains the hand. In some embodiments, terminal device 101 can capture an image of the target object's palm in a corresponding business scenario to obtain an original palm image, and send this original palm image to computer device 102. Computer device 102 can use this original palm image as the palm image to be processed. In some embodiments, terminal device 101 can also crop the original palm image and send the cropped original palm image to computer device 102. Computer device 102 can use this cropped original palm image as the palm image to be processed. This ensures a uniform size of the palm images processed by computer device 102 and saves bandwidth resources for image transmission. In some embodiments, terminal device 101 can also record a video of the target object's palm in a business scenario to obtain a palm video, and select a palm image frame from the palm video according to preset rules (such as selecting a specified frame, selecting the frame with the highest quality score, or random selection) and send it to computer device 102. ② Semantic feature extraction is performed on the palm image to obtain semantic features that represent the semantic information of the palm image. The semantic information represented by these features may include foreground and background information of the palm image. The foreground information is the information of the entire palm, and the background information is the scene in which the palm is located. ③ Detail feature extraction is performed on the palm image to obtain at least one detail feature, each of which represents a specific detail of the palm image. These detail features may include image detail features related to the palm in the palm image. For example, detail features may include texture features, which can be used to represent the texture information of the palm image, including the texture of the palm. One detail feature can be used to intercept one type of attack; if multiple detail features are extracted from the palm image, different detail features can be used to intercept different types of attacks. ④ Liveness detection is performed on the palm in the palm image based on the semantic features and at least one detail feature to obtain a detection result. This detection result indicates whether the palm in the palm image is a live palm. In some implementations, the palm in the palm image can be classified based on semantic features and various detail features separately. After obtaining the respective detection results, the detection results are combined to obtain the final detection result. The classification prediction is used to predict whether the palm in the palm image is a living or non-living hand. Semantic features and at least one detail feature can provide multiple perspectives to better determine whether the palm in the palm image is a living hand. If the palm in the palm image is determined to be non-living by semantic features, or if a detail feature determines that the palm in the palm image is non-living, then the detection result indicates that the palm in the palm image is non-living, thus identifying the palm image as an attack image, i.e., a fake palm image.
[0040] In some embodiments, a liveness detection model is deployed in the computer device 102. The computer device 102 can invoke the liveness detection model to perform the aforementioned semantic feature extraction processing, detail feature extraction processing, and liveness detection. This liveness detection model can be applied to a palmprint recognition system. If the detection result indicates that the palm in the palm image is a live palm, the palmprint in the palm image can be recognized to obtain a recognition result, which can be used to indicate the identity to which the palmprint belongs. Thus, based on the detection result given by the liveness detection model, it can be ensured that palmprint recognition is performed on the palmprint of a live palm, thereby ensuring the accuracy and security of palmprint recognition.
[0041] In some embodiments, the computer device 102 can associate and store the palm image and the detection results obtained from processing the palm image in a database. This allows the associated detection results to be directly retrieved from the database when the same or highly similar palm images are subsequently identified. Based on the indication of the detection results, the computer device 102 can process the palm image accordingly. For example, if the detection results indicate that the palm in the palm image is not a living hand, the palm image can be intercepted, including outputting an alarm message to the terminal device 101 and stopping the corresponding business process to ensure business security. If the computer device includes a server and a terminal device, the server can execute the image processing flow shown in ①-④, or the terminal device can execute ①, and the server can execute the flow shown in ②-④.
[0042] In the image processing system provided in this application embodiment, a computer device can acquire a palm image to be processed. By performing semantic feature extraction processing on the palm image, semantic features can be extracted. By performing detail feature extraction processing on the palm image, detail features can be extracted. Semantic features are high-dimensional features containing rich semantic information in the palm image, while detail features are low-dimensional features containing additional detailed information. Liveness detection is performed on the palm image based on semantic and detail features. The two features are complementary to accurately detect whether the palm in the palm image is a live palm, improving the robustness and accuracy of liveness detection. Through this process, the image processing system can effectively improve the security and reliability of palmprint recognition.
[0043] The image processing method provided in the embodiments of this application will be described next.
[0044] Please refer to Figure 2, which is a flowchart illustrating an image processing method provided by some exemplary embodiments of this application. This image processing method can be executed by a computer device (computer device 102 in the image processing system shown in Figure 1), and the image processing method may include the following:
[0045] S201, Obtain the palm image to be processed; the palm image contains a hand.
[0046] The palm image contains the hand itself. Depending on whether the palm possesses vital characteristics and biological activity, it can be a living or non-living palm. A living palm refers to a real palm with vital characteristics and biological activity, possessing body temperature, natural texture and elasticity, and capable of natural hand gestures (such as opening and closing fingers), like a real human hand; the palm print of a living palm is genuine. A non-living palm refers to a palm lacking vital characteristics and biological activity, such as a palm in a screenshot, a forged palm model, a printed or digitally synthesized palm, or a palm replica; the palm print of a non-living palm is fake. The palm image may also contain other information, such as the fingers and the scene in which the palm is situated.
[0047] In some embodiments, the palm image may include any of the following: an original palm image (obtained directly by capturing the palm with a capture device), a cropped original palm image, a palm image selected from a palm video (e.g., a randomly selected palm image, the palm image with the highest quality score selected from the palm video, or a specified palm image selected from the palm video), or a cropped palm image selected from a palm video.
[0048] In some implementations, the method for acquiring the palm image to be processed may include the following steps 1.1-1.4:
[0049] Step 1.1: Obtain palm video, which includes multiple frames of palm images.
[0050] This palm print video can be obtained in the following business scenarios: scenarios where the target object uses their palm to register their identity, and scenarios where the target object uses their palm to verify their identity. For example, in a user palm print registration scenario, the target object can use a mobile phone to take a picture of their own palm to obtain a palm print video.
[0051] Step 1.2: Calculate the quality of each frame of the palm image in the palm video to obtain the quality score of each frame of the palm image.
[0052] The computer device can call the image quality filtering module to calculate the quality score of each frame of the palm image in the palm video. Based on the quality score, the image quality filtering module can also filter out palm images with problems, such as dirty palm lines, overexposed images, dark lighting, and palm tilt angles that are too large, to prevent false positives in liveness detection due to image quality defects.
[0053] Step 1.3: Select the palm image with the highest quality score from the palm video as the candidate palm image;
[0054] Based on the quality score of each palm image in the palm video, the palm image with the highest quality score can be selected as the candidate palm image. That is, the candidate palm image is the palm image with the highest quality score in the palm video. The candidate palm image can be used to determine the palm image to be processed, ensuring the image quality of liveness detection and helping to improve the accuracy of liveness detection.
[0055] Step 1.4: Determine the candidate palm image as the palm image to be processed; or, crop the candidate palm image according to the preset cropping size to obtain the cropped candidate palm image, and determine the cropped candidate palm image as the palm image to be processed.
[0056] After processing in step 1.4, the palm image can be either a candidate palm image or a cropped candidate palm image. During cropping, key point information of the palm in the candidate palm image can be obtained. This key point information can include multiple key points of the palm identified by a palm key point detection algorithm, such as five key points to represent the five fingers of the palm. A Region of Interest (ROI) detector can also be used to automatically detect the position of the palm in the candidate palm image and calculate the detection box (also called the bounding box) containing the palm. Following the palm cropping algorithm, the candidate palm image can be cropped based on the key point information of the palm, the detection box containing the palm, and a preset cropping size. The preset cropping size and the detection box containing the palm are used to determine the size of the cropping box, while the key point information of the palm is used to calculate the center of the circle. The position of the cropping box can be located through the center of the circle. The candidate palm image is cropped according to the size of the cropping box to obtain the cropped candidate palm image, which is the preset image size compared to the original candidate palm image.
[0057] By pre-cropping, even if the original palm images captured by different terminal devices have different sizes, cropping can achieve a uniform size, facilitating subsequent processing. Appropriate cropping can also remove some background interference during shooting, improving the efficiency and accuracy of liveness detection. If the computer device is a server, step 1.1 can be performed by the terminal device, while steps 1.2-1.4 can be performed by the server; alternatively, steps 1.1-1.4 can be performed by the terminal device, which has a dedicated program executing these steps before sending the palm image to the server. Cropping saves bandwidth resources for image transmission and allows for efficient transmission to the computer device for processing.
[0058] In some implementations, the method for obtaining the palm image to be processed may include the following steps: obtaining an original palm image, which is obtained by directly capturing the palm with a capture device, or can be obtained in identity registration or identity recognition business scenarios; determining the original palm image as the palm image to be processed, or cropping the original palm image according to a preset cropping size, and determining the cropped original palm image as the palm image to be processed; that is, the palm image can be the original palm image or the cropped original palm image. The cropping process for the original palm image can refer to the cropping process for the candidate palm images described above, and will not be elaborated here.
[0059] S202, extract semantic features from the palm image, and perform liveness detection on the palm in the palm image based on the semantic features to obtain a first detection result.
[0060] In some embodiments, a liveness detection model can be invoked to perform semantic feature extraction processing on the palm image to obtain semantic features of the palm image. Semantic feature extraction processing refers to using computer vision technology to understand and interpret the content of the palm image. In some embodiments, the liveness detection model includes a first model, which can be a multimodal large model obtained by pre-training and fine-tuning a massive amount of image-text pairs (consisting of images and text related to the images). By performing semantic feature extraction processing through the first model, features (i.e., semantic features) related to the semantics of the palm image can be accurately extracted. Semantic features are used to characterize the semantic information of the palm image, which includes, but is not limited to, palm attributes and associated auxiliary information. Among them, palm attributes include, but are not limited to, the shape of the palm, the width and length ratio of the palm, the palm print (such as the texture details of the palm print, the palm print bifurcation points, the palm print lines), the position of the palm, etc.; associated auxiliary information may include the background information of the palm, screen borders, print borders, etc. In the embodiments of this application, associated auxiliary information, also called associated background information, refers to environmental information other than the palm itself that can be used to assist in determining the source of the image, and may include background information, screen borders, print borders, etc. Background information refers to the scene objects or environmental features surrounding the palm in the image. Screen borders refer to the physical borders of the device screen and UI elements (such as status bars and virtual buttons) that may be included in the image edge when the palm image is displayed on an electronic screen. Print borders refer to the cut edges of the printed paper that may be included in the image if the image originates from a print medium.
[0061] S203, extract at least one detail feature from the palm image, and perform liveness detection on the palm in the palm image according to each of the at least one detail feature to obtain at least one second detection result.
[0062] In some embodiments, a liveness detection model can be invoked to extract detailed features from the palm image, resulting in at least one detailed feature of the palm image. Each detailed feature is used to characterize a type of detailed information in the palm image, including but not limited to: edge (e.g., screen border, print border), texture, imaging (e.g., color, pixels), and other detailed information.
[0063] In some implementations, at least one detail feature may include, but is not limited to, border features, texture features, and imaging features. The border feature characterizes border detail information in the palm image, which may include planar borders (such as screen borders or paper borders). The texture feature characterizes texture detail information in the palm image, where there is a difference between the texture of a living hand and the texture of a non-living hand (such as a hand in a screen photograph or a forged hand model). The imaging feature characterizes imaging detail information in the palm image, including but not limited to: reflection features, moiré pattern features, and blurring features. The imaging detail information includes, but is not limited to: screen reflection information characterized by the reflection feature (i.e., reflections appearing when photographing the screen), screen moiré pattern information characterized by the moiré pattern feature (i.e., water ripple-like stripes appearing when photographing the screen), and low-resolution screen information characterized by the blurring feature. There is overlap between the detailed information represented by detail features and the semantic information represented by semantic features. If some information is missed or not extracted accurately during semantic feature extraction, the result of determining whether the palm in the palm image is a live palm may not be accurate enough. Thus, if the palm image is an attack image, it cannot be intercepted by semantic features alone. However, by adding detail features, it can be combined with semantic features. This combination means that the detail features can also be used to determine whether the palm in the palm image is a live palm, and thus determine whether the palm image is an attack image. This can provide multiple perspectives for detecting whether the palm image is an attack image, thereby improving the accuracy of liveness detection.
[0064] S204, perform liveness detection on the palm in the palm image based on the first detection result and at least one second detection result to obtain the liveness detection result of the palm in the palm image.
[0065] In some embodiments, liveness detection can be performed on the palm in the palm image based on semantic features and at least one detail feature, yielding separate detection results. The final liveness detection result can be determined based on these results. Alternatively, in some embodiments, a liveness detection model can be invoked to perform liveness detection on the palm in the palm image based on semantic features and at least one detail feature, yielding a detection result. The detection result indicates whether the palm in the palm image is a live hand.
[0066] Based on whether the palm in the palm image indicated by the detection results is a live palm, the palm image can include the following categories: real palm image or fake palm image. The palm in a real palm image is a live palm, and the palm in a fake palm image is a non-live palm; a fake palm image is an attack image (or simply an attack), and a fake palm image includes at least one of the following: a paper print image (a paper print attack), a screen flip image (a screen flip attack), and a cut-out paper image pasted onto a hand (a paper print attack). A paper-printed image refers to a palm image printed on a piece of paper for display; a screen-retrieved image refers to a palm image obtained by photographing a palm image displayed on a screen; and a paper-cut hand image refers to a palm image obtained by attaching a cut-out palm-shaped piece of paper to a real palm and then photographing it. These fake palm images are all obtained by photographing inanimate hands. Real palm images can be obtained by computer equipment photographing a live hand, or by other devices photographing a live hand and then sending the photograph to the computer. If the palm in the image is a live hand, then the palm image is a real palm image; if the palm in the image is not a live hand, then the palm image is a fake palm image. Therefore, by analyzing palm images and combining semantic and detail features to determine whether the palm in the image is a live hand, it is possible to determine whether the palm image is a real or fake palm image.
[0067] In some embodiments, based on the detection results obtained by performing liveness detection on the palm in the palm image, the following ① or ② can be performed.
[0068] ① If the detection result indicates that the hand in the palm image is not a living hand, the palm image is intercepted. In some implementations, based on the detection result, the palm image can be determined to be a fake palm image. If the corresponding business process continues based on this palm image, to ensure business security, the palm image can be intercepted. The interception process includes at least one of the following: deleting the palm image, outputting alarm information based on the detection result of the palm image, rejecting the business request corresponding to the palm image, and adding the source address of the palm image to a high-risk database. The alarm information can be used to indicate that the palm image is invalid. If the palm image is used for identity registration, the corresponding business request can be an identity registration request, and the corresponding business process is an identity registration process; if the palm image is used for identity recognition, the corresponding business request can be an identity recognition request, and the corresponding business process is an identity recognition process. When the hand in the palm image is not a living hand, by rejecting the corresponding business request, the progress of the business process can be stopped, avoiding security issues such as identity forgery and impersonation. When a palm image is determined to be a malicious image, its source address (such as the IP address of the palm video to which the image belongs) can be considered the malicious address. By adding it to a high-risk database, all palm images or videos originating from that malicious address can be intercepted, improving processing efficiency and enhancing security. Based on liveness detection, this solution offers strong interception capabilities against different types of attacks. Applying this solution to various payment, access control, and other business scenarios can significantly improve security.
[0069] ② If the detection result indicates that the palm in the palm image is a live palm, then the palm image is processed for approval. In some implementations, based on the indication of the detection result, the palm image can be determined to be a real palm image, and the computer device can process the palm image for approval to advance the corresponding business process. The approval process includes at least one of the following: adding the palm image to an image library, outputting approval information based on the detection result of the palm image, and receiving a business request corresponding to the palm image. By adding the palm image to the image library, efficient detection of the palm image can be achieved by directly comparing whether the same or similar palm images exist in the image library during the next business request. The output approval information can be used to indicate that the palm image is legitimate and allow the object (such as a user) to proceed to the next step. Receiving the approval business request can advance the business process and ensure its smoothness and security.
[0070] In some embodiments, regardless of what the detection result for the palm image indicates, the palm image and the corresponding detection result can be associated and stored in the database. In this way, when the obtained palm image is the same as or has a similarity to the palm image stored in the database that reaches a preset similarity threshold, the associated detection result can be directly obtained from the database. By reusing the detection result, the liveness detection efficiency of palm images that are the same or have a high degree of similarity can be improved, and computing resources can be saved to a certain extent.
[0071] Based on the above embodiments, a schematic diagram of the image processing flow shown in Figure 3 can be provided. As shown in Figure 3, the front end (such as a terminal) can acquire the original palm image. Taking the user's palmprint registration scenario as an example, the original palm image can be a registration photo obtained by the user using a mobile phone to take a picture of their palm. The preprocessing program deployed in the front end can automatically detect the position of the palm in the original palm image, thereby calculating the palm region (represented by the detection box where the palm is located (referred to as the palm detection box)). Subsequently, based on the original palm image and the palm detection bounding box, keypoint detection (or registration point detection) can be performed on the palm to identify key points. Then, based on the key points and the detection bounding box (i.e., the palm region), the original palm image is cropped to obtain a cropped original palm image. This cropped original palm image is then transmitted to the backend (such as a server). The backend can perform liveness detection processing on the cropped original palm image, including: extracting semantic features and detail features of the original palm image, and detecting whether the palm is a live palm based on the semantic features and detail features. If a live palm is detected, it indicates that the liveness detection has passed, and the original palm image can be added to the database (such as a registry). If a non-live palm is detected, it indicates that the liveness detection has failed, and the backend can return an alarm message to the frontend.
[0072] The image processing method provided in this application can obtain semantic features of a palm image through semantic feature extraction and detailed features of the palm image through detail feature extraction. Based on the semantic and detailed features, liveness detection is performed on the palm image to determine whether the palm is a live hand. Thus, by combining high-dimensional and low-dimensional features, the complementarity between the features improves the accuracy and robustness of liveness detection. Furthermore, if the detection result indicates that the palm image is not a live hand, the palm image is intercepted, which can effectively block various attacks and ensure the security of palmprint recognition.
[0073] Please refer to Figure 4, which is another schematic flowchart of an image processing method provided by some exemplary embodiments of this application. This image processing method can be executed by a computer device (computer device 102 in the image processing system shown in Figure 1), and the image processing method may include the following:
[0074] S401, Obtain the palm image to be processed; the palm image contains a hand.
[0075] In some implementations, the image processing method provided in this application can invoke a liveness detection model for execution. This liveness detection model is responsible for semantic feature extraction, detail feature extraction, and liveness detection. The liveness detection model includes a first model and at least one second model. The number of network layers in each second model is less than that in the first model. The first model is a large model, which can be a multimodal large model (e.g., a deep learning classification model based on the CLIP multimodal large model, a contrastive language-image pre-training architecture). It is trained on a large-scale dataset and can extract high-dimensional features rich in semantic information. The second model is a small model, which can be a lightweight deep learning model (e.g., EfficientNet) or other neural network models. The second model has a shallower number of network layers and focuses more on the detailed features of the image. Through the second model, the authenticity of palm images can be efficiently determined. In the field of artificial intelligence, large models and small models are two categories of models classified according to parameter scale and capability. Large models typically refer to deep learning models with a large parameter scale. These models, trained on large-scale datasets, possess powerful generalization capabilities and complex task processing capabilities. Small models refer to deep learning models with relatively few parameters and relatively simple structures. They are typically used to handle specific tasks and have lower computational complexity and resource consumption.
[0076] In this embodiment, the first model may include an image encoder and a classifier. The image encoder may be based on ResNet and used to convert the input image into a high-dimensional vector representation. The classifier may employ a fully connected neural network, used to input the high-dimensional vector representation (feature vector) output by the encoder into the fully connected layer, calculate the class score through softmax, and complete the classification task of judging whether the image is true or false. The image encoder in the first model can be pre-trained through contrastive learning.
[0077] The second model may include an image encoder (e.g., including an input layer and a convolutional layer) and an output layer, where the image encoder is used to extract image features, and the output layer (also called a classifier) is used to receive the output of the image encoder and generate classification results.
[0078] S402, extract semantic features of the palm image using the first model.
[0079] The first model is used to perform semantic feature extraction processing. Because the first model has strong semantic feature extraction capabilities and robustness, this application can avoid cropping the palm image and directly input it into the image encoder of the first model. This avoids the loss of semantic information caused by cropping, such as the loss of screen borders in screen-recognized images, or the loss of paper border information in printed paper images or images of cut paper pasted onto a hand. By not cropping, the first model can comprehensively determine whether the palm in the palm image is a live palm by combining all the information in the palm image, and thus determine whether the palm image is a fake palm image. The first model can be used to intercept various types of fake palm images.
[0080] S403, extract at least one detailed feature from the palm image using at least one second model.
[0081] At least one second model is used to perform detail feature extraction processing; there are N types of detail features, and N second models in total, each dedicated to extracting a specific detail feature; N is a positive integer. For example, the second models include second model M1 and second model M2. Second model M1 can be called to perform detail feature extraction processing on the palm image to obtain detail feature A1, and second model M2 can be called to perform detail feature extraction processing on the palm image to obtain detail feature A2. Each detail feature can be used independently to determine whether the palm in the palm image is a live palm, and thus, a comprehensive determination can be made as to whether the palm image is an attack image.
[0082] In some embodiments, based on the detailed features of the palm image and the second model included in the liveness detection model, when the computer device executes S403, it includes the following contents as shown in (1)-(3):
[0083] (1) The detailed features of the palm image include the border features, and the liveness detection model includes the second model M1; the computer device can call the second model M1 to extract the border features of the palm image.
[0084] The second model M1 determines whether a hand in a palm image is a live hand by identifying screen borders and / or paper borders. This second model M1 can be used to intercept paper printing attacks and / or screen re-photographing attacks. Since borders are usually located at the image edges, and cropping may cause the loss of border information, the computer device does not crop the palm image when it is processed by the second model M1. By extracting the border features of the palm image, if the border information represented by the border features is a screen border or a paper border, it can be determined that the hand in the palm image is not a live hand. Furthermore, the attack type can be determined based on the type of border information: if the border information is a screen border, then the palm image can be determined to be a screen re-photographed image (a type of screen re-photographing attack); if the border information is a paper border, then the palm image can be determined to be a paper printing image or a paper-cut hand image (a type of paper printing attack).
[0085] (2) The detailed features of the palm image include texture features, and the liveness detection model includes the second model M2; the computer device can: crop the palm image according to the first cropping size to obtain the first cropped palm image; and call the second model M2 to extract the texture features of the palm image.
[0086] The second model, M2, primarily determines whether a hand in a palm image is a live hand by recognizing detailed features of the paper. This second model, M2, can be used to intercept paper-printing attacks (including printed paper images or images of cut-out paper pieces attached to a hand). Considering that liveness detection is subject to a large amount of complex background information interference in practical applications (e.g., the background may contain LED screen features), to reduce background information interference, the palm image can be cropped so that the second model, M2, focuses on texture information. If the texture details represented by the extracted texture features are the texture information of a printed paper piece, then it can be determined that the hand in the palm image is not a live hand, and the palm image is a printed paper image rather than a real palm image.
[0087] In some implementations, steps 2.1-2.3 can be performed to crop the palm image according to a first cropping size, thereby obtaining a first cropped palm image:
[0088] Step 2.1: Perform region detection on the palm image to obtain the palm region in the palm image.
[0089] A computer device can invoke a palm region of interest detector to detect the area containing the palm (i.e., the palm region) in a palm image. This palm region can be identified by a detection box. The detection box is a bounding box that encloses the entire palm, including the palm and all fingers. It can be represented by {x, y, w, h}, where (x, y) represents the location of the detection box (e.g., the coordinates of the top-left corner), and (w, h) represents the size of the detection box (e.g., w represents the width, and h represents the height). The detection box can be square, circular, or irregular in shape; for image processing efficiency, a square detection box is used in this application.
[0090] Step 2.2: Obtain multiple key points of the palm in the palm image; multiple key points are used to indicate the shape of the palm.
[0091] Computer devices can use a palm keypoint detector to detect the palm region in a palm image, obtaining keypoint detection results. These results include the key points of the palm in the image. For example, the palm keypoint detector can obtain 21 key points. The keypoint detection results can be represented as follows: Where, p k This represents the coordinates of the k-th keypoint. For example, as shown in Figure 5a, which illustrates the hand detection bounding box and the detection results of hand keypoints, the detection box containing the hand in the palm image is square, and the detected hand keypoints include keypoints on each finger and keypoints on the edge of the palm. In some embodiments, the multiple keypoints of the palm can be some or all of the keypoints obtained from the keypoint detection results that indicate the shape of the palm; the boundary of the palm can be located based on these multiple keypoints.
[0092] Step 2.3: According to the first cropping size, crop the palm image based on multiple key points and palm areas to obtain the first cropped palm image.
[0093] In some implementations, this application can use a palm-center tracking algorithm to crop the palm region to further reduce background interference with liveness detection. The computer device can determine the palm cropping box based on multiple key points of the palm and the size of the detection box, adjust the palm cropping box according to a first cropping size, and crop the palm image based on the adjusted palm cropping box to obtain a first cropped palm image. When determining the palm cropping box, the position of the center of the palm circle {x} can be calculated based on multiple key points of the palm. c y c}, with the dimensions {w, h} of the detection frame as the reference, the center of the circle {x} c y cCentered on}, a palm cropping bounding box is calculated. This palm cropping bounding box can be circular, square, or other irregular shapes. In actual deployment, the palm cropping bounding box can be adjusted in different sizes, including expanding or shrinking it. The first cropping size refers to the size of the remaining image after cropping. The size of the first cropped palm image is the same as the first cropping size. This application can provide a first expansion size margin1, determine the first cropping size based on the first expansion size margin1 and the size of the detection box, and adjust the palm cropping bounding box according to the first cropping size. The adjusted palm cropping bounding box can be represented as {x}. c y c The adjusted palm cropping box is used to crop the palm image. The resulting first cropped palm image is the cropped palm image. The first cropped palm image contains the complete palm print, but may not contain complete fingers. This allows the focus to be on recognizing the important information of the palm print while reducing interference from other information. For example, as shown in Figure 5b, a schematic diagram of palm image cropping can be used to select multiple key points on the edge of the palm from the key point detection results, including key point p. 1 p 2 p 3 p 6 p 10 p 14 p 18 Based on these key points, the palm circle can be calculated, and the outer tangent frame of the palm circle is the palm cropping frame. The first cropped image can be obtained by cropping based on the palm cropping frame, or the palm cropping frame can be adjusted and the first cropped image can be obtained by cropping based on the adjusted palm cropping frame.
[0094] Following the cropping logic shown in steps 2.1-2.3 above, areas unrelated to the palm print can be accurately cropped out, allowing the first cropped palm image to be input into the second model M2 for detailed feature extraction. This model can focus on the texture analysis of the palm print and accurately extract the texture features of the palm.
[0095] (3) The detailed features of the palm image include imaging features, and the liveness detection model includes the second model M3; the computer device can: crop the palm image according to the second cropping size to obtain the second cropped palm image; and call the second model M3 to extract the imaging features of the palm image.
[0096] In some implementations, the second model M3 primarily determines whether the hand in the palm image is a live hand by recognizing detailed features of the screen capture. The second model M3 can be used to intercept borderless screen capture attacks. To reduce background interference, the palm image can be cropped so that the second model M3 can focus on the imaging details of the screen capture.
[0097] In some implementations, the second cropping size refers to the size of the remaining image after cropping, and the size of the second cropped palm image is the same as the first cropping size. The second cropping size can be determined based on a given second expansion size margin2 and the size of the detection box. The second cropping size can be used to adjust the palm cropping box, and the adjusted palm cropping box can be represented as {x c y c The specific implementation of cropping the palm image according to the second cropping size can be referred to in steps 2.1-2.3 above, which describes the process of cropping the palm image according to the first cropping size. The second cropping size is larger than the first cropping size. For example, margin1 = 1.0, margin2 = 2.0. The first cropping size determined based on margin1 and the size of the detection box is denoted as C1. The second cropping size is determined based on margin2 and the size of the detection box, and C2 is greater than C1. By adjusting the palm cropping box according to these two cropping sizes, different sizes of adjusted palm cropping boxes can be obtained. Cropping according to these different sizes of adjusted palm cropping boxes can yield cropped images of different sizes. The size of the second cropped palm image is larger than the size of the first cropped palm image.
[0098] In some implementations, both the first and second cropped palm images can include borders. This is because borders (whether screen borders or paper borders) can be saturated with interception; that is, both the first and second models can process the borders in the palm image. Therefore, the borders in the palm image can be preserved during cropping for easy recognition. The first cropping size can be understood as a paper cropping size, and the second cropping size can be understood as a screen cropping size. During cropping, the paper cropping size used is smaller than the screen cropping size. An important reason is that in addition to intercepting the printing of the complete paper, the model can also intercept the cropped paper touching the hand (a small area). To encourage the model to focus more on the area where the cropped paper touches the hand and avoid background interference, it can be set slightly smaller. For example, referring to the schematic diagram of the cut-out paper hand image shown in Figure 5c, the cut-out paper hand image includes the "cut-out paper hand" area, that is, the area where the cut-out paper hand fits against the real hand. The "cut-out paper hand" area is located in the middle part of the image, so the effective area is relatively small. A small size can be set to make the model more focused on this effective area. The two cases of photocopying and printing are more similar, that is, the entire image contains the details of photocopying and printing, so the cut-out area can be larger.
[0099] In some embodiments, if the liveness detection model includes a first model and a second model, the detail feature extraction processing performed by the second model may include at least one of: bounding box feature extraction processing, texture feature extraction processing, and imaging feature extraction processing. A computer device can invoke the second model to perform the above detail feature extraction processing on the palm image to obtain at least one detail feature. In this way, one second model can be responsible for the extraction of one detail feature, reducing the deployment and training costs of the model.
[0100] In some embodiments, the first model and at least one second model collaboratively perform liveness detection. The first model and at least one second model can perform liveness detection in parallel, including performing liveness detection in parallel to predict whether the hand in the palm image is a live hand, which can improve the efficiency of liveness detection. Based on the models contained in the liveness detection model, when the computer device calls the liveness detection model to perform liveness detection, it can perform the following steps S404-S406.
[0101] S404, The first model performs liveness detection on the palm in the palm image based on semantic features to obtain the first detection result.
[0102] The liveness detection (also known as category prediction processing) mentioned in this application is used to predict whether a hand in a palm image is a live hand or a non-live hand. In some embodiments, the first model includes a classifier, which can be invoked to perform liveness detection. The computer device can perform the following steps: invoke the first model to predict a score of the palm image based on semantic features to obtain a first category prediction score, which is used to indicate the probability that the hand in the palm image is a live hand; if the first category prediction score is greater than or equal to a first preset score threshold, then generate a first detection result (also known as a predicted category) indicating that the hand in the palm image is a live hand; if the first category prediction score is less than the first preset score threshold, then generate a first detection result indicating that the hand in the palm image is a non-live hand.
[0103] In some implementations, the first preset score threshold is set based on the security level requirements of the business scenario corresponding to the palm image. The security level requirements indicate the security level required by the business scenario. The security level indicated by the security level requirements is positively correlated with the first preset score threshold: the higher the security level indicated by the security level requirements, the higher the first preset score threshold is set; the lower the security level indicated by the security level requirements, the lower the first preset score threshold is set.
[0104] In other embodiments, the computer device may perform the following steps: predicting a score for a palm image based on semantic features to obtain at least one category prediction score for the palm image, where each category prediction score corresponds to a category; the categories here include the following two categories: live palm and non-live palm; representing the palm image as a real palm image or a fake palm image, respectively; and determining the category corresponding to the highest category prediction score among the at least one category prediction scores as the first detection result. Further, in some implementations, if the highest category prediction score is greater than or equal to a preset score, then the category corresponding to the highest category prediction score is determined as the first detection result; otherwise, liveness detection can be performed again on the palm image.
[0105] S405, each second model performs liveness detection on the palm in the palm image based on the detailed features extracted by each model, and obtains at least one second detection result.
[0106] In some implementations, calling each second model to perform liveness detection on the palm image based on its extracted detail features may include at least one of the following: calling second model M1 to perform liveness detection on the palm image based on bounding box features to obtain a second detection result P1; calling second model M2 to perform liveness detection on the palm image based on texture features to obtain a second detection result P2; and calling second model M3 to perform liveness detection on the palm image based on imaging features to obtain a second detection result P3. Each second model includes a classifier, and the classifier in the second model can be called to perform liveness detection. The logic for liveness detection by each second model can refer to the process of obtaining the first detection result described above, for example: performing score prediction based on the corresponding detail features to obtain a second category prediction score S1; and generating the second detection result P1 based on the relationship between the second category prediction score S1 and a second preset score threshold Th1.
[0107] Both the first and second detection results can be used to indicate whether the palm in the palm image is a live palm. However, the first and second detection results are outputs of different models, and multiple second detection results are also outputs of different second models. Therefore, the first and second detection results may be the same or different, and any two second detection results may be the same or different. The first detection result and at least one second detection result can be used to jointly determine the detection result.
[0108] In some implementations, the liveness detection model integrates multiple models, each of which can be configured with a corresponding preset score threshold. For example, a first model may have a first preset score threshold Th0, which is the score threshold used by the first model to determine whether a hand in a palm image is a live hand. Each second model may have its own second preset score threshold. For instance, second model M1 may have a second preset score threshold Th1, second model M2 may have a second preset score threshold Th2, and second model M3 may have a second preset score threshold Th3. Each second prediction score threshold can also be set based on the business requirements of the corresponding business scenario for the palm image. Furthermore, based on the business requirements of the corresponding business scenario, each preset score threshold can be flexibly adjusted, thereby improving the practicality of the liveness detection model in real-world applications. For example, if the business requirements in the business scenario are used to indicate the interception of a specific type of attack, such as intercepting attacks on the screen border, then the first preset score threshold Th0 corresponding to the first model and the second preset score threshold Th1 of the second model M1 can be reset so that the liveness detection model can detect whether the specific type of attack has occurred, and then intercept and process it after detecting the specific type of attack.
[0109] S406, determine the liveness detection result of the palm in the palm image based on the first detection result and at least one second detection result.
[0110] In some embodiments, the liveness detection result includes a live hand or a non-live hand. The computer device can perform the following: if either a first detection result or at least one second detection result indicates that the hand in the palm image is a non-live hand, then the liveness detection result is determined to be a non-live hand; if both the first detection result and at least one second detection result indicate that the hand in the palm image is a live hand, then the liveness detection result is determined to be a live hand.
[0111] The existence of a detection result indicating that the hand in the palm image is a non-living hand can include one or more of the following: a first detection result and at least one second detection result indicating that the hand in the palm image is a non-living hand. For example, if the first detection result indicates that the hand in the palm image is a living hand, and at least one second detection result includes a second detection result P1 and a second detection result P2, where second detection result P1 indicates that the hand in the palm image is a non-living hand, and second detection result P2 indicates that the hand in the palm image is a living hand, then the liveness detection result can be determined as a non-living hand. In other words, for the detection results given by multiple models, as long as at least one detection result indicates that the hand in the palm image is a non-living hand, it can be determined that the hand in the palm image is a non-living hand. Conversely, both the first detection result and at least one second detection result need to indicate that the hand in the palm image is a living hand for it to be determined that the hand in the palm image is a living hand. For example, if the first detection result indicates that the palm in the palm image is a live palm, and both the second detection results P1 and P2 indicate that the palm in the palm image is a live palm, then the liveness detection result can be determined to be a live palm.
[0112] As can be seen, each model can output a detection result, which is also a prediction result obtained by the corresponding model based on the relevant features of the palm image for liveness detection. Each detection result can also be used to determine whether the palm image is an attack image. Based on these detection results, the final liveness detection result can be determined. For example, at least one second model includes a second model M1 and a second model M2. The first model, the second model M1, and the second model M2 can perform liveness detection in parallel. The first model outputs a first detection result P0, the second model M1 outputs a second detection result P1, and the second model M2 outputs a second detection result P2. If any of the first detection result P0, the second detection result P1, and the second detection result P2 indicates that the palm in the palm image is not a live hand, it means that there is a detection result that determines that the palm image is an attack image. Then, the palm image can be determined to be an attack image, and the attack image can be intercepted.
[0113] Based on the above embodiments, a schematic diagram of the liveness detection model processing flow, as shown in Figure 6, can be provided. As shown in Figure 6, the liveness detection model involves a first model and multiple second models working collaboratively to intercept different types of attacks. Since each model identifies different attack types, different image cropping schemes are used. No cropping is required for the first and second models M1, while different degrees of cropping are needed for the second models M2 and M3. Furthermore, considering the limited fitting ability of the second models, making it difficult to simultaneously defend against multiple attacks with significantly different distributions (such as paper printing and screen replay), multiple second models (including second models M1, M2, and M3) are selected instead of a single second model. By combining the results of processing the palm image using multiple models to determine whether an attack has occurred, the final detection result can be determined. This multi-model collaborative approach reduces the training difficulty of the second models and effectively intercepts various attacks, improving the overall attack interception effect.
[0114] The image processing method provided in this application can use a first model with strong feature generalization for semantic feature extraction, which is less affected by lighting and shooting angle, making it more robust to intercepting simple attack samples. It can also extract detailed features to supplement semantic features, and combine the two features for liveness detection. Furthermore, it can accurately intercept difficult attack samples that closely resemble real images. Attack interception based on the joint use of the first and second models is a multi-feature joint interception mechanism. Different features can provide different perspectives to determine whether a palm image is an attack image, thus providing better interception capability and robustness. The first model is responsible for intercepting high-dimensional features. Because the first model can extract high-dimensional features rich in semantic information, it can improve the interception capability against attacks. The second model focuses on intercepting low-dimensional features, using the model fitting ability to capture subtle changes in palm prints to provide additional detailed information, which can enhance the robustness of liveness detection.
[0115] In some embodiments, the various models included in the liveness detection model are trained independently. That is, the first model and the second model support independent training and optimization. Based on the independent training and optimization of each model, the liveness detection model has good scalability and adaptability, and can flexibly adjust the included models or model-related data (such as the score threshold used by the model to distinguish categories) according to different application scenarios. By fully utilizing the advantages of the first and second models, the shortcomings of palmprint recognition systems in liveness detection can be effectively addressed, improving the security and robustness of palmprint recognition. Below, any second model is denoted as the second model Mi, and the training method for the second model Mi can be shown in steps a-d as follows:
[0116] Step a: Obtain the first sample image. The first sample image contains a sample hand and carries a reference category label, which indicates whether the sample hand in the first sample image is a live hand. The reference category label carried by the first sample image is a real label; if the first sample image is a fake hand image (belonging to attack sample data), then the reference category label is used to indicate that the sample hand in the first sample image is a non-live hand; if the first sample image is a real hand image (belonging to real person sample data), then the reference category label is used to indicate that the sample hand in the first sample image is a live hand.
[0117] In some implementations, the first sample image is any image in the training set Ti, which is functionally matched with the second model Mi. This includes: if the second model Mi is used to perform texture feature extraction processing, then the training set Ti includes paper print sample images and paper cutout hand sample images; if the second model Mi is used to perform border feature extraction processing, then the training set Ti includes screen capture sample images, paper print sample images, and paper cutout hand sample images; if the second model Mi is used to perform imaging feature extraction processing, then the training set Ti includes screen capture sample images. The training set Ti includes at least one first sample image, and each first sample image carries a corresponding reference category label to indicate whether the hand in the first sample image is a living hand.
[0118] In some implementations, the computer device can acquire the original sample image and perform data augmentation on the original sample image to obtain an augmented sample image, which is then used as the first sample image. Data augmentation includes, but is not limited to, random cropping, rotation, and flipping. By applying data augmentation techniques during training, the diversity of training data can be increased, enhancing the generalization ability of the second model.
[0119] In some implementations, the method for obtaining the training set Ti during the training phase includes: acquiring sample palm videos, performing quality filtering processing on the sample palm videos using a quality filtering module to obtain at least one candidate sample image, and adding each candidate sample image to the training set Ti. Each candidate sample image is a real palm image, and each candidate sample image meets the quality requirements, thus ensuring the quality of the first sample image used during the training phase. Alternatively, each candidate sample image can be cropped to obtain a corresponding first cropped sample image, which is then added to the training set Ti. Uniform palm cropping reduces the differences between different real palm images, making it easier for the model to converge during training. Alternatively, fake sample images can be acquired. Based on the attack category intercepted by the second model Mi (such as screen flipping attacks or paper printing attacks), the fake sample images and the first cropped sample images are mixed to obtain the training set Ti. Mixing real sample data and attack sample data according to attack category or attack features (such as border features, screen imaging features, and paper texture features) can improve the second model's ability to identify different types of attacks.
[0120] Step b: Invoke the second model Mi to extract sample detail features from the first sample image. The second model relies more on image detail features (e.g., texture features) than semantic features. This detail feature extraction process can be any one or more of the following: bounding box feature extraction, texture feature extraction, and imaging feature extraction. Sample detail features can include one or more of the following: sample bounding box features, sample texture features, and sample imaging features. A specific data preprocessing scheme and training set are set for each second model to focus on intercepting specific attack types. The data preprocessing scheme includes: if the second model Mi is used to perform bounding box feature extraction, the first sample image can be left uncropped; if the second model Mi is used to perform texture feature extraction, the first sample image can be cropped according to a first cropping size to obtain a first cropped sample image, and the second model Mi can be invoked to perform texture feature extraction on the first cropped sample image; if the second model Mi is used to perform imaging feature extraction, the first sample image can be cropped according to a second cropping size to obtain a second cropped sample image, and the second model Mi can be invoked to perform imaging feature extraction on the second cropped sample image.
[0121] Step c: Invoke the second model Mi to perform liveness detection based on the sample's detailed features, obtaining the predicted category label for the first sample image. The predicted category label for the first sample image indicates whether the sample hand in the first sample image is a live hand. The predicted category label is a prediction result, and the content indicated by the predicted category label may be the same as or different from the content indicated by the reference category label. For example, the reference category label indicates that the sample hand in the first sample image is a non-live hand, while the predicted category label indicates that the sample hand in the first sample image is a live hand.
[0122] Step d: Train the second model Mi based on the predicted class label and the reference class label. In some implementations, a classification loss can be calculated based on the predicted class label and the reference class label, and the model parameters of the second model Mi can be adjusted in the direction of reducing this classification loss. For example, the classification loss can be cross-entropy loss, which measures the difference between the predicted result and the true label, and the model parameters of the second model Mi can be updated through the backpropagation algorithm.
[0123] In some embodiments, the first model is an image processing model obtained by fine-tuning a pre-trained model according to a palmprint liveness detection task. The first model can be a multimodal large model, including an image feature extractor (i.e., an image encoder) and a classifier, which can be a single fully connected neural network (also called a fully connected layer). In some embodiments, the image feature extractor performs semantic feature extraction on the palm image to obtain semantic features of the palm image. The classifier can then predict the category based on the semantic features to obtain a first predicted category. Based on the first predicted category, the authenticity of the first sample palm image can be determined. In some implementations, the image feature extractor includes an image linear layer and an attention layer. During the pre-training phase, the image linear layer and attention layer can be converted into low-rank matrices. For example, the original weight matrix W0, with dimensions m×n, can be decomposed into two low-rank matrices A (dimension m×r) and B (dimension r×n). The parameters of the first model are fine-tuned using these low-rank matrices, which contain all the model parameters of the first model. This is a low-rank adaptation (LoRA) method, which can generalize knowledge to downstream specific tasks by learning external modules and reduce the overhead of model fine-tuning and storage through the design of learnable rank decomposition matrices. During the fine-tuning phase, a low-rank transformed image feature extractor (which can be called a low-rank encoder) can be invoked to extract semantic features from the second sample image (containing the sample palm). This yields sample semantic features, which are feature vectors. A classifier is then invoked to predict class scores for these semantic features, obtaining a predicted score for each class. Based on the predicted scores, the predicted class label is determined. A classification loss (such as cross-entropy loss) is calculated based on the predicted class label and the true class label carried by the second sample palm image. The low-rank matrix in the first model and the parameters in the classifier are updated by backpropagating the classification loss gradient. For example, the training process of the first model is shown in Figure 7. The second sample image first undergoes feature extraction via a large-model image feature extractor, and then the classifier determines the authenticity of the first sample image. Alternatively, the image linear layer and attention layer of the multi-model large-model image feature extractor can be converted into low-rank matrices. These low-rank matrices are then used to convert the second sample image into a feature vector, which is then input into the classifier to determine the authenticity of the first sample image. In practical applications, this can be used to intercept or pass through the processed palm image.
[0124] By utilizing the first model to learn robust semantic features from large-scale, diverse data, and employing a low-rank fine-tuning method, the semantic features of the first model are preserved while being appropriately fine-tuned to better adapt to the palmprint liveness detection task. The semantic knowledge learned by the first model enhances the robustness and generalization of the model, thus achieving good results even with relatively limited data.
[0125] Based on the above introduction to the training process of the first and second models included in the liveness detection model, the effectiveness verification of the liveness detection model can be seen in the data shown in Table 1 below.
[0126] Table 1 shows the pass rate of real handprints and the false positive rate for different attack types on the self-built dataset.
[0127] Table 1 compares the pass rates of this application and other methods in the field of palmprint liveness detection on different types of data. The evaluation index is the pass rate. The higher the pass rate of real human palms, the better. The lower the false failure rate of attacks such as screen flipping, complete paper pieces, and cut paper pieces, the better. As a comparison, it can be seen that the performance of the liveness detection model of this application is far superior to other models. Not only is the pass rate of real human palms higher, but the false failure rate of attacks is also much lower than that of other models.
[0128] Please refer to Figure 8, which is a schematic diagram of the structure of an image processing apparatus provided by some exemplary embodiments of this application. This image processing apparatus can be disposed in a computer device executing embodiments of this application. The image processing apparatus 800 shown in Figure 8 can be a computer program (including program code) running in the computer device. The image processing apparatus 800 can be used to execute some or all of the steps in the method embodiments shown in Figures 2 and 4. Please refer to Figure 8, the image processing apparatus 800 may include the following units:
[0129] The acquisition unit 801 is used to acquire a palm image to be processed; the palm image contains a hand.
[0130] The processing unit 802 is configured to: extract semantic features of a palm image, and perform liveness detection on the palm in the palm image based on the semantic features to obtain a first detection result; the semantic features are used to characterize the semantic information of the palm image; extract at least one detail feature of the palm image, and perform liveness detection on the palm in the palm image based on each of the at least one detail feature to obtain at least one second detection result; each of the at least one detail feature is used to characterize a detail information of the palm image; perform liveness detection on the palm in the palm image based on the first detection result and at least one second detection result to obtain a liveness detection result of the palm in the palm image, the liveness detection result being used to indicate whether the palm in the palm image is a live palm.
[0131] In some embodiments, each model in the liveness detection model is trained independently; any second model is denoted as second model Mi, and the image processing device 800 further includes: a training unit 803, configured to: acquire a first sample image, the first sample image containing a sample palm, the first sample image carrying a reference category label, and the reference category label being used to indicate whether the sample palm in the first sample image is a live palm; call the second model Mi to perform detail feature extraction processing on the first sample image to obtain sample detail features of the first sample image; call the second model Mi to perform category prediction processing based on the sample detail features to obtain a predicted category label of the first sample image; the predicted category label of the first sample image is used to indicate whether the sample palm in the first sample image is a live palm; and train the second model Mi based on the predicted category label and the reference category label.
[0132] It is understood that the specific functions of each unit of the image processing apparatus described in the embodiments of this application can be specifically implemented according to the methods in the above method embodiments, and the specific implementation process can be referred to the relevant descriptions in the above method embodiments, which will not be repeated here. In addition, the beneficial effects of using the same method will not be repeated here either.
[0133] This application also provides a schematic diagram of the structure of a computer device, which is shown in Figure 9. The computer device may include a processor 901, an input device 902, an output device 903, and a memory 904. The processor 901, input device 902, output device 903, and memory 904 are connected via a bus. The memory 904 stores a computer-readable storage medium, which includes a computer program. The processor 901 executes the computer program stored in the memory 904.
[0134] In some embodiments, the processor 901 performs the following operations by running a computer program in the memory 904: acquiring a palm image to be processed; the palm image contains a hand; performing semantic feature extraction processing on the palm image to obtain semantic features of the palm image, the semantic features being used to characterize the semantic information of the palm image; performing detail feature extraction processing on the palm image to obtain at least one detail feature of the palm image, each detail feature being used to characterize a detail information of the palm image; performing liveness detection on the hand in the palm image based on the semantic features and at least one detail feature to obtain a detection result, the detection result being used to indicate whether the hand in the palm image is a live hand.
[0135] It should be understood that the computer device described in the embodiments of this application can execute the image processing method described in the corresponding embodiments above, and can also execute the image processing apparatus described in the corresponding embodiments above, which will not be repeated here. In addition, the beneficial effects of using the same method will not be repeated here either.
[0136] Furthermore, it should be noted that this application embodiment also provides a computer-readable storage medium, which stores a computer program, and the computer program includes program instructions. When the processor executes the above program instructions, it can execute the methods in the embodiments corresponding to Figures 2 and 4 above. Therefore, it will not be described again here.
[0137] According to one aspect of this application, a computer program product is provided, comprising a computer program stored in a computer-readable storage medium. A processor of a computer device reads the computer program from the computer-readable storage medium and executes the computer program, enabling the computer device to perform the methods described in the embodiments corresponding to Figures 2 and 4 above; therefore, further details will not be provided here.
[0138] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. This computer program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the methods described above. The computer-readable storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.
[0139] The above-disclosed embodiments are merely preferred embodiments of this application and should not be construed as limiting the scope of this application. Those skilled in the art will understand that all or part of the processes for implementing the above embodiments and equivalent variations made in accordance with the claims of this application are still within the scope of this application.
Claims
1. An image processing method, performed by an electronic device, the method comprising: Obtain the palm image to be processed; The palm image contains a hand; Semantic features are extracted from the palm image, and liveness detection is performed on the palm in the palm image based on the semantic features to obtain a first detection result; the semantic features are used to characterize the semantic information of the palm image. At least one detail feature is extracted from the palm image, and liveness detection is performed on the palm in the palm image based on each of the at least one detail feature to obtain at least one second detection result; each of the at least one detail feature is used to characterize a detail information of the palm image; Based on the first detection result and the at least one second detection result, a liveness detection result of the palm in the palm image is obtained, and the liveness detection result is used to indicate whether the palm in the palm image is a live palm.
2. The method as described in claim 1, wherein, The method is executed by calling a liveness detection model, which includes a first model and at least one second model, wherein the number of network layers in each second model is less than the number of network layers in the first model. The first model is used to extract semantic features of the palm image and perform liveness detection on the palm in the palm image based on the semantic features; each of the at least one second model is used to extract a detail feature of the palm image and perform liveness detection on the palm in the palm image based on the detail feature.
3. The method as described in claim 1 or 2, wherein, The semantic features of the palm image include: palm attribute information and associated auxiliary information; wherein, the palm attribute information includes the geometric shape features and palm print features of the palm; and the associated auxiliary information includes background information and border information.
4. The method as described in any one of claims 1 to 3, wherein, The at least one detail feature includes one or more of the following: border feature, texture feature, and imaging feature; wherein, the border feature is used to characterize the border details in the palm image; the texture feature is used to characterize the texture details in the palm image; and the imaging feature is used to characterize the imaging details in the palm image.
5. The method of claim 4, wherein, The border details include screen borders or paper borders; the texture details include the texture of a living hand or a non-living hand; the imaging details include one or more of reflective features, moiré patterns, and blurring features.
6. The method according to any one of claims 2 to 5, wherein, The at least one second model includes one or more of the following: second model M1, second model M2, and second model M3; The extraction of at least one detailed feature from the palm image includes: The second model M1 is invoked to extract the border features of the palm image; The second model M2 is invoked to extract the texture features of the palm image; The second model M3 is invoked to extract the imaging features of the palm image.
7. The method of claim 6, further comprising: Before calling the second model M2 to extract the texture features of the palm image, the palm image is cropped according to the first cropping size to obtain the first cropped palm image; The step of calling the second model M2 to extract the texture features of the palm image includes: calling the second model M2 to extract the texture features of the first cropped palm image.
8. The method of claim 6, further comprising: Before calling the second model M3 to extract the imaging features of the palm image, the palm image is cropped according to the second cropping size to obtain the second cropped palm image; The step of calling the second model M3 to extract the imaging features of the palm image includes: calling the second model M3 to extract the imaging features of the second cropped palm image.
9. The method of claim 7, wherein, The step of cropping the palm image according to the first cropping size to obtain the first cropped palm image includes: Region detection is performed on the palm image to obtain the palm region in the palm image; Obtain multiple key points of the palm in the palm image; the multiple key points are used to indicate the shape of the palm; According to the first cropping size, the palm image is cropped based on multiple key points of the palm and the palm area to obtain the first cropped palm image.
10. The method of claim 9, wherein, The step of cropping the palm image according to a first cropping size, based on multiple key points of the palm and the palm region, to obtain a first cropped palm image includes: The palm cutting frame is determined based on multiple key points of the palm and the size of the palm area; Adjust the palm cutting frame according to the first cutting size; The palm image is cropped based on the adjusted palm cropping frame to obtain a first cropped palm image.
11. The method according to any one of claims 1 to 10, wherein, The first detection result includes: whether the palm in the palm image is a living palm or a non-living palm; Each of the at least one second detection result includes: the palm in the palm image is a living palm or a non-living palm.
12. The method as claimed in any one of claims 1 to 11, wherein, The step of performing liveness detection on the palm image based on the semantic features to obtain a first detection result includes: The palm image is scored based on the semantic features to obtain a first category prediction score, which is used to indicate the probability that the palm in the palm image is a living palm. If the predicted score of the first category is greater than or equal to the first preset score threshold, a first prediction result is generated to indicate that the palm in the palm image is a living palm. If the predicted score of the first category is less than the first preset score threshold, a first prediction result is generated to indicate that the palm in the palm image is a non-living palm. The first preset score threshold is set based on the security level requirements of the business scenario corresponding to the palm image. The security level requirements are used to indicate the security level required by the business scenario. The higher the security level indicated by the security level requirements, the higher the first preset score threshold is set. The lower the security level indicated by the security level requirements, the lower the first preset score threshold is set.
13. The method as claimed in any one of claims 1 to 12, wherein, The step of obtaining the liveness detection result of the palm in the palm image based on the first detection result and the at least one second detection result includes: If either the first detection result or the at least one second detection result indicates that the palm in the palm image is a non-living palm, then the liveness detection result is determined to be a non-living palm. If both the first detection result and the at least one second detection result indicate that the palm in the palm image is a live palm, then the liveness detection result is determined to be a live palm.
14. The method of any one of claims 1 to 13, further comprising: If the liveness detection result indicates that the palm in the palm image is not a live hand, then the palm image is intercepted. The interception process includes at least one of the following: deleting the palm image, outputting alarm information based on the liveness detection result, and rejecting the service request corresponding to the palm image. If the liveness detection result indicates that the palm in the palm image is a live palm, then the palm image is processed for approval. The approval process includes at least one of the following: adding the palm image to the image library, outputting approval information based on the liveness detection result of the palm image, and accepting the service request corresponding to the palm image.
15. The method according to any one of claims 1 to 14, wherein, The process of acquiring the palm image to be processed includes: Acquire a palm video, which includes multiple frames of palm images; The quality of each frame of the palm image in the palm video is calculated to obtain a quality score for each frame of the palm image; The palm image with the highest quality score is selected from the palm video as a candidate palm image; The candidate palm image is determined as the palm image to be processed; or, the candidate palm image is cropped to obtain a cropped candidate palm image, and the cropped candidate palm image is determined as the palm image to be processed.
16. The method as claimed in any one of claims 2 to 15, wherein, The first model and at least one second model in the liveness detection model are each trained independently; the method further includes: For each of the at least one second model: A first sample image is obtained, the first sample image contains a sample palm, the first sample image carries a reference category label, and the reference category label is used to indicate whether the sample palm in the first sample image is a live palm; The second model is invoked to extract sample detail features from the first sample image; The second model is invoked to perform category prediction processing based on the sample detail features to obtain the predicted category label of the first sample image; the predicted category label of the first sample image is used to indicate whether the sample palm in the first sample image is a live palm; The second model is trained based on the predicted category label and the reference category label.
17. An image processing apparatus, comprising: The acquisition unit is used to acquire the palm image to be processed. The palm image contains a hand; The processing unit is configured to extract semantic features from the palm image, and perform liveness detection on the palm in the palm image based on the semantic features to obtain a first detection result; the semantic features are used to characterize the semantic information of the palm image; extract at least one detail feature from the palm image, and perform liveness detection on the palm in the palm image based on each of the at least one detail feature to obtain at least one second detection result; each of the at least one detail feature is used to characterize a detail information of the palm image; perform liveness detection on the palm in the palm image based on the first detection result and the at least one second detection result to obtain a liveness detection result of the palm in the palm image, the liveness detection result being used to indicate whether the palm in the palm image is a live palm.
18. A computer device, comprising: A processor is used to execute computer programs; A computer-readable storage medium storing a computer program, which, when executed by the processor, performs the image processing method as described in any one of claims 1-16.
19. A computer-readable storage medium storing a computer program that, when executed by a processor, performs the image processing method as described in any one of claims 1-16.
20. A computer program product comprising a computer program or computer instructions, the computer program or computer instructions being executed by a processor to implement the image processing method as described in any one of claims 1-16.