Age deception detector
The method of segmenting facial images and comparing age estimates from different parts, combined with inpainting and VLMs, addresses the vulnerability of age estimation systems to spoofing attacks, improving detection accuracy and reliability.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- YOTI HLDG LTD
- Filing Date
- 2025-12-19
- Publication Date
- 2026-06-25
Smart Images

Figure EP2025088604_25062026_PF_FP_ABST
Abstract
Description
[0001] Age Deception Detector
[0002] Technical Field
[0003]
[0001] The present invention is in the field of age verification, and has particular applications in the context of security to prevent spoofing attacks based on users attempting to appear to be a different age.
[0004] Background
[0005] [2] Some activities are age-restricted. Historically, physical identity documents have been used to prove age. More recently, with the development of machine learning and artificial intelligence, automated age estimation and verification systems have been implemented.
[0006] [3] Facial age estimation models can be used to estimate an age of a human present in an image. These models analyse facial characteristics or features to predict age. They typically work on individual frames using a deep convolutional neural network to map the input pixels of the image to a predicted age (or age bands). These models out-perform humans, however unlike humans they are typically more susceptible to simplistic spoofing or deception attacks.
[0007] Summary
[0008] [4] With any computer vision model that is used to process human faces, a key vulnerability lies in bad actors attempting to spoof the system using synthetic facial props. In the case of age estimation a bad actor may wear a prosthetic beard, for example, with the hope of pretending to be older than they really are, thus obtaining access to otherwise restricted goods / services.
[0009] [5] Typical attack vectors against a facial age estimation model are likely to rely on a human bad actor using props, exaggerated facial expressions or make-up that they believe may encourage the model to assign a higher age value. Obvious examples include intentionally creating wrinkles on the forehead, smiling or the use of a fake beard / bald cap. These types of attacks are sometimes unreasonably effective against computer vision machine learning models, and would rarely work against a human operator. One of the reasons for this is that humans have the ability to determine when a task is “out of domain”, which by default machine learning models lack.
[0010] [6] One simple approach to identify these types of attack is to train a model to detect these instances. However, these models are trained for specific instances and so may not generalise to new attack types and very high-quality prosthetics. [7] Provided herein is a general method to detect instances where a user may be using such prosthetics to fool an age estimation model.
[0011] [8] According to a first aspect, there is provided a computer-implemented method for detecting a spoofing attack in an image, the method comprising: receiving an image of a user; obtaining a first image portion from the image, the first image portion being less that the whole image of the user; processing the first image portion to generate a first output indicative of a first estimated human age; processing the image of the user to generate a second output indicative of a second estimated human age; and based on the first output and the second output, determine whether the image of the user comprises the spoofing attack.
[0012] [9] In some embodiments, processing the first image portion or the image may comprise: detecting a structure in the first image portion or the image having human characteristics; and determining the respective estimated human age based on the human characteristics of the structure.
[0013]
[0010] According to a second aspect, there is provided a computer-implemented method for detecting a spoofing attack in an image, the method comprising: receiving an image of a user; obtained, from the image, a first image portion and a second image portion, wherein each of the first image portion and the second image portion comprises an area of the image of the user which is not in the second image portion and the first image position respectively; processingthe first image portion to determine a first output indicative of a first estimated human age; processing the second image portion to determine a second output indicate of a second estimated human age; and based on the first output and the second output, determine whether the image of the user comprises the spoofing attack.
[0014]
[0011] In some embodiments, processing the first image portion or the second image portion may comprise: detecting a structure in the first image portion or the second image portion having human characteristics; and determining the respective estimated human age based on the human characteristics of the structure.
[0015]
[0012] In some embodiments, determining whether the image of the user comprises the spoofing attack may comprise: determining a similarity measure based on the first output and the second output; wherein the determination is based on the similarity measure.
[0016]
[0013] In some embodiments, the first output comprises the first estimated human age and wherein the second output comprises the second estimated human age, wherein the similarity measure is an estimated age difference between the first estimated human age and the second estimated human age, wherein determiningwhetherthe image of the user comprises the spoofing attack may further comprise: comparing the estimated age difference to a threshold age difference; if the estimated age difference exceeds the threshold age difference, determining that the image comprises a spoofing attack; and if the estimated age difference does not exceed the threshold age difference, determining that the image does not comprise a spoofing attack.
[0017]
[0014] In some embodiments, the threshold age difference may be a predefined threshold age difference.
[0018]
[0015] In some embodiments, the threshold age difference may be derived based on a predefined allowable percentage age difference and one of the first estimated age and the second estimated age.
[0019]
[0016] In some embodiments, processing the image portion and / or the image may comprise providing the image portion and / or the image as input to a trained age estimation neural network, the trained age estimation neural network trained to estimate an age of a human captured in an image.
[0020]
[0017] In some embodiments, the method may further comprise: inpainting the first image portion to obtain a reconstructed image of the user, wherein the first image portion is inpainted so as to preserve an age of the user in the first image portion; wherein processing the first image portion comprises processing the reconstructed image of the userto obtain the first estimated human age.
[0021]
[0018] In some embodiments, the first image portion may be inpainted by providingthe first image portion to a trained age-preserving inpainting model, wherein the trained agepreserving inpainting model is trained to preserve the age of a user in the first image portion when inpainting.
[0022]
[0019] In some embodiments, the method may further comprise: inpainting the second image portion to obtain a second reconstructed image of the user, wherein the second image portion is inpainted so as to preserve an age of the user in the second image portion; wherein processing the second image portion comprises processing the second reconstructed image of the userto obtain the second estimated human age.
[0023]
[0020] In some embodiments, obtainingthe first image portion may comprise extracting the first image portion from the image of the user.
[0024]
[0021] In some embodiments, obtaining the first image portion may comprise masking all portions of the image of the user except the first image portion.
[0022] In some embodiments, obtaining the first image portion may comprise: identifying a facial feature in the image of the user; and determining that the facial feature is to be included in, or excluded from, the first image portion; and obtaining the first image portion based on the determination.
[0025]
[0023] In some embodiments, obtaining the second image portion may comprise: determining an area of the image which is not in the first image portion; and obtaining the second image portion based on the area which is not in the first image portion.
[0026]
[0024] In some embodiments, the method may further comprise: determining that the image of the user comprises an age deception object; wherein the first image portion is obtained and processed in response to determining that the image of the user comprises the age deception object.
[0027]
[0025] In some embodiments, the first image portion may be a portion of the image of the user which does not comprise the age deception object.
[0028]
[0026] In some embodiments, the image may be a facial image.
[0029]
[0027] According to a third aspect, there is provided a method of training an age- preservinginpainting modelfor use in an age estimation system, the method comprising: obtaining a ground truth image of a human; obtaining a ground truth age of the human in the ground truth image; obtaining an image portion of the ground truth image, wherein the image portion comprises an incomplete image of the human; provide the image portion to an untrained age-preserving inpainting model to obtain a reconstructed image; providing the reconstructed image to a trained age estimation model to obtain an estimated age; computing an age loss using an age loss function and based on the estimated age and the ground truth age; and backpropagatingthe age loss.
[0030]
[0028] In some embodiments, the trained age estimation model may have frozen weights.
[0031]
[0029] In some embodiments, the method may further comprise: computing a reconstruction loss using a reconstruction loss function and based on the ground truth image and the reconstructed image; and backpropagatingthe reconstruction loss.
[0032]
[0030] In some embodiments, the method may further comprise executing each of the steps for a second ground truth image.
[0031] In some embodiments, the estimated age and the ground truth age may be scalar values, wherein the age loss function is a p-norm.
[0033]
[0032] In some embodiments, the estimated age may be an estimated age probability distribution, wherein the ground truth age is an age probability distribution, wherein the age loss function is a statistical distance between the distributions.
[0034]
[0033] In some embodiments, the method may comprise computing a total loss based on the reconstruction loss and the age loss.
[0035]
[0034] In some embodiments, the total loss may be further computed based on a perceptual similarity loss between the ground truth image and the reconstructed image.
[0036]
[0035] According to a fourth aspect, there is provided a computer system comprising: a memory storing computer-readable instructions; and one or more processors which, when executing the computer-readable instruction, are configured to execute any of the method set out herein.
[0037]
[0036] In some embodiments, the one or more processors may be configured to execute: a trained age estimation model to determine the first output and the second output; and a trained age-preserving inpainting model, trained as set out herein, to inpaint the first image portion to obtain the reconstructed image of the user.
[0038]
[0037] According to a fifth aspect, there is provided herein a computer program which, when executed on one or more processors, is configured to execute any of the method provided herein.
[0039] Brief Description of the Drawings
[0040]
[0038] To aid understanding of the present invention, and to show howthe same may be carried into effect, reference is made byway of example to the following figures, in which:
[0041]
[0039] Figure 1 provides a first example method for determining if a user has attempted to spoof an age estimation system;
[0042]
[0040] Figures 1A-1 D provide variations of methods for determining if a user has attempted to spoof an age estimation system;
[0043]
[0041] Figure 2 provides a second example method for determining if a user has attempted to spoof an age estimation system;
[0044]
[0042] Figure 3 illustrates image segmentation according to the present invention;
[0043] Figure 4 provides a third example method usin inpaintingfor determining if a user has attempted to spoof an age estimation system;
[0045]
[0044] Figure 4A provides an alternative method using inpaintingfor determining if a user has attempted to spoof an age estimation system;
[0046]
[0045] Figure 5 provides an example method for training an inpainting model to preserve age;
[0047]
[0046] Figure 6 provides an example system for detecting age estimation spoofing;
[0048]
[0047] Figure 7 provides an example method for determining whether to grant a user access to an age-restricted activity;
[0049]
[0048] Figure 8 shows example data for defining age difference thresholds; and
[0050]
[0049] Figure 9 shows further example data for defining age difference thresholds.
[0051] Detailed Description
[0052]
[0050] Certain facial features or characteristics are indicative of age. Such features include moustaches, wrinkles, and baldness. Certain make-up may also be suggestive of an age. While the presences of any one of these age-indicating features may cause an overall image of a person to be assigned one age, smaller or less prominent features captured in the image may provide further insight into the actual age of the user.
[0053]
[0051] For example, a person with a large beard may appear to be one age, while they are actually much younger. The person’s eyes and forehead may be more aligned with their actual age.
[0054]
[0052] The idea that some specific features can be used to alter the perceived age of a person can be used to spoof an age estimation system.
[0055]
[0053] In order to capture the age of features which may otherwise be overlooked, methods are provide herein which analyse a portion of an image of a human to determine the estimated age from that portion. This estimated age is compared to the estimated age derived from the whole image, or from other portions of the image. Discrepancies in the estimated ages are used to identify instances in which a user has attempted to appear to be a different age, thereby spoofing the age estimation system.
[0056]
[0054] Figure 1 shows a first example method 100 for identifying an age spoofing attack.
[0055] An image 102 of a user is received. The image 102 is an image of the user’s face. It may also include other parts of the user’s body.
[0057]
[0056] A first image portion 104 of the image 102 is obtained. The first image portion 104 can be seen to include the top half of the user’s face, and therefore shows their hair, forehead, and eyes. It will be appreciated that the first image portion 102 may comprise other features of the user instead or in addition, as discussed in more detail later.
[0058]
[0057] The first image portion 104 is provided as input to an age estimation model 106 to determine a first estimated human age 108a of the user. In this example, the top part of the user’s face as captured in the image is estimated to be that of a 16-year-old.
[0059]
[0058] The image 102 which includes all portions of the user captured in the image is also provided as input to the age estimation model 106 to determine a second estimated human age 108b. Here, the user as presented in the whole image is estimated to be 25.
[0060]
[0059] The difference between the age estimated from the whole image, that is the overall impression given by the user, and the age estimated from a smaller portion of the user indicates that the user has used some prop or other age-modifying mechanism to make themselves appear to be a different age. For example, the user may have used make-up to apply a fake beard to their chin to make them appear older, and thus eligible to access an age restricted activity, such as an 18+ website.
[0061]
[0060] The first and second estimated ages 108a, 108b are provided as input to an age estimation comparison module 110. This module determines an estimated age difference, that is the difference between the estimated ages 108a, 108b, which in the example of Figure 1 is 9 years. The age estimation comparison module 110 comparesthe age difference to a predefined threshold age difference. If the estimated age difference exceeds the threshold, it is determined that the user has attempted to spoof the age estimation system. However, if the estimated age difference is less than the threshold, it is determined that no spoofing has been attempted.
[0062]
[0061] In the example of Figure 1 , the estimated age difference exceeds the threshold, and therefore the age estimation comparison module 110 provides an output 112 indicatingthat spoofing has been attempted.
[0063]
[0062] The estimated age difference may be considered a similarity measure.
[0064]
[0063] The output 112 can be used by an age verification system to determine whether the user satisfies an age criterion. Taking the example above, the age verification system would not grant the user access to the 18+ website because the age spoofing is detected in the image 102 of the user.
[0065]
[0064] As set out above, the age of the image 102 or image portion 104 is estimated by an age estimation model. Such models are known in the art and will not be described in depth herein. Any model which is capable of estimatingthe age of a human present in an image may be used in the methods provided herein.
[0066]
[0065] In essence, an age estimation model is a neural network which has been trained to estimate the age of a human present in an image. The neural network is a deep neural network, and in many instances a convolutional neural network. The age estimation model 106 processes images by first detecting a structure on the image which has human characteristics and using the characteristics to determine an estimated age. The age estimation model 106 used in the examples herein is a facial age estimation model, which estimates the age of a human based on facial features.
[0067]
[0066] The age estimation model may have as output many neurons, for example if it predicts the probability of each age, or it may be a binary model, for example if predicts the probability of being above or below an age.
[0068]
[0067] Herein, the age estimation model 106 is referred to as determining an estimated age. The estimated ages used in the present examples are scalar values, that is a single estimated age value, also referred to as a float. However, an age estimation model may be chosen for use which outputs an estimated age probability distribution as the estimated age. In this case, determinations may be made based on the distribution or based on a scalar value derived from the distributions, e.g. the mode of the probability distribution. Alternativity, the age estimation model may generate a binary estimated age classification, e.g. over 18 or under 18, which is provided as at output. Within the context of the present disclosure, each of the types of outputs of the age estimation models may be considered an estimated age or indications of an estimated age.
[0069]
[0068] Figure 1 shows two instances of the age estimation model 106, with the image 102 and first image portion 104 being processed in parallel. It will be appreciated that there may only be a single instance of the age estimation model 106, with the image 102 and first image portion 104 being provided as input to the age estimation model 106 independently.
[0070]
[0069] Figure 2 provides an alternative method 200 for determining if the user has attempted to spoof the age estimation system.
[0070] In the example of Figure 2, the image of the user 102 is received and divided into two portions: the first image portion 104 including the top part of the user’s face, and a second image portion 204 comprising the bottom part of the user’s face. The bottom part includes the user’s mouth and chin, while the top part includes the user’s forehead, hair, and eyes.
[0071]
[0071] In this example, the second image portion 204 is provided as input to the age estimation model 106 to determine the estimated age 108b of the user instead of the image 102 as in method 100.
[0072]
[0072] Here, the estimated age 108b is 29. The higher age in this example is due to the user’s features which have not been modified, e.g. eyes, being removed for the purpose of the age estimation, thus emphasisingthe age increase caused by the prop. However, it will be appreciated that the age difference may not be increased by using the method 200.
[0073]
[0073] As in method 100, the estimated ages 108a, 108b are provided to the age estimation comparison module 110, which uses the estimated ages to determine if spoofing is present in the image 102, generating output 112.
[0074]
[0074] In the methods 100, 200 above, the face of the user captured in the image 102 is split in half to obtain the first image portion 104, and in method 200 the second image portion 204. However, it will be appreciated that the image 102 may be divided in other ways.
[0075]
[0075] Figure 3 provides some further examples of ways in which the image 102 may be segmented.
[0076]
[0076] In one example, the image 102 is split into quarters 304a, 304b, 304c, 304d. Each quarter 304a, 304b, 304c, 304d comprises a different portion of the user’s face. The top quarters 304a, 304b comprises the user’s right and left eye respectively. The bottom quarters 304c, 304d comprise the right and left half of the user’s mouth respectively.
[0077]
[0077] In another example, the image 102 is split into a forehead portion 306, a centre portion 308 comprising the eyes and nose, and a mouth portion 310. That is, the image 102 is split so as to provide significant features which are indicative of age in individual portions.
[0078]
[0078] Otherfeatures which may be extracted into their own portion may also be defined. For example, the facial image could be segmented to individually identify each of the user’s eyes, mouth, nose, skin, eyebrows, and hair, or any subset of these. These can be individually passed by the age estimation model 106.
[0079]
[0079] As another example, blocks or pixels within the image 102 may be chosen at random to be excluded from the image portion 104. This may reduce the significance of any prop in the image portion 104.
[0080]
[0080] It will be appreciated that in the context of facial age estimation, the face of the user as displayed in the image 102 is divided into portions. That is, if the whole of the user’s face is present in the top half of the image 102, the image 102 is not itself divided in half, but instead the image of the user’s face. This enables smaller portions of the user’s face to be isolated for processing by the age estimation model 106.
[0081]
[0081] Where multiple image portions are processed, as in Figure 2, the image portions may comprise some overlapping areas of the image. Each portion, however, comprises at least some area of the image which is not in the other portion(s). For example, if the facial image is split in two, the top portion may include the forehead, eyes, and nose, and the bottom portion may include the nose, mouth, and chin. The nose is present in both portions, but each portion comprises some features which are not in the other portion.
[0082]
[0082] A number of techniques may be used to ensure that the image 102 is divided such thatthe image portions, also referred to as patches, comprise only a portion of the user’s face.
[0083]
[0083] In one example, the user may be required to fill a predefined area with their face when capturingthe image 102. A user interface may be displayed which renders an oval or other location indicator which the user must fill or align with their face before the image 102 is captured. Such user interfaces are known in the art. Since the location within the image 102 which comprises the user’s face is predefined, the locations of patches within the image 102 can also be predefined.
[0084]
[0084] In another example, the image 102 may be processed to locate facial features in the image 102 before the image portions are obtained. For example, the image 102 may be processed to locate the user’s face, eyes, nose, and mouth. Once these locations have been determined within the image 102, the required portions can be extracted based on the determined feature locations.
[0085]
[0085] The image portions may be obtained from the image 102 in any suitable manner. For example, a mask may be applied to the image 102 to cover all parts of the image other than the portion to be extracted. This may also be referred to as patch occlusion. As another example, the image 102 may be cropped or cut to remove all parts of the image 102 except the required portion. In yet another example, a copy of only the parts of the image to be included in the image portion may be made. Other suitable methods may be apparent to the skilled person.
[0086]
[0086] It will be appreciated that the division of the image 102 as described with reference to Figure 3 may apply equally to any of the methods for obtaining the image portions.
[0087]
[0087] Where more than two portions are obtained from the image 102, the method 200 may be modified to provide each of the portions to the age estimation model 106. The age predictions comparison module 110 may then determine the largest age difference between the estimated ages, and base the spoofing determination thereon.
[0088]
[0088] Alternatively, some other similarity metric may be used. For example, the age predictions comparison module 110 determines a distribution of estimated ages, and base the spoofing determination on the distribution.
[0089]
[0089] In some embodiments, not all portions of the image 102 are provided as input to the age estimation model 106. For example, only a forehead portion 306 and mouth portion 310 may be processed to determine estimated ages 108. The centre portion 308 may not be processed. In another example, the user’s body may not be included in the processed image portions. Such an embodiment may be used if it is determined that a part of the user does not provide useful information for the purpose of estimating the user’s age or detecting age spoofing, and where resource restrictions benefit from processing less image data.
[0090]
[0090] In some embodiments, both first and second image portions 104, 204 and the image 102 are processed by the age estimation model 106. The age estimation comparison module 110 may use all estimated ages to determine if there is spoofing in the image 102. As set out above, the maximum age difference may be pre-determined or a distribution of the estimated ages may be used.
[0091]
[0091] Figures 1A-1 D provide further methods which may be used to identify spoofing attacks based on estimated ages of a user derived from an image and image portion. This alternative methods are provided relative to the method 100 of Figure 1 , however it will be appreciated that the methods could also be applied in a similar way to method 200 of Figure 2, i.e. by deriving the estimated ages from multiple image portions. As above, the methods are shown to process two image inputs, however these examples are illustrative and there may be more image inputs and / or image inputs comprising different image portions as discussed with reference to Figure 3.
[0092] Methods 100A, 100B, and 100C of Figures 1 A-1 C respectively comprise a Vision- Language Model (VLM) 120. A VLM 120 is a type of machine learning model designed to process and relate visual data (e.g., images, video frames) with natural language text. VLMs typically include a visual encoder that converts visual inputs into feature representations, a language encoder that converts text into semantic embeddings, and a mechanism for combiningthese representations into a shared multimodal space. This enables tasks such as image captioning, visual question answering, and cross-modal retrieval. VLMs are generally trained on paired image-text datasets using objectives that align visual and linguistic features, and are often general-purpose models.
[0092]
[0093] Figure 1A shows an example method 100A in which the estimated ages 108a, 108b are processed bythe VLM 120, rather than the age estimation comparison module 110, to determine if the image 102 comprises a spoofing attack. Although shown to replace the age estimation comparison module 110, the VLM 120 may be a component of the age estimation comparison module 110.
[0093]
[0094] The estimated ages 108a, 108b are derived in the same way as in method 100, by providing the image portion 104 and image 102 to the age estimation model 106. The estimated ages 108a, 108b, and the full image 102 are provided as input to a VLM 120. A prompt 122 is also provided as input to the VLM120. The prompt causes the VLM to determine, based on the inputs, whether the image 102 comprises a spoofing attack.
[0094]
[0095] The prompt 122 may comprise a system prompt and a user prompt. The system prompt may be a predefined prompt which defines what a spoof attack is, and is provided as input to the VLM 120 each time it is used to interpret the estimated ages 108a, 108b. This system prompt is used by the VLM 120 to interpret the inputs.
[0095]
[0096] The user prompt is a prompt provided by a user for a given executing of the method 100A. The user defines a question for the VLM 120 to answer. For example, the user may ask the VLM 120 if there is a spoofing attack. It will be appreciated that the user prompt may also be predefined, such that the same user prompt is provided ot the VLM 120 with each execution of the method 100A.
[0096]
[0097] As an example, the prompt 122 may comprise:
[0097] System prompt: explanation of spoof attack>
[0098] User prompt: <whole image tokens> According to our expert, the upper part of the face depicted in this image is estimated to have a human age of <first prediction> and the lower part of the face is estimated to have a human age of <second predictions Is the difference between predictions likely an indication that this is a spoof attack?
[0099]
[0098] The VLM 120 processes the full image 102 and the estimated ages 108a, 108b based on the prompt 122 to determine if there is a spoof attack, and provides output 112 to indicate the results.
[0100]
[0099] Figure 1 B provides another method 100B in which a VLM 120 is used to process the estimated ages 108a, 108b to determine if a spoofing attack is present in the image 102. In this method, the estimated ages 108a, 108b are derived by the age estimation model 106, and processed by the age estimation comparison module 110. In this embodiment, the age estimation comparison module 110 comprises a large error estimator, such thatthe output of the age estimation comparison module 110 comprises an age estimation error.
[0101]
[0100] The age estimation error output 124 may be determined by the age estimation comparison module 110 using patch-wise interference. Patch-wise inference refers to a technique for detecting significant errors or anomalies within large datasets by dividing the input (e.g., images, time-series signals) into smaller, localized patches. Each patch is independently processed by an encoder, such as a transformer or language model, to generate feature representations. A reconstruction or prediction mechanism then estimates expected values for each patch, and deviations between actual and predicted outputs are computed to identify errors. This localized approach enables efficient detection of large-scale anomalies without requiring full-sequence or full-image processing, improving scalability and accuracy in systems handling high-dimensional or multimodal data.
[0102]
[0101] The age estimation error is determined based on age distributions generated by the age estimation model 106.
[0103]
[0102] The age estimation error output 124 is then provided as input to a VLM 120. The estimated ages 108a, 108b and image 102 are also provided as input. The prompt 122 provided to the VLM 120 instructs the VLM 120 to determine if the error is correct, thus outputting a final age estimation error output 126. The prompt 122 may comprises a system prompt and a user prompt. As an example, the prompt may comprise:
[0104] System prompt: explanation of spoof attack>
[0105] User prompt: <whole image tokens> According to our age-estimation expert, the person in the image is prediction 1 > years old. According to our error detector, the expert is prediction 3; accurate / inaccurate>. Is the age-estimation expert accurate? Or is the age-estimation expert making a large error?
[0106]
[0103] Here, the age estimation expert refers to the age estimation model 106.
[0107]
[0104] If it is determined that there is an error in the estimated ages, a user may be prompted to attempt the age estimation process again. This is because the estimated ages 108a, 108b are unreliable for the purpose of determining the user’s age, and / or if there is a spoof attack. If no error is detected, the system trusts the estimated ages 108a, 108b, and so uses the estimated ages 108a, 108b to determine if there is a spook attack. Further, the estimated ages 108a, 108b can be used to determine if the user meets an age criterion if they are determined to be accurate.
[0108]
[0105] The final age estimation error output 126 output by the VLM 120 is more accurate than the age estimation error output 124 output by the age estimation comparison module 110. Therefore, the error determined by the VLM 120 is considered the correct error and may override the error determined by the age estimation comparison module 110.
[0109]
[0106] In some embodiments, the estimated ages 108a, 108b are only provided to the VLM 120 for analysis if the age estimation error output 124 indicates that there is an error in the age estimation. In this case, the VLM 120 is used to verify the error. If no error is detected by the age estimation comparison module 110, then no data is sent to the VLM 120 for processing. Instead, the age estimation module 110 determines if there is a spoof attack based on the estimated ages 108a, 108b.
[0110]
[0107] Figure 1 C provides example method 100C, in which a VLM 120 is used in place of a specifically trained age estimation model 106.
[0111]
[0108] In this embodiment, the image portion 104 and the image 102 are each independently provided to the VLM 120 with a prompt 122. The prompt 122 asks the VLM 120 to determine the age of the user based on the input image or image portion. In this way, the VLM 120 acts in a similar way to the age estimation model 106. The VLM 120 outputs an estimated age 108 for each input image or image portions.
[0112]
[0109] While Figure 1 C only shows two image inputs (image 102 and image portion 104) being provided to the VLM 120, multiple different views, or portions, of the image 102 may be provided to the VLM 120 to obtain multiple age estimations 108. In addition, permutations of the system and / or user prompts may be generated and provided to the VLM 120.
[0110] The different views of the image 102 may be referred to as variations of the image 102, and may include different crops, masked portions, and / or colour augmentations. In addition to the estimated ages 108 generated by the VLM 120 when provided with these variations , logarithmic probabilities (logprobs) are output by the VLM 102 which provide a probability of the estimated ages over the variations provided as input. The logporbs are also provided to the age estimation comparison module 100 for processing. In some embodiments, only some of the logrpobs are provided to, or processed by, the age estimation comparison module 100.
[0113]
[0111] Based on the outputs of the VLM 120 forthe different image portions and prompts, the variance of the estimated ages 108 is computed by the age estimation comparison module 110. Within the context of machine learning, a higher variance is an indication that a hallucination is detected. In the method 100C of Figure 1 C, deception detection is framed as hallucination detection using the VLM 120. Thus, a higher variance is an indication that there is a spoof attack.
[0114]
[0112] The age estimation comparison module 110 takes the logprobs of the estimated ages 108 as input, determines a variance between the logprobs, and determines based on the variance if there is a spoofing attack. The age estimation comparison module 110 may only consider the logprobs for the most likely predicted age for a given variation of the image. The variance between these logprobs is then computed and used to identify a spoof attack.
[0115]
[0113] There may be a predefined threshold variance, above which the age estimation comparison module 110 determines that there is a spoof attack, for example. The age estimation comparison module 110 generates output 112 indicating whether the image 102 comprises a spoof attack. The predefined threshold variance may be tuned for each application domain. This enables high risk activities to have more stringent anti-spoofing checks, while lower risk activities may have less stringent checks.
[0116]
[0114] Figure 1 D provides a further method 100D for identifying a spoof attack. In method 100D, the image 102 and image portion 104 are provided to age estimation models 106 to obtained estimated ages 108a, 108b. The age estimation model 106 also outputs, for each input image or image portion, an activation output. The activation output may be a class activation map (CAM). The activation output indicates the contribution of areas, or pixels, of the image or image portion input to the estimated age 108.
[0117]
[0115] In the example of Figure 1 D, the activation outputs are processed to compute a heatmap for the CAM, such that the contribution of each pixel of the image is represented as a heatmap 126a, 126b. While these heatmaps 126a, 126b are shown in Figure 1 D as being generated prior to input to the age estimation module 110, it will be appreciated that the activation output may be provided to the age estimation module 110, which generates therefrom the heatmaps 126a, 126b.
[0118]
[0116] It can be seen in the example of Figure 1 D that the heatmap 126b derived from the full image 102 comprises an area of high activation 128 around the chin of the user captured in the image 102. This indicates that the estimated age 108b is derived predominantly from the pixels in this area. In contrast, the heatmap 126a derived from the image portion 104 has a uniform activation across the portion of the user present in the image portion 104, thus indicating that the estimated age 108a is consistent in the image portion 104.
[0119]
[0117] The area of high activation 128 in heatmap 126b indicates an age deception object, such as a fake beard.
[0120]
[0118] The age estimation comparison module 110 processes the estimated ages 108a, 108b and the heatmaps 126a, 126b to determine if there is a spoof attack. In this example, not only is there a large difference in the estimated ages 108a, 108b, but there is also a high activation area 128 in the heatmap 126b derived from the image 102, which indicates that the estimated age 108b derived from the image 102 may not be accurate. Thus, the age estimation comparison module 110 can determine that the estimated age 108b derived from the image 102 should not be trusted. As such, the age estimation comparison module 110 determines that there is a spoof attack, and generates the output 112.
[0121]
[0119] In some embodiments, the heatmaps 126a, 126b may be used to verify that the correct image portion 104 has been used. In the example of Figure 1 D, the age estimation comparison module 110 can determine that the bottom portion of the image 102 corresponding to the area of the high activation area 128 should be masked for processingthe image portion 104. The age estimation comparison module 110 can then verify that this area of the image has been masked in image portion 104, and thus determine that the age estimation 108a derived from the image portion 104 is suitable for identifying a spoofing attack.
[0122]
[0120] In another embodiment, the heatmaps 126a, 126b may be used to determine the image portion 104 or masking to be used. For example, the image 102 may be processed first by the age estimation module 106 to obtain the heatmap 126b. Based on the heatmap 126b, the portion of the image to be masked is determined. That is, the area of the image 102 comprising the high activation area 128 is to be masked. Thus, the top image portion 104 is obtained for processing by the age estimation model 106.
[0121] The same methodology may be applied to the background of the image 102. That is, if the estimated age 108 is influenced by the background, or to the user’s clothes for example, the system may determine that these areas should be masked in further image portions. Alternatively, the system may flag that the background has contributed to the age, and as a result limit the extent to which the image is used for estimating the user age. As an example, a threshold of 30% may be predefined, such that the image 102 is flagged if more then 30% of the heatmap energy is located in the background.
[0123]
[0122] It will be appreciated that the activation outputs may be process in a form other than heatmaps 126a, 126b. Heatmaps are provided here by way of example only.
[0124]
[0123] Further, the age estimation model 106 may be replaced bya VLM 120 as described with reference to Figure 1 C. Other combinations of the methods set out in Figures 1 A-1 D may be implemented as desired to achieve different variations of the spoof attack identification.
[0125]
[0124] Figure 4 provides a further method 400 for determining if the user has attempted to spoof the age estimation system. The method 400 provides a modification to method 100, however it will be appreciated that the same modification could be made to method 200 or any variation of methods 100 and 200.
[0126]
[0125] In method 400, the image 102 of the user is received and the top (first) image potion 104 obtained. Here, the bottom half of the image 102 is masked to obtain the first image portion 104.
[0127]
[0126] The top image portion 104 is provided as input to an inpainting model 402. The inpainting model 402 is trained to reconstruct the masked or removed portion of the image 102 to generate a reconstructed image 404 of the user so as to conserve the age of the user in the top image portion 104. The inpainting model 402 may be referred to as an age-preserving inpainting model.
[0128]
[0127] The inpainting model 402 and its training will be described in more detail below.
[0129]
[0128] The reconstructed image 404 is provided to the age estimation model 106 to determine the first estimated age 108a. This is compared by the age estimation comparison module 110 to the second estimated age 108b as determined by the age estimation model 106 based on the image 102.
[0130]
[0129] In the example of Figure 4, the age estimation comparison module 110 determines that there is spoofing in the image 102.
[0130] The inpainting model 402 could be any known model for example a U-NET, or CNN with downsampling and upsampling filters that takes in an image, downsamples with an encoder and upsamples with a decoder to get a new image as output.
[0131]
[0131] Method 400 shown in Figure 4 requires the additional step of inpainting, and thus is less computationally efficient than the methods 100, 200 of Figures 1 and 2. However, the method 400 has the advantage over the previous methods 100, 200 that the reconstructed image 404 more accurately represents the images of the training data set used to train the age estimation model 106 that the image portions 104, 204. This is because the training data set comprises image of complete human faces.
[0132]
[0132] That is, the inpainting creates "in-distribution" samples, i.e. samples that are closerto the naturalfaces in the training dataset. Aface with the top / bottom part masked as used in methods 100 and 200 does not lie on the same image-manifold as the original image. The age estimation model 106 is able to estimate an age from the image portions 104, 204 because they are close to the image-manifold of the training data. However, the accuracy of the age estimation results is improved if the input image includes the whole face, rather than just a portion, as in the training data.
[0133]
[0133] Figure 5 illustrates how the inpainting model 402 is trained.
[0134]
[0134] In summary, to train the inpainting model 402, a facial image is masked to obtain a first image portion 104. The first image portion 104 is provided as input to the inpainting model 402, which inpaints to generate a reconstructed image 502a. A frozen ageestimation model is then called to obtain an estimated age or an age distribution 504a on the inpainted image 502a. The inpainting model is penalised for both differences in the predicted age and differences in the image itself.
[0135]
[0135] In this context, the frozen age-estimation model is a model that is pre-trained to predict an age, or distribution of ages, from a facial image, such as the age estimation model 106 used in the methods discussed above. The “frozen” refers to its weights not being updated duringtraining, i.e. this model is not trained but rather is used as a signal to train the inpainting model.
[0136]
[0136] The training of the inpainting model 402 will now be described in more detail. It will be appreciated that the specific implementation described is provided by way of example only, and that suitable substitutes may be used instead, as will be apparent to the skilled person.
[0137] Training the inpainting model 402 may start with a pre-trained general inpainting model with pre-trained weights, or the training may start from scratch with an untrained model with random weights.
[0137]
[0138] First the images of the training set are aligned and cropped to the input image size of the inpainting model 402, here (256,256,3), where the 3 indicates that the image is an RGB image. Each ground truth image is denoted X and the ground truth (known) age is denoted y.
[0138]
[0139] Next, for each (image, label) = (X,y) in the training set, part of the image 102 is masked. Masking may be achieved by using black pixels, white noise, or the mean of the training data, to obtain a new image Z, also referred to as the first image portion 104. The masking may be applied to the top part of the image 102, bottom part of the image 102, or any combination of patches of the image 102, either at random or specifically chosen.
[0139]
[0140] The image portion 104 is then passed through the inpainting model 402 to inpaint it and the inpainted image 502a, 502b is obtained. The inpainted image 502a, 502b is denoted X*.
[0140]
[0141] The inpainting model 402 inpaints the whole image, that is it generates a complete new image. In some embodiments, the inpainted image 506a, 506b is the whole inpainted image. In other embodiments, the inpainted image 506a, 506b is a combination of the inpainted patches with the rest of ground truth (un-masked) patches. In the preferred embodiment, This second embodiment trains the model 402 in a similar manner to teacher forcing guidance. Which of these embodiments is used may be dependent on the method used in inference. That is, if in use the image processed by the age-estimation model 106 is a combination of the ground truth and the reconstructed image, the image used in training is also a combination.
[0141]
[0142] Since it is the masked part of the image 102 which is of interest for deception detection, the combination image may be preferable in such use cases.
[0142]
[0143] Once the inpainted image X* 502a, 502b has been generated, it is passed through the frozen age model to obtain the predicted age y*. The predicted age y* may be either a float (e.g. a specific age) or a distribution of ages depending on the choice of age estimation model.
[0143]
[0144] Thetotal loss will be the weighted combination of a reconstruction loss Lrec508a, 508b and an age loss Lage506a, 506b: where: where: distAis any p-norm or metric applied to the vector X - X*, for example: distBis a loss comparing the distance between y and y*
[0144] • If y,y* are distributions of ages, then the distBis a statistical distance between the distributions, for example the KL-divergence or Earth mover’s distance (EMD) between the two distributions.
[0145] • If y, y* are scalar ages, then it can be any distance metric between them (L1;L2, or Lpin general)
[0146]
[0145] LPIPS is the perceptual similarity loss, which is a metric used to measure the similarity between two images.
[0147]
[0146] In some embodiments, the LPIPS loss may not be included in the total loss equation. In other embodiments, the LPIPS loss may be included instead of the Lrec(X, X*), and in such an embodiment may be referred to as the reconstruction loss as it measures the image similarity between the ground truth image and the reconstructed image.
[0148]
[0147] The losses 506, 508 set out above are backpropagated to train the inpainting model. The inpainting model trained based on these loses is capable of accurately reconstructing the image in an age-preserving way.
[0149]
[0148] In the example above, y is the ground truth age. This is the known age of the person shown in the ground truth image. Where a float is used, this is the age used. However, where a distribution is needed, a Gaussian distribution is generated around the ground truth age y. The standard deviation used may be, for example 3 years, but other standard deviations will be apparent.
[0150]
[0149] In some embodiments, instead of usingthe known age of the user as the ground truth age, the ground truth image 102 may be passed by the frozen age-estimation model 106 to derive the age or age distribution.
[0150] The training data set may be that used to train the age-estimation model 106.
[0151] The same training data set may be used because the age-estimation model 106 is being queried to determine if the image has been inpainted in an age-preserving way.
[0152]
[0151] The data set may comprise a set of facial images of people over a large range of ages. For example, the people may be between 3 and 80 years old. It will be appreciated that other age ranges may be used, for example 5 to 70 years, 1 to 90 years, etc.. The images may be in any image format. The people in the images may be facing the camera, or substantially facing the camera. The people may display a range of different features such as facial hair, glasses, different skin colour, etc. The purpose of including people exhibiting different features is to expose the model to all types of features that is may come across in use. The model may be trained on millions of images.
[0153]
[0152] The training data set may be sampled during for training. That is, the training data set may comprise different numbers of images of people at different ages. For the purpose of training, the training data set may be sampled to select the same number of images of people at different ages, so that each age is represented by the same number of images.
[0154]
[0153] The batch size for training may be 512. The initial learning rate may be 0.003, which reduces to 0.001 after plateauing.
[0155]
[0154] Once trained, the age-preserving inpainting model 402 can be used to create augmentations of the training dataset, by generating a larger number of images for which the age is known. The noise used during inpainting can be controlled, and so a wide range of images with the ground truth age can be generated from a much smaller number of original images. The inpainting model 402 could also be used to add props to the images as augmentations in a way that ensures we maintain the age of the person with high fidelity. These augmentations can help improve the age-estimation model 106 in the future, using the more diverse training set comprising reconstructed images which have not altered the ground truth age.
[0156]
[0155] This is also useful for asking what-if questions (counterfactuals) and making claims about test distribution from the inpainted dataset.
[0157]
[0156] The inpainting method of Figure 4 may be modified to use a VLM 120 in place of the inpainting model 402, as shown in Figure 4A. This removes the requirement to train the inpainting model 402.
[0157] In method 400A, the image portion 104 is provided to the VLM 120. A prompt 122 is also provided, which instructs the VLM 120to inpaintthe image in an age-invariant way. For example, the prompt 122 may be “please fill in the missing part of the face to preserve the age of the person”. The VLM 120 then outputs the reconstructed image 404 for use by the age estimation model 106 in estimating the user’s age. The method 400A is otherwise the same as that of method 400.
[0158]
[0158] In both methods 400 and 400A, the methods may be modified to include other features of method 100A-100D as desired.
[0159]
[0159] As an example, the age estimation models 106 may be replaced by VLMs as described with refence to Figure 1 C. The variance derived based on the reconstructed image 404 may be used to determine the quality of the inpainting by the VLM 120. For example, the inpainting, and so reconstructed image 404, may not be trusted if the hallucination metrics are high, or over a predefined threshold. However, if the hallucination metrics are low, or under the threshold, then the inpainting is trusted.
[0160]
[0160] Figure 6 provides an example system for implementing the methods provided above.
[0161]
[0161] An age estimation system 608 is provided, which comprises a deception object detection module 610, an age estimation module 612, and an age deception detection module 614. The age estimation system 608 communicates with a user device 604 of a user 602 via a network 606 such as the Internet.
[0162]
[0162] The age estimation system 608 may comprise a server with one or more processors for executing programs to deliver the functionality of each of the modules 610, 612, 614. The programs may be stored in a memory at the age estimation system 608. The server may be a cloud-based server.
[0163]
[0163] The age estimation system 608 may be provided over multiple servers. For example, each module 610, 612, 614 may be provided by a different server. The servers can communicate with each other via a network.
[0164]
[0164] The server of the age estimation system 608 may also be used to train the agepreserving inpainting model, as described with reference to Figure 5. Alternatively, the training may take place at another server comprising one or more processors and provided to the age estimation system 608 once trained.
[0165]
[0165] The user 602 captures their image 102 at the user device 604 using a camera of the user device 604. The image 102 is sent by the user device 604 via the internet 606 to the age estimation system 608 for processing. The age estimation system 608 may provide an age estimation result to the user device 604. The age estimation result may be an indication of an estimated age or age range, or an indication that spoofing has been identified.
[0166]
[0166] The age estimation system 608 may instead provide the age estimation result to another system for which the user’s age is required, such as an age-restricted website.
[0167]
[0167] The age estimation module 612 comprises the age estimation model 106. The age estimation module 612 processes the image 102 of the user and / or image portions 104, 204, 304, 306, 308, 310, and / or the reconstructed image 404 to estimate an age of the user 602 as depicted in the image or image portion.
[0168]
[0168] The age deception detection module 614 comprises the inpainting model 402. The age deception detection module 614 receives the image 102 of the user and obtains the image portion or reconstructed image for providing to the age estimation module 612.
[0169]
[0169] The age deception detection module 614 also determines if the image 102 comprises a spoofing attack based on the estimated ages received from the age estimation module 612.
[0170]
[0170] The deception object detection module 610 may be used to determine whether the image 102 comprises a feature which may be a deception object, and thus whether the methods 100, 200, 400 are to be used.
[0171]
[0171] A deception object as referred to herein is an object or feature which may be used to alter the user’s age, thereby deceiving or spoofing the age estimation system 608. Such objects include fake beards and moustaches, bald caps to imitate baldness, and hats. Other objects which may have the same effect will be apparent.
[0172]
[0172] The deception object detection module 610 comprises an age deception object detection model (not shown) which is trained to identify features within the image 102 which may be a deception object. The age deception object detection model may be trained, for example, to identify fake beards and moustaches, bald caps, hats, glasses, exaggerated or dawn on wrinkles, and fringes. Models trained to identify such features in images are known in the art, and as such will not be described in more detail herein.
[0173]
[0173] If any of these features are identified, the deception object detection module 610 indicates to the age deception detection module 614 that the image 102 may comprise a spoofing attack. This causes the age deception detection module 614 to obtain the image portion or reconstructed image for providing to the age estimation module 612.
[0174] The deception object detection module 610 may indicate to the age deception detection module 614 which feature has been identified. This allows the age deception detection module 614 to obtain a first image portion 104 which does not include the identified feature. In this way, the first image portion 104 more accurately represents the age of the user 602.
[0174]
[0175] If, however, no features which may be deception objects are identified in the image 102, the image 102 may not be provided and / or processed by the age deception detection module 614. Instead, only the image 102 is processed by the age estimation module 612 and the age of the user 602 estimated without any additional image processing or age estimation.
[0175]
[0176] While in some cases the features identified by the deception object detection module 610 are real or not used to deceive the system, in other cases the features are fake and used as a spoofing mechanism. By identifying all images 102 which contain a feature which may cause a change in age to be estimated, the system 608 can check using the methods set out herein wither the features have caused the estimated age of the user to be altered. In this way, the system 608 is able to more accurately identify spoofing attacks for a larger range or spoofing mechanisms.
[0176]
[0177] By including the deception object detection module 610 in the age estimation system 608, only images 102 which are identified as possible spoof attacks are processed by one of the methods 100, 200, 400 provided herein. The identification of the possible deception object may be a less computationally intensive process than age estimation. The additional steps of obtaining the first image portion, generating the reconstructed image, and age comparison are also avoided where no possible deception object are detected. As such, by first processing the image 102 to identify such objects, the efficiency of the system is improved.
[0177]
[0178] The deception object detection model can more easily detect age deception objects where there are more present in an image. However, the masking methods set out herein can more easily detect age deception objects where only one is used due to the larger difference in estimated ages. As such, it is advantageous to use the two mechanisms together, thereby identifying a wider range of spoofing attacks.
[0178]
[0179] In some embodiments, a VLM 120 may be used for deception object detection instead of a specifically trained deception object detection model. A prompt 122 is provided to the VLM asking it if there is a deception object present, or to identify the deception object. The system prompt may provide an explanation of deception objects.
[0180] The method 100C may be modified to identify deception object, with these user and system prompts. The variance and logprobs output by the VLM 120 may be used to identify deception objects in a similar way.
[0179]
[0181] In some embodiments, the image 102 is passed to the age deception detection module 614 if there is a low confidence associated with the identified age deception object by the deception object detection module 610. In this embodiment, where there is a high confidence that there is an age deception object present in the image 102, the system 608 determines that there is a spoofing attack and so no further processing of the image 102 is required.
[0180]
[0182] However, where there is a low confidence that there is an age deception object present in the image, there is a level of uncertainty in the system as the whether or not there is a spoofing attack in the image 102, and therefore further processing of the image is required.
[0181]
[0183] If the confidence level associated with the age deception object is too low, the system may determine that no age deception object is present in the image 102, and as such the image 102 is only processed subsequently by the age estimation module 612.
[0182]
[0184] The cut-off confidence levels for when to determine there is a spoofing attack based only on the output of the deception object detection module 610, when to require further processing by the age deception detection module 614, and when to acceptthere is no age detection object may be chosen based on the accuracy of the deception object detection model and / orthe use case. As an example, a confidence above 80% may result in a determination that there is a spoofing attack, a confidence between 20% and 80% requires further processing by the age deception detection module 614, and a confidence level below 20% requires no further deception detection processing. It will be apparent to a skilled person how to select such confidence levels.
[0183]
[0185] In an age estimation system 608 in which the deception object detection module 610 is not included, it is preferable to estimate the user’s age based on two or more image portions 104, 204. In this way, at least one image portion 104, 204 is likely to be free of age deception objects.
[0184]
[0186] In some embodiments, one or more models 106, 402 are replaced with VLMs 120, not sown in Figure 6. There may be a single VLM 120 provided in the age estimation system 608, such that the same VLM 120 is used irrespective of the purpose of its use. This is possible because the VLM 120 is a general-purpose model, the function of which is defined by the prompt 122 provided to the VLM 120 in use.
[0187] Thus, the VLM 120 may be provided as a sperate module in the age estimation system 608. Alternatively, the modules 610, 612, 614 may comprise one or more VLMs which are used for their specific purpose. In another embodiment, the VLM 120 is not provided by the age estimation system 208, but rather an external component with which the age estimation system 608 interacts.
[0185]
[0188] The methods provided herein are not limited to being provided by an age estimation system 608. In some instances, a machine learning model, such as a VLM 120 may be queried to determine if a spoof attack is present. The VLM 120 may call the functions or models described herein to execute the methods presented herein.
[0186]
[0189] Figure 7 provides an example method 700 for granting or refusing access of a user to an age-restricted activity. The age-restricted activity may be access to an age- restricted website or an age-restricted sale, for example.
[0187]
[0190] The method of Figure 7 may be implemented by any suitable system or combination of systems. For example, the method may be executed by a system for providing the age-restricted activity, where the age estimation systems 608 is a subsystem within said system. Alternatively, the age estimation systems 608 may be a separate system to the system for providing the age-restricted activity, where the two systems communicate to allow the user access or not.
[0188]
[0191] In the example below, the method is referred to as being implemented by an access system. The access system determines whether the allow a user to access the age-restricted activity based in age estimation, and comprises the components of the age estimation system 608 described above.
[0189]
[0192] An access request for requesting access to the age-restricted activity is received at the access system at step S702. An image 102 may be received with the request. Alternatively, the access system may send an image request to the user device in response to the access request. The user device responds to the image request by transmitting the image 102 to the access system.
[0190]
[0193] At step S704, it is determined whether there is a possible age deception object present in the image 102.
[0191]
[0194] If no possible age deception object is present, the method proceeds to step S706 where the image 102 is provided as input to the age estimation model. As discussed above, since there is no feature present in the image 102 which may have been included so as to alter the perceived age of the user, the image 102 can be processed without considering an age of the user as presented in only a portion of the image.
[0195] The method then proceeds to step S720, where it is determined if the estimated age is an allowable age for accessing the age-restricted activity. If it is, the access request is accepted at step S724, or if it is not, the access request is denied at step S722.
[0192]
[0196] However, if a possible age deception object is present in the image 102, the method proceeds to step S708, where the first image portion 104 is extracted from the image 102. The location within the image 102 of the image portion 104 may be based on the location of the age deception object, such that the first image portion 104 does not contain the age deception object. This may be identified based on the output of the deception detection model or based on an activation output, for example.
[0193]
[0197] It will be appreciated that a further option for rejecting the request could follow a positive determination of an age deception object is present in the image at step S704, if the system is suitably confident that there is a spoofing object present in the image 102, as discussed with reference to Figure 6. One advantage of the not including this step is that the user may nonetheless be of an age which is authorised to access the age- restricted activity.
[0194]
[0198] The first image portion 104 is then provided as input to the age estimation model 106 at step S710 to obtain the first estimated age 108a. The image 102 is provided as input to the age estimation model at step S712 to obtain the second estimated age 108b.
[0195]
[0199] At step S714, the estimated age difference between the first and second estimated ages is determined. At step S716, the estimated age difference is compared to an age difference threshold to determine if the difference in estimated ages is indicative of a spoofing attack.
[0196]
[0200] If the estimated age difference is not above the threshold, this indicates that there is no spoofing attack present in the image 102. In this case, the method moves to step S720, where it is determined if the estimated age is an allowed age for accessingthe age- restricted activity, that is, if the estimated age meets an age criterion. The estimated age used in this determination may be the estimated age 108a, 108b which falls closest to the age threshold for the age-restricted activity. In some embodiments, both the first estimated age 108a and the second estimated age 108b are assessed with respectto the age criterion and the age criterion is only determined to the met if both estimated ages 108a, 108b meet the age criterion.
[0197]
[0201] If the estimated age is an allowable age, the access request is accepted, step S724, and the user is granted access to the age-restricted activity. If instead the estimated ages are not allowable, the access request is rejected at step S722 and the user is denied access to the age-related activity.
[0198]
[0202] If, at step S716, it is determined that the estimated age difference is above the threshold, the access request may be denied. However, as an alternative and as shown in Figure 7, the method may move to step S718, where it is determined if the estimated ages are near the age threshold for the age restricted activity. If they are near the age threshold, then the request is rejected at step S722. For example, if the age threshold is 18, and the estimated ages are 21 and 25, the estimated ages may be determined to be too close to the age threshold and as such the system cannot be suitably satisfied that the user is over the age threshold.
[0199]
[0203] If, at step S718, it is determined that the estimated age is not near the age threshold, the method moves to step S720 where the estimated age is compared to the age criterion to determine if it is allowable. As above, if the estimated age is allowable, the access request is granted and step S724, and if not, it is rejected at step S722.
[0200]
[0204] The term “near” may refer to any age gap which is suitable for providing sufficient certainty for a given implementation. For example, the estimated age may be considered near to the threshold age if the age closest to the threshold age for the age-related activity is less that the threshold age difference from the threshold age. Alternatively, the nearness may be determined based on an accuracy of the age estimation model. For example, the estimated age may be considered near to the threshold age if the estimated age is within one standard variation of the threshold age for the given age estimation model 106. Other differences in the ages may be used as appropriate. For example, a difference in ages may be predefined, such as 10 years, where the estimated age is considered near to the threshold age for the age-relate activity if it is within 10 years of the threshold age.
[0201]
[0205] Figure 8 provides two graphs which may be used to determine estimated age difference thresholds. In the examples, the results are based on the detection of beards or moustaches, but it will be appreciated that they can be extended to other age deception objects.
[0202]
[0206] The left-hand graph is a receiver operating characteristics (ROC) graph, plotting the false positive rate (FPR) against the true positive rate (TPR), where positive is a prediction of an attack.
[0203]
[0207] The ROC graph is used to identify, for one of the methods 100, 200, 400, the estimated age difference threshold which provides a suitable attack detection rate. The ROC graph applies to the specific method 100, 200, 400 it is generated for.
[0208] In this example, the graph relates to a deception object detection model which is trained to detect fake beards and moustaches. A set of images are processed by the deception object detection model to determine if they contain a fake beard or moustache. Any number of images may be used to derive the graph, however it will be appreciated that the graph more accurately represents the accuracy of the method and model where the set of images is larger. Each of the images is also processed using one the methods 100, 200, 400, where two (or more) ages are estimated and the estimated age difference calculated. In this example, method 100 is used, and the age difference is calculated by subtracting the age estimated for the full-face image from the age estimated from the top part of the image.
[0204]
[0209] In the context of the example of Figure 8, a true positive is a correctly detected fake beard or moustache (a correctly identified attack), and a false positive is an incorrectly detected fake beard or moustache (an incorrectly identified attack). The graph provides the TPR and the FPR plotted for different age estimation difference threshold. The graph can be seen to have an area under curve (AUC) of 0.778.
[0205]
[0210] The estimated age difference threshold can be chosen to obtain a required accuracy. For example, it may be decided that a FPR of 5% is acceptable, that is that 5% of spoofing attacks will not be identified. Other acceptable FPRs may be chosen, and the choice of FPR may be dependent on the intended use of the age estimation system. Looking at the ROC graph, it can be seen that this allowable FPR corresponds to an estimated age difference threshold of -4 year. That is, that the age estimated for the full face image 102 is 4 years older than that for the tip portion of the face.
[0206]
[0211] Since the ROC graph is plotted using results generated by the models and methods that the chosen estimated age difference threshold is intended to be used for, it will be appreciated that the ROC graph plotted for a different combination of models and / or methods may be different, and therefore the estimated age difference threshold for an acceptable FPR may different. The ROC graph in Figure 8 is provided by way of example only.
[0207]
[0212] Other methods may be used to define the estimated age difference threshold. For example, the right-hand graph provides a graph plotting estimated age difference against the likelihood of a spoofing attack. The threshold age difference can be chosen by defining an allowable likelihood of attack and inferring from the graph the corresponding estimated age difference. As with the ROC graph, this graph is plotted based on results obtained usingthe model(s) and method for which the chosen estimated age difference threshold is to be used.
[0213] The graphs of Figure 8 are obtained by using the age estimation model 106 to estimate the age of an image and corresponding top (first) image portion for a set of images. The set of images contain people of different ages and genders, and some images contain age deception objects while others do not.
[0208]
[0214] The ROC graph shown in Figure 8 is used in the context of a method which first attempts to identify an age deception object, and then uses age estimation of image portion(s) to determine if there is spoofing.
[0209]
[0215] However, a similar ROC graph can be used to determine an estimated age difference threshold for use with each of methods 100, 200, and 400 without the detecting a false beard or moustache first. Figure 9 provides an example ROC graph for this purpose on the left-hand side, and a magnified portion of the ROC graph on the righthand side.
[0210]
[0216] The ROC graph of Figure 9 shows the TPR and FPR for detecting large errors in age estimations. Large in the context of Figure 9 is 20 years or more, however any other error could be used instead, for example 5 year, 10 years or 30 years.
[0211]
[0217] The ROC graph of Figure 9 is derived by processing a set of images by the age estimation model to estimate an age. In this instance, each image is of a human under the age of 16. The set images comprises both images of humans who are not attempting to spoof the age estimation system, that is they are not using any age deception objects, and images of humans who are attempting to spoof the age estimation system, that is they are using age deception objects.
[0212]
[0218] Where the estimated age is 20 years or more above the ground truth age, the image from which the estimated age is derived is processed using one of the methods 100, 200, 400 set out above, to obtain first and second estimated ages and the estimated age difference. In the example of Figure 9, method 100 is used such thatthe lower portion of theface is masked to process thetop portion of theface, and comparingthe estimated age of the top portion of the face to the age estimation for the whole face.
[0213]
[0219] The results are plotted in the ROC graph. Here, as above, a true positive is a correct detection that the human in the image is attemptingto spoof the age estimation system, and a false positive is an incorrect detection that the human in the image is attempting to spoof the age estimation system.
[0214]
[0220] Once the ROC graph has been plotted, showing TPR and FPR for different estimated age differences, an estimated age difference threshold can be selected. For example, a FPR of 0.2% may be deemed to be acceptable. This corresponds to a TPR of around 45%. The estimated age difference threshold is -18 years.
[0215]
[0221] This allows mispredictions by the age estimation model to be identified.
[0216]
[0222] In the examples set out above, the similarity measure is a difference in the estimated ages defined by a scalar quantity. This may be derived based on a scalar output or a probability distribution output from the age estimation model. However, it will be appreciated that other similarity measures may be used.
[0217]
[0223] For example, if the model output a distribution of estimated age, the age distributions output by the age estimation model 106 can be compared with various known distribution measures, for example KL divergence, EMD etc.. These may each also be considered similarity measures. Any combination of similarity measures may be used to determine if a spoofing attack is present in the image.
[0218]
[0224] If the model outputs a float number that represents the estimated age, scalar functions can be used for comparison, for example or L2distance, or other p- norms etc..
[0219]
[0225] Further, if the age estimation model outputs a binary age classifier, such d over or under some predefined age, the similarity measure may also be binary. That is, the similarity measure indicates if the binary classification is the same for both, or all, outputs. In this embodiment, an image for which both binary classifications are output when processed by one of the methods set out above may be determined to be a spoofing attack, and / or any access request may be rejected.
[0220]
[0226] While the estimated age difference is compared to a predefined threshold to determine if the age difference is allowable in the examples above, other mechanisms for determining if the difference
[0221]
[0227] For example, an allowable percentage difference may be defined. For example, the allowed percentage difference may be set to 10% or 25%. The chosen percentage difference may correlate to results of passing the images and image portions by the age estimation model as described with reference to Figure 8 and 9. Alternatively, the percentage difference may be defined based on known speeds of visual aging over certain ages, and / or allowable tolerances. The percentage age difference may be based on the estimated age derived from the whole facial image 102.
[0222]
[0228] In some embodiments, one or more further parameters are used to determine if the outputs the age estimation model indicate that there is a spoofing attack. For example, the output may comprise an estimated age and an uncertainty. In such an example, a higher uncertainty may cause an otherwise allowable estimated age difference to be associated with a spoofing attack. This is just one example of a secondary parameter which may be used to determine if there is a spoofing attack in an image.
[0223]
[0229] In the examples set out above, the methods are used to determine if a user has attempted to look older than they actually are. This may be achieved by using an aging prop such as a false moustache.
[0224]
[0230] It will be appreciated that the methods may be used to determine if a user has attempted to look younger. The age-related activity in this case may be a student or young-adult forum for example. Here, other age-reducing props may be used. The methods disclosed herein can be used in the same way, with the age criterion being that the estimated age falls below an age threshold for the age-related activity, or within a predefined age range.
[0225]
[0231] It will be appreciated that the examples described above are illustrative rather than exhaustive. The user device can take various forms, including a mobile device, personal computer, wearable device, media controller etc. The age-restricted activity may be embodied as program instructions stored in memory of the client device and executed on a processor (e.g. CPU) of the client device. References to a processor include multiple processors. The figures provided herein merely depict one example deployment in which the present techniques can be implemented, which is not exhaustive. In general, the functional backend components of the age estimation system 608 can be implemented in one or more computing devices at one or more locations within a localized or distributed computer system. A computer system comprises computing hardware which may be configured to execute any of the steps or functions taught herein. The term computing hardware encompasses any form / combination of hardware configured to execute steps or functions taught herein. Such computing hardware may comprise one or more processors, which may be programmable or non-programmable, or a combination of programmable and nonprogrammable hardware may be used. Examples of suitable programmable processors include general purpose processors based on an instruction set architecture, such as CPUs, GPUs / accelerator processors etc. Such general-purpose processors typically execute computer readable instructions held in memory coupled to the processor and carry out the relevant steps in accordance with those instructions. Other forms of programmable processors include field programmable gate arrays (FPGAs) having a circuit configuration programmable through circuit description code. Examples of nonprogrammable processors include application specific integrated circuits (ASICs). Code, instructions etc. may be stored as appropriate on transitory or non-transitory media (examples of the latter including solid state, magnetic and optical storage device(s) and the like).
Claims
Claims1 . A computer-implemented method for detecting a spoofing attack in an image, the method comprising: receiving an image of a user; obtaining a first image portion from the image, the first image portion being less that the whole image of the user; processing the first image portion to generate a first output indicative of a first estimated human age; processing the image of the user to generate a second output indicative of a second estimated human age; and based on the first output and the second output, determine whether the image of the user comprises the spoofing attack.
2. The method of claim 1 , wherein processingthe first image portion orthe image comprises: detecting a structure in the first image portion or the image having human characteristics; and determining the respective estimated human age based on the human characteristics of the structure.
3. A computer-implemented method for detecting a spoofing attack in an image, the method comprising: receiving an image of a user; obtaining, from the image, a first image portion and a second image portion, wherein each of the first image portion and the second image portion comprise an area of the image of the user which is not in the second image portion and the first image position respectively; processingthe first image portion to determine a first output indicative of a first estimated human age; processingthe second image portion to determine a second output indicate of a second estimated human age; and based on the first output and the second output, determine whether the image of the user comprises the spoofing attack.
4. The method of claim 3, wherein processingthe first image portion orthe second image portion comprises: detecting a structure in the first image portion or the second image portion having human characteristics; and determining the respective estimated human age based on the human characteristics of the structure.
5. The method of any preceding claim, wherein determining whether the image of the user comprises the spoofing attack comprises: determining a similarity measure based on the first output and the second output; wherein the determination is based on the similarity measure.
6. The method of claim 5, wherein the first output comprises the first estimated human age and wherein the second output comprises the second estimated human age, wherein the similarity measure is an estimated age difference between the first estimated human age and the second estimated human age, wherein determining whether the image of the user comprises the spoofing attack further comprises: comparing the estimated age difference to a threshold age difference; if the estimated age difference exceeds the threshold age difference, determining that the image comprises a spoofing attack; and if the estimated age difference does not exceed the threshold age difference, determining that the image does not comprise a spoofing attack.
7. The method of claim 6, wherein the threshold age difference is a predefined threshold age difference.
8. The method of claim 6, wherein the threshold age difference is derived based on a predefined allowable percentage age difference and one of the first estimated age and the second estimated age.
359. The method of any preceding claim, wherein processing the image portion and / or the image comprises providing the image portion and / or the image as input to a trained age estimation neural network, the trained age estimation neural network trained to estimate an age of a human captured in an image.
10. The method of any preceding claim, wherein the method further comprises: inpainting the first image portion to obtain a reconstructed image of the user, wherein the first image portion is inpainted so as to preserve an age of the user in the first image portion; wherein processing the first image portion comprises processing the reconstructed image of the user to obtain the first estimated human age.11 . The method of claim 10, wherein the first image portion is inpainted by providing the first image portion to a trained age-preserving inpainting model, wherein the trained age-preserving inpainting model is trained to preserve the age of a user in the first image portion when inpainting.
12. The method of claim 3 and claim 10, wherein the method further comprises: inpaintingthe second image portion to obtain a second reconstructed image of the user, wherein the second image portion is inpainted so as to preserve an age of the user in the second image portion; wherein processing the second image portion comprises processingthe second reconstructed image of the userto obtain the second estimated human age.
13. The method of any preceding claim, wherein obtaining the first image portion comprises extracting the first image portion from the image of the user.
14. The method of any of claims 1 to 12, wherein obtaining the first image portion comprises masking all portions of the image of the user except the first image portion.
15. The method of any preceding claim, wherein obtaining the first image portion comprises: identifying a facial feature in the image of the user; anddetermining that the facial feature is to be included in, or excluded from, the first image portion; and obtaining the first image portion based on the determination.
16. The method of claim 3 and claim 15, wherein obtaining the second image portion comprises: determining an area of the image which is not in the first image portion; and obtaining the second image portion based on the area which is not in the first image portion.
17. The method of any preceding claim, wherein the method further comprises: determining that the image of the user comprises an age deception object; wherein the first image portion is obtained and processed in response to determining that the image of the user comprises the age deception object.
18. The method of claim 17, wherein the first image portion is a portion of the image of the user which does not comprise the age deception object.
19. The method of any preceding claim, wherein the second output comprises an activation indicating a contribution of areas of the processed image to the second estimated human age, wherein the method further comprises: determining an area of the image with a high contribution.
20. The method of claim 17 and claim 19, wherein the method further comprises: determiningthat the image of the user comprises an age deception object based on the area with a high contribution.21 . The method of claim 19, wherein the first output comprises an activation indicating a contribution of areas of the processed image to the first estimated human age.
22. The method of any preceding claim, wherein the method further comprises: determining, based on the first output and the second output, an age estimation error; and determining whether the image of the user comprises the spoofing attack based on the age estimation error.
23. The method of any preceding claim, wherein the image is a facial image.
24. A method of training an age-preserving inpainting model for use in an age estimation system, the method comprising: obtaining a ground truth image of a human; obtaining a ground truth age of the human in the ground truth image; obtaining an image portion of the ground truth image, wherein the image portion comprises an incomplete image of the human; provide the image portion to an untrained age-preserving inpainting model to obtain a reconstructed image; providing the reconstructed image to a trained age estimation model to obtain an estimated age; computing an age loss using an age loss function and based on the estimated age and the ground truth age; and backpropagatingthe age loss.
25. The method of claim 24, wherein the trained age estimation model has frozen weights.
26. The method of claim 24 or claim 25, wherein the method further comprises: computing a reconstruction loss using a reconstruction loss function and based on the ground truth image and the reconstructed image; and backpropagatingthe reconstruction loss.
381. The method of any of claims 24 to 26, wherein the method further comprises executing each of the steps for a second ground truth image.
28. The method of any of claims 24 to 27, wherein the estimated age and the ground truth age are scalar values, wherein the age loss function is a p-norm.
29. The method of any of claims 24 to 27, wherein the estimated age is an estimated age probability distribution, wherein the ground truth age is an age probability distribution, wherein the age loss function is a statistical distance between the distributions.
30. The method of claim 27 or any claim dependent thereon, wherein the method comprises computing a total loss based on the reconstruction loss and the age loss.31 . The method of claim 30, wherein the total loss is further computed based on a perceptual similarity loss between the ground truth image and the reconstructed image.
32. A computer system comprising: a memory storing computer-readable instructions; and one or more processors which, when executingthe computer-readable instruction, are configured to execute the method of any of claims 1 to 31 .
33. The computer system of claim 32 configured to execute the method of claim 10, wherein the one or more processors is configured to execute: a trained age estimation model to determine the first output and the second output; and a trained age-preserving inpainting model, trained as claimed in claim 20, to inpaint the first image portion to obtain the reconstructed image of the user.
34. A computer program which, when executed on one or more processors, is configured to execute the method of any of claims 1 to 31 .39