High-resolution controllable face aging with spatial-aware conditional gan

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing a patch-based training method and a spatially aware conditional generative adversarial network (GAN), combined with population-specific aging information and weak spatial supervision, the fine-grained control problem of facial aging in high-resolution images is solved, generating realistic aging results and addressing the issue of insufficient aging result quality in existing technologies.

CN116097319BActive Publication Date: 2026-06-12LOREAL SA

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: LOREAL SA
Filing Date: 2021-06-29
Publication Date: 2026-06-12

Application Information

Patent Timeline

29 Jun 2021

Application

12 Jun 2026

Publication

CN116097319B

IPC: G06V40/16; G06V10/82; G06N3/0475

CPC: G06T11/00; G06V40/171; G06V10/50; G06V10/454; G06V10/82; G06V10/765; G06V10/764; G06V10/774

AI Tagging

Application Domain

Image enhancement Image analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN116097319B_ABST

Patent Text Reader

Abstract

Computing devices and methods are provided to controllably transform images of faces, including high resolution images, to simulate continuous aging. Population-specific aging information and weak spatial supervision are used to guide an aging process defined by training a model including a GAN-based generator. The aging map presents the population-specific aging information as skin landmark scores or apparent age values. The scores are located in the map in association with respective locations of skin landmark regions of the face associated with the skin landmarks. Patch-based training, in particular associated with location information to distinguish similar patches from different parts of the face, is used to train on the high resolution images while minimizing resource usage.

Need to check novelty before this filing date? Find Prior Art

Description

[0001] Cross-citation of related applications

[0002] This application claims priority and / or benefit to U.S. Provisional Application No. 63 / 046,011, filed June 30, 2020, entitled “High-Resolution Controllable Face Aging with Spatially-Aware Conditional GANs,” and to French Patent Application No. 2009199, filed September 11, 2020, entitled “High-Resolution Controllable Face Aging with Spatially-Aware Conditional GANs,” the entire contents of which are incorporated herein by reference, where permitted. Technical Field

[0003] This disclosure relates to image processing, and more specifically, to high-resolution controlled facial aging using spatially aware conditional generative adversarial networks (GANs). Background Technology

[0004] Facial aging is an image synthesis task in which a reference image must be transformed to give the impression of a person at different ages while preserving the subject’s identity and key facial features. When done correctly, the process can be used in various fields, from predicting a person’s future appearance to entertainment and educational purposes. Focus can be placed on achieving high-resolution facial aging, as it is a useful step in capturing the subtle details of aging (fine lines, pigmentation, etc.). In recent years, GANs

[14] have allowed for learning-based approaches to this task. However, the results are often of poor quality and offer only limited aging options. Popular models such as StarGAN

[10] fail to produce convincing results without additional fine-tuning and modifications. This stems in part from the choice of reducing aging to true or apparent age [1]. In addition, current methods treat aging as a stepwise process, segmenting age by domain (30-40, 40-50, 50+, etc.) [2, 16, 28, 30, 32].

[0005] In reality, aging is a continuous process that can take many forms depending on genetic factors such as facial features and different populations, as well as lifestyle choices (smoking, spa treatments, sunburn, etc.) or behaviors. Notably, habitual facial expressions contribute to expression wrinkles and may become prominent on the forehead, upper lip, or corners of the eyes (crow's feet). Furthermore, aging is subjective, as it depends on the cultural background of the person assessing their age. These factors necessitate a more granular approach to aging.

[0006] Existing methods and datasets for facial aging produce mean-biased results, where individual variations and expression wrinkles are often invisible or ignored, favoring global patterns such as facial fattening. Furthermore, they offer little control over the aging process and may be difficult to scale to large images, thus preventing their use in many real-world applications. Summary of the Invention

[0007] According to the technical methods described herein, various embodiments of computational devices and methods are provided for controllably transforming facial images (including high-resolution images) to simulate continuous aging. In these embodiments, population-specific aging information and weak spatial supervision are used to guide the aging process defined by training a model including a GAN-based generator. In these embodiments, the aging map presents population-specific aging information as skin marker scores or apparent age values. In these embodiments, the scores are located in a map associated with the corresponding locations of skin marker regions on the face. In these embodiments, patch-based training (particularly associated with location information to distinguish similar patches from different parts of the face) is used for training on high-resolution images while minimizing computational resource usage. Attached Figure Description

[0008] Figure 1 This illustrates a high-resolution array of two faces in each row aged in a sequential manner according to an embodiment.

[0009] Figure 2A , Figure 2B , Figure 2C and Figure 2D It shows Figure 2E Images of some corresponding signs of aging on the face are shown in the image. Figure 2E An aging map of the face, constructed from associated aging marker scores of facial regions according to an embodiment, is also shown. Figures 2A to 2D The regions (a)-(d) shown in the figure are relative to Figure 2E Shown in magnified form.

[0010] Figure 3A and Figure 3B The horizontal and vertical position gradient maps are shown.

[0011] Figure 4 and Figure 5 This is a diagram illustrating the training workflow according to the implementation method.

[0012] Figure 6 This is an array of aged images illustrating an aging comparison according to the methods and embodiments described above.

[0013] Figure 7This is an array of restored, original, and aged images of six faces from different ages and populations from the FFHQ dataset, illustrating the method implemented using the method described.

[0014] Figure 8 It is an image array 800 showing model results in group 802 (without defined skin marker values) and group 804 (with defined skin marker values) according to an embodiment.

[0015] Figure 9 An aging image array is shown in a continuous manner according to an embodiment.

[0016] Figure 10A , Figure 10B , Figure 10C , Figure 10D , Figure 10E and Figure 10F This is an image showing the same face using the original image and the aged image of each aging image according to an embodiment.

[0017] Figure 11 Two image arrays of two corresponding faces according to an embodiment are shown, illustrating the restoration result of one face and the aging result of the second face using three different patch sizes on a 1024×1024 image.

[0018] Figure 12 An array of images illustrating the aging effect is shown, wherein a first array shows aging using a model (patch) trained without utilizing the location map, and a second array shows aging using a model (patch) trained with the location map, wherein each model is trained according to the implementation method.

[0019] Figure 13 An array of images illustrating aging effects according to an embodiment is shown, wherein a first array shows aging using a model (patch) trained with uniform feature maps, and a second array shows aging using a model (patch) trained with aging maps.

[0020] Figure 14 It is a block diagram of a computer system including multiple computing devices according to an embodiment.

[0021] Figure 15 This is an operational flowchart based on the method of the implementation method.

[0022] The accompanying drawings containing facial images are masked for presentation purposes in this disclosure and are not masked when used. Detailed Implementation

[0023] According to the technical methods described herein, in various embodiments, there are systems and methods for obtaining high-resolution facial aging results by creating models capable of individually transforming local aging markers. Figure 1 This is an array 100 showing two high-resolution faces in each row 102 and 104 aged in a continuous manner according to an embodiment.

[0024] In this implementation, organized high-resolution datasets are combined with new technologies (in combination) to produce detailed, state-of-the-art aging results. Clinical aging markers and weak spatial supervision allow for fine-grained control over the aging process.

[0025] In this implementation, a patch-based approach is introduced to enable inference from high-resolution images while keeping the computational cost of training the model low. This allows the model to deliver state-of-the-art aging results at a scale four times larger than previous methods.

[0026] Related work

[0027] Conditional Generative Adversarial Networks (Conditional GANs)

[14] utilize the principle of adversarial loss to force samples generated by the generative model to be indistinguishable from real samples. This approach has yielded impressive results, especially in the field of image generation. GANs can be extended to generate images based on one or more conditions. The resulting conditional GAN is trained to generate images that satisfy both realism and conditional criteria.

[0028] Unpaired image-to-image conditional GANs are powerful tools for image-to-image translation

[18] tasks, where input images are fed to a model to synthesize transformed images. StarGAN

[10] introduced a way to specify the desired transformation to be applied using additional conditions. They proposed feeding input conditions into the generator in the form of feature maps linked to the input images

[10] , but newer methods use more sophisticated mechanisms, such as AdAIN

[20] or its 2D extension SPADE

[22] , to condition the generator in a more optimized way. In cases where previous techniques required training images with pixel alignment in different domains, recent work such as CycleGAN

[34] and StarGAN

[10] introduced cycle consistency loss to enable unpaired training across discrete domains. This has been extended in

[23] to allow translations across continuous domains.

[0029] Facial aging

[0030] To perform facial aging analysis from a single image, traditional methods use training data of one image [2, 16, 30, 32, 33] or multiple images [26, 28] of the same person, along with the person's age at the time the image was taken. Using longitudinal data with multiple photographs of the same person offers less flexibility because it creates severe time-dependent constraints on the dataset.

[0031] Age is typically grouped (e.g., clustered) into discrete age groups (20-30, 30-40, 40-50, 50+, etc.) [2, 16, 30, 32], which simplifies the problem formulation but limits control over the aging process and does not allow training to take advantage of the ordered nature of the groups. The disclosure in

[33] addresses this limitation by treating age as a continuous value. Aging is not objective because different skin types age differently, and different populations explore different aging markers. Focusing on apparent age freezes the subjective viewpoint because the guidance on aging is therefore frozen. Such methods cannot be tailored to the perspective of a population without requiring additional age estimation data from their point of view.

[0032] To improve the detail quality and level of the generated images,

[32] used an attention mechanism from the generator in

[23] . However, the generated samples were low-resolution images that were too coarse for real-world applications. Working at this scale hides some of the difficulties in generating realistic images, such as the overall sharpness of skin texture, fine lines, and details.

[0033] method

[0034] Problem Formulation

[0035] In this implementation, the goal is to train a model capable of generating realistic, high-resolution (e.g., 1024×1024) aged faces using a single unpaired image, where fine-grained aging markers are continuously controlled to create a smooth transition between the original and transformed images. This is a more intuitive approach because aging is a continuous process and there is no explicit enforced logical order in the age group domain.

[0036] In implementation, the use of population-specific skin atlases [4-7, 13] incorporates a population dimension of clinical aging markers. These atlases define numerous clinical markers, such as under-eye wrinkles, lower facial sagging, and the density of pigmentation spots on the cheeks. Each marker is linked to a specific area on the face and scored according to a population scale. Using these labels, in addition to age, allows for a more comprehensive representation of aging and enables the transformation of images with various combinations of clinical markers and scores.

[0037] In the implementation, Figure 2A , Figure 2B , Figure 2C and Figure 2DIt shows Figure 2E Images of the various aging marker regions (a)-(d) (202, 204, 206, and 208) of face 212 are shown in the figure. Other marker regions are used but are not shown. Figure 2E An aging graph 210 for face 212 is also shown. According to an embodiment, the aging graph 210 consists of associated aging marker scores for all areas of face 212. It should be understood that... Figures 2A to 2D The regions (a)-(d) shown in the figure are relative to Figure 2E The face 212 is shown in magnification. In this embodiment, skin markers represent "age," "forehead wrinkles," "nasolabial folds," "under-eye wrinkles," "frown lines," "interocular wrinkles," "corner of the mouth wrinkles," "upper lip," and "lower facial sagging." In this embodiment, other skin markers are used to gather sufficient training data.

[0038] In aging graph 210, the brightness of each pixel represents a normalized score of a local clinical marker (e.g., nasolabial folds (a), under-eye wrinkles (b), nasolabial folds (c), interocular wrinkles (d), etc.). When no aging marker score is available (defined), the apparent age value is used.

[0039] In other words, in this implementation, the aging target is passed to the network in the form of an aging map (e.g., 210) for a specific facial image (e.g., 212). For this purpose, facial feature points are calculated, and a relevant region for each aging marker is defined (e.g., see [link to relevant documentation]). Figures 2A to 2D (Example). Then, each region (e.g., the forehead (in) Figures 2A to 2D (The area not shown in the image) is filled with a score corresponding to a marker (e.g., forehead wrinkles). Figures 2A to 2D In this embodiment, the skin aging marker values for the applicable area are (a) 0.11; (b) 0.36; (c) 0.31; and (d) 0.40. In the implementation, apparent age or actual age (if available) is used (via an estimator) to fill in the blanks for undefined clinical markers. Finally, a coarse mask is applied to the background of the image.

[0040] In the implementation, skin aging markers (and apparent age, if used) are normalized on a scale between 0 and 1.

[0041] Processing the entire image simultaneously would be ideal, but training the model with 1024×1024 images requires significant computational resources. In this implementation, a patch-based training approach is used to train the model using only a portion of the image and the corresponding patch from the aging image during training. Patch-based training reduces the context of the task (i.e., global information) and reduces the computational resources required to process large batches of high-resolution images, as recommended in [8]. Large batch sizes are used on small patches of 128×128, 256×256, or 512×512 pixels. In this implementation, training samples random patches each time an image is seen during training (approximately 300 times in such training).

[0042] The main drawback of patch-based training is that small patches may look similar (e.g., forehead and cheeks), yet must age differently (e.g., horizontal and vertical wrinkles respectively). Reference Figure 3A and Figure 3B In this implementation, to avoid wrinkles determined from the arithmetic mean of these ambiguous regions, the generator is configured with two pieces, one from a horizontal gradient location map 300 and the other from a vertical gradient location map 302. Arithmetic mean wrinkles appear unnatural. This allows the model to understand the position of the pieces in order to distinguish between potentially ambiguous regions.

[0043] Network architecture

[0044] In the implementation, the training process is based on the StarGAN

[10] framework. The generator is a fully convolutional encoder-decoder derived from

[11] , where the SPADE

[22] residual blocks in the decoder combine aging and location maps. This allows the model to utilize the spatial information present in the aging map and use that spatial information at multiple scales in the decoder. To avoid learning unnecessary details, an attention mechanism from

[23] is used to force the generator to transform the image only when necessary. The discriminator is a modified version of

[10] and produces outputs for the WGAN [3] objective (given image i and aging map a in Equation 1), estimates of the coordinates of the mosaic, and low-resolution estimates of the aging map.

[0045]

[0046] In the implementation, Figure 4 and Figure 5 The following are presented: Piece-based training workflows 400 and 500, among which... Figure 4 The training generator is shown. G )402 and Figure 5 The discriminator used to train the GAN-based model is shown. D 502.

[0047] refer to Figure 4 The generator (G) 402 includes an encoder section 402A and a decoder section 402B, wherein the decoder section 402B is configured with SPADE residual blocks to refit its graph and position. Workflow operation 400 is performed by processing images... I (404) Aging diagram A (406) and location map X and Y Each cropped patch in (408, 410) defines image patch I. p (412) Aging diagram piece A p (414) and location map X p and Y p (416, 418) and so on. Generator 402 transforms image mosaic I according to FIG414 and positions (FIGs 416, 418) via SPADE configuration 420. p 412 to generate an image As mentioned, for a 1024×1024 training image, the patch size can be 128×128, 256×256, or 512×512 pixels.

[0048]

[23] The attention mechanism 424 is used to force the generator 402 to transform the image (patch 412) only when needed, thereby giving the result .

[0049] refer to Figure 5 Along with workflow operation 500, discriminator (D) 502 produces true / false output 504, estimated position (x, y) of the patch 506, and estimated aging map (508). These outputs (504, 506, and 508) are penalized by WGAN target, position, and aging map loss functions (510, 512, and 516), respectively. The position and aging map loss functions are further described.

[0050] Result 426 is used for training a recurrent GAN-based model to produce results from generator 402. Cyclic consistency loss of 520 ensures that the transformation preserves the key features of the original image mosaic 412.

[0051] Aging diagram

[0052] In this implementation, to avoid penalizing the model (e.g., generator G) for not being able to place bounding boxes with pixel precision, the aging map is blurred to smooth the edges, and the discriminator regression loss is calculated on 10×10 downsampled maps. This formula allows information to be packaged in a more compact and meaningful way than individual uniform feature maps [10, 28, 32, 33]. When there is significant overlap between markers (e.g., forehead pigmentation and forehead wrinkles), this method requires only multiple feature maps. In this implementation, small overlap is generally achieved with only one aging map, where the value is the average of the two markers in the overlapping region. If the regions overlap too much (e.g., forehead wrinkles vs. forehead pigmentation), in this implementation, the aging map comprises two layers of aging maps (i.e., one layer for wrinkles and one layer for pigmentation in this case).

[0053] Considering image patch i and aging image patch a, the loss is given in Equation 2.

[0054]

[0055] Location map

[0056] In the implementation, two orthogonal gradients (position maps 416, 418) are used to help generator 402 apply the relevant aging transformation to a given patch (e.g., 412). The X and Y coordinates of patch 412 can be provided to generator 402 as two numbers instead of a linear gradient map, but doing so would prevent the model from being used on the full-scale image because it would destroy its fully convolutional properties. Considering image patch i and aging image patch a located at coordinates (x, y), the loss is given in Equation 3.

[0057]

[0058] train

[0059] In the implementation, the model is trained using the Adam

[21] optimizer, where β1 = 0, β2 = 0.99 and the learning rate for G is 7 × 10⁻⁶. -5 And the learning rate used for D is 2 × 10⁻⁶. -4 Following the update rules at two time scales

[17] , both models are updated at each step. Furthermore, during training, [the following is used]... G and D The learning rates of both are linearly decayed to zero. To enhance cycle consistency,

[31] the perceptual loss is... Cyc = 100 used. For regression tasks, Loc =50 is used to predict the (x,y) coordinates of the mosaic pieces, and Age = 100 to estimate the downsampled aging map. Utilizing in GP = 10, as presented in

[15] , is used to penalize the discriminator using the original gradient penalty. The complete loss objective function is given in Equation 4:

[0060]

[0061] infer

[0062] For inference, in the implementation, the trained (generator) model G can be optimized for stability, for example by defining the inference model G by determining the exponential moving average of the parameters of G

[29] . Due to the fully convolutional nature of the network and the use of continuous 2D aging maps, the trained generator can be used directly on 1024×1024 images regardless of the size of the mosaic used during training.

[0063] In this implementation, the target aging map is created manually. In this implementation, facial feature points and target scores are used to create the target aging map.

[0064] In one implementation, the user is encouraged to input the target aging in the application interface and the application is configured to use the target aging to define an aging map (and location map, if necessary) as the aging map value.

[0065] In this implementation, instead of absolute age, users are encouraged to input age differences (e.g., incremental values such as decreasing by 3 years or increasing by 10 years). In this implementation, the application then analyzes the received image to determine apparent age or skin marker values, and subsequently defines an aging graph relative to that analysis, modifying the apparent age / skin marker values to suit the user's request. The application is configured to use this graph to define a modified image illustrating the aging process.

[0066] In one implementation, a method (e.g., a computing device method) includes:

[0067] Receive "selfie" images provided by users;

[0068] Analyzing images to generate “current” skin marker values; Automatic skin marker analysis is illustrated and described in U.S. Patent Publication No. 2020 / 0170564 A1 entitled “Automatic image-based diagnostics using deep learning”, dated June 4, 2020, the entire contents of which are incorporated herein by reference.

[0069] (via a display device) Annotated selfies are presented to the user, showing the user's analyzed skin markers overlaid on facial areas associated with various markers;

[0070] Receive user input (via a graphical or other user interface) to adjust one or more skin marker scores. For example, the input is a skin marker adjustment value (e.g., target or increment). For example, the input is a selection of products and / or services associated with a region (or more than one region). The products and / or services are associated with skin marker score adjustment values (e.g., increments).

[0071] The aging map is defined using the current skin marker score and the skin marker score adjustment value;

[0072] The generator G uses this graph to define and modify the image; and

[0073] For example, (e.g., via a display device) the modified image is presented to the user showing what the user might look like after using the product and / or service.

[0074] experiment

[0075] Experimental setup

[0076] Most facial aging datasets [9, 24, 25] lack diversity in terms of population

[19] and are concentrated on low-resolution images (up to 250×250 pixels). This is insufficient to capture details associated with skin aging. Furthermore, they often fail to normalize facial poses and expressions (smiling, frowning, raising eyebrows), leading to the presence of aging-related wrinkles (primarily nasolabial folds, crow's feet, forehead wrinkles, and under-eye wrinkles). Finally, the lack of fine-grained information on aging markers makes it difficult for other methods to capture unwanted related features, such as facial fattening, as observed in datasets such as IMDB-Wiki

[25] . Figure 6 These effects were observed.

[0077] Figure 6 An image array 600, comprising the original image in the first column 602 and the aged images in the remaining columns, is shown to illustrate a comparison between a previous aging method and the method currently taught herein. Images according to the previous aging method are presented in rows 604, 606, 608, and 610, respectively, according to

[28] ,

[16] ,

[26] , and [2]. An image according to the method currently taught herein is presented in row 612.

[0078] Previous methods operate on low-resolution images and suffer from a lack of dynamic range for wrinkles, especially for expression wrinkles (line 604). They are also prone to color shift and artifacts (606, 608 and 610), as well as unwanted related features such as facial fattening (610).

[0079] To address these issues, manually generated aging maps or uniform aging maps are used to highlight restoration / aging, and the model based on this teaching is tested on two organized high-resolution datasets.

[0080] FFHQ

[0081] The FFHQ dataset

[20] was used for testing. In the implementation, to minimize problems with lighting, pose, and facial expression, a simple heuristic was applied to select a subset of the dataset with better quality. For this purpose, facial feature points were extracted from all faces and used to remove all images where the head was tilted too far to the left, right, up, or down. In addition, all images with open mouths were removed to limit artificial nasolabial folds and under-eye wrinkles. Finally, the HOG

[12] feature descriptor was used to remove images where hair covered the face. This selection reduced the dataset from 70k+ images to 10k+ images. Due to the great diversity of the FFHQ dataset, the remaining images are still far from perfect, especially in terms of lighting color, orientation, and exposure.

[0082] To obtain scores for individual aging markers on these images, in this implementation, an aging marker estimation model based on the ResNet

[27] architecture is used, trained on a high-quality normalized dataset (i.e., 6000 high-resolution 3000×3000 images) as described below. Finally, feature points are used as the basis for coarse bounding boxes to generate realistic aging maps. The model is trained on 256×256 patches randomly selected from 1024×1024 faces.

[0083] High-quality standardized datasets

[0084] To achieve better performance, in this implementation, a dataset of 6,000 high-resolution (3,000 × 3,000) facial images was collected, with faces centered and aligned, spanning most ages, genders, and populations. Population-specific clinical aging marker atlases [4–7, 13] were used to label the images and score them on markers covering most of the face (apparent age, forehead wrinkles, nasolabial wrinkles, under-eye wrinkles, upper lip wrinkles, corner-of-the-mouth wrinkles, and lower facial drooping).

[0085] result

[0086] FFHQ dataset

[0087] Despite the complexity of the dataset and the lack of real-age values, the patchwork-based model was able to transform individual wrinkles on the face in a continuous manner.

[0088] Figure 7It is an image array 700 showing original (column 702), restored (column 704), and aged (column 706) images of six faces from different ages and populations from the FFHQ dataset using the implementation of the present teachings herein. Figure 7 It shows how the model can transform different wrinkles, even with the complexity of patch-based training, large variations in lighting in the dataset, and imbalances between clinical markers / age levels, where the vast majority of young subjects had few wrinkles. Figure 8 Image array 800 shows model results in group 802 (without defined skin marker values) and group 804 (with defined skin marker values) according to an embodiment. In the absence of defined markers, the image is filled with age values. This helps the model learn global features, such as graying hair (group 802). Using individual clinical markers in the aging map allows us to age all markers while maintaining the appearance of the hair intact (group 804), highlighting the model's control over individual markers and allowing for controlled aging of the face, which is not feasible with age as the sole marker.

[0089] High-quality standardized datasets

[0090] On more standardized images, and with better coverage of crowds and signs of aging, the model demonstrates state-of-the-art performance with high levels of detail, realism, and no visible artifacts. Figure 1 , Figure 9 For example, Figure 9 Four image arrays 900 of facial aging are shown in a continuous manner in rows 902, 904, 906, and 908 according to the embodiment. No area remains unchanged, not even the lower part of the face, such as the forehead or sagging. Supplemental age information used to fill gaps can be seen when the eyebrows thin or turn gray.

[0091] Along the continuous spectrum of the aging map, the aging process taught herein is successful, allowing for the generation of realistic images for a diverse set of severity-indicating values. In implementation, as... Figures 10A to 10F As shown in the embodiments, this realistic and continuous aging is illustrated on the same face using separately defined aging maps. Figure 10A Image 1002 of the face before the application of aging is shown. Figure 10B Image 1004 of an aged face is shown, which restores all features except for wrinkles under the eyes on the nose, lips, corners of the mouth, and right side of the face. Figure 10C The image shown is only the bottom of the aging face (image 1006). Figure 10D The image shown is only the top of the aged image, 1008. Figure 10E The image shown is 1010, which is defined as aging only under-eye wrinkles. Figure 10FImage 1012 shows a diagram defined as an asymmetrical aging face, namely a wrinkle under the right eye and a nasolabial fold on the left.

[0092] Evaluation metrics

[0093] For a face aging task to be considered successful, it must meet three criteria: the image must be realistic, the subject's identity must be preserved, and the face must be aged. These are enforced separately during training due to the WGAN objective function, cycle consistency loss, and aging map estimation loss. Essentially, a single metric cannot guarantee that all criteria are met. For example, a model could leave the input image unchanged and still successfully achieve realism and identity. Conversely, a model might succeed in aging but fail in realism and / or identity. If one model is not superior to another in every metric, trade-offs can be made.

[0094] Experiments on FFHQ and high-quality standardized datasets have never shown any problems in preserving subject identity. In implementation, the focus is on authenticity and aging criteria used for quantitative assessment. Because the method in this paper focuses on aging as a combination of aging markers rather than relying solely on age, the accuracy of the target age is not used as a metric. Instead, the Fraser initiation distance (FID)

[17] is used to assess the authenticity of the image and the mean error (MAE) of the accuracy against the target aging markers.

[0095] For this purpose, one half of the dataset was used as a reference for the real images, and the remaining half was used as the images to be transformed by the model. Aging maps were randomly selected from the real labels to transform these images to ensure that the generated images followed the distribution of the original dataset. A dedicated aging flag estimation model based on the ResNet

[27] architecture was used to estimate individual scores on all generated images. The FID was computed between the two halves of the real image dataset as a reference for the FID score. Note that the dataset size prevented the computation of FID on the recommended 50k+ [17, 20], which would lead to an overestimation of that value. This can be seen when only the FID between the real images is computed, given a baseline FID of 49.0. The results are presented in Table 1.

[0096]

[0097] Comparison between age and clinical markers

[0098] In the implementation, when trained without clinical markers, using only age to create a uniform aging map, the model still yields compelling results, where the criterion for age estimation has low FID and MAE. Therefore, Table 2 shows the Fraser initiation distance and mean error for the model with clinical markers and only age.

[0099]

[0100] However, by comparing aging faces with age-only methods, it appears that some wrinkles do not exhibit their full range of dynamics with age-only models. This is due to the fact that not all aging markers need to be maximized to reach the age limit of the dataset. In fact, the 150 oldest individuals in the normalized dataset (65 to 80 years old) showed a median standard deviation of 0.18 for their normalized aging markers, highlighting the many possible combinations of aging markers in older adults. This is problematic for age-only models, as they only provide one way to represent facial aging. For example, markers such as forehead wrinkles are highly dependent on the subject's facial expressions and are a major component of the aging process. By only observing the age of the subjects in the dataset, the distribution of these clinical aging markers cannot be controlled.

[0101] Conversely, in this implementation, using aging maps of the face provides greater control over the aging process. By controlling for each individual aging marker, it is possible to choose whether or not to apply these expression wrinkles. A natural extension of this benefit is skin pigmentation, which is considered a marker of aging in some Asian countries. Age-based models cannot reproduce aging for these countries without requiring a re-estimation of age from a local perspective. Unlike the method disclosed herein, which, once trained with each relevant aging marker, is able to provide a facial aging experience tailored to different national perspectives, all within a single model and without additional labels.

[0102] Ablation Research

[0103] Impact of tile size: In the implementation, when training the model, the size of the tiles used for training can be selected for a given target image resolution (1024×1024 pixels in the experiments). The larger the tiles, the more context the model will have to execute for the aging task. However, for the same computational power, larger tiles result in smaller batch sizes, which hinders training [8]. Experiments were conducted using tiles of 128×128, 256×256, and 512×512 pixels. Figure 11An image array 1100 is shown, illustrating restoration and aging results on a 1024×1024 facial image, as taught herein. Array 1100 includes a first image array 1102 and a second image array 1104 for two corresponding faces. Array 1102 shows the restoration result for one face and array 1104 shows the aging result for the second face. Rows 1106, 1108, and 1110 show the results using their respective different patch sizes. Row 1106 shows a 128×128 patch size, row 1108 shows a 256×256 patch size, and row 1110 shows a 512×512 patch size.

[0104] Figure 11 The implementation shows that all patch sizes attempt to age faces at high resolution, but achieve varying degrees of realism. The smallest patch size suffers most from a lack of context and produces inferior results compared to the other two, with visible texture artifacts. The 256×256 patch yields convincing results, with only slight imperfections visible when compared to the 512×512 patch. These results suggest that the technique can be applied to larger resolutions, such as 512×512 patches on a 2048×2048 image.

[0105] Location map:

[0106] In this implementation, to see the contribution of the location maps, the model was trained with and without them. As expected, the effect of the location maps was more pronounced at small patch sizes, where ambiguity was high. Figure 12 This demonstrates how, when the size of the small patchwork is small and location information is missing, the model cannot distinguish between similar patches from different parts of the face. Figure 12 Image array 1200, showing the aging effect in two arrays 1202 and 1204, is illustrated based on two (patchwork trained) models taught in this paper. Therefore, Figure 12 Array 1202 shows faces aged with the minimum patch size without using a position map, and array 1204 shows faces aged with the minimum patch size using a position map. In each corresponding array, the aged face and the differences from the original image are shown. When trained without a position map (patchwork), the model cannot add wrinkles consistent with the location and generates generalized diagonal wavy lines. This effect is less pronounced at larger patch sizes because the patch positions are less blurred. The position map eliminates the presence of diagonal texture artifacts, especially on the forehead, where horizontal wrinkles are allowed.

[0107] Spatialization of information:

[0108] The use of the aging map proposed in this paper will be compared with a baseline approach based on formatted conditions, which gives all label scores as an individual uniform feature map. Since not every label is present in a particular piece, especially when the piece size is small, much of the processed information is useless to the model. The aging map represents a simple way: it only gives the model labels that are present in the pieces, in addition to their spatial extent and location. Figure 13 The effect of the aging graph is highlighted. Figure 13 An image array 1300 illustrating aging effects according to the teachings herein is shown, wherein a first array 1302 illustrates aging using a model (patch) trained with a uniform feature map, and a second array 1304 illustrates aging using a model (patch) trained with an aging map.

[0109] On small or medium-sized mosaics (e.g., 128×128 or 256×256 pixels), the model struggles to produce realistic results. Aging maps help reduce the complexity of the problem. Therefore, Figure 13 Faces aged using an individual uniform conditional feature map (array 1302) and a proposed aging map (array 1304) with a large patch size (e.g., 512×512) are shown in arrays 1302 and 1304, respectively, along with differences from the original images in each corresponding array. The patch size does not need to be twice the size of the original image (e.g., 800×800 would be large, rather than the full size of a 1024×1024 image). Due to the denser spatial information, the aging map helps make training more efficient and produces more realistic aging. The differences highlight small, unrealistic wrinkles in the baseline technique.

[0110] Alternatively, in implementations, different methods, as shown in StarGAN, are used, thereby assigning all flag values to each piece of the model, even values for flags that are not present in the piece.

[0111] application

[0112] In implementation, the disclosed techniques and methodologies include developer-related methods and systems for defining (e.g., by tuning) a model having a generator for providing age simulation image-to-image transformations. The generator exhibits sequential control (over multiple age-related skin landmarks) to create a smooth transformation between original and transformed images (e.g., facial images). The generator is trained using individual, unpaired training images, each with an aging map that identifies facial feature points associated with the corresponding age-related skin landmark to provide weak spatial supervision to guide the aging process. In implementation, age-related skin landmarks represent population-specific dimensions of aging.

[0113] In one implementation, a GAN-based model with a generator for image-to-image conversion for age simulation is incorporated into a computer-implemented method (e.g., an application) or computing device or system to provide virtual reality, augmented reality, and / or modified reality experiences. The application is configured to facilitate a user taking a selfie (or video) using a smartphone or tablet equipped with a camera, and the generator G achieves the desired effect, such as playback or other presentation by the smartphone or tablet.

[0114] In implementations, the generator G taught herein is configured to load and execute on a typically available consumer smartphone or tablet (e.g., the target device). An exemplary configuration includes a device with the following hardware specifications: an Intel® Xeon® CPU E5-2686 v4 @ 2.30 GHz, profiled with only one core and one thread. In implementations, the generator G is configured to load and execute on a computing device with more resources, including a server, desktop, gaming computer, or other device such as one with multiple cores and executing in multiple threads. In implementations, the generator G is provided as a (cloud-based) service.

[0115] In implementation, apart from the developer aspect (used during training time) and the target computing device aspect (used during inference time), those skilled in the art will understand that computer program product aspects are disclosed, wherein instructions are stored in a non-transient storage device (e.g., a memory, CD-ROM, DVD-ROM, disk, etc.) to configure the computing device to perform any of the methods disclosed herein.

[0116] Figure 14 This is a block diagram of a computer system 1400 according to an embodiment. The computer system 1400 includes multiple computing devices (1402, 1406, 1408, 1410, and 1450), including servers, developer computers (PCs, desktops, etc.), and typical user computers (e.g., PCs, desktops, and small form factor (personal) mobile devices, such as smartphones and tablets). In this embodiment, computing device 1402 provides a network model training environment 1412, which includes hardware and software according to the teachings herein to define a model for providing continuously aged image-to-image translations. Components of the network model training environment 1412 include a model trainer component 1414 for defining and configuring, for example, by tuning, a model including a generator G 1416 and a discriminator D 1418. It is well known that the generator G helps define the model for inference to perform image-to-image translations, while the discriminator D 1418 is a construct for training.

[0117] In this implementation, such as according to Figure 4and Figure 5 The training workflow is used to perform conditioning. The workflow trains using mosaicks of high-resolution images (e.g., 1024×1024 or higher pixel resolution). The training targets the corresponding regions of the face where such skin landmarks are located using skin landmark values or their apparent age. This is achieved by providing dense spatial information about these features, such as by using aging maps. In this implementation, the location of the mosaicks is provided, for example, to avoid blurring and to use location information to distinguish similar mosaicks from different parts of the face. In this implementation, to achieve full convolutional processing, location information is provided using gradient location maps of (x, y) coordinates within the training images. In this implementation, the model and discriminator have a form that provides output and is conditioning using the objective function (e.g., loss function) described above.

[0118] In this implementation, because training uses mosaicks, aging maps, and location maps, additional components of environment 1412 include image mosaicks ( Ip ) Component 1420 of the fabricator, aging diagram ( Ap ) Maker component 1422 and position map ( Xp, Yp Creator component 1424. Other components are not shown. In this embodiment, a data server (e.g., 1404) or other form of computing device stores an image dataset 1426 of (high-resolution) images for training and other purposes, and is coupled via one or more networks, typically shown as network 1428, which is coupled to any one of computing devices 1402, 1404, 1406, 1408, and 1410. For example, network 1428 may be wireless or otherwise, public or otherwise, etc. It will also be understood that system 1400 is simplified. At least any of the services may be implemented by more than one computing device.

[0119] Once trained, generator 1416 can be further defined as desired and provided as an inference-time model (generator G). IT 1430. Based on the techniques and methodologies described herein, in the implementation, the inference time model (generator G) is made... IT 1430) can be used in different ways. In implementation, it can be used in ways such as Figure 14 One method shown is the generator G. IT 1430 is provided as cloud service 1432 or other Software as a Service (SaaS) provided via cloud server 1408. User applications such as augmented reality (AR) applications 1434 are defined as related to the generator G. IT 1430 provides an interface for use with cloud service 1432. In an implementation, AR application 1434 is provided for distribution (e.g., via download) from application distribution service 1436 provided by server 1406.

[0120] Although not shown, in this embodiment, an AR application 1434 is developed using an application developer's computing device for a specific target device with specific hardware and software (especially operating system configuration, etc.). In this embodiment, the AR application 1434 is a native application configured to execute in a specific local environment (such as a specific local environment defined for a specific operating system (and / or hardware)). The native application is typically distributed via an application distribution service 1436 configured to operate as an e-commerce "store" by a third-party service, although this is not mandatory. In this embodiment, the AR application 1420 is a browser-based application, for example, configured to execute in the browser environment of the target user's device.

[0121] AR application 1434 is provided for distribution (e.g., download) by a user device such as mobile device 1410. In an embodiment, AR application 1434 is configured to provide an augmented reality experience to a user (e.g., via an interface). For example, effects are applied to an image via processing by an inference time generator 1430. The mobile device has a camera (not shown) that captures an image (e.g., captured image 1438), which in an embodiment is a still image including a selfie. Effects are applied to captured image 1438 using image processing techniques that provide image-to-image transformations. An aging image 1440 is defined and displayed on a display device (not shown) of mobile device 1410 to simulate the effects on captured image 1438. The position of the camera can be changed, and effects applied in response to another captured image simulate augmented reality. It should be understood that a captured image defines a source image or original image, while an aging image defines a transformed or altered image or an image to which effects are applied.

[0122] exist Figure 14 In this cloud service example of this embodiment, the captured image 1438 is provided to the cloud service 1432, where it is generated by generator G. IT 1430 processes the image to image conversion using continuous aging to define the aged image 1440. The aged image 1440 is then transmitted to the mobile device 1440 for display, storage, sharing, etc.

[0123] In one implementation, AR application 1434 provides an interface (not shown) for operating AR application 1434, which may be, for example, a voice-enabled graphical user interface (GUI). This interface is configured to capture images, communicate with cloud services, and display, save, and / or share transformed images (e.g., aging image 1440). In another implementation, the interface is configured to allow the user to provide input to the cloud service, such as defining an aging map. As previously mentioned, in this implementation, the input includes a target age. As previously mentioned, in this implementation, the input includes an age increment. As previously mentioned, in this implementation, the input includes a product / service selection.

[0124] exist Figure 14 In one implementation, AR application 1434 or another (not shown) provides access (e.g., via communication) to computing device 1450 providing e-commerce service 1452. E-commerce service 1452 includes a recommendation component 1454 for providing (personalized) recommendations for products, services, or both. In this implementation, such products and / or services are anti-aging or restorative products and / or services, etc. In this implementation, such products and / or services are associated, for example, with specific skin markers. Images captured from device 1410 are provided to e-commerce service 1452. According to one implementation, skin marker analysis is performed, for example, by using a deep learning skin marker analyzer model 1456. Image processing analysis of the skin (e.g., regions of the face associated with specific skin markers) is performed using the trained model to generate a skin analysis that includes scores of at least some of the skin markers. Values of individual scores on images can be generated using a (dedicated) aging marker estimation model (e.g., a classifier type) based on the ResNet

[27] architecture, such as those previously described for analyzing training set data.

[0125] In this implementation, skin tags (e.g., their scores) are used to generate personalized recommendations. For example, a corresponding product (or service) is associated with one or more skin tags and with a specific score (or score range) for such a tag. In this implementation, the information is stored in a database (e.g., 1460) for use by e-commerce service 1452, such as through a suitable lookup that matches user data with product and / or service data. In this implementation, rule-based matching can be used to select one or more products and / or rank products / services associated with a specific score (or score range) for such a tag. In this implementation, additional user data for use by recommendation component 1454 includes any data such as gender, demographics, and location data. For example, location data may be associated with any of the following: product / brand, formula, regulatory requirements, format (e.g., size, etc.), label, SKU (stock keeping unit) that can be used for a user's location or otherwise associated with the user's location. In this implementation, any such gender, demographics, and / or location data may also help select and / or sort selected products / services or filter products / services (e.g., remove products / services not sold in or for a location). In this implementation, location data is used to determine whether a retailer / service provider is available (e.g., whether it utilizes a physical business location (e.g., a store, salon, office, etc.)) so that users can purchase products / services locally.

[0126] In this embodiment, skin marker scores from user-captured images are provided from an e-commerce service for display, such as via AR application 1434 in an AR application interface. In this embodiment, the skin marker scores are used to define an aging map, which is provided to cloud service 1432 for generator G. IT 1430 is used to define the transformed image. For example, in this embodiment, skin marker scores generated by model 1456 are used as originally generated from the image to define aging map values for some skin markers. Other skin marker scores, as initially generated, are modified to define aging map values for some skin markers. In this embodiment, for example, the user can modify some scores generated via the interface (e.g., only skin markers around the eyes). For example, in this embodiment, scores are modified using other means such as by applying rules or other codes. In this embodiment, modifications are made to represent the restoration, aging, or any combination of selected skin markers. Apparent age values, instead of skin marker scores, can be used for some skin markers as previously described.

[0127] In a non-limiting implementation, the user receives personalized product recommendations, such as those provided by e-commerce service 1452. The user selects a specific product or service. This selection triggers a modification of the user's skin tag score for the skin tag associated with the product or service. This modification adjusts the score to simulate use of the product or service. Skin tag scores, as initially generated or as modified, are used in the aging graph and provided to cloud service 1432 to receive the aging image. As previously described herein, skin tag scores for different tags can be combined in the graph and generator G IT Different markers can be aged differently. Therefore, in this embodiment, an aging map is defined, in which some skin marker scores are generated as initially for some markers, while others have modified scores.

[0128] exist Figure 14 In this implementation, the e-commerce service 1452 is configured with a purchasing component 1458 to facilitate the purchase of products or services. Products or services include cosmetic products or services, or others. Although not shown, the e-commerce service 1452 and / or the AR application 1434 provide image processing of captured images to simulate cosmetic products or services, such as applying makeup to captured images to produce images with applied effects.

[0129] Although the captured image is used as the source image for processing in the above embodiments, other source images (e.g., sources other than the camera of device 1410) are used in other embodiments. The embodiments may use captured images or other source images. Whether it is a captured image or another image, in the embodiments, such an image is a high-resolution image to improve the user experience because generator G... IT The model 1430 was trained on. Although not shown, in this embodiment, the image used by the skin landmark analyzer model is scaled down when analyzed. Further image preprocessing is performed for this analysis.

[0130] In one implementation, AR application 1434 may guide the user to improve performance regarding quality features (i.e., lighting, centering, background, hair occlusion, etc.). In another implementation, AR application 1434 may reject images if they do not meet certain minimum requirements or are deemed unsuitable.

[0131] Despite Figure 14 While shown as a mobile device, in embodiments, as stated, computing device 1410 may have a different form factor. Conversely (or additionally), a generator G is provided. IT As a cloud service, 1430 can be locally hosted and executed as a specific computing device with sufficient storage and processing resources.

[0132] Therefore, in embodiments, a computing device (e.g., device 1402, 1408, or 1410) is provided, including: a processing unit configured to: receive an original image of a face and use an age simulation generator to generate a transformed image for presentation; wherein the generator simulates aging by continuously controlling a plurality of age-related skin markers between the original image of the face and the transformed image, and the generator is configured to transform the original image using individual aging targets of the skin markers. It should be understood that such a computing device (e.g., device 1402, 1408, or 1410) is configured to perform relevant method aspects according to the embodiments, for example, as referenced... Figure 15 As described herein. It should be understood that this implementation of the computing device aspect has a corresponding implementation of the method aspect. Similarly, the computing device and method aspects have corresponding computer program product aspects. The computer program aspect includes a storage device for (e.g., non-transitory) instructions that, when executed by the processor of the computing device, configure the computing device to perform methods such as those according to any of the various embodiments herein.

[0133] In this implementation, the generator is based on conditional GAN. In this implementation, the target is provided to the generator as an aging map identifying regions associated with various skin landmarks in the facial skin landmarks, wherein each region in the aging map is filled with a corresponding aging target corresponding to the associated skin landmark. In this implementation, the aging map represents a specific aging target of an associated skin landmark using the score value of the associated skin landmark. In this implementation, the aging map represents a specific aging target of an associated skin landmark using the apparent age value of the associated skin landmark. In this implementation, the aging map represents a specific aging target of an associated skin landmark using the score value of the associated skin landmark (when available) and the apparent age value when the score value is unavailable. In this implementation, the aging map is defined as representing the aging target using pixel intensity.

[0134] In this implementation, the aging image masks the background of the original image.

[0135] In one implementation, the generator is configured by training on individual training images and associated aging maps, wherein the associated aging maps provide weak spatial supervision to guide the aging transformation of individual skin landmarks. In another implementation, skin landmarks represent population-specific dimensions of aging. Specifically, skin landmarks represent one or more of the following: "age," "forehead wrinkles," "nasolabial folds," "under-eye wrinkles," "frown lines," "interocular wrinkles," "corner of the mouth wrinkles," "upper lip," and "lower facial sagging."

[0136] In one implementation, the generator is a fully convolutional encoder-decoder, and residual blocks included in the decoder are combined with an aging target in the form of an aging map. In another implementation, the generator is configured to use patch-based training, which uses a portion of a specific training image and a corresponding patch of the associated aging map. In yet another implementation, the residual blocks also incorporate positional information to indicate the corresponding location of that portion of the specific training image and the corresponding patch of the associated aging map. In yet another implementation, various gradient maps defined from horizontal and vertical gradient maps related to the height and width (H×W) size of the original image are used. X and Y Coordinate graphs are used to provide location information. In one implementation, the specific training image is a high-resolution image, and the patch size is a portion of it. In another implementation, the patch size is half or less of the high-resolution image.

[0137] In the implementation, the generator is configured via an attention mechanism to limit the generator to transforming age-related skin markers while minimizing additional transformations to be applied.

[0138] In one implementation, (e.g., of device 1410) the processing unit is configured to communicate with a second computing device (e.g., 1408) that provides a generator for use, the processing unit transmitting the original image and receiving the converted image.

[0139] In this implementation, the original image is a high-resolution image of 1024×1024 pixels or higher.

[0140] In one embodiment, the processing unit (e.g., of computing device 1410) is also configured to provide augmented reality applications to simulate aging using transformed images. In another embodiment, the computing device includes a camera, and the processing unit receives raw images from the camera.

[0141] In an implementation, the processing unit is configured to provide at least one of the following: a recommendation function that recommends at least one of the products and services; and an e-commerce function that allows the purchase of at least one of the products and services. In this implementation, within the context, the operation of "providing" includes communicating with a web-based or other web-based service provided by another computing device (e.g., 1450) to facilitate recommendations and / or purchases.

[0142] In this implementation, the product includes one of the following: a recovery product, an anti-aging product, and a cosmetic product.

[0143] In this implementation, the service includes one of the following: a restoration service, an anti-aging service, and a cosmetic service.

[0144] Figure 15This is, for example, an operational flowchart 1500 of the method aspect according to an embodiment, executed by computing devices 1402 or 1408. In step 1502, the operation receives an original image of a face, and in step 1504, an age simulation generator generates a transformed image for presentation; wherein the generator simulates aging by continuously controlling multiple age-related skin markers between the original image of the face and the transformed image, and the generator is configured to transform the original image using various aging targets of the skin markers. As described, the relevant computing device aspect of the embodiment has a corresponding method implementation.

[0145] In one implementation, the network model training environment provides a computing device configured to execute methods, such as methods configured by adjusting a (GAN-based) age simulation generator. In another implementation, the method includes: defining an age simulation generator that has continuous control over multiple age-related skin landmarks between an original image and a transformed image of a face, wherein the definition includes training the generator using individual, unpaired training images, each of which is associated with an aging target for at least some of the skin landmarks; and providing the generator for transforming the images.

[0146] In this implementation, the generator is based on conditional GAN.

[0147] In one implementation, the method includes defining aging targets as aging maps that identify regions of the face associated with various skin landmarks, wherein each region in the aging map is filled with a corresponding aging target corresponding to an associated skin landmark.

[0148] In one embodiment, a computing device is provided, including a facial effects unit comprising processing circuitry configured to apply at least one facial effect to a source image and generate a virtual instance of the source image with the applied effect on an interface. The facial effects unit utilizes a generator to continuously control multiple age-related skin markers between the original and transformed images of the face to simulate aging. The generator is configured to transform the original image using individual aging targets of the skin markers. In this embodiment, the interface is an e-commerce interface, for example, one capable of enabling purchases or access to products / services.

[0149] In one embodiment, the computing device includes a recommendation unit comprising processing circuitry configured to present recommendations for products and / or services and to receive selections of products and / or services, wherein the products and / or services are associated with an aging target modifier for at least one of the skin markers. In another embodiment, the interface is an e-commerce interface, for example, capable of purchasing the recommended products / services. A facial effects unit is configured to generate individual aging targets for the skin markers using the aging target modifier in response to the selection, thereby simulating the effect of the products and / or services on the source image. In another embodiment, the recommendation unit is configured to obtain recommendations by: invoking a skin marker analyzer to determine a current skin marker score using the source image; and using the current skin marker score to determine the products and / or services. In another embodiment, the skin marker analyzer is configured to analyze the source image using a deep learning model. In yet another embodiment, the aging targets are defined by the current skin marker score and the aging target modifier.

[0150] in conclusion

[0151] This disclosure presents a method for creating aging maps for facial aging using clinical markers. State-of-the-art results are demonstrated on high-resolution images with complete control over the aging process. In this implementation, a patch-based approach allows for training of conditional GANs on large images while maintaining a large batch size.

[0152] Actual implementations may include any or all of the features described herein. These and other aspects, features, and different combinations may be expressed as methods, apparatus, systems, means for performing functions, program products, and other combinations of the features described herein. Multiple implementations have been described. However, it should be understood that various modifications may be made without departing from the spirit and scope of the processes and techniques described herein. Furthermore, additional steps may be provided, or steps may be eliminated from the described process, and other components may be added to or removed from the described system. Therefore, other implementations are within the scope of the appended claims.

[0153] Throughout this specification and the claims, the terms "comprise" and "contain," and variations thereof, mean "including but not limited to," and are not intended to exclude other components, integers, or steps. Throughout this specification, unless the context requires otherwise, the singular includes the plural. Specifically, when the indefinite article is used, unless the context requires otherwise, the specification should be understood to contemplate a plurality and a singularity.

[0154] Features, integers, characteristics, or groups described in connection with a particular aspect, implementation, or embodiment of the invention should be understood to be applicable to any other aspect, implementation, or embodiment, unless incompatible therewith. All features disclosed herein (including any appended claims, abstract, and drawings) and / or all steps of any disclosed method or process can be combined in any combination, except where at least some of such features and / or steps are mutually exclusive. The invention is not limited to the details of any of the foregoing embodiments or implementations. The invention extends to any novel feature or any novel combination of features disclosed in this specification (including any appended claims, abstract, and drawings), or to any novel step or any novel combination of steps of any disclosed method or process.

[0155] 1. Agustsson, E., Timofte, R., Escalera, S., Baro, IEEE(2017)

[0156] 2. Antipov, G., Baccouche, M., Dugelay, JL: Face aging with conditional generative adversarial networks. In: 2017 IEEE internationalconference on image processing (ICIP). pp. 2089–2093. IEEE (2017)

[0157] 3. Arjovsky, M., Chintala, S., Bottou, L. Wasserstein gan. arXivpreprint arXiv:1701.07875 (2017)

[0158] 4. Bazin, R., Doublet, E.: Skin aging atlas. volume 1. caucasiantype. MED’COM publishing (2007)

[0159] 5. Bazin, R., Flament, F.: Skin aging atlas. volume 2, asian type(2010)

[0160] 6. Bazin, R., Flament, F., Giron, F.: Skin aging atlas. volume 3.afro-american type. Paris: Med’com (2012)

[0161] 7. Bazin, R., Flament, F., Rubert, V.: Skin aging atlas. volume 4,indian type (2015)

[0162] 8. Brock, A., Donahue, J., Simonyan, K.: Large scale gan training forhigh fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)

[0163] 9. Chen, B.C., Chen, C.S., Hsu, W.H.: Cross-age reference coding forage-invariant face recognition and retrieval. In: European conference oncomputer vision. pp. 768–783. Springer (2014)

[0164] 10. Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.:Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE conference on computer visionand pattern recognition. pp. 8789–8797 (2018)

[0165] 11. Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: Stargan v2: Diverse imagesynthesis for multiple domains. arXiv preprint arXiv:1912.01865 (2019)

[0166] 12. Dalal, N., Triggs, B.: Histograms of oriented gradients for humandetection. In: 2005 IEEE computer society conference on computer vision andpattern recognition (CVPR’05). vol. 1, pp. 886–893. IEEE (2005)

[0167] 13. Flament, F., Bazin, R., Qiu, H.: Skin aging atlas. volume 5,photo-aging face&body (2017)

[0168] 14. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarialnets. In: Advances in neural information processing systems. pp. 2672–2680(2014)

[0169] 15. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville,A.C.: Improved training of wasserstein gans. In: Advances in neuralinformation processing systems. pp. 5767–5777 (2017)

[0170] 16. Heljakka, A., Solin, A., Kannala, J.: Recursive chaining ofreversible image-to-image translators for face aging. In: InternationalConference on Advanced Concepts for Intelligent Vision Systems. pp. 309–320.Springer (2018)

[0171] 17. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B.,Hochreiter, S.: Gans trained by a two time-scale update rule converge to alocal nash equilibrium. In: Advances in neural information processingsystems. pp. 6626–6637 (2017)

[0172] 18. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-imagetranslation with conditional adversarial networks. In: Proceedings of theIEEE conference on computer vision and pattern recognition. pp. 1125–1134(2017)

[0173] 19. Kärkkäinen, K., Joo, J.: Fairface: Face attribute dataset forbalanced race, gender, and age. arXiv preprint arXiv:1908.04913 (2019)

[0174] 20. Karras, T., Laine, S., Aila, T.: A style-based generatorarchitecture for generative adversarial networks. In: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition. pp. 4401–4410 (2019)

[0175] 21. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980 (2014)

[0176] 22. Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic imagesynthesis with spatially-adaptive normalization. In: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition. pp. 2337–2346 (2019)

[0177] 23. Pumarola, A., Agudo, A., Martinez, A.M., Sanfeliu, A., Moreno-Noguer, F.: Ganimation: Anatomically-aware facial animation from a singleimage. In: Proceedings of the European Conference on Computer Vision (ECCV).pp. 818–833 (2018)

[0178] 24. Ricanek, K., Tesafaye, T.: Morph: A longitudinal image databaseof normal adult age-progression. In: 7th International Conference onAutomatic Face and Gesture Recognition (FGR06). pp. 341–345. IEEE (2006)

[0179] 25. Rothe, R., Timofte, R., Van Gool, L.: Dex: Deep expectation ofapparent age from a single image. In: Proceedings of the IEEE internationalconference on computer vision workshops. pp. 10–15 (2015)

[0180] 26. Song, J., Zhang, J., Gao, L., Liu, X., Shen, H.T.: Dualconditional gans for face aging and rejuvenation. In: IJCAI. pp. 899–905(2018)

[0181] 27. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4,inception-resnet and the impact of residual connections on learning. In:Thirty-first AAAI conference on artificial intelligence (2017)

[0182] 28. Wang, Z., Tang, X., Luo, W., Gao, S.: Face aging with identity-preserved conditional generative adversarial networks. In: Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition. pp. 7939–7947(2018)

[0183] 29. Yazici, Y., Foo, C.S., Winkler, S., Yap, K.H., Piliouras, G.,Chandrasekhar, V.: The unusual effectiveness of averaging in gan training.arXiv preprint arXiv:1806.04498 (2018)

[0184] 30. Zeng, H., Lai, H., Yin, J.: Controllable face aging. arXivpreprint arXiv:1912.09694 (2019)

[0185] 31. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: Theunreasonable effectiveness of deep features as a perceptual metric. In:Proceedings of the IEEE Conference on Computer Vision and PatternRecognition. pp. 586–595 (2018)

[0186] 32. Zhu, H., Huang, Z., Shan, H., Zhang, J.: Look globally, agelocally: Face aging with an attention mechanism. arXiv preprint arXiv:1910.12771 (2019)

[0187] 33. Zhu, H., Zhou, Q., Zhang, J., Wang, J.Z.: Facial aging andrejuvenation by conditional multi-adversarial autoencoder with ordinalregression. arXiv preprint arXiv:1804.02740 (2018)

[0188] 34. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In:Proceedings of the IEEE international conference on computer vision. pp.2223–2232 (2017)

Claims

1. A computing device, comprising: The processing unit is configured to receive an original image of a face and use an age simulation generator to generate a transformed image for presentation. The generator simulates aging by utilizing sequential control of multiple age-related skin markers between the original image and the transformed image of the face, and the generator is configured to transform the original image using corresponding aging targets of the skin markers. The generator is a fully convolutional encoder-decoder, which includes residual blocks from the decoder to combine the aging target in the form of an aging map. The generator is configured to use patch-based training, which uses a portion of a specific training image and corresponding patches of an associated aging image. The aging target is provided to the generator as an aging map identifying regions of the face associated with corresponding skin markers in the skin landmarks, wherein each region in the aging map is filled with a corresponding aging target corresponding to the associated skin marker. The residual block further incorporates location information to indicate the corresponding position of the portion of the specific training image and the corresponding piece of the associated aging image, and wherein the position information is provided using respective X and Y coordinate maps defined from horizontal and vertical gradient maps associated with the height and width (H x W) size of the original image.

2. The computing device according to claim 1, wherein, The generator is based on conditional GAN.

3. The computing device according to claim 1 or 2, wherein, The aging graph represents a specific aging target for the associated skin marker by using a score value for the associated skin marker.

4. The computing device according to claim 1 or 2, wherein, The aging map represents a specific aging target for the associated skin marker by using the apparent age value for the associated skin marker.

5. The computing device according to claim 1 or 2, wherein, When the associated skin marker's score is available, the aging map represents a specific aging target for the associated skin marker using the score, and when the score is unavailable, the aging map represents a specific aging target for the associated skin marker using an apparent age value.

6. The computing device according to claim 1 or 2, wherein, The aging map is defined as representing the aging target using pixel intensity.

7. The computing device according to claim 1 or 2, wherein, The aging image obscures the background of the original image.

8. The computing device according to claim 1 or 2, wherein, The generator is configured to be trained using corresponding training images and associated aging maps, wherein the associated aging maps provide weak spatial supervision to guide the aging transformation of each of the skin landmarks.

9. The computing device according to claim 1 or 2, wherein, The skin markers represent specific dimensions of aging in the population.

10. The computing device according to claim 1 or 2, wherein, The skin markers refer to one or more of the following: "age", "forehead wrinkles", "nasolabial folds", "under-eye wrinkles", "frown lines", "interocular wrinkles", "corner of the mouth wrinkles", "upper lip" and "lower facial sagging".

11. The computing device according to claim 1 or 2, wherein, The specific training image is a high-resolution image, and the patch size is a part of it.

12. The computing device according to claim 11, wherein, The size of the mosaic piece is half or smaller than that of the high-resolution image.

13. The computing device according to claim 1 or 2, wherein, The generator is configured via an attention mechanism to limit the generator from transforming the age-related skin markers while minimizing any additional transformations to be applied.

14. The computing device according to claim 1 or 2, wherein, The processing unit is configured to communicate with a second computing device that provides the generator for use, the processing unit transmitting the original image and receiving the converted image.

15. The computing device according to claim 1 or 2, wherein, The original image is a high-resolution image of 1024×1024 pixels or higher.

16. The computing device according to claim 1 or 2, wherein, The processing unit is also configured to provide augmented reality applications to use the transformed image to simulate aging.

17. The computing device of claim 16, comprising a camera, and wherein, The processing unit receives the raw image from the camera.

18. The computing device according to claim 1 or 2, wherein, The processing unit is configured to provide at least one of the following: a recommendation function that recommends at least one of a product or service; and an e-commerce function that allows the purchase of at least one of the product and the service.

19. The computing device according to claim 18, wherein, The products include one of the following: recovery products, anti-aging products, and cosmetic products.

20. The computing device according to claim 18, wherein, The services include one of the following: restoration services, anti-aging services, and cosmetic services.

21. A method for processing an image, comprising: An age simulation generator is defined, which has continuous control over multiple age-related skin landmarks between an original image and a transformed image of a face. The definition includes training the generator using individual, unpaired training images, wherein each training image is associated with an aging target for at least some of the skin landmarks. The generator is provided for converting images. The generator is a fully convolutional encoder-decoder, which includes residual blocks from the decoder to combine the aging target in the form of an aging map. The generator is configured to use patch-based training, which uses a portion of a specific training image and corresponding patches of an associated aging image. The aging target is provided to the generator as an aging map identifying regions of the face associated with corresponding skin markers in the skin landmarks, wherein each region in the aging map is filled with a corresponding aging target corresponding to the associated skin marker. The residual block further incorporates location information to indicate the corresponding position of the portion of the specific training image and the corresponding piece of the associated aging image, and The location information is provided using corresponding X and Y coordinate maps defined from horizontal and vertical gradient maps related to the height and width (H×W) size of the original image.

22. The method according to claim 21, wherein, The generator is based on conditional GAN.

23. A computing device, comprising: A facial effects unit includes processing circuitry configured to apply at least one facial effect to a source image and generate a virtual instance of the source image with the applied effect on an interface. The facial effects unit utilizes a generator to simulate aging through continuous control of multiple age-related skin markers between the original and transformed images of the face. The generator is configured to transform the original image using corresponding aging targets for the skin markers. The generator is a fully convolutional encoder-decoder, which includes residual blocks from the decoder to combine the aging target in the form of an aging map. The generator is configured to use patch-based training, which uses a portion of a specific training image and corresponding patches of an associated aging image. The aging target is provided to the generator as an aging map identifying regions of the face associated with corresponding skin markers in the skin landmarks, wherein each region in the aging map is filled with a corresponding aging target corresponding to the associated skin marker. The residual block further incorporates location information to indicate the corresponding position of the portion of the specific training image and the corresponding piece of the associated aging image, and The location information is provided using corresponding X and Y coordinate maps defined from horizontal and vertical gradient maps related to the height and width (H×W) size of the original image.

24. The computing device of claim 23, further comprising: The recommendation unit includes processing circuitry configured to present recommendations for products and / or services and receive selections of said products and / or services, wherein said products and / or services are associated with an aging target modifier targeting at least one of the skin markers; and The facial effects unit is configured to respond to the selection to use the aging target modifier to generate the corresponding aging target of the skin marker, thereby simulating the effect of the product and / or service on the source image.

25. The computing device according to claim 24, wherein, The recommendation unit is configured to obtain the recommendation by: Invoke the skin marker analyzer to determine the current skin marker score using the source image; and The current skin marker score is used to determine the product and / or service.

26. The computing device according to claim 25, wherein, The skin marker analyzer is configured to use a deep learning model to analyze the source image.

27. The computing device according to claim 25 or 26, wherein, The aging target is defined by the current skin marker score and the aging target modifier.

28. The computing device according to any one of claims 23 to 26, wherein, The generator is based on conditional GAN.

29. The computing device according to claim 23, wherein, The aging graph represents a specific aging target of the associated skin marker through a score value of the associated skin marker.

30. The computing device according to claim 23, wherein, The aging map represents a specific aging target of the associated skin marker by the apparent age value of the associated skin marker.

31. The computing device according to claim 23, wherein, When a score value for the associated skin marker is available, the aging map represents a specific aging target for the associated skin marker using the score value, and when the score value is unavailable, the aging map represents a specific aging target for the associated skin marker using an apparent age value.

32. The computing device according to any one of claims 23 to 26, wherein, The aging map is defined as representing the aging target using pixel intensity.

33. The computing device according to any one of claims 23 to 26, wherein, The aging image obscures the background of the source image.

34. The computing device according to any one of claims 23 to 26, wherein, The skin markers refer to one or more of the following: "age", "forehead wrinkles", "nasolabial folds", "under-eye wrinkles", "frown lines", "interocular wrinkles", "corner of the mouth wrinkles", "upper lip" and "lower facial sagging".

35. The computing device according to any one of claims 23 to 26, wherein, The original image is a high-resolution image of 1024×1024 pixels or higher.

36. The computing device according to any one of claims 23 to 26, comprising a camera, and wherein, The computing device is configured to generate the original image from the camera.

37. The computing device according to any one of claims 24 to 26, wherein, The products include one of the following: recovery products, anti-aging products, and cosmetic products.

38. The computing device according to any one of claims 24 to 26, wherein, The services include one of the following: restoration services, anti-aging services, and cosmetic services.

39. The computing device according to any one of claims 23 to 26, wherein, The interface includes an e-commerce interface that enables the purchase of any products and services.