Three-dimensional face shape generation method, device, equipment and storage medium
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GUANGZHOU SHIYUAN ELECTRONICS CO LTD
- Filing Date
- 2021-11-22
- Publication Date
- 2026-06-19
AI Technical Summary
但是,现有技术在构造3D表情模板时,通常对每个人都构造具有同样表情形变的3D表情模板,而忽略了不同人之间同一个表情的差异性,缺乏不同个体间的个性化体现,从而影响了人脸表情的准确拟合和驱动
[0053]The three-dimensional face shape generation method provided in this application first calculates the first 3DMM parameters corresponding to the input two-dimensional face image through a first parameter estimation network model, and determines the coarse three-dimensional face shape corresponding to the two-dimensional face image based on the first 3DMM parameters and a preset 3DMM model. Then, based on the two-dimensional face image and the coarse three-dimensional face shape, the second 3DMM parameters corresponding to the two-dimensional face image are calculated through a second parameter estimation network model, and the fine three-dimensional face shape corresponding to the two-dimensional face image is determined based on the second 3DMM parameters and the preset 3DMM model. In this way, personalized three-dimensional face shapes are generated through two stages from coarse to fine, which focuses on the construction of personalized 3D face shapes for different people and fully considers the specificity of each person under the same expression, thereby improving the accuracy of 3D face reconstruction and expression fitting and enhancing the effect of face-driven processing.
Smart Images

Figure CN116152399B_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of image processing technology, specifically relating to a method, apparatus, device, and storage medium for generating three-dimensional human face shapes. Background Technology
[0002] With the development of technology, people's online social life is becoming increasingly rich, with applications such as live streaming, online education, video conferencing, and virtual reality emerging one after another. In order to enable people to interact better in the online world, virtualization technologies such as facial motion capture and driving have been vigorously developed. In particular, facial motion capture and driving technology based on images captured by ordinary cameras, such as those on portable mobile terminals like mobile phones and tablets, has received widespread attention because it does not require special equipment.
[0003] Facial motion capture and actuation technology typically involves capturing facial expressions and head poses from 2D (two-dimensional) face images and transferring them to a 3D (three-dimensional) virtual character to drive the 3D face. However, existing technologies often construct 3D expression templates for each individual with the same expression changes, ignoring the differences in the same expression between different people. This lack of individualized representation affects the accurate fitting and actuation of facial expressions. Summary of the Invention
[0004] This application proposes a method, apparatus, device, and storage medium for generating three-dimensional face shapes. It can first obtain a rough three-dimensional face shape from an input two-dimensional face image by estimating the first 3DMM parameters predicted by a first parameter estimation network model, and then obtain a fine three-dimensional face shape based on the rough three-dimensional face shape by estimating the second 3DMM parameters predicted by a second parameter estimation network model.
[0005] The first aspect of this application proposes a method for generating a three-dimensional human face shape, including:
[0006] The first 3DMM parameters corresponding to the input two-dimensional face image are calculated using the trained first parameter estimation network model.
[0007] The rough three-dimensional face shape corresponding to the two-dimensional face image is determined based on the first 3DMM parameters and the preset 3DMM model.
[0008] Based on the two-dimensional face image and the rough three-dimensional face shape, the second 3DMM parameters corresponding to the two-dimensional face image are calculated using a trained second parameter estimation network model.
[0009] The fine three-dimensional face shape corresponding to the two-dimensional face image is determined based on the second 3DMM parameters and the preset 3DMM model.
[0010] In some embodiments of this application, based on the two-dimensional face image and the rough three-dimensional face shape, a second 3DMM parameter corresponding to the two-dimensional face image is calculated using a trained second parameter estimation network model, including:
[0011] The three-dimensional appearance of the rough three-dimensional face shape relative to the preset standard face is determined, and the preset standard face is selected from the preset standard face set of the 3DMM model based on the two-dimensional face image;
[0012] Based on the rough 3D face shape and the changes in the 3D appearance, the second 3DMM parameters corresponding to the 2D face image are calculated using a trained second parameter estimation network model.
[0013] In some embodiments of this application, determining the variation of the rough three-dimensional face shape relative to the three-dimensional appearance of a preset standard face includes:
[0014] The rough three-dimensional face shape is mapped to the UV space by UV mapping to obtain the rough two-dimensional UV map corresponding to the rough three-dimensional face shape;
[0015] Determine the two-dimensional representation of the coarse two-dimensional UV map relative to the two-dimensional UV map of a preset standard human face;
[0016] Based on the two-dimensional surface changes, the three-dimensional surface changes of the rough three-dimensional face shape relative to a preset standard face are determined by a mapping network.
[0017] In some embodiments of this application, after determining the two-dimensional representation change of the coarse two-dimensional UV map relative to the two-dimensional UV map of a preset standard face, the method further includes:
[0018] Calculate the Euclidean distance of each vertex of the two-dimensional table case variation, and form an attention mask for the two-dimensional table case variation based on the Euclidean distance. The attention mask is greater than or equal to 0 and less than or equal to 1.
[0019] In some embodiments of this application, the first 3DMM parameters include identity coefficient, expression coefficient, texture coefficient, lighting coefficient, and pose coefficient; the second 3DMM parameters include expression coefficient, texture coefficient, lighting coefficient, and pose coefficient.
[0020] In some embodiments of this application, before calculating the first 3DMM parameters corresponding to the input two-dimensional face image through the first parameter estimation network model, the method further includes:
[0021] Obtain the first training set; the first training set includes multiple face sample images, and each face sample image corresponds to a set of coarse 3DMM parameters;
[0022] The first parameter estimation network model is trained based on the first training set.
[0023] In some embodiments of this application, training a first parameter estimation network model based on the first training set includes:
[0024] Each face sample image in the first training set is input into the first parameter estimation network model to obtain the 3DMM parameters corresponding to the face sample image;
[0025] The first parameter estimation network model is trained by a preset first loss function, so that the 3DMM parameters obtained based on the face sample image are equal to the corresponding coarse 3DMM parameters.
[0026] In some embodiments of this application, the preset first loss function is:
[0027] Lcom=λ pho L pho +λ per L per +λ lm L lm +λ reg L reg +λ sp L sp
[0028] Among them, L pho L per L lm and (L) reg L sp ) are the loss values calculated using the image reconstruction loss function, image perception loss function, keypoint reconstruction loss function, and regularization loss function, respectively; λ pho , λ per , λ lm , λ reg , λ sp All are greater than 0, representing the hyperparameters of the corresponding loss functions.
[0029] In some embodiments of this application, before calculating the second 3DMM parameters corresponding to the two-dimensional face image using a trained second parameter estimation network model based on the two-dimensional face image and the rough three-dimensional face shape, the method further includes:
[0030] Obtain a second training set, which includes multiple face sample images and the coarse 3D face sample shape and fine 3DMM parameters corresponding to each face sample image;
[0031] The second parameter estimation network model is trained based on the second training set.
[0032] In some embodiments of this application, training the trained second parameter estimation network model based on the second training set includes:
[0033] The rough 3D face sample shape corresponding to each face sample image in the second training set is determined relative to the 3D expression sample deformation of the preset standard face. The preset standard face is selected from the preset standard face set of the 3DMM model based on the 2D face image.
[0034] The second parameter estimation network model is trained by a preset second loss function, so that the 3DMM parameters obtained based on the face sample image and the corresponding coarse 3D face shape are equal to the corresponding fine 3DMM parameters.
[0035] In some embodiments of this application, the preset second loss function is:
[0036] L = L com +λ gra L gra
[0037] Among them, L com Let L be the preset first loss function. gra Let λ be the expression gradient loss function. gra >0 is a hyperparameter of the facial gradient loss function.
[0038] In some embodiments of this application, the facial expression gradient loss function is:
[0039]
[0040] Among them, G a→b Let represent the gradient of the deformed 3D face image b with respect to the original 3D face image a.
[0041] In some embodiments of this application, determining the rough 3D face shape corresponding to the 2D face image based on the first 3DMM parameters and a preset 3DMM model includes:
[0042] Based on the first 3DMM parameters and the preset 3DMM model, a first set of three-dimensional expression templates corresponding to the two-dimensional face image is determined. The first set of three-dimensional expression templates includes multiple rough three-dimensional face shapes with different expressions.
[0043] Determining the detailed 3D face shape corresponding to the 2D face image based on the second 3DMM parameters and the preset 3DMM model includes:
[0044] Based on the second 3DMM parameters and the preset 3DMM model, a second set of three-dimensional expression templates corresponding to the two-dimensional face image is determined. The second set of three-dimensional expression templates includes multiple fine three-dimensional face shapes with different expressions.
[0045] An embodiment of the second aspect of this application provides a three-dimensional face shape generation apparatus, comprising:
[0046] The first parameter calculation module is used to calculate the first 3DMM parameters corresponding to the input two-dimensional face image by estimating the network model using the first parameters.
[0047] The rough shape determination module is used to determine the rough three-dimensional face shape corresponding to the two-dimensional face image based on the first 3DMM parameters and the preset 3DMM model.
[0048] The second parameter calculation module is used to calculate the second 3DMM parameters corresponding to the two-dimensional face image based on the two-dimensional face image and the rough three-dimensional face shape through the second parameter estimation network model.
[0049] The fine shape determination module is used to determine the fine three-dimensional face shape corresponding to the two-dimensional face image based on the second 3DMM parameters and the preset 3DMM model.
[0050] An embodiment of the third aspect of this application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the method described in the first aspect above.
[0051] An embodiment of the fourth aspect of this application provides a computer-readable storage medium having a computer program stored thereon, the program being executed by a processor to implement the method described in the first aspect above.
[0052] The technical solutions provided in this application embodiment have at least the following technical effects or advantages:
[0053] The three-dimensional face shape generation method provided in this application first calculates the first 3DMM parameters corresponding to the input two-dimensional face image through a first parameter estimation network model, and determines the coarse three-dimensional face shape corresponding to the two-dimensional face image based on the first 3DMM parameters and a preset 3DMM model. Then, based on the two-dimensional face image and the coarse three-dimensional face shape, the second 3DMM parameters corresponding to the two-dimensional face image are calculated through a second parameter estimation network model, and the fine three-dimensional face shape corresponding to the two-dimensional face image is determined based on the second 3DMM parameters and the preset 3DMM model. In this way, personalized three-dimensional face shapes are generated through two stages from coarse to fine, which focuses on the construction of personalized 3D face shapes for different people and fully considers the specificity of each person under the same expression, thereby improving the accuracy of 3D face reconstruction and expression fitting and enhancing the effect of face-driven processing.
[0054] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application. Attached Figure Description
[0055] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the scope of this application. Furthermore, the same reference numerals denote the same parts throughout the drawings.
[0056] In the attached diagram:
[0057] Figure 1 A flowchart illustrating a three-dimensional face shape generation method according to an embodiment of this application is shown.
[0058] Figure 2 A schematic diagram illustrating the generation process of a rough facial expression template provided in an embodiment of this application is shown;
[0059] Figure 3 A schematic diagram illustrating the generation process of a detailed facial expression template provided in an embodiment of this application is shown;
[0060] Figure 4 A schematic diagram of the structure of a three-dimensional face shape generation device provided in an embodiment of this application is shown;
[0061] Figure 5 This illustration shows a schematic diagram of the structure of an electronic device according to an embodiment of this application;
[0062] Figure 6 A schematic diagram of a storage medium provided in one embodiment of this application is shown. Detailed Implementation
[0063] Exemplary embodiments of this application will now be described in more detail with reference to the accompanying drawings. While exemplary embodiments of this application are shown in the drawings, it should be understood that this application may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided to enable a more thorough understanding of this application and to fully convey the scope of this application to those skilled in the art.
[0064] It should be noted that, unless otherwise stated, the technical or scientific terms used in this application shall have the ordinary meaning as understood by one of ordinary skill in the art to which this application pertains.
[0065] The following description, in conjunction with the accompanying drawings, describes a method, apparatus, device, and storage medium for generating a three-dimensional human face shape according to embodiments of this application.
[0066] Currently, related technologies generally rely on 3D face generation techniques based on 3D facial deformation models (3DMM models) to capture facial expressions and head poses from 2D (two-dimensional) face images and transfer them to 3D virtual characters to drive 3D faces. However, when constructing 3D expression templates (3D face shapes), a 3D face shape with the same expression changes is usually constructed for each person, ignoring the differences in the same expression between different people. This lack of individualized representation affects the accurate fitting and driving of facial expressions.
[0067] Based on this, this application provides a method for generating three-dimensional face shapes, which can be applied to any server or other electronic device (such as a computer, tablet, or mobile phone) capable of image processing. This method is based on a relatively mature 3DMM (3D Statistical Model of Face Deformation). First, a coarse three-dimensional face shape corresponding to the input two-dimensional face image is determined using a trained first parameter estimation network model and a preset 3DMM model. Then, based on this coarse three-dimensional face shape and the input two-dimensional face image, a fine three-dimensional face shape can be determined using a trained second parameter estimation network model and the preset 3DMM model. This method fully considers the differences in the same expression between different people, constructing different refined expression shapes for each person, fully reflecting the individuality between different individuals. The fine three-dimensional face shape generated by this method can better capture the accurate expression of the target face when fitting the expression of a two-dimensional face image, thereby improving the accuracy of face-driven processing.
[0068] A 3DMM model is a relatively basic statistical model for three-dimensional faces. It can reconstruct and generate a three-dimensional face shape based on an input two-dimensional face image. By adjusting the parameters of the 3DMM model (i.e., the 3DMM itself), a three-dimensional face shape that most closely approximates the input two-dimensional face image can be obtained. For each person's two-dimensional face image, there is a corresponding set of 3DMM parameters, which ensures that the three-dimensional face shape generated by the 3DMM model under these parameters is most similar to the two-dimensional face image.
[0069] A 3DMM model can be composed of a mesh, typically a triangular mesh. A triangular mesh consists of vertices in 3D space and the triangular faces between them. Each vertex, in addition to its position coordinates, can also contain information such as color and normals. 3DMM parameters can include, but are not limited to, identity coefficients, expression coefficients, texture (color and brightness) coefficients, lighting coefficients, and head pose coefficients. These can be understood as weighted values related to identity, expression, texture, lighting, and head pose in the 3DMM model. Each coefficient in a 3DMM model controls the local variations of the face.
[0070] In this embodiment, the 3DMM model can be represented by the following formulas (1) and (2), and the identity basis B based on principal component analysis (PCA) of the Basel Face Model dataset is used. id and texture base and B tex The FaceWarehouse dataset, which contains facial expression templates with clear semantic information (such as wide-eyed, closed-eye, frowning, and raised-eyebrow expressions), uses 46 facial expression template offsets (i.e., the deformation of the 3D face shape relative to the preset standard face shape of the 3DMM model) based on the Facial Action Coding System (FACS) as the expression base B used in this embodiment. exp The 3DMM model and the face shape generation method provided in this embodiment are described in detail.
[0071]
[0072]
[0073] in, These represent the coordinates of the average face shape (i.e., the preset standard face shape) and the texture pixel value of the 3DMM model, respectively, where n represents the number of vertices of the 3D face. This represents a dataset or matrix containing n three-dimensional coordinates, where S(α,β) represents the three-dimensional coordinates of the vertices of a 3D face, and T(δ) represents the pixel value of the RGB vertex color of the 3D face vertex. and These represent the identity basis, expression basis, and texture basis, respectively (where 46 represents the number of expression templates in the FaceWarehouse dataset based on the Facial Action Coding System (FACS), and 80 represents the color dimension). and These are the corresponding 3DMM coefficients.
[0074] The 3DMM coefficients α, β, and δ predicted by the 3DMM model and parameter estimation network can be combined with the 3DMM base using the above formulas (1) and (2) to reconstruct the shape and texture of the 3D face.
[0075] Because this embodiment uses expression base B exp These are the offsets of 46 3D expression templates from the Facewarehouse model (i.e., the deformation of the expression template relative to the expressionless face template). Therefore, after the parameter estimation network predicts the identity coefficient α from the input face image, we can rewrite formula (1) as:
[0076] S(β)=B0+B exp β(3)
[0077] in, This represents the preset standard face for reconstructing the input face image, i.e., a 3D face without expression. Furthermore, we can rewrite formula (3) as:
[0078]
[0079] Among them, B i This represents the i-th 3D emoticon template.
[0080] As can be seen from the above formula for constructing 3D expression templates, for any face image, the deformation of its 3D expression template relative to a neutral 3D face is the same. However, in reality, the same expression template for different people will have certain differences, and the expression template constructed using this method cannot model these differences, resulting in some loss of accuracy when fitting facial expressions. Therefore, in this embodiment, before calculating the first 3DMM parameters corresponding to the input 2D face image through the first parameter estimation network model, the first parameter estimation network model is trained first. Before calculating the second 3DMM parameters corresponding to the 2D face image through the trained second parameter estimation network model, the second parameter estimation network model is trained first. This improves the accuracy of the first and second parameter estimation network models in predicting 3DMM parameters.
[0081] In one embodiment, the operation of training the first parameter estimation network model described above may specifically include the following steps:
[0082] Step A1: Obtain the first training set; the first training set includes multiple face sample images, and each face sample image corresponds to a set of coarse 3DMM parameters.
[0083] Several face images can be selected directly from existing image datasets (such as the CelebA celebrity dataset, the Public Figures FaceDatabase from Columbia University, the color FERET Database, the MTFL dataset, the Voxceleb2 celebrity interview video dataset, etc.), or multiple face images can be captured as needed. Experiments can be conducted to obtain a set of 3DMM parameters corresponding to each selected face image, which can generate a 3D face shape most similar to that face sample image. The selected face images are used as face sample images, and the set of 3DMM parameters corresponding to each face sample image is used as the coarse 3DMM parameters for that face sample image. These face sample images and coarse 3DMM parameters are combined into sample pairs, and multiple such sample pairs form the first training set, which is used to train the first parameter estimation network model to make the predicted 3DMM coefficients of the first parameter estimation network model more accurate.
[0084] Step A2: Train the first parameter estimation network model based on the first training set.
[0085] The above sample pairs are selected from the first training set. The number of sample pairs obtained from the training set in each training cycle can be multiple. The face sample images from the sample pairs are input into the first parameter estimation network model. The first parameter estimation network model can be any model that can estimate the 3DMM parameters corresponding to the input face image through deep learning.
[0086] In one embodiment, step A2 may specifically include the following steps: Step A21, inputting each face sample image in the first training set into the first parameter estimation network model to obtain the 3DMM parameters corresponding to the face sample images. Step A22, training the first parameter estimation network model using a preset first loss function, so that the 3DMM parameters corresponding to the face sample images are equal to the corresponding coarse 3DMM parameters.
[0087] Specifically, the aforementioned preset first loss function is:
[0088] L com =λ pho L pho +λ per L per +λ lm L lm +λ reg L reg +λsp L sp (5)
[0089] Among them, L pho L per L lm and (L) reg L sp ) are the loss values calculated using the image reconstruction loss function, image perception loss function, keypoint reconstruction loss function, and regularization loss function, respectively; λ pho , λ per , λ lm , λ reg , λ sp All are greater than 0, representing the hyperparameters of the corresponding loss functions. Specifically, in the image reconstruction process, the image reconstruction loss usually has a larger impact on each face image compared to the other losses; correspondingly, λ pho The value of λ can be relatively large, significantly larger than the other parameters; for example, it can be set to any value between 1 and 10. However, the keypoint reconstruction loss function has a relatively small impact on each face image compared to the other losses; correspondingly, λ... reg The value can be relatively small, significantly smaller than the other parameters; for example, it can be set to 10. -k k can be greater than or equal to 3. And λ lm and λ sp It can take values between 0 and 1. Specifically, λ per , λ lm , λ reg , λ sp The values can be set to 1.9, 0.2, 0.1, 0.0001, and 0.1 respectively.
[0090] This embodiment uses image reconstruction loss to calculate the input image I and the corresponding rendered image. The pixel error between them. The image reconstruction loss function is shown in the following formula (6):
[0091]
[0092] Where i represents the pixel index. This represents the face rendering region of a 3D face in the image. A represents the face mask detected by existing face segmentation algorithms; the face mask can be understood as the probability that the current pixel's location is within a face. This indicates that the value is 1 when the pixel location is a human face skin, and 0 otherwise.
[0093] This embodiment utilizes a pre-trained face recognition network to extract the input image I and the corresponding rendered image. The network features are used, and the similarity between two network features is calculated using cosine distance, i.e., image perception loss. This loss is defined as shown in the following formula (7):
[0094]
[0095] Where f(·) represents the deep features extracted from the face recognition network, and <·,·> represents the vector inner product.
[0096] Keypoint reconstruction loss is defined as the sum of the real face keypoints Q detected by the 3D keypoint detector and the keypoints projected from the 3D face. The mean square error between them, where the key points of the 3D face refer to the key areas of the face, including eyebrows, eyes, nose, mouth, facial contours, etc., which can be obtained through vertex indexing and projected onto the 2D image plane through a projection model. The loss is defined as shown in the following formula (8):
[0097]
[0098] Where i represents the keypoint index, n represents the number of facial keypoints, which can be determined by the keypoint detector and can be 68, 81, 106, etc., ω i The weights represent key points. In this scheme, the weight of the facial contour key point is set to 1, and the weights of other facial key points are set to natural numbers greater than 1, such as 10.
[0099] To prevent the shape and texture of 3D faces from degrading, resulting in meaningless 3D faces, this embodiment applies a regularized loss constraint to the coefficients of face shape and texture. This constraint is defined as shown in the following formula (9):
[0100]
[0101] Where, λ α and λ δ They represent the coefficients α and α, respectively. n and δ n The hyperparameters are set to 1 and 0.001.
[0102] Furthermore, this embodiment also uses regularization loss to promote the sparse representation of expression coefficients, which is defined as shown in the following formula (10):
[0103]
[0104] Where m represents the number of emoji templates (e.g., m = 46), i represents the index of the emoji template, and λ α Represents the coefficient α n hyperparameter, β i This represents the i-th 3D emoticon template.
[0105] In this embodiment, to train the parameter estimation network, a textured 3D face is rendered onto the image plane. This process incorporates the illumination coefficient γ and pose coefficient p of a 3DMM model. The reconstructed 3D face texture T is further processed using a spherical harmonics lighting model (incorporating the illumination coefficient γ) to model the ambient lighting of the face image. To project the 3D face onto the image plane, this embodiment employs a perspective projection camera model (incorporating the pose coefficient p). Finally, the illuminated 3D face is rendered onto a 2D image using a projection model to obtain a rendered image on the image plane, which is then used in the network training.
[0106] In another embodiment, the operation of training the second parameter estimation network model described above may specifically include the following steps:
[0107] Step B1: Obtain the second training set, which includes multiple face sample images and the rough 3D face sample shape and fine 3DMM parameters corresponding to each face sample image.
[0108] The method for obtaining the face sample images and corresponding fine 3DMM parameters in the second training set can be the same as that in the first training set, and will not be repeated here. The coarse 3D face sample shape in the second training set is the output coarse 3D face sample shape of the 3DMM model with the selected face sample images as input coefficients of the corresponding fine 3DMM parameters. Then, the selected face sample images, the corresponding fine 3DMM parameters, and the coarse 3D face sample shapes are combined into sample groups. Multiple sample groups constitute the second training set, which is used to train the second parameter estimation network model so that the fine 3DMM parameters predicted by the second parameter estimation network model are more accurate.
[0109] Step B2: Train the second parameter estimation network model based on the second training set.
[0110] The above sample groups are selected from the second training set. The number of sample groups obtained from the training set in each training cycle can be multiple. The face sample images and coarse 3D face sample shapes from the sample groups are input into the second parameter estimation network model. The second parameter estimation network model can be any model that can estimate the fine 3DMM parameters corresponding to the input face images and coarse 3D face sample shapes through deep learning.
[0111] In another embodiment, step B2 above may specifically include the following steps:
[0112] Step B21: Determine the deformation of the rough 3D face sample shape corresponding to each face sample image in the second training set relative to the 3D expression sample of the preset standard face. The preset standard face is selected from the preset standard face set of the 3DMM model based on the 2D face image.
[0113] In this embodiment, to generate refined 3D face shapes, personalized expression variations are learned for different individuals based on the coarse 3D face shapes. Specifically, firstly, the coarse 3D face sample shape is mapped to UV space through UV mapping to obtain a 2D sample UV map corresponding to the coarse 3D face sample shape. Then, the 2D expression sample deformation of the 2D sample UV map relative to the 2D UV map of a preset standard face is determined, and based on the 2D expression sample deformation, the 3D expression sample deformation of the coarse 3D face sample shape relative to the preset standard face is determined through a mapping network.
[0114] Specifically, after determining the deformation of the two-dimensional expression sample in the two-dimensional sample UV map relative to the two-dimensional UV map of the preset standard face, the deformation B of the rough three-dimensional face shape relative to the preset standard face can also be calculated. i -B0 is the Euclidean distance of each vertex, and a threshold of 0.001 is set. Vertex positions below this threshold are set to 0. This deformation value is then normalized to the range of 0-1 and used as the attention mask A. i (Attention mask is greater than or equal to 0 and less than or equal to 1). Attention mask A i Reflecting the importance of local regions in the context of two-dimensional table changes, the personalized two-dimensional table changes to be learned can be constrained to local regions similar to a coarse three-dimensional shape. Therefore, the attention mask A can be used. i Adding to a personalized two-dimensional table changes the case Δ i Above, let F be the function that maps from UV space to 3D space. Then, the deformation of a 3D facial expression sample can be expressed as F(A). i Δ i ).
[0115] Step B22: Train the second parameter estimation network model by using a preset second loss function, so that the 3DMM parameters obtained based on the face sample image and the corresponding coarse 3D face shape are equal to the corresponding fine 3DMM parameters.
[0116] To better utilize neural networks for training, the coarse 3D face shape can first be mapped to a 2D UV space for learning, and then mapped back to 3D space from the UV space. The function for mapping from UV space to 3D space can be defined as F. The refined 3D face shape is then represented by the following formula (9):
[0117]
[0118] Among them, F(A)i Δ i ) represents the deformation of the three-dimensional expression sample of the i-th face image. According to formula (9), the refined three-dimensional face shape B′ i =B i +F(A i Δ i ), relative to the rough three-dimensional human face shape B i Personalized appearance changes have been added.
[0119] In the process of calculating fine 3DMM parameters, compared with the stage of calculating coarse 3DMM parameters, the expressions of other 3DMM models (such as texture model, lighting model, and projection model) are the same, except that the shape expression of the 3D face is different.
[0120] In the fine 3D face shape generation stage, the training method of the coarse 3D face shape generation stage can also be adopted. By performing a self-supervised training paradigm in the 2D image space, the training of the second parameter estimation network and the mapping network can be supervised.
[0121] Specifically, to prevent the addition of personalized expressions from altering the semantic information of the original expression template, an additional expression template gradient loss is introduced. Therefore, the definition of the aforementioned preset second loss function can be shown in the following formula (10):
[0122] L = L com +λ gra L gra (10)
[0123] Among them, L com As a preset first loss function, L gra Let λ be the expression gradient loss function. gra >0 represents the hyperparameter of the facial gradient loss function.
[0124] If free deformation is directly added to a rough 3D face shape, it may be possible to change the semantics of the facial expression even after using an attention mask to confine it to a local region. For facial expression driving, the facial expressions of different people should have the same semantic information. Therefore, the expression template gradient loss can be used to make the gradient of the refined 3D face shape after deformation close to the gradient of the rough 3D face shape. The expression gradient loss function is defined as shown in the following formula (11):
[0125]
[0126] Among them, G a→b This represents the gradient of the fine 3D face image b with respect to the coarse 3D face image a, which is the same face image.
[0127] It should be noted that since the second parameter estimation network model is trained based on the rough face shape obtained by the first parameter estimation network model, it can be determined that the identity coefficient predicted by the second parameter estimation network model is the same as that predicted by the first parameter estimation network model. Therefore, during the training of the second parameter estimation network model, only the expression coefficient, texture (color brightness) coefficient, illumination coefficient, and head pose coefficient need to be trained.
[0128] After training the first and second parameter estimation network models as described above, as follows: Figure 1 As shown, the three-dimensional face shape is generated using the first parameter estimation network model and the second parameter estimation network model through the following steps:
[0129] Step S1: Calculate the first 3DMM parameters corresponding to the input two-dimensional face image using the trained first parameter estimation network model.
[0130] The execution entity of this 3D face shape generation method can be a server. The server receives any 2D face image input from the terminal and can calculate the first 3DMM parameters corresponding to the input 2D face image through the first parameter estimation network model trained above. The specific calculation process can refer to the training process of the first parameter estimation network model mentioned above, and will not be repeated here.
[0131] Step S2: Determine the rough three-dimensional face shape corresponding to the two-dimensional face image based on the first 3DMM parameters and the preset 3DMM model.
[0132] After calculating the first 3DMM parameter, the server can assign the calculated first 3DMM parameter to a preset 3DMM model to form a specific coarse 3DMM model corresponding to the input two-dimensional face image. By inputting the two-dimensional face image into the specific coarse 3DMM model, the server can output the coarse three-dimensional face shape corresponding to the two-dimensional face image.
[0133] Step S3: Based on the two-dimensional face image and the rough three-dimensional face shape, calculate the second 3DMM parameters corresponding to the two-dimensional face image using the trained second parameter estimation network model.
[0134] After generating a rough 3D face shape, the server can first determine the 3D appearance of the rough 3D face shape relative to the preset standard face. The preset standard face is selected from the preset standard face set of the 3DMM model based on the 2D face image.
[0135] Specifically, to alleviate the difficulty of learning personalized 3D facial feature variations, a coarse 3D face shape can be mapped to a 2D UV space using UV mapping. A convolutional neural network is then used to learn the refined 3D facial feature variations in the UV space; that is, a coarse 2D UV map corresponding to the coarse 3D face shape is first obtained. Then, the 2D facial feature variation of the coarse 2D UV map relative to the 2D UV map of a preset standard face is determined. Based on the 2D facial feature variation, the 3D facial feature variation of the coarse 3D face shape relative to the preset standard face is determined through a mapping network.
[0136] In one embodiment, after determining the two-dimensional variation of the coarse two-dimensional UV map relative to the two-dimensional UV map of a preset standard face, the method may further include: calculating the Euclidean distance of each vertex of the two-dimensional variation, and forming an attention mask for the two-dimensional variation based on the Euclidean distance, wherein the attention mask is greater than or equal to 0 and less than or equal to 1.
[0137] It should be noted that the above method for determining three-dimensional facial expression deformation through UV space is only a preferred implementation of this embodiment. This embodiment is not limited to this. For example, other 3D networks can also be used to learn personalized facial expression template deformation in 3D space.
[0138] After the server determines the 3D appearance variation of the coarse 3D face shape relative to the preset standard face, it can calculate the second 3DMM parameters corresponding to the 2D face image based on the coarse 3D face shape and the 3D appearance variation, using a trained second parameter estimation network model. The specific calculation process can refer to the training process of the second parameter estimation network model described above, and will not be repeated here.
[0139] Step S4: Determine the detailed three-dimensional face shape corresponding to the two-dimensional face image based on the second 3DMM parameters and the preset 3DMM model.
[0140] After calculating the second 3DMM parameters, the server can assign the calculated second 3DMM parameters to a preset 3DMM model to form a specific fine 3DMM model corresponding to the input two-dimensional face image. By inputting the two-dimensional face image into the specific fine 3DMM model, the server can output the fine three-dimensional face shape corresponding to the two-dimensional face image.
[0141] In one embodiment, the 3D face shape generation method can be applied to face-driven technology. The process of generating the 3D face shape can be the process of generating expression templates in face-driven technology. Accordingly, the method may further include: determining a first set of 3D expression templates corresponding to a 2D face image based on first 3DMM parameters and a preset 3DMM model, the first set of 3D expression templates including multiple coarse 3D face shapes with different expressions; and determining a second set of 3D expression templates corresponding to a 2D face image based on second 3DMM parameters and a preset 3DMM model, the second set of 3D expression templates including multiple fine 3D face shapes with different expressions. By generating expression templates through this method, different expression templates are generated for different people, fully considering the specificity of each person under the same expression, which can improve the accuracy of 3D face reconstruction and expression fitting, thereby enhancing the effect of face-driven technology and making the face-driven process more sensitive.
[0142] It should be noted that the use of the FaceWarehouse dataset, Basel Face Model dataset, CelebA celebrity dataset, and Voxceleb2 celebrity interview video dataset in the above embodiments is only for describing the three-dimensional face shape generation method in detail, and is not a limitation of this embodiment. Other datasets can also be used in the specific implementation of the three-dimensional face shape generation method, as long as they can achieve the three-dimensional face shape generation method.
[0143] To facilitate understanding of the methods provided in the embodiments of this application, the following description is provided in conjunction with the accompanying drawings. Figure 2 and Figure 3As shown, after receiving the input 2D face image, the server can predict the coarse 3DMM coefficients (including identity coefficient α, expression coefficient β, texture coefficient δ, illumination coefficient γ, and head pose coefficient p) of the 2D face image through the trained first parameter estimation network. Then, based on the coarse 3DMM coefficients and the preset 3DMM model, 46 coarse 3D expression templates corresponding to the 2D face image can be generated, and the coarse 3D face shape of the 2D face image can be reconstructed (the reconstructed 3D face is rendered onto the input 2D face image, and the training of the first parameter estimation network can be supervised through multiple self-supervised training loss functions). Then, both the coarse 3D expression template and the original input 2D face image can be input into the second parameter estimation network to predict the fine 3DMM coefficients of the 2D face image (including expression coefficient β, texture coefficient δ, illumination coefficient γ, and head pose coefficient p). Based on the fine 3DMM coefficients and the preset 3DMM model, 46 fine 3D expression templates corresponding to the 2D face image can be generated, and the fine 3D face shape of the 2D face image can be reconstructed (the reconstructed 3D face is rendered onto the input 2D face image, and the training of the second parameter estimation network can be supervised through multiple self-supervised training loss functions; the aforementioned fine 3D expression templates can be used to train the mapping network).
[0144] The 3D face shape generation method provided in this embodiment first calculates the first 3DMM parameters corresponding to the input 2D face image through a first parameter estimation network model, and determines the coarse 3D face shape corresponding to the 2D face image based on the first 3DMM parameters and a preset 3DMM model. Then, based on the 2D face image and the coarse 3D face shape, the second parameter estimation network model calculates the second 3DMM parameters corresponding to the 2D face image, and determines the fine 3D face shape corresponding to the 2D face image based on the second 3DMM parameters and the preset 3DMM model. In this way, personalized 3D face shapes are generated through two stages from coarse to fine, focusing on the construction of personalized 3D face shapes for different people, and fully considering the specificity of each person under the same expression, thereby improving the accuracy of 3D face reconstruction and expression fitting, and enhancing the effect of face-driven processing.
[0145] Based on the same concept as the above embodiments, this application also provides a three-dimensional face shape generation apparatus, which is used to execute the three-dimensional face shape generation method provided in any of the above embodiments. Figure 4 As shown, the device includes:
[0146] The first parameter calculation module is used to calculate the first 3DMM parameters corresponding to the input two-dimensional face image by estimating the network model using the first parameters.
[0147] The rough shape determination module is used to determine the rough three-dimensional face shape corresponding to the two-dimensional face image based on the first 3DMM parameters and the preset 3DMM model.
[0148] The second parameter calculation module is used to calculate the second 3DMM parameters corresponding to the two-dimensional face image based on the two-dimensional face image and the rough three-dimensional face shape, through the second parameter estimation network model.
[0149] The fine shape determination module is used to determine the fine three-dimensional face shape corresponding to the two-dimensional face image based on the second 3DMM parameters and the preset 3DMM model.
[0150] In one embodiment, the second parameter calculation module is specifically used for:
[0151] The rough 3D face shape is determined relative to the 3D appearance of the preset standard face. The preset standard face is selected from the preset standard face set of the 3DMM model based on the 2D face image.
[0152] Based on the rough 3D face shape and 3D appearance changes, the second 3DMM parameters corresponding to the 2D face image are calculated through the trained second parameter estimation network model.
[0153] In another embodiment, the second parameter calculation module is further configured to:
[0154] By mapping the rough 3D face shape to the UV space, a rough 2D UV map corresponding to the rough 3D face shape is obtained.
[0155] Determine the two-dimensional representation of the coarse two-dimensional UV map relative to the two-dimensional UV map of a preset standard face;
[0156] Based on the two-dimensional surface variation, the three-dimensional surface variation of the rough three-dimensional face shape relative to the preset standard face is determined by a mapping network.
[0157] In another embodiment, the second parameter calculation module is further used for:
[0158] Calculate the Euclidean distance of each vertex in the 2D table case variation. Based on the Euclidean distance, form the attention mask for the 2D table case variation. The attention mask is greater than or equal to 0 and less than or equal to 1.
[0159] The 3D face shape generation device also includes a first training module, which is used for:
[0160] Obtain the first training set; the first training set includes multiple face sample images, and each face sample image corresponds to a set of coarse 3DMM parameters;
[0161] The first parameter estimation network model is trained based on the first training set.
[0162] In another embodiment, the first training module is specifically used for:
[0163] Each face sample image in the first training set is input into the first parameter estimation network model to obtain the 3DMM parameters corresponding to the face sample images;
[0164] The first parameter estimation network model is trained by pre-setting the first loss function, so that the 3DMM parameters obtained based on the face sample images are equal to the corresponding coarse 3DMM parameters.
[0165] In another embodiment, the three-dimensional face shape generation device further includes a second training module, which is used for:
[0166] Obtain the second training set, which includes multiple face sample images and the coarse 3D face sample shape and fine 3DMM parameters corresponding to each face sample image;
[0167] The second parameter estimation network model is trained based on the second training set.
[0168] In another embodiment, the second training module is specifically used for:
[0169] Determine the deformation of the rough 3D face sample shape corresponding to each face sample image in the second training set relative to the 3D expression sample of the preset standard face. The preset standard face is selected from the preset standard face set of the 3DMM model based on the 2D face image.
[0170] The second parameter estimation network model is trained by pre-setting a second loss function, so that the 3DMM parameters obtained based on the face sample image and the corresponding coarse 3D face shape are equal to the corresponding fine 3DMM parameters.
[0171] In another embodiment, the rough shape determination module is specifically used for:
[0172] Based on the first 3DMM parameters and the preset 3DMM model, the first set of three-dimensional expression templates corresponding to the two-dimensional face image is determined. The first set of three-dimensional expression templates includes multiple rough three-dimensional face shapes with different expressions.
[0173] The fine shape determination module is specifically used for:
[0174] Based on the second 3DMM parameters and the preset 3DMM model, a second set of three-dimensional expression templates corresponding to the two-dimensional face image is determined. The second set of three-dimensional expression templates includes multiple fine three-dimensional face shapes with different expressions.
[0175] The image processing apparatus provided in the above embodiments of this application and the three-dimensional face shape generation method provided in the embodiments of this application are based on the same inventive concept and have the same beneficial effects as the methods adopted, run or implemented by the applications stored therein.
[0176] This application also provides an electronic device for performing the above-described three-dimensional face shape generation method. Please refer to... Figure 5 This illustrates a schematic diagram of an electronic device provided by some embodiments of this application. For example... Figure 5 As shown, the electronic device 8 includes: a processor 800, a memory 801, a bus 802 and a communication interface 803. The processor 800, the communication interface 803 and the memory 801 are connected through the bus 802. The memory 801 stores a computer program that can run on the processor 800. When the processor 800 runs the computer program, it executes the three-dimensional face shape generation method provided in any of the foregoing embodiments of this application.
[0177] The memory 801 may include high-speed random access memory (RAM) or non-volatile memory, such as at least one disk storage device. Communication between this device network element and at least one other network element is achieved through at least one communication interface 803 (which can be wired or wireless), such as the Internet, wide area network, local area network, metropolitan area network, etc.
[0178] Bus 802 can be an ISA bus, PCI bus, or EISA bus, etc. Buses can be divided into address buses, data buses, control buses, etc. Memory 801 is used to store programs. After receiving execution instructions, processor 800 executes the programs. The three-dimensional face shape generation method disclosed in any of the aforementioned embodiments of this application can be applied to processor 800, or implemented by processor 800.
[0179] The processor 800 may be an integrated circuit chip with signal processing capabilities. In implementation, each step of the above method can be completed by the integrated logic circuitry in the hardware of the processor 800 or by instructions in software form. The processor 800 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it may also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor may be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. The storage medium is located in memory 801. Processor 800 reads the information in memory 801 and, in conjunction with its hardware, completes the steps of the above method.
[0180] The electronic device provided in this application embodiment and the three-dimensional face shape generation method provided in this application embodiment are based on the same inventive concept and have the same beneficial effects as the methods they adopt, operate or implement.
[0181] This application also provides a computer-readable storage medium corresponding to the three-dimensional face shape generation method provided in the foregoing embodiments. Please refer to... Figure 6 The computer-readable storage medium shown is an optical disc 30, on which a computer program (i.e., a program product) is stored. When the computer program is run by a processor, it executes the three-dimensional face shape generation method provided in any of the aforementioned embodiments.
[0182] It should be noted that examples of computer-readable storage media may also include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other optical and magnetic storage media, which will not be elaborated here.
[0183] The computer-readable storage medium provided in the above embodiments of this application and the three-dimensional face shape generation method provided in the embodiments of this application are based on the same inventive concept and have the same beneficial effects as the methods adopted, run or implemented by the applications stored therein.
[0184] It should be noted that:
[0185] Numerous specific details are set forth in the specification provided herein. However, it will be understood that embodiments of this application may be practiced without these specific details. In some instances, well-known structures and techniques have not been shown in detail so as not to obscure the understanding of this specification.
[0186] Similarly, it should be understood that, for the sake of brevity and to aid in understanding one or more of the various inventive aspects, in the above description of exemplary embodiments of this application, various features of this application are sometimes grouped together in a single embodiment, figure, or description thereof. However, this disclosure should not be construed as reflecting a schematic diagram in which the claimed application requires more features than expressly recited in each claim. Rather, as reflected in the following claims, inventive aspects lie in fewer than all features of a single foregoing disclosed embodiment. Therefore, the claims following the detailed description are hereby expressly incorporated into that detailed description, wherein each claim itself is a separate embodiment of this application.
[0187] Furthermore, those skilled in the art will understand that although some embodiments herein include certain features included in other embodiments but not others, combinations of features from different embodiments are intended to be within the scope of this application and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.
[0188] The above are merely preferred embodiments of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A three-dimensional face shape generation method characterized by comprising: include: The first 3DMM parameters corresponding to the input two-dimensional face image are calculated using the trained first parameter estimation network model. The rough three-dimensional face shape corresponding to the two-dimensional face image is determined based on the first 3DMM parameters and the preset 3DMM model. Based on the two-dimensional face image and the rough three-dimensional face shape, the second 3DMM parameters corresponding to the two-dimensional face image are calculated using a trained second parameter estimation network model. The fine three-dimensional face shape corresponding to the two-dimensional face image is determined based on the second 3DMM parameters and the preset 3DMM model. The step of calculating the second 3DMM parameters corresponding to the two-dimensional face image based on the two-dimensional face image and the rough three-dimensional face shape, using a trained second parameter estimation network model, includes: Determining the 3D appearance variation of the rough 3D face shape relative to a preset standard face includes: mapping the rough 3D face shape to UV space through UV mapping to obtain a rough 2D UV map corresponding to the rough 3D face shape; determining the 2D appearance variation of the rough 2D UV map relative to the 2D UV map of the preset standard face; determining the 3D appearance variation of the rough 3D face shape relative to the preset standard face through a mapping network based on the 2D face image; the preset standard face is selected from a preset standard face set of the 3DMM model based on the 2D face image. Based on the rough 3D face shape and the 3D appearance variation, the second 3DMM parameters corresponding to the 2D face image are calculated using a trained second parameter estimation network model.
2. The method of claim 1, wherein, After determining the two-dimensional representation of the coarse two-dimensional UV map relative to the two-dimensional UV map of a preset standard face, the method further includes: Calculate the Euclidean distance of each vertex of the two-dimensional table case variation, and form an attention mask for the two-dimensional table case variation based on the Euclidean distance. The attention mask is greater than or equal to 0 and less than or equal to 1.
3. The method of claim 1, wherein, The first 3DMM parameter includes identity coefficient, expression coefficient, texture coefficient, lighting coefficient, and pose coefficient; the second 3DMM parameter includes expression coefficient, texture coefficient, lighting coefficient, and pose coefficient.
4. The method according to any one of claims 1 to 3, characterized in that, Before calculating the first 3DMM parameters corresponding to the input two-dimensional face image using the trained first parameter estimation network model, the method further includes: Obtain the first training set; the first training set includes multiple face sample images, and each face sample image corresponds to a set of coarse 3DMM parameters; The first parameter estimation network model is trained based on the first training set.
5. The method of claim 4, wherein, The step of training the first parameter estimation network model based on the first training set includes: Each face sample image in the first training set is input into the first parameter estimation network model to obtain the 3DMM parameters corresponding to the face sample image; The first parameter estimation network model is trained by a preset first loss function, so that the 3DMM parameters obtained based on the face sample image are equal to the corresponding coarse 3DMM parameters.
6. The method according to claim 5, characterized in that, The preset first loss function is: in, These are the loss values calculated using the image reconstruction loss function, the image perception loss function, the keypoint reconstruction loss function, and the regularization loss function, respectively. , , , All are greater than 0, representing the hyperparameters of the corresponding loss functions.
7. The method of claim 5, wherein, Before calculating the second 3DMM parameters corresponding to the two-dimensional face image based on the two-dimensional face image and the rough three-dimensional face shape using a trained second parameter estimation network model, the method further includes: Obtain a second training set, which includes multiple face sample images and the coarse 3D face sample shape and fine 3DMM parameters corresponding to each face sample image; The second parameter estimation network model is trained based on the second training set.
8. The method of claim 7, wherein, The step of training the second parameter estimation network model based on the second training set includes: The rough 3D face sample shape corresponding to each face sample image in the second training set is determined to be deformed relative to the 3D expression sample of the preset standard face. The preset standard face is selected from the preset standard face set of the 3DMM model based on the 2D face image. The second parameter estimation network model is trained by a preset second loss function, so that the 3DMM parameters obtained based on the face sample image and the corresponding coarse 3D face shape are equal to the corresponding fine 3DMM parameters.
9. The method of claim 8, wherein, The preset second loss function is: in, For the preset first loss function, Let the facial expression gradient loss function be... is the hyperparameter of the facial gradient loss function.
10. The method of claim 9, wherein, The facial expression gradient loss function is: wherein, denotes the gradient of the deformed three-dimensional face image b with respect to the original three-dimensional face image a.
11. The method according to any one of claims 1 to 3, characterized in that, The step of determining the rough 3D face shape corresponding to the 2D face image based on the first 3DMM parameters and a preset 3DMM model includes: Based on the first 3DMM parameters and the preset 3DMM model, a first set of three-dimensional expression templates corresponding to the two-dimensional face image is determined. The first set of three-dimensional expression templates includes multiple rough three-dimensional face shapes with different expressions. Determining the detailed 3D face shape corresponding to the 2D face image based on the second 3DMM parameters and the preset 3DMM model includes: Based on the second 3DMM parameters and the preset 3DMM model, a second set of three-dimensional expression templates corresponding to the two-dimensional face image is determined. The second set of three-dimensional expression templates includes multiple fine three-dimensional face shapes with different expressions.
12. A three-dimensional face shape generation apparatus, characterized by, include: The first parameter calculation module is used to calculate the first 3DMM parameters corresponding to the input two-dimensional face image through the trained first parameter estimation network model. The rough shape determination module is used to determine the rough three-dimensional face shape corresponding to the two-dimensional face image based on the first 3DMM parameters and the preset 3DMM model. The second parameter calculation module is used to calculate the second 3DMM parameters corresponding to the two-dimensional face image based on the two-dimensional face image and the rough three-dimensional face shape, through a trained second parameter estimation network model. The fine shape determination module is used to determine the fine three-dimensional face shape corresponding to the two-dimensional face image based on the second 3DMM parameters and the preset 3DMM model. The second parameter calculation module is specifically used for: Determining the 3D appearance variation of the rough 3D face shape relative to a preset standard face includes: mapping the rough 3D face shape to UV space through UV mapping to obtain a rough 2D UV map corresponding to the rough 3D face shape; determining the 2D appearance variation of the rough 2D UV map relative to the 2D UV map of the preset standard face; determining the 3D appearance variation of the rough 3D face shape relative to the preset standard face through a mapping network based on the 2D face image; the preset standard face is selected from a preset standard face set of the 3DMM model based on the 2D face image. Based on the rough 3D face shape and the 3D appearance variation, the second 3DMM parameters corresponding to the 2D face image are calculated using a trained second parameter estimation network model.
13. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that, The processor executes the computer program to implement the method as described in any one of claims 1-11.
14. A computer-readable storage medium having a computer program stored thereon, characterized in that, The program is executed by a processor to implement the method as described in any one of claims 1-11.