Model training method, digital human driving method and related device

By transferring the parametric basis of 3DMM to the topology of the target digital human, and combining shape transfer method and RBF technology, the facial expression capture model is automatically trained, which solves the problem of poor expression driving effect in existing technologies and achieves efficient and accurate facial expression restoration and complex animation effects.

CN117132713BActive Publication Date: 2026-06-12BEIJING BAIDU NETCOM SCI & TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING BAIDU NETCOM SCI & TECH CO LTD
Filing Date
2023-09-07
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In existing technologies, it is difficult to efficiently and cost-effectively reproduce the facial expressions of real humans onto the target digital human model using 3D facial expression capture technology. Furthermore, the quality of blendshapes created by different modelers is difficult to standardize, resulting in poor expression-driven effects for digital humans.

Method used

By using 3DMM's parametric basis transfer to the target digital human topology, and combining shape transfer methods and RBF technology, the facial expression capture model is automatically trained, avoiding manual blendshape creation and directly adapting to the target digital human topology.

Benefits of technology

It achieves efficient and accurate capture of realistic human facial expressions under the target digital human topology, saving manpower and time costs, and improving the reproduction quality of digital human facial expressions and complex animation effects.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117132713B_ABST
    Figure CN117132713B_ABST
Patent Text Reader

Abstract

The present disclosure provides a model training method, a digital human driving method and related devices, relates to the technical field of artificial intelligence, and in particular to the technical field of computer vision, augmented reality, virtual reality, deep learning and the like. The specific implementation scheme is: inputting a sample image into a three-dimensional face reconstruction model to obtain a three-dimensional face reconstruction coefficient; reconstructing a three-dimensional face model of a target object under a target digital human topology based on the three-dimensional face reconstruction coefficient and a target basis; obtaining a two-dimensional face image of the three-dimensional face model of the target object; and adjusting parameters of the three-dimensional face reconstruction model based on a loss between the sample image and the two-dimensional face image to obtain a facial expression capture model. In the embodiment of the present disclosure, by migrating the basis under the 3DMM topology to the target digital human topology, the target digital human topology can be adapted, and a facial expression capture model that can accurately capture the facial expression suitable for the target digital human topology can be trained to drive the digital human.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of artificial intelligence technology, particularly to the fields of computer vision, augmented reality, virtual reality, and deep learning, and can be applied to scenarios such as artificial intelligence content generation and digital humans. Background Technology

[0002] In recent years, with the development of technologies such as artificial intelligence and machine learning, the concept of digital humans has gradually attracted widespread attention. A digital human can be understood as a virtual human figure, a digitized character created using digital technology that closely resembles a human. Digital humans can exhibit appearances and movements similar to real humans in the digital world. Compared to traditional image animation, digital human technology can achieve more complex animation effects, including physical appearance and facial expressions. Summary of the Invention

[0003] This disclosure provides a model training method, a digital human driving method, and related devices.

[0004] According to one aspect of this disclosure, a model training method is provided, comprising:

[0005] Input the sample image into the 3D face reconstruction model to obtain the 3D face reconstruction coefficients of the target object in the sample image;

[0006] Based on the 3D face reconstruction coefficients and the target basis of the target digital human topology, a 3D face model of the target object under the target digital human topology is reconstructed; the target basis is obtained by transferring the parameterized basis of the 3D face deformation statistical model 3DMM to the target digital human topology.

[0007] The three-dimensional face model of the target object is projected into two-dimensional space to obtain a two-dimensional face image of the target object;

[0008] Based on the loss between sample images and 2D face images, the parameters of the 3D face reconstruction model are adjusted to terminate training and obtain the facial expression capture model when the training convergence condition is met.

[0009] According to another aspect of this disclosure, a digital human-driven method is provided, comprising:

[0010] Get the source image;

[0011] The source image is input into the facial expression capture model to obtain the expression coefficients output by the facial expression capture model;

[0012] Based on the expression coefficients and the expression base of the digital human to be driven, control the expression of the digital human to be driven;

[0013] The facial expression base of the digital human to be driven is the same as the topology of the target digital human.

[0014] According to another aspect of this disclosure, a model training apparatus is provided, comprising:

[0015] The first input module is used to input the sample image into the 3D face reconstruction model in order to obtain the 3D face reconstruction coefficients of the target object in the sample image.

[0016] The reconstruction module is used to reconstruct the 3D face model of the target object under the target digital human topology based on the 3D face reconstruction coefficients and the target basis of the target digital human topology; the target basis is obtained by transferring the parameterized basis of the 3D face deformation statistical model 3DMM to the target digital human topology;

[0017] The projection module is used to project the 3D face model of the target object into a 2D space to obtain a 2D face image of the target object.

[0018] The adjustment module is used to adjust the parameters of the 3D face reconstruction model based on the loss between the sample images and the 2D face images, so as to end the training and obtain the facial expression capture model when the training convergence condition is met.

[0019] According to another aspect of this disclosure, a digital human actuation device is provided, comprising:

[0020] The second acquisition module is used to acquire the source image;

[0021] The second input module is used to input the source image into the facial expression capture model and obtain the expression coefficients output by the facial expression capture model.

[0022] The control module is used to control the expressions of the digital human to be driven based on the expression coefficients and the expression base of the digital human to be driven;

[0023] The facial expression base of the digital human to be driven is the same as the topology of the target digital human.

[0024] According to another aspect of this disclosure, an electronic device is provided, comprising:

[0025] At least one processor; and

[0026] The memory is communicatively connected to the at least one processor; wherein,

[0027] The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the methods of any embodiment of the present disclosure.

[0028] According to another aspect of this disclosure, a non-transitory computer-readable storage medium is provided storing computer instructions, wherein the computer instructions are used to cause the computer to perform a method according to any embodiment of this disclosure.

[0029] According to another aspect of this disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements a method according to any embodiment of this disclosure.

[0030] In this embodiment of the disclosure, by transferring the basis under the 3DMM topology to the target digital human topology, it is possible to adapt to the target digital human topology and train a facial expression capture model that can accurately capture facial expressions suitable for the target digital human topology to drive the digital human.

[0031] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description

[0032] The accompanying drawings are provided to better understand this solution and do not constitute a limitation of this disclosure. Wherein:

[0033] Figure 1 This is a flowchart illustrating the model training method according to an embodiment of this disclosure;

[0034] Figure 2 This is a schematic diagram of an RBF neural network model according to another embodiment of the present disclosure;

[0035] Figure 3 This is a schematic diagram of the overall process of a model training method according to another embodiment of the present disclosure;

[0036] Figure 4 This is a flowchart illustrating a digital human driving method according to another embodiment of the present disclosure;

[0037] Figure 5 This is a schematic diagram of the structure of a model training device according to another embodiment of the present disclosure;

[0038] Figure 6 This is a schematic diagram of the structure of a digital human driving device according to another embodiment of the present disclosure;

[0039] Figure 7 This is a block diagram of an electronic device used to implement the model training method or digital human driving method of the embodiments of this disclosure. Detailed Implementation

[0040] The exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments to aid understanding, and should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope of this disclosure. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.

[0041] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this disclosure, "multiple" means two or more, unless otherwise explicitly specified.

[0042] 3D (three-dimensional) facial expression capture technology plays a crucial role in the field of digital human technology. Its main application is to capture the facial expressions of a realistically driven human. After capturing the facial expressions of a realistically driven human, the facial expressions of a target digital human can be driven based on these expressions, so that the target digital human has the same expression changes as the realistically driven human.

[0043] In related technologies, a target digital human can be modeled by optically scanning a real human, while simultaneously capturing the facial expressions of the real human and transferring those expressions onto the target digital human model.

[0044] However, considering implementation costs, 3D facial expression capture technology typically requires training a 3DMM (three-dimensional Morphable Face Model). Since 3D faces generally possess corresponding shapes and textures, and furthermore, depending on the specific needs, they should also exhibit corresponding expressions, the 3DMM used for facial expression capture is generally trained by combining shape, texture, and expression bases. A 3DMM can be understood as a 3D structure of a face, which can be obtained by weighting and adding or multiplying many independent facial features. From the 3DMM, correlation coefficients used to construct the 3D facial structure can be obtained, such as shape coefficients, texture coefficients, and expression coefficients. Based on the 3DMM approach, an accurate 3D face can be reconstructed from a 2D image.

[0045] To achieve facial expression capture, training a 3DMM requires modelers to manually create blendshapes (blended deformers) for the 3DMM that can be applied to the topology of the target digital human. This allows the expression coefficients output by the 3DMM model to be directly applied to the topology of the target digital human, thus driving facial expressions. Blendshapes can be understood as the expression base. A single expression type of blendshape may contain hundreds of blendshapes (blended deformers), which can be understood as different facial expressions. The production cost of each blendshape is extremely high, requiring modelers to possess strong aesthetic sense and operational skills, resulting in a lengthy production process. Furthermore, the quality of blendshapes created by different modelers is difficult to standardize, making it challenging to obtain high-quality, realistically driven human facial expressions through 3DMM. Consequently, the target digital human cannot accurately reproduce the facial expression changes of a realistically driven human, making it difficult to effectively drive facial expressions.

[0046] In view of this, embodiments of this disclosure provide a model training method. For example... Figure 1 The diagram shown is a flowchart of a model training method in an embodiment of this disclosure, including:

[0047] S101, input the sample image into the 3D face reconstruction model to obtain the 3D face reconstruction coefficients of the target object in the sample image.

[0048] The sample image can be any image capable of outputting 3D face reconstruction coefficients by the 3D face reconstruction model, and can be any 2D image containing a face; this disclosure does not limit this. The sample image contains a target object, which is an image of a real human being for whom facial expression capture is required.

[0049] Before inputting sample images into a 3D face reconstruction model, the sample images can be preprocessed to facilitate better training of the 3D face reconstruction model based on the preprocessed sample images. The preprocessing process may include: detecting key points in the sample images; aligning and cropping the face regions in the sample images to a uniform size, such as 256*256, based on the key points; and then performing image normalization on the cropped images, for example, dividing the pixel value of each pixel in the sample image by 255 and then subtracting 1, so that the pixel value of each pixel is distributed between [-1,1] or [0,1].

[0050] 3D face reconstruction models include general encoders such as ResNet (residual neural network) and FaRL (pre-trained large model for face tasks), which generally contain convolutional layers and pooling layers.

[0051] S102, based on the 3D face reconstruction coefficients and the target basis of the target digital human topology, reconstructs the 3D face model of the target object under the target digital human topology.

[0052] The target basis is obtained by migrating the parametric basis of 3DMM to the topology of the target digital human. The parametric basis of 3DMM includes the shape basis and texture basis of 3DMM.

[0053] The target digital human topology is a three-dimensional structure composed of point clouds and facets with a certain structure. Points in the point cloud can be numbered, with points of the same number representing the same semantic meaning. For example, in the 2017 version of the BFM (Basel Face Model) database, the 2217th vertex of the face shape base represents the semantic meaning of the left outer corner of the eye. During implementation, the number of vertices and facets can be determined according to actual needs.

[0054] S103, Project the three-dimensional face model of the target object into a two-dimensional space to obtain a two-dimensional face image of the target object.

[0055] S104: Based on the loss between the sample image and the 2D face image, adjust the parameters of the 3D face reconstruction model to end the training and obtain the facial expression capture model when the training convergence condition is met.

[0056] The loss between the sample image and the 2D face image can be calculated using at least one of the following loss functions: L1 Loss (mean absolute error), L2 Loss (mean squared error), or Wing Loss (keypoint loss). Of course, other losses can be introduced according to actual needs, such as perceptual loss between the two images; this disclosure does not limit this.

[0057] In this embodiment, based on the 3D face reconstruction model to obtain 3D face reconstruction coefficients, the parameterized basis of the 3DMM is transferred to the topology of the target digital human to obtain the target basis. This enables direct training of the 3D face reconstruction model in conjunction with the topology of the target digital human, allowing the coefficients output by the trained facial expression capture model to be directly applied to the topological structure of the target digital human. This facial expression capture model eliminates the need for manually creating numerous blendshapes to capture realistically driven human facial expressions, saving significant manpower and time costs. Furthermore, it facilitates the reproduction of realistically driven human facial expressions on the target digital human's face, thereby enabling the target digital human to achieve complex animation effects.

[0058] Since the topology of 3DMM and the target digital human are usually different, their shape bases cannot generally be directly interchanged. Directly using the parameterized basis of 3DMM will affect the driving effect of the digital human. The topological space of 3DMM and the spatial representation of the target digital human's topology also differ, and there is usually no fixed mapping relationship. To avoid distortion and shape anomalies caused by unanchored transferred shape bases, this embodiment requires registration of the 3DMM shape base and the target digital human topology before shape base transfer to prevent deformation of the 3DMM shape base after transfer. The specific registration implementation steps are as follows:

[0059] A1: Obtain a reference face from multiple faces in 3DMM.

[0060] The reference face is any one of the multiple faces in 3DMM.

[0061] A2, register the parameterized shape base of the reference face with the digital human template under the target digital human topology to obtain the reference shape base of the reference face under the target digital human topology.

[0062] Since the target digital human topology is merely a 3D face model composed of point clouds and facets with a specific structure, lacking a parametric basis, a digital human template with a parametric basis under the target digital human topology is needed for registration with the reference face. Specifically, this involves registering the parametric shape basis of the reference face with the parametric shape basis of the digital human template under the target digital human topology. For example, the ICP (iterative closest point) algorithm based on point cloud matching can be used to achieve this registration process. Alternatively, the modeler can manually adjust the model to ensure the reference face fits snugly against the digital human template's face, thus completing the registration.

[0063] After registering the parametric shape basis of the reference face with the digital human template under the target digital human topology, a reference shape basis is obtained. This reference shape basis is the representation of the reference face under the target digital human topology, and is obtained by changing the coordinates of the midpoints of the point cloud of the parametric shape basis of the reference face.

[0064] A3, based on a reference shape base, uses a shape transfer method to transfer the shape base and texture base in the parametric base of 3DMM to the target digital human topology in order to construct the target base.

[0065] In this embodiment, the 3DMM has multiple shape bases and multiple texture bases. The target digital human topology lacks a rich set of shape and texture bases. To accurately drive the digital human within the target topology, the shape and texture bases of the 3DMM need to be migrated to the target digital human topology. Each shape base in the 3DMM represents the facial features; the facial features remain unchanged or approximate before and after migration, only the topological structure is migrated from the 3DMM to the target digital human topology.

[0066] In this embodiment, based on the registration of the parameterized shape base of the reference face and the digital human template under the target digital human topology, a shape transfer method is used to transfer the shape base of the 3DMM to the target digital human topology, which makes the shape base transfer easier to implement. After registration is completed, the shape base transfer process is simplified to a certain extent. At the same time, it helps to make the shape base of the 3DMM undergo more reasonable changes after being transferred to the target digital human topology, so that the difference between the shape base of the 3DMM and the 3DMM shape base under the target digital human topology is only that the same shape base is expressed differently under different topologies.

[0067] In some embodiments, based on a reference shape base, a shape transfer method is used to transfer the shape base and texture base in the parametric base of the 3DMM to the target digital human topology, which can be implemented as follows:

[0068] B1, based on multiple reference vertices in the reference shape base, determines multiple target vertices corresponding to multiple reference vertices in the digital human template.

[0069] Since the number of points in the reference shape base is generally less than the number of points in the digital human template, multiple or all points in the reference shape base can be used as reference vertices. These reference vertices are then mapped to points in the digital human template, and the points in the digital human template corresponding to these reference vertices can be determined as target vertices. For example, the point in the digital human template closest to a reference vertex can be used as the target vertex. This disclosure does not limit this approach. Based on these target vertices, it can be determined that the parameterized shape base of the reference face and the digital human template under the target digital human topology have been registered.

[0070] B2, based on multiple target vertices, uses a shape transfer method to transfer the shape base and texture base of 3DMM to the target digital human topology.

[0071] In this embodiment of the disclosure, the shape basis and texture basis in the parametric basis of 3DMM are transferred to the target digital human topology. This facilitates the automatic transfer of the parametric basis of 3DMM to the target digital human topology based on the shape transfer methodology, thereby improving the efficiency and accuracy of the transfer and enabling the reconstruction of a three-dimensional face under the digital human topology.

[0072] The aforementioned shape transfer method can be implemented using techniques such as RBF (Radial Basis Function) and deformation transfer.

[0073] In the RBF technique, given a point x and a set of RBF basis functions, the value of f(x) can be calculated using RBF interpolation, as shown in equation (1) below:

[0074]

[0075] Among them, w i The weights of each RBF basis function represent the unknowns to be solved. The center x of the RBF basis functions is... ' The parameterized shape base of the reference face and the target vertex in the target digital human topology after registration with the digital human template are set as the reference face. The input x is set as the position of the original vertex in the target digital human topology, and the value of f(x) is calculated as the offset of the original vertex. Thus, the offset position of the original vertex is x + f(x).

[0076] Taking the RBF (Rapid Reduction) technique as an example, if there are three known points and one unknown point, it is necessary to obtain the same type of parameter values ​​for the unknown point based on the parameter values ​​of the known points. These parameter values ​​can be, for example, the shape basis parameters or texture parameters of the point cloud. Using RBF, a first distance matrix can be calculated based on the distances between the known points, and a parameter value matrix can be obtained based on the parameter values ​​of the same type for each known point. Then, the first distance matrix and the parameter value matrix are multiplied by the inverse of the first distance matrix to obtain the weight matrix. The distances between the unknown point and the known points are then calculated separately to obtain a second distance matrix. Finally, the second distance matrix is ​​multiplied by the weight matrix to obtain the same type of parameter values ​​for the unknown point. The following, in conjunction with an embodiment of this disclosure, describes a specific implementation of using RBF as a shape transfer method to transfer the shape basis and texture basis of a 3DMM to the target digital human topology.

[0077] In some embodiments, based on multiple target vertices, a shape transfer method is used to transfer the shape basis of the 3DMM to the topology of the target digital human, which can be implemented as follows:

[0078] For the vertices to be transferred in the shape base of a 3DMM, perform the following operations:

[0079] C1, based on the position information of multiple target vertices in the topology of the target digital human, determines the distance between multiple target vertices and obtains the first distance matrix.

[0080] C2, based on the shape basis parameter matrix of multiple target vertices in the target digital human topology and the first distance matrix, calculates the first weight matrix.

[0081] The shape basis parameter is used as the parameter value in the RBF technique.

[0082] C3 determines the distances between the vertices to be migrated in the shape basis and multiple target vertices, respectively, to obtain the second distance matrix.

[0083] C4, based on the second distance matrix and the first weight matrix, determines the shape basis parameters of the vertices to be migrated in the target digital human topology.

[0084] In this embodiment of the disclosure, the shape basis of 3DMM is transferred to the target digital human topology through a shape transfer method, so as to enrich the shape basis under the target digital human topology. This is beneficial for reconstructing three-dimensional faces directly based on the shape basis in the parameterized basis of 3DMM under the target digital human topology.

[0085] In some embodiments, based on multiple target vertices, a shape transfer method is used to transfer the texture base of the 3DMM to the topology of the target digital human, which can be implemented as follows:

[0086] For the vertices to be transferred in the texture base of a 3DMM, perform the following operations:

[0087] D1, based on the position information of multiple target vertices in the topology of the target digital human, determines the distance between multiple target vertices and obtains the first distance matrix.

[0088] D2 is calculated based on the texture basis parameter matrix of multiple target vertices in the target digital human topology and the first distance matrix to obtain the second weight matrix.

[0089] The texture base parameter is used as a parameter value in the RBF technique.

[0090] D3 determines the distances between the vertex to be migrated in the texture base and multiple target vertices, resulting in the third distance matrix.

[0091] D4, based on the third distance matrix and the second weight matrix, determines the texture base parameters of the vertices to be migrated in the target digital human topology.

[0092] In this embodiment of the disclosure, the texture base of 3DMM is transferred to the target digital human topology through a shape transfer method to enrich the texture base under the target digital human topology, which is beneficial for reconstructing three-dimensional faces directly based on the texture base in the parameterized base of 3DMM under the target digital human topology.

[0093] In addition, based on the registered target vertices, the texture parameters in the parameterized base of 3DMM can be directly mapped to the texture parameters of the target vertices under the target digital human topology.

[0094] In summary, this disclosure does not limit the specific implementation of the shape transfer method.

[0095] In some embodiments, based on multiple target vertices, the shape transfer method is used to transfer the shape base and texture base of the 3DMM to the target digital human topology, which can also be implemented as follows:

[0096] E1 constructs shape-based transfer models and texture-based transfer models based on multiple target vertices.

[0097] This process requires using multiple target vertices as training labels. For example, a first RBF neural network model can be constructed and trained to obtain a shape basis transfer model. Similarly, a second RBF neural network model can be constructed and trained to obtain a texture basis transfer model. Taking the first RBF neural network model as an example, its input is the 3DMM shape basis parameters of the reference face, and through training, its output continuously approaches the shape basis corresponding to the target vertices. Thus, the shape basis transfer model is obtained.

[0098] For the second RBF neural network model, its input is the texture basis of a 3DMM, and through training, its output is continuously made to approach the texture basis corresponding to the target vertex. Thus, the texture basis transfer model is obtained.

[0099] Furthermore, for the second RBF neural network model, its input is the texture basis of 3DMM, and its output is the predicted texture basis under the digital human topology. The predicted texture basis is used by a discriminator to determine whether it belongs to the real texture or the fake texture under the target digital human topology. Thus, the parameters of the second RBF are optimized by the discriminator, and through training, its output continuously approaches the texture basis under the real target digital human topology. Thus, the texture basis transfer model is obtained.

[0100] The first RBF neural network model and the second RBF neural network model can be simplified as follows: Figure 2 As shown, the model includes an input layer, hidden layers, and an output layer. The model parameters are adjusted by continuously optimizing the weights of the hidden layers and dynamically optimizing the centers of the radial basis functions (RBFs). The hidden layers are constructed using the centers of multiple RDFs, and the initial values ​​of these centers can be set to the values ​​of the target vertex.

[0101] E2, inputs the shape basis of the 3DMM into the shape basis transfer model to transfer the shape basis of the 3DMM to the target digital human topology; and,

[0102] E3 inputs the texture base of 3DMM into the texture base transfer model to transfer the texture base of 3DMM to the target digital human topology.

[0103] The shape base and texture base of the reference face in 3DMM are respectively used as inputs to the shape base transfer model and the texture base transfer model.

[0104] In this embodiment of the disclosure, the shape basis and texture basis of the 3DMM are respectively input into the shape basis transfer model and texture basis transfer model constructed based on multiple target vertices, so as to automatically realize the transfer of the shape basis and texture basis of the 3DMM, which helps to simplify the transfer process and improve the transfer efficiency.

[0105] Since a single face in a 3DMM with multiple faces is difficult to represent the average level of the parametric shape basis of 3DMM, the average face of 3DMM is usually calculated based on the multiple faces of 3DMM, and the parametric shape basis of the average face of 3DMM is used as the parametric shape basis that can represent 3DMM.

[0106] In some embodiments, the parameterized shape basis of the reference face is the average face of a 3DMM.

[0107] Furthermore, considering the potentially significant differences in parametric shape bases among multiple faces in a 3DMM, the calculated average face in a 3DMM may not accurately represent the average level of the parametric shape base features of the 3DMM. Therefore, faces in a 3DMM can be roughly classified based on factors such as age group and gender. The average face in a 3DMM can be calculated separately for each category, and the average face in a category similar to the digital human template can be used as the parametric shape base for the reference face. This mitigates the impact of potentially large differences in the parametric shape bases of different faces on the calculation of the average face in a 3DMM.

[0108] In this embodiment of the disclosure, using the average face of the 3DMM as the parameterized shape basis of the reference face is more representative, which is beneficial for obtaining the overall shape basis features of the 3DMM based on the average face of the 3DMM.

[0109] In some embodiments, the target base includes a target shape base, a target texture base, and an expression base under the target digital human topology. This expression base refers to the original blendshapes under the target digital human topology.

[0110] Specifically, the shape basis of 3DMM is transferred to the target digital human topology as the target shape basis; the texture basis of 3DMM is transferred to the target digital human topology as the target texture basis. This target base not only includes the shape and texture bases from the parametric base of 3DMM, but also uses the expression base of the digital human template under the target digital human topology as the expression base of the target base. Therefore, the expression base of the digital human template under the target digital human topology generally has richer expression content, i.e., richer blendshapes, compared to the expression base in 3DMM.

[0111] In this embodiment of the disclosure, since the expression base of the target digital human itself is richer than that of the 3DMM, it can better capture the facial expressions of the real human. Therefore, adopting the expression base under the topology of the target digital human itself is beneficial to train the three-dimensional face reconstruction model based on the target base to obtain the facial expression capture model, and output the expression parameters that can more accurately reflect the facial expressions of the real human through the facial expression capture model.

[0112] In some embodiments, when the number of vertices in the shape basis of the 3DMM is less than the number of vertices in the target digital human topology, the shape basis parameters of the vertices in the target digital human topology that are more numerous than those in the shape basis of the 3DMM remain unchanged in the target basis.

[0113] In 3DMM, the visible facial features are included. In the target digital human topology, in addition to the visible facial features, there are also invisible or non-expression-affecting elements such as the eyes, tongue, mouth, ears, and teeth (e.g., ears and tongue). Therefore, the number of vertices in the target digital human topology is generally greater than the number of vertices in the shape base of 3DMM. The extra vertices are the vertices of the aforementioned eyes, tongue, mouth, ears, and teeth.

[0114] In this embodiment of the disclosure, during the process of transferring the shape basis of 3DMM to the target digital human topology, the vertices in the target digital human topology that are more than the shape basis of 3DMM do not participate in the relevant calculations of the shape basis transfer method in the target basis. This helps to avoid the extra vertices affecting the quality of the shape basis transfer, and also helps to avoid unnecessary changes to the shape basis parameters of the extra vertices.

[0115] In some embodiments, during the process of migrating the shape base of a 3DMM to the target digital human topology, it is necessary to keep the position of the target anchor point unchanged before and after the migration.

[0116] Here, the target anchor point refers to, for example, the back of the head or the neck point in the target digital human topology. Since no operations using the target anchor point are required during the transfer of the 3DMM shape base to the target digital human topology, it is necessary to keep the position of the target anchor point unchanged before and after the shape base transfer.

[0117] In this embodiment, keeping the position of the target anchor point unchanged before and after the shape basis of the 3DMM is transferred to the target digital human topology helps to reduce the computational burden during the shape basis transfer process to some extent. This also helps ensure that, apart from changes in the shape basis of the target digital human, the overall body structure of the target digital human remains unaffected, thus ensuring that the mesh shape of the target digital human topology obtained after shape transfer is not abnormal.

[0118] In some embodiments, the 3D face reconstruction coefficients include shape vectors, expression vectors, pose vectors, and texture vectors.

[0119] In 3D face reconstruction coefficients, shape vectors, expression vectors, pose vectors, and texture vectors are all indispensable. Pose vectors are used to adjust the pose of the target object when rendering the 3D face model to 2D space, i.e., the angle of the 3D face model. Each of these vectors has its own corresponding dimension. For example, a 3D face reconstruction model might output a 500-dimensional shape vector, a 150-dimensional expression vector, a 12-dimensional pose vector, and a 200-dimensional texture vector; the dimensions of these vectors are determined by the parametric basis of the 3DMM.

[0120] Specifically, the following equations (2) and (3) are used, along with the 3D face model of the target object reconstructed based on the 3D face reconstruction coefficients under the target digital human topology:

[0121] S = S mean +c i I base +c e E base (2)

[0122] T = T mean +c t T base (3)

[0123] Among them, S mean T represents the average face shape in 3DMM. mean I represents the average face texture in 3DMM. base T represents the face shape after PCA dimensionality reduction. base E represents the face texture after PCA dimensionality reduction. base This represents the facial expression of the digital human template within the target digital human topology. i c represents the shape vector. t Represents the texture vector, c e Represents an expression vector.

[0124] In this embodiment of the disclosure, the three-dimensional face reconstruction coefficients include various types of vectors, which is beneficial for expressing the three-dimensional face from different angles, so as to make the reconstructed three-dimensional face more realistic.

[0125] In some embodiments, projecting a three-dimensional face model of a target object into a two-dimensional space to obtain a two-dimensional face image of the target object can be implemented as follows: applying differentiable rendering to project the three-dimensional face model of the target object into a two-dimensional space to obtain a two-dimensional face image of the target object.

[0126] The purpose of obtaining the two-dimensional face image of the target object is to calculate the loss between the two-dimensional face image and the sample image. The two-dimensional face image and the sample image are respectively input into any neural network model for emotion recognition, and the loss between the output vectors of the emotion recognition model corresponding to the two images is calculated as the perceptual expression loss.

[0127] In this embodiment of the disclosure, differentiable rendering is used to project the 3D face model of the target object into a 2D space, which helps to ensure the accuracy of the obtained 2D face image of the target object. This facilitates the calculation of perceptual loss based on the 2D face image and the sample image, so as to adjust the parameters of the 3D face reconstruction model.

[0128] In some embodiments, before differentiable rendering, it is necessary to remove viewpoints from the 3D face model of the target object.

[0129] The purpose of differentiable rendering is to obtain the projection of the 3D face model of the target object into a 2D space, thus obtaining a 2D face image of the target object. This 2D face image is used to calculate the loss with the sample image, which is also a 2D image. Therefore, this process does not require the use of unseen points in the 3D face model of the target object, and these unseen points need to be removed.

[0130] In this embodiment of the disclosure, removing the unviewable points in the 3D face model of the target object before differentiable rendering helps to reduce the computational burden of differentiable rendering on unviewable points and increases rendering efficiency.

[0131] To facilitate a better understanding of the model training method in this disclosure, the overall process of the above model training method is explained below with reference to figures. Figure 3In this process, the sample image from the lower left corner is input into the 3D face reconstruction model. Then, the parametric shape basis and parametric texture basis of the 3DMM are transferred to the target digital human topology, and these are used as the target shape basis and target texture basis, respectively. Further, the target shape basis and target texture basis are combined with the expression basis of the digital human template under the target digital human topology, as well as the 3D face reconstruction coefficients output by the 3D face reconstruction model for lighting, pose, etc., to participate in the reconstruction of the 3D face model under the target digital human topology, obtaining the 3D face model of the target object. Then, the 3D face model is differentiable to obtain a 2D face image of the target object. Next, the loss between the 2D face image and the corresponding original sample image of the target object is calculated, and the parameters of the 3D face reconstruction model are adjusted based on this loss. Finally, a facial expression capture model is obtained based on the converged 3D face reconstruction model.

[0132] In summary, a facial expression capture model can be obtained based on the aforementioned model training method. This facial expression capture model is not only applicable to the target digital human topology, but also, because it is trained on the expression basis under the target digital human topology, it can be applied to digital humans with the same expression basis, i.e., blendshapes. Therefore, the facial expression capture model trained in this embodiment can be reused and replicated for digital humans with the same blendshapes.

[0133] Based on the same technical concept, this disclosure also provides a digital human driving method, such as... Figure 4 The diagram shown is a flowchart illustrating the digital human driving method in an embodiment of this disclosure, including:

[0134] S401, Obtain the source image.

[0135] S402, input the source image into the above facial expression capture model to obtain the expression coefficients output by the facial expression capture model.

[0136] S403 controls the expressions of the digital human to be driven based on the expression coefficients and the expression base of the digital human to be driven.

[0137] The facial expression base of the digital human to be driven is the same as the topology of the target digital human.

[0138] In this embodiment, the source image is input into a facial expression capture model, and the expression of the digital human to be driven is controlled based on the expression coefficients output by the facial expression capture model and the expression base of the digital human to be driven. Using a trained facial expression capture model to obtain expression coefficients helps to make the obtained expression coefficients more accurate and reliable. Applying the expression coefficients to the expression base of the digital human to be driven, based on the rich expression base of the digital human itself, helps to make the control of the expression of the digital human to be driven more rigorous and detailed, thereby making the digital human to be driven more realistic in its reproduction of the facial expression changes of a real human.

[0139] Based on the same technical concept, this disclosure also provides a model training device 500, such as... Figure 5 As shown, the device includes:

[0140] The first input module 501 is used to input the sample image into the three-dimensional face reconstruction model to obtain the three-dimensional face reconstruction coefficients of the target object in the sample image.

[0141] Reconstruction module 502 is used to reconstruct the 3D face model of the target object under the target digital human topology based on the 3D face reconstruction coefficients and the target basis of the target digital human topology; the target basis is obtained by transferring the parameterized basis of the 3D face deformation statistical model 3DMM to the target digital human topology;

[0142] The projection module 503 is used to project the three-dimensional face model of the target object into a two-dimensional space to obtain a two-dimensional face image of the target object;

[0143] The adjustment module 504 is used to adjust the parameters of the 3D face reconstruction model based on the loss between the sample image and the 2D face image, so as to end the training and obtain the facial expression capture model when the training convergence condition is met.

[0144] In some embodiments, it also includes:

[0145] The first acquisition module is used to acquire a reference face from multiple faces in 3DMM;

[0146] The registration module is used to register the parameterized shape base of the reference face with the digital human template under the target digital human topology to obtain the reference shape base of the reference face under the target digital human topology.

[0147] The migration module is used to migrate the shape base and texture base in the parametric base of 3DMM to the target digital human topology based on the reference shape base using the shape migration method, so as to construct the target base.

[0148] In some embodiments, the parameterized shape basis of the reference face is the average face of a 3DMM.

[0149] In some embodiments, the migration module includes:

[0150] A determination unit is used to determine multiple target vertices corresponding to multiple reference vertices in a digital human template based on multiple reference vertices in a reference shape base.

[0151] The migration unit is used to migrate the shape base and texture base of 3DMM to the target digital human topology based on multiple target vertices using a shape migration method.

[0152] In some embodiments, the migration unit is specifically used for:

[0153] For the vertices to be transferred in the shape base of a 3DMM, perform the following operations:

[0154] Based on the position information of multiple target vertices in the topology of the target digital human, the distance between multiple target vertices is determined, and the first distance matrix is ​​obtained;

[0155] Based on the shape basis parameter matrix and the first distance matrix of multiple target vertices in the topology of the target digital human, the first weight matrix is ​​calculated.

[0156] The distances between the vertex to be migrated and multiple target vertices of the shape basis are determined respectively, and the second distance matrix is ​​obtained;

[0157] Based on the second distance matrix and the first weight matrix, the shape basis parameters of the vertices to be migrated under the target digital human topology are determined.

[0158] In some embodiments, the migration unit is specifically used for:

[0159] For the vertices to be transferred in the texture base of a 3DMM, perform the following operations:

[0160] Based on the position information of multiple target vertices in the topology of the target digital human, the distance between multiple target vertices is determined, and the first distance matrix is ​​obtained;

[0161] The second weight matrix is ​​calculated based on the texture basis parameter matrix and the first distance matrix of multiple target vertices in the topology of the target digital human.

[0162] The distances between the vertex to be transferred in the texture base and multiple target vertices are determined to obtain the third distance matrix;

[0163] Based on the third distance matrix and the second weight matrix, the texture base parameters of the vertices to be migrated in the target digital human topology are determined.

[0164] In some embodiments, the migration unit is specifically used for:

[0165] Construct shape-based transfer models and texture-based transfer models based on multiple target vertices;

[0166] The shape basis of the 3DMM is input into the shape basis transfer model to transfer the shape basis of the 3DMM to the target digital human topology; and,

[0167] The texture base of 3DMM is input into the texture base transfer model to transfer the texture base of 3DMM to the target digital human topology.

[0168] In some embodiments, the target substrate includes a target shape substrate, a target texture substrate, and an expression substrate under the target digital human topology;

[0169] In this process, the shape basis of 3DMM is transferred to the topology of the target digital human and serves as the target shape basis.

[0170] The texture base of 3DMM is transferred to the topology of the target digital human and used as the target texture base.

[0171] In some embodiments, when the number of vertices in the shape basis of the 3DMM is less than the number of vertices in the target digital human topology, the shape basis parameters of the vertices in the target digital human topology that are more numerous than those in the shape basis of the 3DMM remain unchanged in the target basis.

[0172] In some embodiments, it also includes:

[0173] The retention module is used to maintain the position of the target anchor point before and after the migration of the 3DMM shape base to the target digital human topology.

[0174] In some embodiments, the 3D face reconstruction coefficients include shape vectors, expression vectors, pose vectors, and texture vectors.

[0175] In some embodiments, the projection module is specifically used to project a three-dimensional face model of the target object into a two-dimensional space using differentiable rendering to obtain a two-dimensional face image of the target object.

[0176] In some embodiments, it also includes:

[0177] The nulling module is used to remove unviewable points from the 3D face model of the target object before differentiable rendering.

[0178] Based on the same technical concept, this disclosure also provides a digital human driving device 600, applied to a facial expression capture model obtained based on the aforementioned device, such as... Figure 6 As shown, the device includes:

[0179] The second acquisition module 601 is used to acquire the source image;

[0180] The second input module 602 is used to input the source image into the facial expression capture model to obtain the expression coefficients output by the facial expression capture model;

[0181] Control module 603 is used to control the expressions of the digital human to be driven based on expression coefficients and the expression base of the digital human to be driven;

[0182] The facial expression base of the digital human to be driven is the same as the topology of the target digital human.

[0183] The specific functions and examples of each unit and subunit of the apparatus in this disclosure embodiment can be found in the relevant descriptions of the corresponding steps in the above method embodiments, and will not be repeated here.

[0184] According to embodiments of this disclosure, this disclosure also provides an electronic device, a readable storage medium, and a computer program product.

[0185] Figure 7 A schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.

[0186] like Figure 7 As shown, device 700 includes a computing unit 701, which can perform various appropriate actions and processes based on a computer program stored in read-only memory (ROM) 702 or a computer program loaded from storage unit 708 into random access memory (RAM) 703. RAM 703 may also store various programs and data required for the operation of device 700. The computing unit 701, ROM 702, and RAM 703 are interconnected via bus 704. Input / output (I / O) interface 705 is also connected to bus 704.

[0187] Multiple components in device 700 are connected to I / O interface 705, including: input unit 706, such as keyboard, mouse, etc.; output unit 707, such as various types of monitors, speakers, etc.; storage unit 708, such as disk, optical disk, etc.; and communication unit 709, such as network card, modem, wireless transceiver, etc. Communication unit 709 allows device 700 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0188] The computing unit 701 can be various general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above, such as model training methods and digital human driving methods. For example, in some embodiments, the model training methods and digital human driving methods can be implemented as computer software programs tangibly contained in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program can be loaded and / or installed on device 700 via ROM 702 and / or communication unit 709. When the computer program is loaded into RAM 703 and executed by the computing unit 701, one or more steps of the model training methods and digital human driving methods described above can be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform model training methods or digital human driving methods by any other suitable means (e.g., by means of firmware).

[0189] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0190] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0191] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0192] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0193] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with embodiments of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.

[0194] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact via communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other. Servers can be cloud servers, servers in distributed systems, or servers incorporating blockchain technology.

[0195] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.

[0196] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the principles of this disclosure should be included within the scope of protection of this disclosure.

Claims

1. A model training method, comprising: The sample image is input into the 3D face reconstruction model to obtain the 3D face reconstruction coefficients of the target object in the sample image; Based on the three-dimensional face reconstruction coefficients and the target basis of the target digital human topology, the three-dimensional face model of the target object under the target digital human topology is reconstructed; the target basis is obtained by migrating the parameterized basis of the three-dimensional face deformation statistical model 3DMM to the target digital human topology; The three-dimensional face model of the target object is projected into a two-dimensional space to obtain a two-dimensional face image of the target object; Based on the loss between the sample image and the two-dimensional face image, the parameters of the three-dimensional face reconstruction model are adjusted so that the training ends when the training convergence condition is met, and the facial expression capture model is obtained. Also includes: Obtain a reference face from the multiple faces in the 3DMM; The parameterized shape base of the reference face is registered with the digital human template under the target digital human topology to obtain the reference shape base of the reference face under the target digital human topology; the reference shape base is the expression of the reference face under the target digital human topology, which is obtained by changing the coordinates of the midpoint of the point cloud of the parameterized shape base of the reference face. Multiple points or all points in the reference shape base are determined as multiple reference vertices, and the point in the digital human template that is closest to the reference vertex is used as the target vertex to obtain multiple target vertices; Based on the multiple target vertices, the shape transfer method is used to transfer the shape base and texture base of the 3DMM to the target digital human topology; The target base of the target digital human topology includes a target shape base, a target texture base, and an expression base under the target digital human topology; Wherein, the shape basis of the 3DMM is transferred to the topology of the target digital human and serves as the target shape basis; The texture base of the 3DMM is migrated to the topology of the target digital human and serves as the target texture base.

2. The method of claim 1, wherein, The parameterized shape basis of the reference face is the average face of the 3DMM.

3. The method according to claim 1, wherein, Based on the multiple target vertices, a shape transfer method is used to transfer the shape basis of the 3DMM to the target digital human topology, including: For the vertices to be transferred in the shape base of the 3DMM, perform the following operations: Based on the position information of the multiple target vertices in the topology of the target digital human, the distances between the multiple target vertices are determined to obtain a first distance matrix; Based on the shape basis parameter matrix of the multiple target vertices under the target digital human topology and the first distance matrix, the first weight matrix is ​​calculated; The distances between the vertex to be migrated in the shape base and the plurality of target vertices are determined respectively to obtain the second distance matrix; Based on the second distance matrix and the first weight matrix, the shape basis parameters of the vertices to be migrated in the target digital human topology are determined.

4. The method according to claim 1, wherein, Based on the multiple target vertices, a shape transfer method is used to transfer the texture base of the 3DMM to the topology of the target digital human, including: For the vertices to be transferred in the texture base of the 3DMM, perform the following operations: Based on the position information of the multiple target vertices in the topology of the target digital human, the distances between the multiple target vertices are determined to obtain a first distance matrix; Based on the texture base parameter matrix of the multiple target vertices under the target digital human topology and the first distance matrix, the second weight matrix is ​​calculated; The distances between the vertex to be transferred in the texture base and the plurality of target vertices are determined respectively to obtain a third distance matrix; Based on the third distance matrix and the second weight matrix, the texture base parameters of the vertex to be migrated in the target digital human topology are determined.

5. The method according to claim 1, wherein, The step of transferring the shape basis and texture basis of the 3DMM to the target digital human topology based on the multiple target vertices using a shape transfer method includes: Based on the multiple target vertices, construct a shape-based transfer model and a texture-based transfer model; The shape basis of the 3DMM is input into the shape basis transfer model to transfer the shape basis of the 3DMM to the target digital human topology; and, The texture base of the 3DMM is input into the texture base transfer model to transfer the texture base of the 3DMM to the target digital human topology.

6. The method according to claim 1, wherein, When the number of vertices in the shape base of the 3DMM is less than the number of vertices in the target digital human topology, the shape base parameters of the vertices in the target digital human topology that are more numerous than those in the shape base of the 3DMM remain unchanged in the target base.

7. The method according to claim 1, further comprising: During the process of transferring the shape base of the 3DMM to the target digital human topology, the position of the target anchor point remains unchanged before and after the transfer.

8. The method according to any one of claims 1-7, wherein, The 3D face reconstruction coefficients include shape vectors, expression vectors, pose vectors, and texture vectors.

9. The method according to any one of claims 1-7, wherein, The step of projecting the three-dimensional face model of the target object into a two-dimensional space to obtain a two-dimensional face image of the target object includes: Differentiable rendering is used to project the three-dimensional face model of the target object into a two-dimensional space to obtain a two-dimensional face image of the target object.

10. The method of claim 9, further comprising: Before differentiable rendering, the unviewable points in the 3D face model of the target object are removed.

11. A digital human-driven method, applied to a facial expression capture model obtained by the method of any one of claims 1-10, comprising: Get the source image; The source image is input into the facial expression capture model to obtain the expression coefficients output by the facial expression capture model; Based on the expression coefficients and the expression base of the digital human to be driven, the expression of the digital human to be driven is controlled. The facial expression base of the digital human to be driven is the same as the topology of the target digital human.

12. A model training device, comprising: The first input module is used to input the sample image into the three-dimensional face reconstruction model to obtain the three-dimensional face reconstruction coefficients of the target object in the sample image; The reconstruction module is used to reconstruct a 3D face model of the target object under the target digital human topology based on the 3D face reconstruction coefficients and the target basis of the target digital human topology; the target basis is obtained by migrating the parameterized basis of the 3D face deformation statistical model 3DMM to the target digital human topology; The projection module is used to project the three-dimensional face model of the target object into a two-dimensional space to obtain a two-dimensional face image of the target object; An adjustment module is used to adjust the parameters of the three-dimensional face reconstruction model based on the loss between the sample image and the two-dimensional face image, so as to end the training and obtain the facial expression capture model when the training convergence condition is met. Also includes: The first acquisition module is used to acquire a reference face from the multiple faces in the 3DMM; The registration module is used to register the parameterized shape base of the reference face with the digital human template under the target digital human topology to obtain the reference shape base of the reference face under the target digital human topology; the reference shape base is the expression of the reference face under the target digital human topology, which is obtained by changing the coordinates of the midpoint of the point cloud of the parameterized shape base of the reference face. The migration module includes: The determining unit is used to determine multiple points or all points in the reference shape base as multiple reference vertices, and to use the point in the digital human template that is closest to the reference vertex as the target vertex to obtain multiple target vertices; The migration unit is used to migrate the shape base and texture base of the 3DMM to the target digital human topology based on the multiple target vertices using a shape migration method; The target base of the target digital human topology includes a target shape base, a target texture base, and an expression base under the target digital human topology; Wherein, the shape basis of the 3DMM is transferred to the topology of the target digital human and serves as the target shape basis; The texture base of the 3DMM is migrated to the topology of the target digital human and serves as the target texture base.

13. The apparatus according to claim 12, wherein, The parameterized shape basis of the reference face is the average face of the 3DMM.

14. The apparatus according to claim 12, wherein, The migration unit is specifically used for: For the vertices to be transferred in the shape base of the 3DMM, perform the following operations: Based on the position information of the multiple target vertices in the topology of the target digital human, the distances between the multiple target vertices are determined to obtain a first distance matrix; Based on the shape basis parameter matrix of the multiple target vertices under the target digital human topology and the first distance matrix, the first weight matrix is ​​calculated; The distances between the vertex to be migrated in the shape base and the plurality of target vertices are determined respectively to obtain the second distance matrix; Based on the second distance matrix and the first weight matrix, the shape basis parameters of the vertices to be migrated in the target digital human topology are determined.

15. The apparatus according to claim 12, wherein, The migration unit is specifically used for: For the vertices to be transferred in the texture base of the 3DMM, perform the following operations: Based on the position information of the multiple target vertices in the topology of the target digital human, the distances between the multiple target vertices are determined to obtain a first distance matrix; Based on the texture base parameter matrix of the multiple target vertices under the target digital human topology and the first distance matrix, the second weight matrix is ​​calculated; The distances between the vertex to be transferred in the texture base and the plurality of target vertices are determined respectively to obtain a third distance matrix; Based on the third distance matrix and the second weight matrix, the texture base parameters of the vertex to be migrated in the target digital human topology are determined.

16. The apparatus according to claim 12, wherein, The migration unit is specifically used for: Based on the multiple target vertices, construct a shape-based transfer model and a texture-based transfer model; The shape basis of the 3DMM is input into the shape basis transfer model to transfer the shape basis of the 3DMM to the target digital human topology; and, The texture base of the 3DMM is input into the texture base transfer model to transfer the texture base of the 3DMM to the target digital human topology.

17. The apparatus according to claim 12, wherein, When the number of vertices in the shape base of the 3DMM is less than the number of vertices in the target digital human topology, the shape base parameters of the vertices in the target digital human topology that are more numerous than those in the shape base of the 3DMM remain unchanged in the target base.

18. The apparatus of claim 12, further comprising: A retention module is used to maintain the position of the target anchor point unchanged before and after the migration during the migration of the shape base of the 3DMM to the target digital human topology.

19. The apparatus according to any one of claims 12-18, wherein, The 3D face reconstruction coefficients include shape vectors, expression vectors, pose vectors, and texture vectors.

20. The apparatus according to any one of claims 12-18, wherein, The projection module is specifically used to project the three-dimensional face model of the target object into a two-dimensional space using differentiable rendering to obtain a two-dimensional face image of the target object.

21. The apparatus of claim 20, further comprising: The removal module is used to remove unviewable points in the 3D face model of the target object before differentiable rendering.

22. A digital human driving device, applied to a facial expression capture model obtained by the device according to any one of claims 12-21, comprising: The second acquisition module is used to acquire the source image; The second input module is used to input the source image into the facial expression capture model to obtain the expression coefficients output by the facial expression capture model; The control module is used to control the expressions of the digital human to be driven based on the expression coefficients and the expression base of the digital human to be driven; The facial expression base of the digital human to be driven is the same as the topology of the target digital human.

23. An electronic device, comprising: At least one processor; as well as A memory communicatively connected to the at least one processor; wherein, The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.

24. A non-transitory computer-storeable medium storing computer instructions, wherein, The computer instructions are used to cause the computer to perform the method according to any one of claims 1-11.

25. A computer program product comprising a computer program that, when executed by a processor, implements the method according to any one of claims 1-11.