A method and system for fusing facial, jaw, and dental three-dimensional images

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By fusing 3D facial image data from intraoral scans, cone-beam CT, and different facial expressions using a multimodal registration algorithm, the problem of tooth-face integration in existing orthodontic solutions has been solved, realizing a visual model of integrated teeth and face and improving the planning effect of orthodontic solutions.

CN122244334APending Publication Date: 2026-06-19SICHUAN UNIV

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: SICHUAN UNIV
Filing Date: 2026-05-20
Publication Date: 2026-06-19

Smart Images

Figure CN122244334A_ABST

Patent Text Reader

Abstract

This invention discloses a method and system for fusing three-dimensional images of the face, jawbone, and teeth, relating to the fields of digital oral medicine and orthodontics. The method includes: obtaining a tooth surface model and a jawbone structure model based on intraoral scan data and cone-beam CT data, respectively; registering the tooth surface model and the jawbone structure model to obtain a tooth-jawbone three-dimensional model; obtaining a registration coordinate system based on the tooth-jawbone three-dimensional model and resting lip-closed three-dimensional facial image data; registering the resting lip-closed three-dimensional facial image data and a maximum occlusal smile three-dimensional facial image data based on the registration coordinate system to obtain a second spatial transformation matrix; fusing the tooth-jawbone three-dimensional model and the maximum occlusal smile three-dimensional facial image data based on the second spatial transformation matrix to obtain a tooth-face model; and obtaining orthodontic data based on the tooth-face model. This addresses the problem that existing orthodontic three-dimensional model fusion methods lack integration of facial aesthetic background, resulting in a lack of integrated tooth-face visualization functionality.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of digital oral medicine and orthodontics, specifically to a method and system for fusing three-dimensional images of the face, jawbone, and teeth. Background Technology

[0002] Invisible aligner technology, as a modern orthodontic technique, relies on high-precision three-dimensional digital models for accurate orthodontic planning. Currently, this field commonly uses a fusion of crown surface data acquired through intraoral scanning (IOS) and jawbone and root data acquired through cone-beam computed tomography (CBCT) to construct a digital model containing root-bone information. By replacing the distorted crown portions in CBCT images with high-precision IOS crown data, this model can simultaneously and accurately present the three-dimensional morphology of the jawbone, alveolar bone, root, and crown, providing crucial skeletal anatomical references for the final position of tooth movement.

[0003] However, existing orthodontic 3D planning schemes centered on root-bone models have a significant drawback: the lack of integration with facial aesthetic context. Many key aesthetic attributes of teeth, such as midline harmony, smile line curvature, tooth exposure, and gingival aesthetics, can only be accurately defined, evaluated, and optimized within the context of the patient's personalized 3D facial contours. This means that current mainstream digital orthodontic schemes are essentially biomechanical planning confined to the oral cavity, unable to integrate tooth and jawbone models with the patient's 3D facial morphology, lacking integrated tooth-face visualization capabilities. In other words, when planning the final tooth position, it is difficult to intuitively assess and precisely adjust the spatial matching relationship between the dentition and facial soft tissue contours in 3D space, limiting the formulation of orthodontic solutions to the intraoral environment and failing to ensure overall harmony with the external facial morphology. Summary of the Invention

[0004] To address the problem that existing orthodontic 3D model fusion methods lack integration of facial aesthetic backgrounds, resulting in a lack of integrated teeth-face visualization, this invention provides a fusion method for facial, jawbone, and teeth 3D images, the method comprising:

[0005] Acquire intraoral scan data, cone-beam CT data, and resting closed-lip three-dimensional facial images and maximum occlusal tooth-exposing smile three-dimensional facial images of the target object;

[0006] Based on the intraoral scan data and the cone-beam CT data, tooth surface models and jawbone structure models were obtained, respectively.

[0007] The tooth surface model and the jawbone structure model are registered to obtain a three-dimensional tooth-jawbone model;

[0008] The facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jawbone are registered with the resting lip-closed three-dimensional facial image data to obtain the first spatial transformation matrix; based on the first spatial transformation matrix, the resting lip-closed three-dimensional facial image data are transformed and aligned to the coordinate system of the three-dimensional model of the teeth and jawbone to obtain the registration coordinate system;

[0009] In the registration coordinate system, the transformed and aligned three-dimensional facial image data of the resting lip closed face is registered with the three-dimensional facial image data of the maximum occlusal smile, and the second spatial transformation matrix is obtained. Based on the second spatial transformation matrix, the three-dimensional facial image data of the maximum occlusal smile is transformed and aligned to the three-dimensional facial image data of the resting lip closed face, so that the three-dimensional facial image data of the maximum occlusal smile is transformed and aligned to the registration coordinate system.

[0010] Based on the registration coordinate system, the tooth-jawbone three-dimensional model is fused with the aligned maximum occlusal tooth-showing smile three-dimensional facial image data to obtain a tooth-face model;

[0011] Orthodontic data were obtained based on the aforementioned teeth-face model.

[0012] The core technical challenge in achieving fusion of 3D images of teeth and face lies in how to accurately anatomically register a 3D dental arch model in occlusion with 3D facial scan data. Currently, the two commonly acquired 3D facial scans fail to meet this requirement: one is a facial scan in a resting state, where the lips are closed and the teeth are almost invisible, making it impossible to establish a correspondence with the dental arch model; the other is a facial scan in a typical smiling state, where although the teeth are visible, the upper and lower jaws are usually in a non-occlusal contact state, resulting in a fundamental difference between the dental arch morphology and the dental arch model reconstructed from IOS data acquired in an occlusal state, leading to insufficient accuracy in direct fusion.

[0013] To address the aforementioned issues, this invention introduces the maximally occlusal, tooth-revealing smile, a unique facial expression state. This maximally occlusal, tooth-revealing smile requires the user to maintain maximum close contact between the upper and lower teeth in a centric occlusion while voluntarily and to the maximum extent contracting the levator labii superioris and other perioral muscles. This exposes the maximum range of teeth and gingiva under a stable and repeatable jaw relationship. Because the teeth are in close occlusion during the maximally occlusal, tooth-revealing smile state, which perfectly matches the user's dentition state during IOS and CBCT data acquisition, provides a unique and reliable common anatomical benchmark for the accurate registration of dental and facial models using 3D facial scanning. This can serve as a key medium for incorporating facial appearance context into digital orthodontic solutions.

[0014] This method introduces 3D facial images of a maximum occlusal, tooth-opening smile into digital orthodontic modeling. Utilizing four types of 3D image data from the same user: intraoral scan (IOS), cone-beam computed tomography (CBCT), a resting lip-closed 3D facial image (Face0), and a maximum occlusal, tooth-opening smile 3D facial image (Face1), a novel three-step fusion strategy of IOS-CBCT-Face0-Face1 is proposed. A staged multimodal registration algorithm is employed to fuse these data. The first step fuses IOS and CBCT to obtain a 3D digital model including the crown and jawbone anatomy. The second step fuses the 3D digital model obtained in the first step with the resting lip-closed 3D facial image, placing the user's static closed-mouth facial appearance and the tooth-skeleton model in the same coordinate system. The third step, within the same coordinate system, first aligns the resting lip-closed 3D facial image with the maximum occlusal, tooth-opening smile 3D facial image, and then aligns and fuses the maximum occlusal, tooth-opening smile 3D facial image with the 3D digital model obtained in the first step to obtain the tooth-face model. This integrated visualization model effectively fuses hard dental tissue data with soft facial features, enabling seamless integration of facial morphological elements into the digital orthodontic model. This achieves integrated three-dimensional visualization of teeth and face in orthodontic planning. In the second and third steps, a static, resting, closed-lip three-dimensional facial image is introduced as an intermediary, cleverly avoiding the challenge of directly registering CBCT images with a maximum occlusal, tooth-revealing smile three-dimensional image. This ensures the stability and reliability of the registration process and addresses the issue of significant differences between the soft tissue state of a maximum occlusal, tooth-revealing smile and the CBCT image (at rest or with a slightly closed mouth), which precludes direct registration. Furthermore, the three-step fusion process achieves automated fusion, reducing manual operations, minimizing errors, and improving efficiency and accuracy. This solves the problem that current methods for fusing maximum occlusal, tooth-revealing smile images with three-dimensional dental models rely on tedious and time-consuming manual operations, requiring step-by-step alignment of multi-source data, resulting in low efficiency and accuracy, making them unsuitable for large-scale applications.

[0015] Furthermore, the specific steps for obtaining the three-dimensional model of the tooth-jawbone include:

[0016] The first preset surface of the tooth surface model and the second preset surface of the jawbone structure model are spatially aligned.

[0017] The first crown data of the jawbone structure model is updated with the second crown data of the tooth surface model to obtain the three-dimensional tooth-jawbone model.

[0018] The high-precision IOS tooth surface is registered with the reconstructed tooth / jawbone surface in CBCT, so that the crown data of the oral scan can accurately replace the distorted crown part in CBCT. This results in a high-precision tooth-skeleton model with crown and jawbone anatomy in a unified coordinate system. This process ensures the accuracy of the dental arch model and provides a stable skeletal benchmark for subsequent fusion with facial data.

[0019] Furthermore, the specific steps for obtaining the jawbone structure model include:

[0020] Obtain the coarse segmentation model and the fine segmentation model;

[0021] The cone-beam CT data is input into the coarse segmentation model to obtain initial segmentation data; based on the region of interest, the tooth-bone sub-region and facial soft tissue surface data are extracted from the initial segmentation data; the tooth-bone sub-region is input into the fine segmentation model for fine segmentation to obtain the second preset surface and bone data;

[0022] The jawbone structure model is obtained based on the second preset surface, the bone data, and the facial soft tissue surface data.

[0023] Through a two-stage segmentation process of coarse and fine segmentation, teeth, bones, and facial soft tissue surfaces in CBCT are automatically extracted. This process ensures the accuracy of the dental arch model and provides a stable skeletal benchmark for subsequent fusion with facial data.

[0024] Furthermore, the specific steps for obtaining the registration coordinate system include:

[0025] Based on the first stable region, the facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jaws are registered with the resting closed-lip three-dimensional facial image data, and the first spatial transformation matrix is obtained. Based on the first spatial transformation matrix, the resting closed-lip three-dimensional facial image data is transformed and aligned to the coordinate system of the three-dimensional model of the teeth and jaws to obtain a registration coordinate system based on the coordinate system of the three-dimensional model of the teeth and jaws.

[0026] Since CBCT scans can extract the user's facial soft tissue surface, and the resting closed-lip 3D facial image contains the corresponding facial surface, this method selects a common stable region between the two for alignment. This alignment process solves for a first spatial transformation matrix (including rotation and translation vectors), and applies this matrix to the resting closed-lip 3D facial image data, transforming it to the coordinate system of the tooth-jawbone 3D model (CBCT). Through matching stable regions, this method precisely overlaps the facial soft tissue surface reconstructed by CBCT with the corresponding soft tissue surface of the resting closed-lip 3D facial image, and unifies the teeth and jawbone structures in the same CBCT coordinate system with the resting closed-lip 3D facial image into the same coordinate system. This fusion places the user's static closed-mouth face and the tooth-bone model in the same coordinate system, laying the foundation for the next step of fusing a maximum occlusal, tooth-revealing smile.

[0027] Furthermore, the specific steps for obtaining the teeth-face model include:

[0028] In the registration coordinate system, based on the second stable region, the resting lip-closed three-dimensional facial image data and the maximum occlusal tooth-showing smile three-dimensional facial image data are registered, and the second spatial transformation matrix is obtained; based on the second spatial transformation matrix, the maximum occlusal tooth-showing smile three-dimensional facial image data is transformed and aligned to the resting lip-closed three-dimensional facial image data.

[0029] Based on the registration coordinate system, the tooth-jawbone 3D model is fused with the aligned 3D facial image data of the maximum occlusal tooth-showing smile to obtain the tooth-face model.

[0030] A stable region in the upper part of the face is selected as the matching basis, and the 3D facial image of a maximum occlusal, tooth-revealing smile is aligned with the 3D facial image of a resting, closed-lip face. During this alignment process, a second spatial transformation matrix is obtained and applied to the 3D facial image data of the maximum occlusal, tooth-revealing smile, transforming it to the registration coordinate system of the resting, closed-lip face image data. Using the resting, closed-lip face image as an intermediate bridge, the 3D facial image of the maximum occlusal, tooth-revealing smile is fused with CBCT, resolving the issue that the facial soft tissue state of the maximum occlusal, tooth-revealing smile differs significantly from that obtained from CBCT (resting expression or slight closure), making direct registration impossible.

[0031] Furthermore, the first stable region and the second stable region are obtained based on a stable region approach, wherein the stable region approach is as follows:

[0032] The first input data and the second input data are segmented to obtain several first sub-regions and several second sub-regions, respectively.

[0033] Obtain the heatmap coordinates of facial key points in the first input data and the second input data;

[0034] A stable region is obtained based on the first sub-region, the second sub-region, and the heatmap coordinates.

[0035] Specifically, when obtaining the first stable region, the first input data is the facial soft tissue surface data reconstructed by CBCT in the three-dimensional model of the teeth and jawbone, and the second input data is the resting lip-closed three-dimensional facial image data; when obtaining the second stable region, the first input data is the resting lip-closed three-dimensional facial image data, and the second input data is the maximum occlusal tooth-revealing smile three-dimensional facial image data.

[0036] This invention introduces an artificial intelligence visual model to automatically identify stable regions and key alignment markers on the face. It employs a dual-branch deep neural network architecture: one branch segments different facial regions; the other predicts the heatmap coordinates of key facial points. By combining the two outputs, regions that maintain a relatively stable shape under different expressions are identified in both the resting lip-closed 3D facial image and the maximum occlusion tooth-showing smile 3D facial image, serving as stable region references for subsequent registration. Automatic identification of stable regions ensures that registration between expressions focuses on reliable areas, thereby improving the robustness and accuracy of the registration.

[0037] Furthermore, based on the first stable region and the registration method, the facial soft tissue surface data reconstructed by the cone-beam CT in the three-dimensional model of the teeth and jaws are spatially aligned with the resting lip-closed three-dimensional facial image data. The registration method is as follows:

[0038] Based on the stable region, the first fast point feature histogram feature of the first point cloud of the third input data is obtained, and the second fast point feature histogram feature of the second point cloud of the fourth input data is obtained.

[0039] Based on the first fast point feature histogram feature and the second fast point feature histogram feature, the first point cloud and the second point cloud are matched to obtain the initial feature point correspondence.

[0040] Based on the random sampling consensus algorithm, the correspondence between the initial feature points is estimated to obtain the initial rigid body transformation matrix;

[0041] Based on the initial rigid body transformation matrix, the fourth input data is aligned to the third input data to obtain a coarse registration result;

[0042] Based on the coarse registration result, the third input data and the fourth input data are spatially aligned;

[0043] The third input data is the facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jawbone, and the fourth input data is the resting lip-closed three-dimensional facial image data; the first point cloud is the point cloud corresponding to the third input data, and the second point cloud is the point cloud corresponding to the fourth input data.

[0044] The multimodal registration algorithm first utilizes FPFH (Fast Point Feature Histogram) feature matching combined with the RANSAC algorithm to calculate the initial rigid body transformation estimate between the resting closed-lip 3D surface image and CBCT, achieving global coarse registration. This random sampling feature matching method avoids the risk of falling into incorrect registration due to local extrema.

[0045] Furthermore, based on the coarse registration result, the specific steps for spatially aligning the third input data and the fourth input data include:

[0046] Based on the iterative nearest point algorithm, the regional weighting rule, and the coarse registration result, the first point cloud of the third input data and the second point cloud of the fourth input data are spatially aligned.

[0047] The region weighting rule is as follows: the weight of the stable region is higher than the weight of the non-stable region.

[0048] In the fine registration stage, the concept of stable region weighted ICP is introduced, which means that the ICP iterative nearest point algorithm is performed within the aforementioned stable region. A region weighting strategy (i.e., assigning higher weights to corresponding points within the stable region) is introduced to improve registration accuracy, which solves the problem of large registration errors of facial soft tissue under different expression states and improves the accuracy and robustness of fusion.

[0049] Furthermore, in the registration coordinate system, based on the second stable region and the registration method, the resting closed-lip three-dimensional facial image data and the maximum occlusal tooth-revealing smile three-dimensional facial image data are registered and aligned, and the second spatial transformation matrix is obtained by solving.

[0050] The present invention also provides a fusion system for three-dimensional images of face, jawbone, and teeth, the system comprising:

[0051] Data unit: used to acquire intraoral scan data, cone-beam CT data, resting lip-closed 3D facial image data, and maximum occlusal tooth-exposing smile 3D facial image data of the target object;

[0052] Model unit: used to obtain tooth surface model and jawbone structure model based on the intraoral scan data and the cone-beam CT data, respectively;

[0053] Registration unit: used to register the tooth surface model and the jawbone structure model to obtain a three-dimensional tooth-jawbone model;

[0054] And to register the facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jawbone with the resting closed-lip three-dimensional facial image data, and solve for the first spatial transformation matrix; based on the first spatial transformation matrix, transform and align the resting closed-lip three-dimensional facial image data to the coordinate system of the three-dimensional model of the teeth and jawbone, and obtain the registration coordinate system;

[0055] And for registering the transformed and aligned three-dimensional facial image data of the resting lip closed face with the three-dimensional facial image data of the maximum occlusal smile under the registration coordinate system, and solving for the second spatial transformation matrix; based on the second spatial transformation matrix, transforming and aligning the three-dimensional facial image data of the maximum occlusal smile to the three-dimensional facial image data of the resting lip closed face, so that the three-dimensional facial image data of the maximum occlusal smile is transformed and aligned to the registration coordinate system;

[0056] And for fusing the tooth-jawbone three-dimensional model with the aligned maximum occlusal tooth-showing smile three-dimensional facial image data based on the registration coordinate system to obtain a tooth-face model;

[0057] Results unit: used to obtain orthodontic data based on the teeth-face model.

[0058] The principle and effect of this system are similar to those of this method, so no further details will be provided for this system.

[0059] One or more technical solutions provided by this invention have at least the following technical effects or advantages:

[0060] This invention utilizes four types of 3D image data from the same user: intraoral scan (IOS), cone-beam computed tomography (CBCT), a resting 3D facial image, and a 3D facial image with maximum occlusion and a tooth-showing smile. By employing a staged multimodal registration algorithm, these data are fused. Using the resting lip-closed 3D facial image as an intermediary, the 3D facial image with maximum occlusion and a tooth-showing smile is fused with the CBCT, achieving the fusion of 3D digital dentition (IOS) and the facial image with maximum occlusion and a tooth-showing smile. This results in an integrated teeth-face visualization model, developing a method that can automatically, accurately, and efficiently achieve... This technology integrates intraoral scan (IOS) dental data, cone-beam computed tomography (CBCT) jawbone data, and facial scan data under maximum occlusion with a tooth-showing smile into a three-dimensional fusion system. This allows for the seamless integration of facial morphological elements into a digital orthodontic model, enabling users to visually view tooth alignment and smile effects on their real three-dimensional face. It also allows for the planning of tooth movement positions within the context of facial appearance, improving the quality of visually guided decision-making in orthodontic planning. This lays the technical foundation for truly achieving morphology-driven precision orthodontic planning and fills the gap in the appearance dimension of existing digital dental models. Attached Figure Description

[0061] The accompanying drawings, which are provided to further illustrate embodiments of the invention and constitute a part of this invention, are not intended to limit the scope of the invention.

[0062] Figure 1 This is a flowchart illustrating a method for fusing three-dimensional images of a face, jawbone, and teeth according to the present invention.

[0063] Figure 2 It is a comparison of the positional relationship between the face and teeth in a normal posed smile and a maximum bite smile with teeth showing;

[0064] Figure 3 This is a simplified flowchart illustrating the four-modal, three-step fusion method of IOS, CBCT, resting lip-closed 3D facial image, and maximum occlusal tooth-showing smile 3D facial image;

[0065] Figure 4 This is a schematic diagram of the specific process of the four-modal three-step fusion method of IOS, CBCT, resting closed-lip 3D facial image and maximum occlusion open-tooth smile 3D facial image; where (A) represents the segmentation process of CBCT data, (B) represents the preprocessing process of facial mesh of resting closed-lip 3D facial image and maximum occlusion open-tooth smile 3D facial image, (C) represents the automatic recognition process of upper facial stable area of resting closed-lip 3D facial image and maximum occlusion open-tooth smile 3D facial image, and (D) represents the specific process of obtaining the teeth-face model. Detailed Implementation

[0066] To better understand the above-mentioned objectives, features, and advantages of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that, where there is no conflict, the embodiments of the present invention and the features thereof can be combined with each other.

[0067] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and therefore the scope of protection of the invention is not limited to the specific embodiments disclosed below.

[0068] Example 1

[0069] refer to Figures 1-4 This embodiment provides a method for fusing three-dimensional images of facial features, jawbone, and teeth. The method includes:

[0070] Acquire intraoral scan data, cone-beam CT data, and resting closed-lip three-dimensional facial images and maximum occlusal tooth-exposing smile three-dimensional facial images of the target object;

[0071] Based on the intraoral scan data and the cone-beam CT data, tooth surface models and jawbone structure models were obtained, respectively.

[0072] The tooth surface model and the jawbone structure model are registered to obtain a three-dimensional tooth-jawbone model;

[0073] The specific steps for obtaining the three-dimensional model of the tooth-jawbone include:

[0074] The first preset surface of the tooth surface model and the second preset surface of the jawbone structure model are spatially aligned; in this embodiment, the first preset surface can be the tooth surface and the second preset surface can be the tooth / jawbone surface.

[0075] The first crown data of the jawbone structure model is updated with the second crown data of the tooth surface model to obtain the tooth-jawbone three-dimensional model, that is, a three-dimensional digital model containing the crown and jawbone anatomy.

[0076] The three-dimensional tooth-jawbone model also includes facial soft tissue surface data from the jawbone structure model.

[0077] The specific steps for obtaining the jawbone structure model include:

[0078] Obtain a coarse segmentation model and a fine segmentation model; wherein, the coarse segmentation model and the fine segmentation model can be obtained by training based on labeled CBCT training data, the CBCT training data including CBCT volume data and its corresponding segmentation labels for anatomical structures such as teeth, jawbones and facial soft tissues; the coarse segmentation model is used to output multi-class initial segmentation results, and the fine segmentation model is used to refine the segmentation of the region of interest and output fine segmentation results for teeth / bones; the segmentation results can be further processed through surface reconstruction and post-processing to obtain the second preset surface and the surface data of the facial soft tissue.

[0079] The cone-beam CT data is input into the coarse segmentation model to obtain initial segmentation data; based on the region of interest, the tooth-bone sub-region and facial soft tissue surface data are extracted from the initial segmentation data; the tooth-bone sub-region is input into the fine segmentation model for fine segmentation to obtain the second preset surface and bone data;

[0080] The jawbone structure model is obtained based on the second preset surface, the bone data, and the facial soft tissue surface data;

[0081] The segmented CBCT tooth surface (i.e. the second preset surface) is rigidly registered with the IOS mesh, so that the high-precision crown data of IOS is fused with the root / bone data of CBCT to obtain a high-precision tooth-bone model in a unified coordinate system.

[0082] The facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jawbone are registered with the resting lip-closed three-dimensional facial image data to obtain the first spatial transformation matrix; based on the first spatial transformation matrix, the resting lip-closed three-dimensional facial image data are transformed and aligned to the coordinate system of the three-dimensional model of the teeth and jawbone to obtain the registration coordinate system;

[0083] In the registration coordinate system, the transformed and aligned three-dimensional facial image data of the resting lip closed face is registered with the three-dimensional facial image data of the maximum occlusal smile, and the second spatial transformation matrix is obtained. Based on the second spatial transformation matrix, the three-dimensional facial image data of the maximum occlusal smile is transformed and aligned to the three-dimensional facial image data of the resting lip closed face, so that the three-dimensional facial image data of the maximum occlusal smile is transformed and aligned to the registration coordinate system.

[0084] The specific steps for obtaining the registration coordinate system include:

[0085] Based on the first stable region (mainly located in the midface, such as the nose and forehead), the facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jaws are spatially aligned with the resting lip-closed three-dimensional facial image data, and the first spatial transformation matrix is obtained. Based on the first spatial transformation matrix, the resting lip-closed three-dimensional facial image data is transformed and aligned to the coordinate system of the three-dimensional model of the teeth and jaws, and a registration coordinate system with the coordinate system of the three-dimensional model of the teeth and jaws as the reference is obtained.

[0086] Based on the registration coordinate system, the tooth-jawbone three-dimensional model is fused with the aligned maximum occlusal tooth-showing smile three-dimensional facial image data to obtain a tooth-face model;

[0087] The specific steps for obtaining the teeth-face model include:

[0088] In the registration coordinate system, based on the second stable region, the resting lip-closed three-dimensional facial image data and the maximum occlusal tooth-showing smile three-dimensional facial image data are registered, and the second spatial transformation matrix is obtained; based on the second spatial transformation matrix, the maximum occlusal tooth-showing smile three-dimensional facial image data is transformed and aligned to the resting lip-closed three-dimensional facial image data.

[0089] Based on the registration coordinate system, the tooth-jawbone 3D model is fused with the aligned 3D facial image data of the maximum occlusal tooth-showing smile to obtain the tooth-face model.

[0090] Based on the registration coordinate system, the three-dimensional tooth-jaw model and the three-dimensional facial image data of the maximum occlusal smile are fused to obtain the tooth-face model. By first solving the first spatial transformation matrix and applying it to the resting lip-closed three-dimensional facial image data to transform it to the CBCT coordinate system, and then solving the second spatial transformation matrix and applying it to the maximum occlusal smile three-dimensional facial image data to transform it to the registration coordinate system, the maximum occlusal smile three-dimensional facial image data is finally transformed to the CBCT coordinate system via the resting lip-closed three-dimensional facial image, thus achieving the fusion of the maximum occlusal smile three-dimensional facial image with the IOS / CBCT tooth-jaw model. Orthodontic data is obtained based on the tooth-face model. The orthodontic data refers to the measurement and analysis results calculated based on the tooth-face model for orthodontic treatment plan design, appliance design, and efficacy evaluation. These data may include, but are not limited to, the spatial position / posture parameters of teeth and dentition (such as tooth axis angle, torsion angle, tooth displacement vector, etc.), occlusal relationship parameters (such as overbite, overjet, cusp-fossa relationship, occlusal contact area or contact point distribution, etc.), and facial aesthetic parameters (such as lip position, smile arc, tooth exposure, gum exposure, nasolabial angle, facial proportion, etc.), as well as the differences / changes between these parameters and the treatment goals or the state before and after treatment.

[0091] The first stable region and the second stable region are obtained based on a stable region method, wherein the stable region method is as follows:

[0092] The first input data and the second input data are segmented to obtain several first sub-regions and several second sub-regions, respectively.

[0093] Obtain the heatmap coordinates of facial key points in the first input data and the second input data;

[0094] A stable region is obtained based on the first sub-region, the second sub-region, and the heatmap coordinates.

[0095] Specifically, when obtaining the first stable region, the first input data is the facial soft tissue surface data reconstructed by CBCT in the three-dimensional model of the teeth and jawbone, and the second input data is the resting lip-closed three-dimensional facial image data; when obtaining the second stable region, the first input data is the resting lip-closed three-dimensional facial image data, and the second input data is the maximum occlusal tooth-revealing smile three-dimensional facial image data.

[0096] The facial key points include at least the inner or outer canthi points, the subnasal point, the nasal tip point, the corner of the mouth point, and the upper lip peak point. In the maximum occlusal, tooth-revealing smile 3D facial image data, several crown cusps or incisal edge points can also be selected as auxiliary key points. The heatmap coordinates are the coordinate values in the probability heatmap output for each key point. Sub-pixel / sub-voxel coordinates can be obtained by taking the maximum response position (argmax) on the heatmap or by using soft-argmax. If key point detection is performed on a 2D projected image, the 2D coordinates are back-projected onto a 3D mesh or point cloud by combining the camera parameters / projection mapping relationship that generated the projection to obtain the corresponding 3D coordinates.

[0097] If an existing pre-trained dual-branch deep neural network architecture is used: one branch is based on a multi-view mask classification model to segment different regions of the face; the other branch is based on a high-resolution network to predict the heatmap coordinates of facial key points. By combining the two outputs, regions that maintain a relatively stable shape under different expressions in the resting lip-closed 3D facial image and the maximum occlusal tooth-showing smile 3D facial image are identified as stable regions for subsequent registration.

[0098] Specifically, based on the first stable region and the registration method, the facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jawbone are spatially aligned with the resting closed-lip three-dimensional facial image data. The registration method is as follows:

[0099] Based on the stable region, the first fast point feature histogram feature of the first point cloud of the third input data is obtained, and the second fast point feature histogram feature of the second point cloud of the fourth input data is obtained.

[0100] Based on the first fast point feature histogram feature and the second fast point feature histogram feature, the first point cloud and the second point cloud are matched to obtain the initial feature point correspondence.

[0101] Based on the random sampling consensus algorithm, the correspondence between the initial feature points is estimated to obtain the initial rigid body transformation matrix;

[0102] Based on the initial rigid body transformation matrix, the fourth input data is aligned to the third input data to obtain a coarse registration result;

[0103] Based on the coarse registration result, the third input data and the fourth input data are spatially aligned;

[0104] The third input data consists of facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jawbone, and the fourth input data consists of the resting lip-closed three-dimensional facial image data; the first point cloud is the point cloud corresponding to the third input data, and the second point cloud is the point cloud corresponding to the fourth input data. For example, using FPFH (Fast Point Feature Histogram) feature matching combined with the RANSAC algorithm, the initial rigid body transformation estimate between the resting lip-closed three-dimensional facial image and CBCT is calculated to achieve global coarse registration. This random sampling feature matching method avoids the risk of falling into incorrect registration due to local extrema.

[0105] The specific steps for spatially aligning the third input data and the fourth input data based on the coarse registration result include:

[0106] Based on the iterative nearest point algorithm, the regional weighting rule, and the coarse registration result, the first point cloud of the third input data and the second point cloud of the fourth input data are spatially aligned.

[0107] The region weighting rule is as follows: the weight of the stable region is higher than the weight of the non-stable region.

[0108] In the fine registration stage, the ICP iterative nearest point algorithm is executed within the aforementioned stable region, and a regional weighting strategy (i.e., assigning higher weights to corresponding points within the stable region) is introduced to improve registration accuracy.

[0109] Specifically, in the registration coordinate system, based on the second stable region and the registration method, the resting closed-lip three-dimensional facial image data and the maximum occlusal tooth-revealing smile three-dimensional facial image data are registered and aligned, and the second spatial transformation matrix is obtained by solving.

[0110] Example 2

[0111] refer to Figures 1-4 Based on Example 1, this example specifically illustrates the process of obtaining the teeth-face model:

[0112] Taking the data of an adult orthodontic user as an example, the user simultaneously possesses the following four types of 3D image data: high-precision intraoral scan dental model (IOS), CBCT images covering the jawbone, 3D facial scan of the user's static natural closed-mouth expression (resting lip closed 3D facial image), and 3D facial scan of the user in the state of maximum spontaneous smile (maximum occlusal tooth-showing smile 3D facial image).

[0113] refer to Figure 4 Part (A) illustrates the two-stage segmentation process of CBCT images:

[0114] 1. Data Preparation and CBCT Segmentation: The user's raw CBCT data is sequentially input into the coarse segmentation model and the fine segmentation model. First, preliminary segmentation of multiple anatomical structures is performed on the CBCT data to obtain the approximate regions of teeth, bones, and soft tissues. Subsequently, fine segmentation is performed on the regions of interest containing teeth and alveolar bone to obtain high-precision 3D reconstructions of teeth, jawbones, and facial surfaces. The segmentation results are post-processed to obtain the 3D mesh surfaces of teeth and bones.

[0115] refer to Figure 4 Part (D) illustrates the process of performing coarse registration and weighted ICP fine registration within the stable region, aligning the resting closed-lip 3D image with the CBCT and the maximum occlusal tooth-showing smile 3D image with the resting closed-lip 3D image, ultimately outputting a fused tooth-face model:

[0116] 2. IOS and CBCT Fusion: The CBCT tooth / bone surfaces obtained from step 1 are registered with the user's IOS dental mesh. The existing Iterative Closest Point (ICP) algorithm and its variants are used to automatically identify the correspondence between the IOS and CBCT dental arches, and a rigidity transformation is calculated to align them. During registration, the ICP algorithm can be used iteratively to optimize the matching error, ensuring partial overlap between the IOS and CBCT crowns. After fusion, a 3D tooth-bone composite model of the user is output, including high-precision crowns and precisely matched tooth roots and jawbone structures.

[0117] refer to Figure 4 Parts (B) and (C) illustrate the preprocessing of facial meshes for a resting 3D facial image with closed lips and a 3D facial image with maximum occlusion and exposed teeth, respectively, as well as the automatic identification of the upper facial stability region:

[0118] 3. Fusion of CBCT and Resting Lip-Closed 3D Facial Image: The facial soft tissue surfaces reconstructed by CBCT in the 3D dental-bone composite model obtained in step 2 are registered with the facial mesh of the resting lip-closed 3D facial image. First, a pre-trained semantic recognition model automatically marks stable regions (mainly located in the midface, such as the nose and forehead) on the resting lip-closed 3D facial image. Then, FPFH feature matching and RANSAC algorithms are used to calculate the initial alignment pose (coarse registration) of the resting lip-closed 3D facial image relative to the 3D dental-bone composite model. Next, the ICP algorithm is run within the identified stable regions for fine alignment, accurately fitting the stable facial regions of the resting lip-closed 3D facial image with the corresponding surfaces of the facial soft tissue reconstructed by CBCT. This step unifies the user's static facial appearance with their dental-bone model in the same coordinate system.

[0119] 4. Fusion of the 3D facial image with a resting closed-lip expression and a 3D facial image with a maximum occlusion and open teeth: The facial mesh of the 3D facial image with a maximum occlusion and open teeth is fused with the mesh of the resting closed-lip facial image from the previous step. First, stable regions of the upper face (such as the area around the eye socket to the forehead) are identified on the 3D facial image with a maximum occlusion and open teeth using an existing pre-trained deep learning network. Then, rigid body registration and ICP fine-tuning of the 3D facial image with a maximum occlusion and open teeth are performed on the resting closed-lip facial image as a reference. By matching only in stable regions, the faces of the two expressions are accurately superimposed. Finally, the 3D facial image with a maximum occlusion and open teeth establishes an indirect registration relationship with the 3D tooth-bone combination model globally (3D facial image with maximum occlusion and open teeth → 3D facial image with resting closed lips → CBCT). After this step is completed, a 3D model of the face with teeth and maximum occlusion showing teeth in a smile is output (i.e., a visualization model of the 3D facial image of the maximum occlusion showing teeth in a smile that has been embedded in the IOS dental arch). Through this model, doctors can see the coordination between the alignment of the teeth and facial features when the user smiles, such as the relationship between the maxillary midline offset and the facial midline, the amount of gum exposure when smiling, etc., which provides an intuitive aesthetic evaluation basis for orthodontic technical solutions.

[0120] Example 3

[0121] refer to Figures 1-4 Based on the above embodiments, this embodiment also provides a fusion system for three-dimensional images of face, jawbone, and teeth, the system comprising:

[0122] Data Unit: Used to acquire IOS data of the target object via an intraoral scanner, CBCT data via a cone beam computed tomography (CBCT) system, resting lip-closed 3D facial image data via a 3D facial scanner (such as the existing 3dMDface system, Artec Eva, and EinScan H), and 3D facial image data of a maximum occlusal smile via a 4D dynamic facial capture system (such as the existing DI4D system and 3dMDdynamic system). The data is then transmitted to a display device (such as an existing medical-grade grayscale display, color display, 3Shape Dental System, and Dolphin Imaging) for visualization.

[0123] The above data is input into the processor, which has built-in tooth surface reconstruction model (used to reconstruct tooth surface models, which can be obtained by training with existing artificial intelligence and machine learning algorithms), coarse segmentation model, fine segmentation model, registration model (used to register data, which can be obtained by training with existing artificial intelligence and machine learning algorithms), and fusion model (used to fuse data, which can be obtained by training with existing artificial intelligence and machine learning algorithms). The processor is connected to the display to display the processing results of the above models in real time.

[0124] Model unit: used to reconstruct a model using the tooth surface, and to obtain a tooth surface model and a jawbone structure model based on the IOS data and the CBCT data, respectively;

[0125] Registration unit: used to register the tooth surface model and the jawbone structure model based on the registration model to obtain a three-dimensional tooth-jawbone model;

[0126] The specific steps for obtaining the three-dimensional model of the tooth-jawbone include:

[0127] The first preset surface of the tooth surface model and the second preset surface of the jawbone structure model are spatially aligned; in this embodiment, the first preset surface can be the tooth surface and the second preset surface can be the tooth / jawbone surface.

[0128] The first crown data of the jawbone structure model is updated with the second crown data of the tooth surface model to obtain the tooth-jawbone three-dimensional model, that is, a three-dimensional digital model containing the crown and jawbone anatomy.

[0129] The specific steps for obtaining the jawbone structure model include:

[0130] Obtain a coarse segmentation model and a fine segmentation model (which can be a pre-trained model), wherein the coarse segmentation model and the fine segmentation model are three-dimensional medical image segmentation models obtained based on deep learning training;

[0131] The cone-beam CT data is input into the coarse segmentation model to obtain initial segmentation data; based on the region of interest, the tooth-bone sub-region and facial soft tissue surface data are extracted from the initial segmentation data; the tooth-bone sub-region is input into the fine segmentation model for fine segmentation to obtain the second preset surface and bone data;

[0132] The segmented CBCT tooth surface (i.e. the second preset surface) is rigidly registered with the IOS mesh, so that the high-precision crown data of IOS is fused with the root / bone data of CBCT to obtain a high-precision tooth-bone model in a unified coordinate system.

[0133] The jawbone structure model is obtained based on the second preset surface, the bone data, and the facial soft tissue surface data.

[0134] And to register the facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jawbone with the resting closed-lip three-dimensional facial image data, and solve for the first spatial transformation matrix; based on the first spatial transformation matrix, transform and align the resting closed-lip three-dimensional facial image data to the coordinate system of the three-dimensional model of the teeth and jawbone, and obtain the registration coordinate system;

[0135] The specific steps for obtaining the registration coordinate system include:

[0136] Based on the first stable region (mainly located in the midface, such as the nose and forehead), the facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jaws are spatially aligned with the resting closed-lip three-dimensional facial image data to obtain the first spatial transformation matrix; based on the first spatial transformation matrix, the resting closed-lip three-dimensional facial image data is transformed to the coordinate system of the three-dimensional model of the teeth and jaws to obtain the registration coordinate system.

[0137] Based on the registration coordinate system, the tooth-jawbone three-dimensional model is fused with the aligned maximum occlusal tooth-showing smile three-dimensional facial image data to obtain a tooth-face model;

[0138] And for fusing the tooth-jawbone three-dimensional model with the aligned maximum occlusal tooth-showing smile three-dimensional facial image data based on the registration coordinate system to obtain a tooth-face model;

[0139] Results unit: used to obtain orthodontic data based on the teeth-face model.

[0140] The specific steps for obtaining the teeth-face model include:

[0141] In the registration coordinate system, based on the second stable region and the registration coordinate system, the resting closed-lip 3D facial image data and the maximum occlusal tooth-revealing smile 3D facial image data are registered, and the second spatial transformation matrix is obtained. Based on the second spatial transformation matrix, the maximum occlusal tooth-revealing smile 3D facial image is precisely aligned with the resting closed-lip 3D facial image. In this embodiment, since the maximum occlusal tooth-revealing smile 3D facial image and the resting closed-lip 3D facial image belong to the same user's facial scans under different expressions, there is no obvious deformation caused by expression in the upper facial regions such as the root of the nose, the middle part of the bridge of the nose, and the forehead, which can be regarded as facial stable regions. Therefore, the second stable region can be selected as the stable region of the upper face as the matching basis.

[0142] Based on the registration coordinate system, the three-dimensional model of the teeth and jawbone and the three-dimensional facial image data of the maximum occlusal smile are fused to obtain the teeth-face model. The first spatial transformation matrix is solved and applied to the resting lip-closed three-dimensional facial image data to transform it to the CBCT coordinate system. Then, the second spatial transformation matrix is solved and applied to the maximum occlusal smile three-dimensional facial image data to transform it to the registration coordinate system. Finally, the maximum occlusal smile three-dimensional facial image data is transformed to the CBCT coordinate system via the resting lip-closed three-dimensional facial image, thus achieving the fusion of the maximum occlusal smile three-dimensional facial image with the IOS / CBCT teeth and jaw model.

[0143] Orthodontic data is obtained based on the tooth-face model, and the tooth-face model is output to a display device for visualization.

[0144] The first stable region and the second stable region are obtained based on a stable region method, wherein the stable region method is as follows:

[0145] The first input data and the second input data are segmented to obtain several first sub-regions and several second sub-regions, respectively.

[0146] Obtain the heatmap coordinates of key facial points (such as the outer canthi of the left and right eyes, the tip of the nose, the peak of the upper lip, and several crown apex points) in the first input data and the second input data;

[0147] A stable region is obtained based on the first sub-region, the second sub-region, and the heatmap coordinates.

[0148] Specifically, when obtaining the first stable region, the first input data is the facial soft tissue surface data reconstructed by CBCT in the three-dimensional model of the teeth and jawbone, and the second input data is the resting lip-closed three-dimensional facial image data; when obtaining the second stable region, the first input data is the resting lip-closed three-dimensional facial image data, and the second input data is the maximum occlusal tooth-revealing smile three-dimensional facial image data.

[0149] If an existing pre-trained dual-branch deep neural network architecture is used: one branch is based on a multi-view mask classification model to segment different regions of the face; the other branch is based on a high-resolution network to predict the heatmap coordinates of facial key points. By combining the two outputs, regions that maintain a relatively stable shape under different expressions in the resting lip-closed 3D facial image and the maximum occlusal tooth-showing smile 3D facial image are identified as stable regions for subsequent registration.

[0150] The heatmap coordinates can be obtained by taking the maximum response position (argmax) on the key point heatmap or by using the soft-argmax expectation. When key point detection is implemented based on a two-dimensional rendered projection image, the two-dimensional coordinates can be back-projected to the corresponding three-dimensional mesh / point cloud surface according to the projection mapping relationship to obtain the three-dimensional coordinates.

[0151] Specifically, based on the first stable region and the registration method, the facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jawbone are spatially aligned with the resting closed-lip three-dimensional facial image data. The registration method is as follows:

[0152] Based on the stable region, the first fast point feature histogram feature of the first point cloud of the third input data is obtained, and the second fast point feature histogram feature of the second point cloud of the fourth input data is obtained.

[0153] Based on the first fast point feature histogram feature and the second fast point feature histogram feature, the first point cloud and the second point cloud are matched to obtain the initial feature point correspondence.

[0154] Based on the random sampling consensus algorithm, the correspondence between the initial feature points is estimated to obtain the initial rigid body transformation matrix;

[0155] Based on the initial rigid body transformation matrix, the fourth input data is aligned to the third input data to obtain a coarse registration result;

[0156] Based on the coarse registration result, the third input data and the fourth input data are spatially aligned;

[0157] The third input data is the facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jawbone, and the fourth input data is the resting lip-closed three-dimensional facial image data; the first point cloud is the point cloud corresponding to the third input data, and the second point cloud is the point cloud corresponding to the fourth input data.

[0158] By combining Fast Point Feature Histogram (FPFH) feature matching with the RANSAC algorithm, the initial rigid body transformation estimate between the resting closed-lip 3D surface image and CBCT can be calculated, achieving global coarse registration. This random sampling feature matching method avoids the risk of falling into incorrect registration due to local extrema.

[0159] The specific steps for spatially aligning the third input data and the fourth input data based on the coarse registration result include:

[0160] Based on the iterative nearest point algorithm, the regional weighting rule, and the coarse registration result, the first point cloud of the third input data and the second point cloud of the fourth input data are spatially aligned.

[0161] The region weighting rule is as follows: the weight of the stable region is higher than the weight of the non-stable region.

[0162] In the fine registration stage, the ICP iterative nearest point algorithm is executed within the aforementioned stable region, and a regional weighting strategy (i.e., assigning higher weights to corresponding points within the stable region) is introduced to improve registration accuracy.

[0163] Specifically, in the registration coordinate system, based on the second stable region, the registration method, and the registration coordinate system, the resting closed-lip three-dimensional facial image data and the maximum occlusal tooth-revealing smile three-dimensional facial image data are registered and aligned, a second spatial transformation matrix is obtained, and the maximum occlusal tooth-revealing smile three-dimensional facial image data is transformed to the registration coordinate system based on the second spatial transformation matrix.

[0164] Specifically, based on the registration method, the first preset surface of the tooth surface model and the second preset surface of the jawbone structure model are spatially aligned.

[0165] Although preferred embodiments of the invention have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including both the preferred embodiments and all changes and modifications falling within the scope of the invention.

[0166] Obviously, those skilled in the art can make various modifications and variations to this invention without departing from its spirit and scope. Therefore, if these modifications and variations fall within the scope of the claims of this invention and their equivalents, this invention also intends to include these modifications and variations.

Claims

1. A method of fusing a three-dimensional image of a face, jawbone, and teeth, characterized by, The method includes: Acquire intraoral scan data, cone-beam CT data, and resting closed-lip three-dimensional facial images and maximum occlusal tooth-exposing smile three-dimensional facial images of the target object; Based on the intraoral scan data and the cone-beam CT data, tooth surface models and jawbone structure models were obtained, respectively. The tooth surface model and the jawbone structure model are registered to obtain a three-dimensional tooth-jawbone model; The facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jawbone are registered with the resting lip-closed three-dimensional facial image data to obtain the first spatial transformation matrix; based on the first spatial transformation matrix, the resting lip-closed three-dimensional facial image data are transformed and aligned to the coordinate system of the three-dimensional model of the teeth and jawbone to obtain the registration coordinate system; In the registration coordinate system, the transformed and aligned three-dimensional facial image data of the resting lip closed face is registered with the three-dimensional facial image data of the maximum occlusal smile, and the second spatial transformation matrix is obtained. Based on the second spatial transformation matrix, the three-dimensional facial image data of the maximum occlusal smile is transformed and aligned to the three-dimensional facial image data of the resting lip closed face, so that the three-dimensional facial image data of the maximum occlusal smile is transformed and aligned to the registration coordinate system. Based on the registration coordinate system, the tooth-jawbone three-dimensional model is fused with the aligned maximum occlusal tooth-showing smile three-dimensional facial image data to obtain a tooth-face model; Orthodontic data were obtained based on the aforementioned teeth-face model.

2. The method of claim 1, wherein the method further comprises: The specific steps for obtaining the three-dimensional model of the tooth-jawbone include: The first preset surface of the tooth surface model and the second preset surface of the jawbone structure model are spatially aligned. The first crown data of the jawbone structure model is updated with the second crown data of the tooth surface model to obtain the three-dimensional tooth-jawbone model.

3. The method of claim 2, wherein the method further comprises: The specific steps for obtaining the jawbone structure model include: Obtain the coarse segmentation model and the fine segmentation model; The cone-beam CT data is input into the coarse segmentation model to obtain initial segmentation data; based on the region of interest, the tooth-bone sub-region and facial soft tissue surface data are extracted from the initial segmentation data; the tooth-bone sub-region is input into the fine segmentation model for fine segmentation to obtain the second preset surface and bone data; The jawbone structure model is obtained based on the second preset surface, the bone data, and the facial soft tissue surface data.

4. The method for fusing three-dimensional images of face, jawbone, and teeth according to claim 1, characterized in that, The specific steps for obtaining the registration coordinate system include: Based on the first stable region, the facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jaws are registered with the resting closed-lip three-dimensional facial image data, and the first spatial transformation matrix is obtained. Based on the first spatial transformation matrix, the resting closed-lip three-dimensional facial image data is transformed and aligned to the coordinate system of the three-dimensional model of the teeth and jaws to obtain a registration coordinate system based on the coordinate system of the three-dimensional model of the teeth and jaws.

5. The method for fusing three-dimensional images of face, jawbone, and teeth according to claim 4, characterized in that, The specific steps for obtaining the teeth-face model include: In the registration coordinate system, based on the second stable region, the resting lip-closed three-dimensional facial image data and the maximum occlusal tooth-showing smile three-dimensional facial image data are registered, and the second spatial transformation matrix is obtained; based on the second spatial transformation matrix, the maximum occlusal tooth-showing smile three-dimensional facial image data is transformed and aligned to the resting lip-closed three-dimensional facial image data. Based on the registration coordinate system, the tooth-jawbone 3D model is fused with the aligned 3D facial image data of the maximum occlusal tooth-showing smile to obtain the tooth-face model.

6. The method for fusing three-dimensional images of face, jawbone, and teeth according to claim 5, characterized in that, The first stable region and the second stable region are obtained based on a stable region approach, wherein the stable region approach is as follows: The first input data and the second input data are segmented to obtain several first sub-regions and several second sub-regions, respectively. Obtain the heatmap coordinates of facial key points in the first input data and the second input data; Based on the first sub-region, the second sub-region, and the heatmap coordinates, a stable region is obtained; Specifically, when obtaining the first stable region, the first input data is the facial soft tissue surface data reconstructed by CBCT in the three-dimensional model of the teeth and jawbone, and the second input data is the resting lip-closed three-dimensional facial image data; when obtaining the second stable region, the first input data is the resting lip-closed three-dimensional facial image data, and the second input data is the maximum occlusal tooth-revealing smile three-dimensional facial image data.

7. The method for fusing three-dimensional images of face, jawbone, and teeth according to claim 6, characterized in that, Based on the first stable region and the registration method, the facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jawbone are spatially aligned with the resting closed-lip three-dimensional facial image data. The registration method is as follows: Based on the stable region, the first fast point feature histogram feature of the first point cloud of the third input data is obtained, and the second fast point feature histogram feature of the second point cloud of the fourth input data is obtained. Based on the first fast point feature histogram feature and the second fast point feature histogram feature, the first point cloud and the second point cloud are matched to obtain the initial feature point correspondence. Based on the random sampling consensus algorithm, the correspondence between the initial feature points is estimated to obtain the initial rigid body transformation matrix; Based on the initial rigid body transformation matrix, the fourth input data is aligned to the third input data to obtain a coarse registration result; Based on the coarse registration result, the third input data and the fourth input data are spatially aligned; The third input data is the facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jawbone, and the fourth input data is the resting lip-closed three-dimensional facial image data; the first point cloud is the point cloud corresponding to the third input data, and the second point cloud is the point cloud corresponding to the fourth input data.

8. The method for fusing three-dimensional images of a face, jawbone, and teeth according to claim 7, characterized in that, Based on the coarse registration result, the specific steps for spatially aligning the third input data and the fourth input data include: Based on the iterative nearest point algorithm, the regional weighting rule, and the coarse registration result, the first point cloud of the third input data and the second point cloud of the fourth input data are spatially aligned. The region weighting rule is as follows: the weight of the stable region is higher than the weight of the non-stable region.

9. The method for fusing three-dimensional images of a face, jawbone, and teeth according to claim 8, characterized in that, In the registration coordinate system, based on the second stable region and the registration method, the resting closed-lip three-dimensional facial image data and the maximum occlusal tooth-revealing smile three-dimensional facial image data are registered and aligned, and the second spatial transformation matrix is obtained by solving.

10. A fusion system for facial images, jawbone, and teeth three-dimensional images, characterized in that, The system includes: Data unit: used to acquire intraoral scan data, cone-beam CT data, resting lip-closed 3D facial image data, and maximum occlusal tooth-exposing smile 3D facial image data of the target object; Model unit: used to obtain tooth surface model and jawbone structure model based on the intraoral scan data and the cone-beam CT data, respectively; Registration unit: used to register the tooth surface model and the jawbone structure model to obtain a three-dimensional tooth-jawbone model; And to register the facial soft tissue surface data reconstructed by cone-beam CT in the three-dimensional model of the teeth and jawbone with the resting closed-lip three-dimensional facial image data, and solve for the first spatial transformation matrix; based on the first spatial transformation matrix, transform and align the resting closed-lip three-dimensional facial image data to the coordinate system of the three-dimensional model of the teeth and jawbone, and obtain the registration coordinate system; And for registering the transformed and aligned three-dimensional facial image data of the resting lip closed face with the three-dimensional facial image data of the maximum occlusal smile under the registration coordinate system, and solving for the second spatial transformation matrix; based on the second spatial transformation matrix, transforming and aligning the three-dimensional facial image data of the maximum occlusal smile to the three-dimensional facial image data of the resting lip closed face, so that the three-dimensional facial image data of the maximum occlusal smile is transformed and aligned to the registration coordinate system; And for fusing the tooth-jawbone three-dimensional model with the aligned maximum occlusal tooth-showing smile three-dimensional facial image data based on the registration coordinate system to obtain a tooth-face model; Results unit: used to obtain orthodontic data based on the teeth-face model.