Intraoperative dangerous region generation system based on preoperative intraoperative three-dimensional mesh fusion
By using a preoperative and intraoperative 3D mesh fusion system, intraoperative danger zones are generated, solving the problem of the inability to plan surgical paths in advance in existing technologies, and achieving precise planning of surgical paths and improved safety.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HEFEI UNIV OF TECH
- Filing Date
- 2023-04-14
- Publication Date
- 2026-06-26
AI Technical Summary
Current technology cannot identify and plan the intraoperative danger areas of minimally invasive surgery before the doctor performs the procedure, resulting in insufficient surgical path planning and affecting the safety and efficiency of the surgery.
The system, based on the fusion of preoperative and intraoperative 3D meshes, uses a registration module to register the preoperative 3D mesh model with the intraoperative 3D mesh model. Combined with the areas to be avoided and the danger distance marked by the doctor, the system generates and displays the intraoperative danger area. The system uses depth estimation and multi-mode registration fusion algorithms to display the danger area in real time.
It improves the precision and safety of surgery, helps doctors plan surgical paths in advance, and enhances surgical efficiency.
Smart Images

Figure CN116421311B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of minimally invasive surgery technology, specifically to a system for generating intraoperative danger zones based on preoperative and intraoperative three-dimensional mesh fusion. Background Technology
[0002] Compared to traditional open surgery, minimally invasive surgery (such as endoscopic surgery) has advantages such as smaller incisions, less bleeding, and faster recovery, and is gradually being widely adopted.
[0003] Setting up a danger zone during surgery is an important measure to ensure surgical safety. For example, Chinese patent CN115624382A discloses a holmium laser position warning system, method, device, and medium for pyeloscopes. It defines the danger zone as the distance between the holmium laser and the lens of the pyeloscope. By setting auxiliary scale icons and alarms, it avoids the holmium laser from getting too close or too far from the lens of the pyeloscope, thus preventing damage to the lens or the human body and further improving the safety of the surgery.
[0004] However, danger zone marking based on the positional relationship between instruments and tissues is mainly aimed at the doctor's operation behavior. This type of technology cannot mark the danger zone before the doctor operates and plan the surgical path in advance. Summary of the Invention
[0005] (a) Technical problems to be solved
[0006] To address the shortcomings of existing technologies, this invention provides an intraoperative danger zone generation system based on preoperative and intraoperative three-dimensional mesh fusion, which solves the technical problem of not being able to mark and plan the surgical path in advance before the doctor's operation.
[0007] (II) Technical Solution
[0008] To achieve the above objectives, the present invention provides the following technical solution:
[0009] A system for generating intraoperative danger zones based on preoperative and intraoperative 3D mesh fusion, characterized in that it includes:
[0010] The registration module is used to register the preoperative 3D mesh model and the intraoperative 3D mesh model, and obtain the coordinates of all vertices of the preoperative 3D mesh model after registration.
[0011] The preoperative three-dimensional mesh model contains tissue semantic information;
[0012] The intraoperative three-dimensional mesh model is obtained based on the depth value of the specified binocular endoscopic image frame;
[0013] The receiving module is used to receive the regions to be avoided marked by the doctor on the region of interest of the preoperative 3D mesh model after registration, as well as the set danger distance;
[0014] The generation module is used to generate and display the three-dimensional mesh model corresponding to the intraoperative danger area based on the three-dimensional mesh model corresponding to the area to be avoided and the danger distance.
[0015] Preferably, the registration module includes:
[0016] The first modeling unit is used to obtain a preoperative three-dimensional mesh model with organizational semantic information;
[0017] The second modeling unit is used to obtain an intraoperative three-dimensional mesh model based on the depth value of a specified binocular endoscopic image frame.
[0018] The feature extraction unit is used to obtain corresponding multi-level features based on the preoperative three-dimensional mesh model and the intraoperative three-dimensional mesh model, respectively.
[0019] The overlap prediction unit is used to obtain the overlapping area of the preoperative 3D mesh model and the intraoperative 3D mesh model based on the multi-level features, and to obtain the pose transformation relationship of the vertices of the preoperative 3D mesh model within the overlapping area.
[0020] The global fusion unit is used to obtain the coordinates of all vertices of the preoperative 3D mesh model after registration, based on the coordinates and pose transformation relationship of vertices in the overlapping region and the coordinates of vertices in the non-overlapping region of the preoperative 3D mesh model.
[0021] The information display unit is used to display the internal organization information of the preoperative three-dimensional mesh model in the intraoperative three-dimensional mesh model according to the coordinates of all vertices after registration of the preoperative three-dimensional mesh model.
[0022] Preferably, the feature extraction unit uses Chebyshev spectral map convolution to extract multi-level features of the preoperative 3D mesh model and the intraoperative 3D mesh model:
[0023]
[0024]
[0025] Among them, the preoperative three-dimensional mesh model M is defined. pre =(V pre E pre V pre E represents the spatial coordinates of the vertices of the preoperative 3D mesh model. pre The edges between vertices in the preoperative 3D mesh model; the intraoperative 3D mesh model M in =(V in E in V in E represents the spatial coordinates of the vertices of the preoperative 3D mesh model.in Represents the edges between vertices of the intraoperative 3D mesh model;
[0026] and Let the downsampling scale features of the (n+1)th and nth layers of the preoperative tissue model be represented respectively, and initialized. For V pre ; and Let the features of the (n+1)th and nth layers of the intraoperative tissue model be represented respectively, and initialized. For V in ;
[0027] The b-th order Chebyshev polynomials calculated from their respective vertices and their B-ring neighborhoods. They are respectively from edge E in E pre Calculate the scaled Laplacian matrix. These are the learning parameters of the neural network;
[0028] And / or the overlap prediction unit is specifically used for:
[0029] An attention mechanism is used to obtain the overlapping region between the preoperative 3D mesh model and the intraoperative 3D mesh model, including:
[0030]
[0031]
[0032] Among them, O pre Represents the preoperative 3D mesh model M pre Mask of overlapping regions; O in Intraoperative 3D mesh model M in The mask for the overlapping region; cross and self represent the self-attention and cross-attention operations, respectively; and These represent the m-th downsampling scale features of the vertices in the preoperative and intraoperative 3D mesh models, respectively.
[0033] According to mask O pre and O in Get the vertices that are within the overlapping region. and its characteristics The preoperative 3D mesh model M was calculated using a multilayer perceptron (MLP). pre Vertex in Corresponding point:
[0034]
[0035] in, It is the intraoperative 3D mesh model M in The vertices in the model correspond to the preoperative 3D mesh model M. pre Vertex in This indicates the calculation of cosine similarity. This indicates that the position encoding operation is performed on the vertices of the intraoperative 3D mesh model that are within the overlapping area;
[0036] Vertex construction using KNN (Knowledge Neighbor Network) For the local neighborhood, the rotation matrix is solved using Singular Value Decomposition (SVD), as shown in the following formula:
[0037]
[0038] in, Represents vertices rotation matrix; This indicates that the KNN algorithm is used to construct the vertex... A local neighborhood; The vertices of the preoperative 3D mesh model neighborhood points, It corresponds to the neighborhood point Vertices of the intraoperative 3D mesh model;
[0039] Using rotation matrix Change the point cloud coordinates to obtain Using MLP to predict vertices The displacement vector is given by the following formula:
[0040]
[0041] in, The displacement vectors of the vertices in the overlapping region of the preoperative 3D mesh model are compared with the rotation matrix. This constitutes the pose transformation relationship;
[0042] And / or the global fusion unit is specifically used for:
[0043] The rotation matrices and translation vectors of all vertices of the preoperative 3D mesh model were obtained using MLP regression:
[0044]
[0045] Among them, R pre ,t pre These represent the rotation matrix and translation vector of all vertices in the preoperative 3D mesh model, respectively. Indicates based on vertices within the overlapping region With all vertices v of the preoperative 3D mesh modelpre The weights for distance calculation;
[0046]
[0047] in, This represents the coordinates of all vertices after the preoperative 3D mesh model has been registered.
[0048] Preferably, during the training phase of the intraoperative danger zone generation system, a training set is generated based on real data:
[0049] Based on the specified feature point pairs between the binocular endoscopic image frames and the preoperative 3D mesh model, a non-rigid algorithm is used to register the preoperative and intraoperative 3D mesh models based on the feature points. For any feature point, we have:
[0050]
[0051] Wherein, Non_rigid_ICP represents the non-rigid registration algorithm ICP. This represents the a-th feature point in the preoperative 3D mesh model used for non-rigid registration. correspond Feature points of the intraoperative 3D mesh model, T G T represents the global transition matrix of the preoperative 3D mesh model. l,a It belongs to feature point v pre,a The local deformation transfer matrix;
[0052] The local deformation transfer matrix T of all vertices in the preoperative 3D mesh model was obtained by using quaternion interpolation. l Vertices v in the preoperative 3D mesh model are obtained by transforming the relationships. pre Registered coordinate labels
[0053] Preferably, during the training phase of the intraoperative danger zone generation system, the following supervised loss function is constructed:
[0054]
[0055] Among them, Loss s This represents the supervised loss function during the training phase;
[0056] β s γ s These represent the coefficients of the supervised loss term;
[0057] N1 represents the preoperative 3D mesh model M. pre The number of vertices;
[0058] This represents the L2 ground truth loss based on a manually labeled dataset. This represents the coordinates of all vertices after the preoperative 3D mesh model registration.
[0059] I c +I c +III c I represents the Cauchygreen invariant, used to constrain the degree of tissue deformation in vivo. c The arc distance between two points on the constrained surface remains constant, II c The surface area of the constrained tissue remains unchanged, III c The volume of the constrained tissue remains constant.
[0060] Preferably, the registration module further includes:
[0061] The precision fine-tuning unit is used to introduce an unsupervised loss fine-tuning network to assist the global fusion unit in obtaining the coordinates of all vertices after the preoperative 3D mesh model registration.
[0062] And / or the unsupervised loss fine-tuning network described above, during application, constructs the following unsupervised loss function:
[0063]
[0064] Among them, Loss u Represents the unsupervised loss function;
[0065] β u ,γ u They represent the coefficients of the unsupervised loss term, and All values are vertex coordinates after registration of the preoperative 3D mesh model during unsupervised training. This represents the vertices of the preoperative 3D mesh model after distance registration in the intraoperative 3D mesh model. The closest point, Represents vertices and European distance, This represents the distance from the vertex of the preoperative 3D mesh model to the intraoperative 3D mesh model in the registered preoperative 3D mesh model. The closest point, Represents vertex v in, and vertex Euclidean distance;
[0066] N1 represents the preoperative 3D mesh model M. pre The number of vertices, N2 represents the intraoperative 3D mesh model M. in The number of vertices;
[0067] Describes the Cauchygreen invariant during unsupervised training. The arc distance between two points on the constrained surface remains constant. The surface area of the constrained tissue remains unchanged. The volume of the constrained tissue remains constant.
[0068] Preferably, the generation module includes:
[0069] The estimation unit is used to obtain and normalize the normal vector of each surface vertex based on the surface vertices of the three-dimensional mesh model corresponding to the region to be avoided, using the isonormal estimation method.
[0070] The expansion unit, based on the spatial coordinates of each surface vertex and its normalized normal vector, combined with the danger distance, expands to obtain the surface vertices of the organizational network model corresponding to the danger area;
[0071] v danger = evade ×Normal evade + evade
[0072] Among them, v danger d represents the surface vertices of the organizational network model corresponding to the danger zone. evade Indicates danger distance; Normal evade The surface vertex v of the 3D mesh model corresponding to the region to be avoided. evade The corresponding normalized normal vector;
[0073] The connection unit connects the surface vertices of the tissue network model corresponding to the danger zone according to the connection relationship between the surface vertices of the preoperative three-dimensional mesh model, and generates and displays the three-dimensional mesh model corresponding to the intraoperative danger zone.
[0074] Preferably, the second modeling unit uses an online self-supervised learning depth estimation method based on binocular endoscope to obtain the depth value of the specified binocular endoscope image frame; the binocular depth estimation network used by the online self-supervised learning depth estimation method has the ability to quickly overlearn and can continuously adapt to new scenes using self-supervised information;
[0075] In real-time reconstruction mode, the second modeling unit is specifically used to overfit continuous video frames to obtain the depth value of a specified binocular endoscopic image frame, including:
[0076] Extraction subunits are used to acquire binocular endoscope images, and the encoder network of the current binocular depth estimation network is used to extract multi-scale features of the current frame image;
[0077] The fusion subunit is used to fuse multi-scale features using the decoder network of the current binocular depth estimation network to obtain the disparity of each pixel in the current frame image;
[0078] The conversion subunit is used to convert parallax into depth based on camera intrinsic and extrinsic parameters and output it as the result of the current frame image.
[0079] The first estimation subunit is used to update the parameters of the current stereo depth estimation network using self-supervised loss without introducing external ground truth, for depth estimation of the next frame image.
[0080] Preferably, in the precise measurement mode, the second modeling unit is specifically used to overfit key image video frames, including:
[0081] The second estimation subunit, without introducing external truth values, updates the parameters of the aforementioned binocular depth estimation network in real-time reconstruction mode based on the binocular depth estimation network obtained from the previous frame of the specified binocular endoscopic image frame using the self-supervised loss corresponding to the specified binocular endoscopic image frame until convergence, and uses the converged binocular depth estimation network to accurately estimate the depth of the specified binocular endoscopic image frame, thereby obtaining the depth value of the specified binocular endoscopic image frame.
[0082] (III) Beneficial Effects
[0083] This invention provides a system for generating intraoperative danger zones based on preoperative and intraoperative 3D mesh fusion. Compared with existing technologies, it has the following advantages:
[0084] This invention combines depth estimation and multi-modal registration fusion algorithms, allowing doctors to flexibly select specific human tissues according to their needs, set danger distances, and obtain and display dangerous areas in real time, thus improving the accuracy and safety of surgery. Because this method alerts doctors to dangerous areas before the actual operation, it helps them plan the surgical path in advance, greatly improving surgical efficiency. Attached Figure Description
[0085] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0086] Figure 1 A structural block diagram of an intraoperative danger zone generation system based on preoperative and intraoperative three-dimensional mesh fusion provided in an embodiment of the present invention;
[0087] Figure 2 This is a schematic diagram illustrating the technical framework of an online self-supervised learning depth estimation method based on binocular endoscopy, provided in an embodiment of the present invention. Detailed Implementation
[0088] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are described clearly and completely. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0089] This application provides a system for generating intraoperative danger zones based on preoperative and intraoperative three-dimensional mesh fusion, which solves the technical problem of not being able to mark and plan surgical paths in advance before the doctor's operation.
[0090] The technical solution in this application is to solve the above-mentioned technical problems, and the general idea is as follows:
[0091] The embodiments of this invention are primarily applied to, but not limited to, surgical endoscopic scenarios, such as laparoscopic surgery. Specifically, the provided intraoperative danger zone generation system based on preoperative and intraoperative 3D mesh fusion includes a registration module, a receiving module, and a generation module. Wherein:
[0092] The registration module is used to register the preoperative 3D mesh model and the intraoperative 3D mesh model, and obtain the coordinates of all vertices of the preoperative 3D mesh model after registration; the receiving module is used to receive the regions to be avoided marked by the doctor on the region of interest of the registered preoperative 3D mesh model, as well as the set danger distance; the generation module is used to generate and display the 3D mesh model corresponding to the intraoperative danger area based on the 3D mesh model corresponding to the region to be avoided and the danger distance.
[0093] Based on intraoperative reconstruction and multi-modal fusion, and according to the surgeon's operational needs, the tissue boundary of the area to be avoided is expanded by the normal vectors of the surface vertices of the 3D mesh model corresponding to the area to be avoided, generating the target intraoperative danger area, assisting the surgeon in performing the operation, and effectively improving the safety of the operation.
[0094] Furthermore, an intraoperative 3D mesh model can be obtained based on the depth values of specified binocular endoscopic image frames. Specifically, an online self-supervised learning depth estimation method based on binocular endoscopy can be used to obtain the depth values of the specified binocular endoscopic image frames. The binocular depth estimation network used in this online self-supervised learning depth estimation method has the ability to quickly overlearn and can continuously adapt to new scenarios using self-supervised information. Moreover, the online self-supervised learning depth estimation method also provides two modes: a real-time reconstruction mode and a precise measurement mode, for determining the depth values of the specified binocular endoscopic image frames.
[0095] The dual-mode switching depth estimation can provide real-time point clouds of intraoperative anatomical structures to help doctors intuitively understand the intraoperative three-dimensional structure. It can also achieve high-precision reconstruction of the binocular endoscopic image frames specified by the doctor based on single-frame overfitting, providing a basis for subsequent processing and balancing speed and accuracy in application.
[0096] To better understand the above technical solutions, the following will provide a detailed explanation of the technical solutions in conjunction with the accompanying drawings and specific implementation methods.
[0097] Example:
[0098] like Figure 1 As shown in the figure, an intraoperative danger zone generation system based on preoperative and intraoperative three-dimensional mesh fusion provided by an embodiment of the present invention includes:
[0099] The registration module is used to register the preoperative 3D mesh model and the intraoperative 3D mesh model, and obtain the coordinates of all vertices of the preoperative 3D mesh model after registration.
[0100] The preoperative three-dimensional mesh model contains tissue semantic information;
[0101] The intraoperative three-dimensional mesh model is obtained based on the depth value of the specified binocular endoscopic image frame;
[0102] The receiving module is used to receive the regions to be avoided marked by the doctor on the region of interest of the preoperative 3D mesh model after registration, as well as the set danger distance;
[0103] The generation module is used to generate and display the three-dimensional mesh model corresponding to the intraoperative danger area based on the three-dimensional mesh model corresponding to the area to be avoided and the danger distance.
[0104] This invention combines depth estimation and multi-modal registration fusion algorithms, allowing doctors to flexibly select specific human tissues according to their needs, set danger distances, and obtain and display dangerous areas in real time, thus improving surgical accuracy and safety. Because this method alerts doctors to dangerous areas before the actual operation, it helps them plan the surgical path in advance, significantly improving surgical efficiency.
[0105] The following section will detail each component module of the above technical solution:
[0106] The registration module is used to register the preoperative 3D mesh model and the intraoperative 3D mesh model, and obtain the coordinates of all vertices of the preoperative 3D mesh model after registration; wherein the preoperative 3D mesh model contains tissue semantic information; and the intraoperative 3D mesh model is obtained according to the depth value of the specified binocular endoscopic image frame.
[0107] The registration module includes a first modeling unit, a second modeling unit, a feature extraction unit, an overlap prediction unit, a global fusion unit, and a precision fine-tuning unit. Specifically:
[0108] The first modeling unit is used to obtain a preoperative 3D mesh model with organizational semantic information.
[0109] For example, this unit uses software such as 3D Slicer to reconstruct CT / MRI tissues to obtain a three-dimensional mesh model. Then, deep learning algorithms such as DeepLab or manual segmentation are used to divide tissues such as blood vessels and liver, ultimately forming a preoperative three-dimensional mesh model M with tissue semantic information. pre =(V pre E pre ), where V pre E represents the vertex space coordinates of the model. pre This represents the edges between vertices.
[0110] The second modeling unit is used to obtain an intraoperative three-dimensional mesh model based on the depth value of a specified binocular endoscopic image frame.
[0111] For example, this unit uses an online self-supervised learning depth estimation method based on binocular endoscopy (see below for details) to estimate the depth value D of a pixel; and calculates the spatial coordinates of the pixel in the camera coordinate system using a pinhole camera model, the formula of which is:
[0112]
[0113]
[0114] z = D
[0115] Where D is the depth estimate of the pixel; x, y, and z represent the x, y, and z coordinates in the camera coordinate system, respectively.
[0116] c x ,c y ,f x ,f y The intrinsic parameter matrix between the left or right endoscope and the camera in a binocular endoscope. The corresponding parameters in the image are used to convert the image into a point cloud V. in ={v in,a |a=1,2,…N1}, where v in,a This represents the spatial coordinates of the a-th pixel.
[0117] Finally, Delaunay triangulation was used to generate the point cloud V. in Adjacent edge E in Ultimately, an intraoperative three-dimensional mesh model M is formed. in =(V in E in ).
[0118] The feature extraction unit is used to obtain corresponding multi-level features based on the preoperative three-dimensional mesh model and the intraoperative three-dimensional mesh model, respectively.
[0119] Specifically, the feature extraction unit uses Chebyshev spectral map convolution to extract multi-level features from the preoperative and intraoperative 3D mesh models:
[0120]
[0121]
[0122] Among them, the preoperative three-dimensional mesh model M is defined. pre =(V pre E pre V pre E represents the spatial coordinates of the vertices of the preoperative 3D mesh model. pre The edges between vertices in the preoperative 3D mesh model; the intraoperative 3D mesh model M in =(V in E in V in E represents the spatial coordinates of the vertices of the preoperative 3D mesh model. in Represents the edges between vertices of the intraoperative 3D mesh model;
[0123] and Let the downsampling scale features of the (n+1)th and nth layers of the preoperative tissue model be represented respectively, and initialized. For V pre ; and Let the features of the (n+1)th and nth layers of the intraoperative tissue model be represented respectively, and initialized. For V in ;
[0124] The b-th order Chebyshev polynomials calculated from their respective vertices and their B-ring neighborhoods. They are respectively from edge E in E pre Calculate the scaled Laplacian matrix. These are the learning parameters of the neural network.
[0125] The overlap prediction unit is used to obtain the overlapping region of the preoperative 3D mesh model and the intraoperative 3D mesh model based on the multi-level features, and to obtain the pose transformation relationship of the vertices of the preoperative 3D mesh model within the overlapping region.
[0126] Specifically, the overlap prediction unit is used for:
[0127] An attention mechanism is used to obtain the overlapping region between the preoperative 3D mesh model and the intraoperative 3D mesh model, including:
[0128]
[0129]
[0130] Among them, O pre Represents the preoperative 3D mesh model M pre Mask of overlapping regions; O in Intraoperative 3D mesh model M in The mask for the overlapping region; cross and self represent the self-attention and cross-attention operations, respectively; and These represent the m-th downsampling scale features of the vertices in the preoperative and intraoperative 3D mesh models, respectively.
[0131] According to mask O pre and O in Get the vertices that are within the overlapping region. and its characteristics The preoperative 3D mesh model M was calculated using a multilayer perceptron (MLP). pre Vertex in Corresponding point:
[0132]
[0133] in, It is the intraoperative 3D mesh model Min The vertices in the model correspond to the preoperative 3D mesh model M. pre Vertex in This indicates the calculation of cosine similarity. This indicates that the position encoding operation is performed on the vertices of the intraoperative 3D mesh model that are within the overlapping area;
[0134] Vertex construction using KNN (Knowledge Neighbor Network) For the local neighborhood, the rotation matrix is solved using Singular Value Decomposition (SVD), as shown in the following formula:
[0135]
[0136] in, Represents vertices rotation matrix; This indicates that the KNN algorithm is used to construct the vertex... A local neighborhood; The vertices of the preoperative 3D mesh model neighborhood points, It corresponds to the neighborhood point Vertices of the intraoperative 3D mesh model;
[0137] Using rotation matrix Change the point cloud coordinates to obtain Using MLP to predict vertices The displacement vector is given by the following formula:
[0138]
[0139] in, The displacement vectors of vertices in the overlapping region of the pre-operative 3D mesh model.
[0140] The global fusion unit is used to obtain the coordinates of all vertices of the preoperative 3D mesh model after registration, based on the coordinates and pose transformation relationship of vertices in the overlapping region of the preoperative 3D mesh model and the coordinates of vertices in the non-overlapping region.
[0141] Specifically, the global fusion unit is used for:
[0142] The rotation matrices and translation vectors of all vertices of the preoperative 3D mesh model were obtained using MLP regression:
[0143]
[0144] Among them, R pre ,t pre These represent the rotation matrix and translation vector of all vertices in the preoperative 3D mesh model, respectively. Indicates based on vertices within the overlapping region With all vertices v of the preoperative 3D mesh model pre The weights for distance calculation (where all vertices include vertices in overlapping regions and vertices in non-overlapping regions);
[0145]
[0146] in, This represents the coordinates of all vertices after the preoperative 3D mesh model has been registered.
[0147] Accordingly, it can be clearly stated that the multi-mode fusion network proposed in this embodiment of the invention is based on grid data. It predicts the overlapping region and its displacement field through overlapping prediction units, and combines Cauchy Green invariant constraints on the non-rigid deformation of the preoperative three-dimensional grid model, making the multi-mode fusion model more reasonable and reducing the errors of multi-mode fusion.
[0148] The information display unit is used to display the internal organization information of the preoperative three-dimensional mesh model in the intraoperative three-dimensional mesh model according to the coordinates of all vertices after registration of the preoperative three-dimensional mesh model.
[0149] For example, in this unit, VR glasses can be used to display the two registered 3D models in a unified coordinate system, or the registered preoperative 3D mesh model can be superimposed on the endoscopic image based on the basic principles of camera imaging. Both of these optional display methods can present internal tissue information to doctors, assist doctors in making clinical decisions, reduce surgical risks, and improve surgical efficiency.
[0150] The precision fine-tuning unit is used to introduce an unsupervised loss fine-tuning network to assist the global fusion unit in obtaining the coordinates of all vertices after the preoperative 3D mesh model registration.
[0151] The reason for introducing the precision fine-tuning unit is that, in this embodiment of the invention, when registering a specified binocular endoscopic image frame, the reconstructed intraoperative 3D mesh model may differ from the dataset due to differences in endoscopic lighting and individual patient characteristics. These differences may lead to a decrease in registration accuracy. Using an unsupervised loss fine-tuning network can improve the registration accuracy.
[0152] Therefore, in the application of the unsupervised loss fine-tuning network, the following unsupervised loss function needs to be constructed:
[0153]
[0154] Among them, Loss u Represents the unsupervised loss function;
[0155] β u ,γu They represent the coefficients of the unsupervised loss term, and All values are vertex coordinates after registration of the preoperative 3D mesh model during unsupervised training. This represents the vertices of the preoperative 3D mesh model after distance registration in the intraoperative 3D mesh model. The closest point, Represents vertices and European distance, This represents the distance from the vertex of the preoperative 3D mesh model to the intraoperative 3D mesh model in the registered preoperative 3D mesh model. The closest point, Represents vertex v in, and vertex Euclidean distance;
[0156] N1 represents the preoperative 3D mesh model M. pre The number of vertices, N2 represents the intraoperative 3D mesh model M. in The number of vertices;
[0157] Describes the Cauchygreen invariant during unsupervised training. The arc distance between two points on the constrained surface remains constant. The surface area of the constrained tissue remains unchanged. The volume of the constrained tissue remains constant.
[0158] This invention constructs an unsupervised fine-tuning mechanism with bidirectional nearest neighbor as the loss function to achieve accurate fusion of the preoperative combined mesh model and the intraoperative three-dimensional mesh model under a specified binocular endoscopic image frame.
[0159] It should be noted that, compared with the virtual registration datasets constructed by biomechanical models in the prior art, the embodiments of the present invention use real endoscopic images and medical test data to construct a dataset that is tailored to the characteristics of the flexible dynamic environment in vivo. The network trained on this dataset has higher registration accuracy.
[0160] Specifically, during the training phase of the registration module, a training set is generated based on real data, including:
[0161] Based on the specified feature point pairs between the binocular endoscopic image frames and the preoperative 3D mesh model, a non-rigid algorithm is used to register the preoperative and intraoperative 3D mesh models based on the feature points. For any feature point, we have:
[0162]
[0163] Wherein, Non_rigid_ICP represents the non-rigid registration algorithm ICP. This represents the a-th feature point in the preoperative 3D mesh model used for non-rigid registration. correspond Feature points of the intraoperative 3D mesh model, T G T represents the global transition matrix of the preoperative 3D mesh model. l,a It belongs to feature point v pre,a The local deformation transfer matrix;
[0164] The local deformation transfer matrix T of all vertices in the preoperative 3D mesh model was obtained by using quaternion interpolation. l Vertices v in the preoperative 3D mesh model are obtained by transforming the relationships. pre Registered coordinate labels
[0165] Accordingly, during the training phase of the registration module, the following supervised loss function needs to be constructed:
[0166]
[0167] Among them, Loss s This represents the supervised loss function during the training phase;
[0168] β s γ s These represent the coefficients of the supervised loss term;
[0169] N1 represents the preoperative 3D mesh model M. pre The number of vertices;
[0170] This represents the L2 ground truth loss based on a manually labeled dataset. This represents the coordinates of all vertices after the preoperative 3D mesh model registration.
[0171] I c +II c +III c I represents the Cauchygreen invariant, used to constrain the degree of tissue deformation in vivo. c The arc distance between two points on the constrained surface remains constant, II c The surface area of the constrained tissue remains unchanged, III c The volume of the constrained tissue remains constant.
[0172] The receiving module is used to receive the regions to be avoided marked by the doctor on the region of interest of the preoperative 3D mesh model after registration, as well as the set danger distance.
[0173] Since the preoperative 3D mesh model contains tissue semantic information, for example, different colors (blue, green, etc.) are used to distinguish and display different regions (blood vessels, tumors, etc.) in the tissue, the registered preoperative 3D mesh model also contains tissue semantic information.
[0174] Under the aforementioned limitations, the area to be avoided can refer to the three-dimensional mesh model M corresponding to the area to be avoided, which the doctor identifies by selecting different colors. evade =(V evade E evade ), where V evade V represents the spatial coordinates of the surface vertices of the 3D mesh model corresponding to the region to be avoided, and V evade for A subset of a set; E evade This represents the connection relationship between the surface vertices of the 3D mesh model corresponding to the region to be avoided, and E evade For E pre A subset of.
[0175] Define the danger distance d evade ∈R.
[0176] The generation module is used to generate and display the three-dimensional mesh model corresponding to the intraoperative danger area based on the three-dimensional mesh model corresponding to the area to be avoided and the danger distance.
[0177] The generation module includes:
[0178] The estimation unit is used to estimate the three-dimensional mesh model M corresponding to the area to be avoided. evade Surface vertex v evade The isonormal estimation method is used to obtain the v of each surface vertex. evade Normalized normal vector evade ∈R 3 ;
[0179] The expansion unit, based on the spatial coordinates of each surface vertex and its normalized normal vector, combined with the danger distance, expands to obtain the surface vertices v of the organizational network model corresponding to the danger region. danger ;
[0180] v danger =d evade ×Normal evade +v evade
[0181] The connection unit is based on the connection relationships between the surface vertices of the preoperative 3D mesh model (specifically based on E). pre subset E evadeConnecting the surface vertices of the tissue network model corresponding to the danger zone, a three-dimensional mesh model M corresponding to the intraoperative danger zone is generated and displayed. danger =(V danger E evade ).
[0182] For example, VR glasses can be used to display the intraoperative danger area in three dimensions, or the danger area can be superimposed on the binocular endoscopic image and displayed to the doctor based on the basic principles of camera imaging.
[0183] It is easy to understand that the danger zone generated by the embodiments of the present invention can at least indicate the effectiveness of assisting doctors in their operations in the following aspects:
[0184] (1) In the traditional binocular endoscopic surgery scenario, if the instrument moves into the set danger zone when the doctor is performing the surgery, the system will issue a text or sound prompt to the doctor to remind the doctor to operate with caution.
[0185] (2) In the scenario of binocular endoscopy assisted by surgical robot, when the instrument is close to the edge of the danger zone, the system can also apply a force away from the danger zone to the doctor, reminding the doctor that he is about to enter the danger zone. The doctor can move the surgical instrument into the danger zone by applying more force.
[0186] In addition to the factors mentioned above that may affect the fusion accuracy, how the second modeling unit obtains the depth value of the specified binocular endoscope image frame is also a key factor, as it directly affects the accuracy of the intraoperative 3D mesh model.
[0187] Based on this, the second modeling unit adopts an online self-supervised learning depth estimation method based on binocular endoscope to obtain the depth value of the specified binocular endoscope image frame; the binocular depth estimation network used by the online self-supervised learning depth estimation method has the ability to quickly overlearn and can continuously adapt to new scenes using self-supervised information;
[0188] In real-time reconstruction mode, the second modeling unit is specifically used to overfit continuous video frames to obtain the depth value of a specified binocular endoscopic image frame, including:
[0189] Extraction subunits are used to acquire binocular endoscope images, and the encoder network of the current binocular depth estimation network is used to extract multi-scale features of the current frame image;
[0190] The fusion subunit is used to fuse multi-scale features using the decoder network of the current binocular depth estimation network to obtain the disparity of each pixel in the current frame image;
[0191] The conversion subunit is used to convert parallax into depth based on camera intrinsic and extrinsic parameters and output it as the result of the current frame image.
[0192] The first estimation subunit is used to update the parameters of the current stereo depth estimation network using self-supervised loss without introducing external ground truth, for depth estimation of the next frame image.
[0193] This depth estimation scheme utilizes the similarity of consecutive frames to extend the overfitting idea from a pair of binocular images to overfitting over time series. By continuously updating the model parameters through online learning, it can obtain high-precision tissue depth in various binocular endoscopic surgical environments.
[0194] The pre-training stage of the stereo depth estimation network abandons the traditional training mode and adopts the idea of meta-learning, which allows the network to learn the depth of one image to predict the depth of another image, thereby calculating the loss and updating the network. This can effectively promote the network's generalization to new scenes and improve its robustness to low-texture complex lighting, while significantly reducing the time required for subsequent overfitting.
[0195] like Figure 2 As shown in section b, the initial model parameters corresponding to the stereo depth estimation network are obtained through meta-learning training, specifically including:
[0196] S100, Randomly select an even number of pairs of stereo images {e1,e2,…,e 2K} and equally divided into support sets and query set and Images are randomly paired to form K tasks
[0197] S200, Inner Circulation Training: Based on The loss is calculated from the support set image to perform a parameter update;
[0198]
[0199] in, This represents the network parameters after the inner loop update; Let α represent the derivative, where α is the learning rate of the inner loop. For the support set image of the k-th task, It is based on the initial parameters φ of the model m The calculated loss; f represents the stereo depth estimation network;
[0200] S300, External Loop Training: Based on The query set image is used to calculate the meta-learning loss using the updated model, and the initial parameters φ of the model are directly updated. mFor φ m+1 ;
[0201]
[0202] Where β is the learning rate of the outer loop; This is the query set image for the k-th task. This is the learning loss of the meta-learning.
[0203] The following is a detailed description of each sub-unit included in the second modeling unit:
[0204] For extracting sub-units, such as Figure 2 As shown in section a, it acquires binocular endoscopic images and uses the encoder network of the current binocular depth estimation network to extract multi-scale features of the current frame image.
[0205] For example, the encoder of the stereo depth estimation network in this subunit uses a ResNet18 network to extract feature maps at five scales for the current frame image (left and right eyes) respectively.
[0206] For fused subunits, such as Figure 2 As shown in section a, it employs the decoder network of the current binocular depth estimation network to fuse multi-scale features and obtain the disparity of each pixel in the current frame image; specifically, it includes:
[0207] The decoder network described above processes the coarse-scale feature map through convolutional blocks and upsampling, concatenates it with the fine-scale feature map, and then performs feature fusion through convolutional blocks again. The convolutional blocks are constructed by combining reflection padding, convolutional layers, and nonlinear activation subunits (ELUs).
[0208] Calculate the disparity directly based on the output with the highest network resolution:
[0209] d = k·((conv(Y))-H)
[0210] Where d represents the disparity estimate of a pixel; k is the preset maximum disparity range; Y is the highest resolution output; TH represents a parameter related to the type of binocular endoscope, which is 0.5 when the endoscope image has negative disparity and 0 when all endoscope images have positive disparity; conv is a convolutional layer; and sigmoid performs range normalization.
[0211] For the transformation subunit, it converts disparity into depth based on camera intrinsic and extrinsic parameters and outputs it as the result of the current frame image.
[0212] In this sub-unit, converting parallax to depth means:
[0213]
[0214] Among them, c x1 , These are the intrinsic parameter matrices of the left and right eye endoscopes and cameras in a binocular endoscope. The corresponding parameter in; if f x Take the corresponding internal parameters of the left eye camera When f is the left-eye pixel, then d takes the disparity estimate of the left-eye pixel, and D is the depth estimate of the left-eye pixel; if f x Take the corresponding internal parameters of the right eye camera Then d is the disparity estimate of the right eye pixel, and D is the depth estimate of the right eye pixel; b is the baseline length, i.e. the extrinsic parameter of the binocular camera.
[0215] For the first estimation unit, such as Figure 2 As shown in section b, it uses self-supervised loss to update the parameters of the current stereo depth estimation network without introducing external ground truth, for depth estimation of the next frame image.
[0216] It is easy to understand that the "external truth value" mentioned in the embodiments of the present invention is the label (or "supervision information"), which is a well-known expression in the art.
[0217] In this sub-unit, such as Figure 2 As shown in part b, the self-supervised loss is expressed as:
[0218]
[0219] Among them, L self The value represents the self-supervised loss; α1, α2, α3, and α4 are all hyperparameters, l corresponds to the left figure, and r corresponds to the right figure.
[0220] Since both eyes are observing the same scene, the values of corresponding pixels on the left and right depth maps should be equal when transformed to the same coordinate system. Therefore, we introduce... and
[0221] (1) The geometric consistency loss is represented by the left figure:
[0222]
[0223] Wherein, P1 represents the first set of valid pixels (i.e., the valid pixels of the right eye); The effective pixel p represents the left-eye depth obtained from the right-eye depth map after camera pose transformation, and D represents the left-eye depth. l ′(p) represents the effective pixel point p using the predicted right-side disparity Dis. R The left eye depth is obtained by sampling on the left eye depth map.
[0224] (2) The geometric consistency loss is represented in the right figure:
[0225]
[0226] Wherein, P2 represents the second set of valid pixels (i.e., the valid pixels of the left eye); The effective pixel p represents the right-eye depth obtained from the left-eye depth map after camera pose transformation, denoted by D′. r (p) represents the effective pixel p using the predicted left image disparity Dis. L The right eye depth is obtained by sampling on the right eye depth map.
[0227] By incorporating geometric consistency constraints into the training loss, the network's general applicability to hardware is ensured, enabling it to autonomously adapt to unconventional binocular images such as surgical endoscopes.
[0228] Assuming constant brightness and spatial smoothness during endoscopic surgery, reprojection between left and right eye images can achieve reconstruction of another objective. However, this introduces structural similarity loss. The brightness, contrast, and structure of the two images are normalized and compared, and then... and
[0229] (3) The left image shows the luminous loss:
[0230]
[0231] Among them, I L (p) represents the left figure, I′ L (p) indicates the use of the disparity Dis between the right image and the predicted left image. L (p) Reconstructed image from the left eye endoscope, λ i and λ s To balance the parameters, SSIM LL′ (p) represents I L (p) and I′ L Image structural similarity (p);
[0232] (4) The image on the right shows the luminous loss:
[0233]
[0234] Among them, I R (p) represents the right figure, I′ R (p) indicates the use of the disparity between the left image and the predicted right image. R (p) Generated reconstructed image from the right eye endoscope, SSIM RR′ (p) represents IR (p) and I′ R Image structural similarity (p).
[0235] In low-texture and monochromatic organizational regions, smoothing priors are used to aid inference, and depth regularization is applied, introducing... and
[0236] (5) The smoothing loss is shown in the left figure:
[0237]
[0238] in, This represents the normalized left eye depth map. and This represents the first derivative along the horizontal and vertical directions of the image;
[0239] (6) The smoothing loss is shown in the right figure:
[0240]
[0241] in, This represents the normalized depth map of the right eye. and This represents the first derivative along the horizontal and vertical directions of the image.
[0242] Specifically, the process of obtaining the first set of valid pixels P1 and the second set of valid pixels P2 is as follows:
[0243] Define the left eye disparity predicted by the current binocular depth estimation network as: Right visual disparity is The formulaic expression for the cross-validation mask for the left and right eyes is as follows:
[0244]
[0245]
[0246] in, These are used to determine whether the pixel at position (,j) in the left and right eye images is within the stereo matching range; i takes the value of any integer between [1,W]; j takes the value of any integer between [1,H]; W represents the image width, and H represents the image height;
[0247] Let c take the value L or R, when If the value is within the stereo matching range under the current calculation method, it means that the pixel at position (j) is within the stereo matching range; otherwise, it is not within the stereo matching range.
[0248] Projection is performed using a camera model, binocular pose transformation, and predicted depth to obtain an effective region mask based on 3D points. Take 0 or 1, when If the value is within the stereo matching range under the current calculation method, it means that the pixel at position (j) is within the stereo matching range; otherwise, it is not within the stereo matching range.
[0249] Obtain the final valid region mask
[0250]
[0251] If pixel p satisfies When c is R, the first set of valid pixels P1 is obtained; when c is L, the second set of valid pixels P2 is obtained.
[0252] In the corrected stereo image, additional regions caused by viewpoint shift cannot find matching pixels. However, this embodiment of the invention considers that low texture and uneven illumination of in vivo tissues can make local features less obvious, and pixels in these invalid regions often find similar pixels in neighboring regions. Therefore, as mentioned above, this embodiment of the invention proposes a cross-validation-based binocular effective region recognition algorithm, which eliminates the misleading effect of self-supervised loss of invalid region pixels on network learning and improves the accuracy of depth estimation.
[0253] In addition, to avoid insufficient robustness of depth estimation in pure texture or low-light scenes, a method is also introduced.
[0254] (7) Indicates the loss of sparse optical flow:
[0255]
[0256] Among them, Dis L (p) represents the predicted left eye disparity map, OF L (p) represents the sparse disparity map of the left eye. R (p) represents the predicted right eye disparity map, OF R (p) represents the right eye sparse disparity map; P3 represents the left eye sparse disparity map OF. L The third set of valid pixels in (p); P4 represents the right eye sparse disparity map OF. R The fourth set of valid pixels in (p); γ1 and γ2 are balancing parameters, both of which are non-negative and not both of which are 0 at the same time.
[0257] Specifically, the process for obtaining the third set of valid pixels P3 and the fourth set of valid pixels P4 is as follows:
[0258] Using the Lucas-Kanade (LK) optical flow solution algorithm, sparse optical flow (Δx, Δy) is calculated every n pixels in the row and column directions, where Δx represents the offset of the pixel point in the horizontal direction and Δy represents the offset of the pixel point in the vertical direction;
[0259] When solving the optical flow from the left image to the right image, only when and Δx > thd1, the disparity at this pixel position is retained as Δx, where KT and thd1 are corresponding preset thresholds. The disparity at positions that do not meet the above conditions or where sparse optical flow is not calculated is set to 0 to obtain the final sparse disparity map OF L (), OF L (p) ≠ 0 pixel points constitute the third set of valid pixel points P3;
[0260] When solving the optical flow from the right image to the left image, only when and Δx < thd2, the disparity at this pixel position is retained as Δx, where thd2 is the corresponding preset threshold. The disparity at positions that do not meet the above conditions or where sparse optical flow is not calculated is set to 0 to obtain the final sparse disparity map OF R (), OF R (p) ≠ 0 pixel points constitute the fourth set of valid pixel points P4.
[0261] As mentioned above, the embodiment of the present invention introduces traditional Lucas-Kanade optical flow to deduce the sparse disparity between binocular images, giving the network a reasonable learning direction, improving the fast learning ability and reducing the probability of falling into local optimum.
[0262] It is particularly emphasized that in addition to the real-time reconstruction mode, the online self-supervised learning depth estimation method adopted by the second modeling unit in the embodiment of the present invention also sets a precise measurement mode. As Figure 2 shown in part b, in the precise measurement mode, the second modeling unit is specifically used to overfit the key image video frames, including:
[0263] The second estimation subunit, without introducing external ground truth, according to the binocular depth estimation network obtained in the real-time reconstruction mode from the previous frame image of the specified binocular endoscope image frame, updates the parameters of the foregoing binocular depth estimation network using the self-supervised loss corresponding to the specified binocular endoscope image frame until convergence, and uses the converged binocular depth estimation network for precise depth estimation of the specified binocular endoscope image frame to obtain the depth value of the specified binocular endoscope image frame.
[0264] It should be noted that the technical details such as the depth estimation network, self-supervised loss function, effective region mask calculation, and meta-learning pre-training method in the precise measurement mode are all consistent with the technical details extended in the real-time reconstruction mode, and will not be elaborated here.
[0265] In summary, compared with existing technologies, it has the following beneficial effects:
[0266] 1. Based on intraoperative reconstruction and multi-modal fusion, and according to the surgeon's operational needs, the tissue boundary of the area to be avoided is expanded by the normal vector of the surface vertex of the 3D mesh model corresponding to the area to be avoided, generating the target intraoperative danger area, assisting the surgeon in performing the operation, and effectively improving the safety of the operation.
[0267] 2. Because this method can indicate the dangerous areas of the surgical procedure as needed before the actual operation, it can help doctors plan the surgical path in advance, which greatly improves the efficiency of the operation.
[0268] 3. This invention provides an online self-supervised learning depth estimation method based on binocular endoscopy, whose beneficial effects include at least:
[0269] 3.1 The switching depth estimation can provide real-time point cloud of intraoperative anatomical structures to help doctors intuitively understand the intraoperative three-dimensional structure. It can also achieve high-precision reconstruction of key frames selected by doctors based on single-frame overfitting, providing a basis for subsequent measurements, so that speed and accuracy can be balanced in application.
[0270] 3.2 By utilizing the similarity of consecutive frames, the overfitting concept on a pair of binocular images is extended to overfitting on time series. Through online learning, the model parameters are continuously updated, enabling high-precision tissue depth measurements to be obtained in various binocular endoscopic surgical environments.
[0271] 3.3 The pre-training stage of the network model abandons the traditional training mode and adopts the idea of meta-learning, which allows the network to learn the depth of one image to predict the depth of another image, thereby calculating the loss and updating the network. This can effectively promote the generalization of the network to new scenes and improve its robustness to low-texture complex lighting, while significantly reducing the time required for subsequent overfitting.
[0272] 3.4 By incorporating geometric consistency constraints into the training loss, the network's general applicability to hardware is ensured, enabling it to autonomously adapt to unconventional binocular images such as surgical endoscopes.
[0273] 3.5. Depth estimation of each frame of stereo image is treated as an independent task, and high-precision models suitable for the current frame are obtained through real-time overfitting; and new scenes can be learned quickly through online learning to obtain high-precision depth estimation results.
[0274] 3.6. Based on the cross-validation binocular effective region recognition algorithm, the self-supervised loss of invalid region pixels is eliminated, which misleads the network learning and improves the accuracy of depth estimation.
[0275] 3.7. Introducing the traditional Lucas-Kanade optical flow to derive the sparse parallax between binocular images provides the network with a reasonable learning direction, improves its rapid learning ability, and reduces the probability of getting trapped in local optima.
[0276] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0277] The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A system for generating intraoperative danger zones based on preoperative and intraoperative three-dimensional mesh fusion, characterized in that, include: The registration module is used to register the preoperative 3D mesh model and the intraoperative 3D mesh model, and obtain the coordinates of all vertices of the preoperative 3D mesh model after registration. The preoperative three-dimensional mesh model contains tissue semantic information; The intraoperative three-dimensional mesh model is obtained based on the depth value of the specified binocular endoscopic image frame; The receiving module is used to receive the regions to be avoided marked by the doctor on the region of interest of the preoperative 3D mesh model after registration, as well as the set danger distance; The generation module is used to generate and display the three-dimensional mesh model corresponding to the intraoperative danger area based on the three-dimensional mesh model corresponding to the area to be avoided and the danger distance. The registration module includes: The first modeling unit is used to obtain a preoperative three-dimensional mesh model with organizational semantic information; The second modeling unit is used to obtain an intraoperative three-dimensional mesh model based on the depth value of a specified binocular endoscopic image frame. The feature extraction unit is used to obtain corresponding multi-level features based on the preoperative three-dimensional mesh model and the intraoperative three-dimensional mesh model, respectively. The overlap prediction unit is used to obtain the overlapping area of the preoperative 3D mesh model and the intraoperative 3D mesh model based on the multi-level features, and to obtain the pose transformation relationship of the vertices of the preoperative 3D mesh model within the overlapping area. The global fusion unit is used to obtain the coordinates of all vertices of the preoperative 3D mesh model after registration, based on the coordinates and pose transformation relationship of vertices in the overlapping region and the coordinates of vertices in the non-overlapping region of the preoperative 3D mesh model. The information display unit is used to display the internal tissue information of the preoperative three-dimensional mesh model in the intraoperative three-dimensional mesh model according to the coordinates of all vertices after registration of the preoperative three-dimensional mesh model; The feature extraction unit uses Chebyshev spectral map convolution to extract multi-level features from the preoperative and intraoperative 3D mesh models: = = Among them, the preoperative three-dimensional mesh model is defined. , This represents the spatial coordinates of the vertices of the preoperative 3D mesh model. The edges between vertices in the preoperative 3D mesh model; the intraoperative 3D mesh model. , This represents the spatial coordinates of the vertices of the preoperative 3D mesh model. Represents the edges between vertices of the intraoperative 3D mesh model; and Let the downsampling scale features of the (n+1)th and nth layers of the preoperative tissue model be represented respectively, and initialized. for ; and Let the features of the (n+1)th and nth layers of the intraoperative tissue model be represented respectively, and initialized. for ; The b-th order Chebyshev polynomials calculated from their respective vertices and their B-ring neighborhoods. They are respectively by edge Calculate the scaled Laplacian matrix. These are the learning parameters of the neural network; And / or the overlap prediction unit is specifically used for: An attention mechanism is used to obtain the overlapping region between the preoperative 3D mesh model and the intraoperative 3D mesh model, including: in, Represents the preoperative 3D mesh model Mask of overlapping regions; Representing the intraoperative 3D mesh model Mask of overlapping regions; and These represent the self-attention and cross-attention operations, respectively. and The vertices of the preoperative and intraoperative 3D mesh models are respectively represented as the first vertex. Level downsampling scale characteristics; According to the mask and Get the vertices that are within the overlapping region. , and its characteristics , and using a multilayer perceptron Preoperative 3D mesh model Vertex in Corresponding point: in, It is an intraoperative 3D mesh model The vertices in the model correspond to the preoperative 3D mesh model. Vertex in ; This indicates the calculation of cosine similarity. This indicates that the position encoding operation is performed on the vertices of the intraoperative 3D mesh model that are within the overlapping area; Vertex construction using KNN (Knowledge Neighbors) For the local neighborhood, the rotation matrix is solved using Singular Value Decomposition (SVD), as shown in the following formula: in, Represents vertices The rotation matrix; Indicates use Algorithm construction belongs to vertices A local neighborhood; The vertices of the preoperative 3D mesh model neighborhood points, It corresponds to the neighborhood point Vertices of the intraoperative 3D mesh model; Using rotation matrix Change the point cloud coordinates to obtain ,use Predicting Vertex The displacement vector is given by the following formula: in, The displacement vectors of the vertices in the overlapping region of the preoperative 3D mesh model are compared with the rotation matrix. This constitutes the pose transformation relationship; And / or the global fusion unit is specifically used for: use Regress the rotation matrices and translation vectors of all vertices of the preoperative 3D mesh model: in, These represent the rotation matrix and translation vector of all vertices in the preoperative 3D mesh model, respectively; Indicates based on vertices within the overlapping region All vertices of the preoperative 3D mesh model The weights for distance calculation; in, This represents the coordinates of all vertices after the preoperative 3D mesh model has been registered.
2. The intraoperative danger zone generation system as described in claim 1, characterized in that, During the training phase of the intraoperative danger zone generation system, a training set is generated based on real data: Based on the specified feature point pairs between the binocular endoscopic image frames and the preoperative 3D mesh model, a non-rigid algorithm is used to register the preoperative and intraoperative 3D mesh models based on the feature points. For any feature point, we have: in, Non-rigid registration algorithm , The first part represents the preoperative 3D mesh model. A feature point for non-rigid registration correspond Feature points of the intraoperative 3D mesh model, This is the global transition matrix of the preoperative 3D mesh model. It belongs to feature points The local deformation transfer matrix; The local deformation transfer matrix of all vertices in the preoperative 3D mesh model was obtained by using quaternion interpolation. Vertices in the preoperative 3D mesh model are obtained by transforming relationships. Registered coordinate labels .
3. The intraoperative danger zone generation system as described in claim 2, characterized in that, During the training phase of the intraoperative danger zone generation system, the following supervised loss function is constructed: in, This represents the supervised loss function during the training phase; These represent the coefficients of the supervised loss term; Represents the preoperative 3D mesh model The number of vertices; This represents the L2 ground truth loss based on a manually labeled dataset. This represents the coordinates of all vertices after the preoperative 3D mesh model registration. This represents the Cauchygreen invariant, used to constrain the degree of tissue deformation within the body. The arc distance between two points on the constrained surface remains constant. The surface area of the constrained tissue remains unchanged. The volume of the constrained tissue remains constant.
4. The intraoperative danger zone generation system as described in claim 1, characterized in that, The registration module also includes: The precision fine-tuning unit is used to introduce an unsupervised loss fine-tuning network to assist the global fusion unit in obtaining the coordinates of all vertices after the preoperative 3D mesh model registration. And / or the unsupervised loss fine-tuning network described above, during application, constructs the following unsupervised loss function: in, Represents the unsupervised loss function; They represent the coefficients of the unsupervised loss term, and All values are vertex coordinates after registration of the preoperative 3D mesh model during unsupervised training. This represents the vertices of the preoperative 3D mesh model after distance registration in the intraoperative 3D mesh model. The closest point, Represents vertices and European distance, This represents the distance from the vertex of the preoperative 3D mesh model to the intraoperative 3D mesh model in the registered preoperative 3D mesh model. The closest point, Represents vertices and vertex Euclidean distance; Represents the preoperative 3D mesh model The number of vertices, Representing the intraoperative 3D mesh model The number of vertices; Describes the Cauchygreen invariant during unsupervised training. The arc distance between two points on the constrained surface remains constant. The surface area of the constrained tissue remains unchanged. The volume of the constrained tissue remains constant.
5. The intraoperative danger zone generation system as described in any one of claims 1 to 4, characterized in that, The generation module includes: The estimation unit is used to obtain and normalize the normal vector of each surface vertex based on the surface vertices of the three-dimensional mesh model corresponding to the region to be avoided, using the isonormal estimation method. The expansion unit, based on the spatial coordinates of each surface vertex and its normalized normal vector, combined with the danger distance, expands to obtain the surface vertices of the organizational network model corresponding to the danger area; in, The surface vertices of the organizational network model corresponding to the danger zone; Indicates a dangerous distance; Represents the surface vertices of the 3D mesh model corresponding to the area to be avoided. The corresponding normalized normal vector; The connection unit connects the surface vertices of the tissue network model corresponding to the danger zone according to the connection relationship between the surface vertices of the preoperative three-dimensional mesh model, and generates and displays the three-dimensional mesh model corresponding to the intraoperative danger zone.
6. The intraoperative danger zone generation system as described in claim 1, characterized in that, The second modeling unit uses an online self-supervised learning depth estimation method based on binocular endoscope to obtain the depth value of the specified binocular endoscope image frame; the binocular depth estimation network used by the online self-supervised learning depth estimation method has the ability to quickly overlearn and can continuously adapt to new scenes using self-supervised information; In real-time reconstruction mode, the second modeling unit is specifically used to overfit continuous video frames to obtain the depth value of a specified binocular endoscopic image frame, including: Extraction subunits are used to acquire binocular endoscope images, and the encoder network of the current binocular depth estimation network is used to extract multi-scale features of the current frame image; The fusion subunit is used to fuse multi-scale features using the decoder network of the current binocular depth estimation network to obtain the disparity of each pixel in the current frame image; The conversion subunit is used to convert parallax into depth based on camera intrinsic and extrinsic parameters and output it as the result of the current frame image. The first estimation subunit is used to update the parameters of the current stereo depth estimation network using self-supervised loss without introducing external ground truth, for depth estimation of the next frame image.
7. The intraoperative danger zone generation system as described in claim 6, characterized in that, In precise measurement mode, the second modeling unit is specifically used to overfit key image video frames, including: The second estimation subunit, without introducing external truth values, updates the parameters of the aforementioned binocular depth estimation network in real-time reconstruction mode based on the binocular depth estimation network obtained from the previous frame of the specified binocular endoscopic image frame using the self-supervised loss corresponding to the specified binocular endoscopic image frame until convergence, and uses the converged binocular depth estimation network to accurately estimate the depth of the specified binocular endoscopic image frame, thereby obtaining the depth value of the specified binocular endoscopic image frame.