An assisted recognition method for bladder neck transection in minimally invasive radical prostatectomy
By performing multi-scale feature extraction and Bezier curve fitting on the laparoscopic video stream, the probability distribution of the optimal and farthest cutting lines of the bladder neck is generated, solving the problem of bladder neck identification in minimally invasive radical prostatectomy, improving surgical accuracy and safety, and ensuring postoperative functional recovery and tumor radical cure.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHENGDU WITHAI INNOVATION TECH CO LTD
- Filing Date
- 2026-04-16
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies make it difficult to accurately identify the optimal detachment point of the bladder neck in minimally invasive radical prostatectomy, and cannot provide clear cutting line guidance, leading to surgical deviations and affecting the recovery of postoperative urinary continence and the radical resection of the tumor.
By acquiring laparoscopic video streams, preprocessing them, extracting multi-scale features, performing positional encoding and deep feature extraction, generating a global feature map, fitting a Bézier curve, outputting the probability distribution of the optimal and furthest cutting lines of the bladder neck, performing organ segmentation, and generating a Gaussian heatmap, we can provide surgeons with precise cutting and positioning guidance.
It enables precise identification of the bladder neck during minimally invasive radical prostatectomy, improves surgical accuracy and safety, helps restore postoperative urinary control and enhances the radical tumor treatment effect, and reduces the risk of surgical judgment dependence and operational deviation.
Smart Images

Figure CN122048936B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image recognition technology, specifically to an auxiliary identification method for bladder neck transection during minimally invasive radical prostatectomy. Background Technology
[0002] Radical prostatectomy is one of the standard surgical procedures for treating localized prostate cancer. During the surgery, bladder neck transection is a crucial step, requiring the surgeon to accurately identify the anatomical boundary between the bladder and prostate and transection it in the correct location. The accuracy of the bladder neck transection directly affects the recovery of postoperative urinary continence and the overall success of tumor removal.
[0003] However, accurately identifying the bladder neck demarcation point presents significant challenges due to factors such as the limited surgical field in minimally invasive surgery, significant individual differences in anatomical structures, and the influence of anatomical fat in the prostate-bladder region. Traditional methods rely heavily on the surgeon's experience and judgment, which is easily influenced by subjective factors, leading to deviations in the resection position. This may result in insufficient preservation or excessive resection of the bladder neck, affecting postoperative functional recovery.
[0004] Currently, there are no existing methods for predicting the bladder neck transection guideline, and related research mainly focuses on organ segmentation. For example, U-Net (a type of convolutional neural network) is used for automatic segmentation based on prostate cancer magnetic resonance imaging to segment the prostate, prostate tumor, seminal vesicle, rectus muscle, neurovascular bundle, and dorsal vein complex; and semantic segmentation of surgical instruments, bladder, prostate, and seminal vesicle-vas deferens is performed directly on the intraoperative raw images.
[0005] However, the existing technologies have obvious limitations: existing organ segmentation methods can only identify the general areas of the bladder and prostate, and cannot predict the optimal cutting line and safe boundary for bladder neck transection. In other words, existing technologies can only tell the surgeon "where the bladder and prostate are", but cannot directly guide "where to transection them", which makes it difficult to meet the actual needs of precise cutting and positioning in clinical surgery.
[0006] Therefore, we propose an auxiliary identification method that can automatically identify the optimal bladder neck dissection location and provide clear cutting line guidance. Summary of the Invention
[0007] The purpose of this invention is to provide an auxiliary identification method for bladder neck transection in minimally invasive radical prostatectomy, which solves the problems of difficulty in identifying the optimal transection position of the bladder neck and the inability to provide clear cutting line guidance in traditional methods.
[0008] This invention is achieved through the following technical solution:
[0009] A method for assisting in the identification of bladder neck transection during minimally invasive radical prostatectomy, specifically including:
[0010] Acquire the video stream from the endoscope and perform preprocessing;
[0011] Extract multi-scale features from real-time images in the preprocessed video stream to generate multi-scale feature maps;
[0012] A global feature map is obtained by performing location encoding and deep feature extraction based on multi-scale feature maps.
[0013] The global feature map is mapped to the control points of a Bézier curve, and then a smooth cutting line is fitted.
[0014] The global feature map and control points are processed separately to generate a Gaussian heatmap, outputting the probability distribution of the optimal and furthest cutting lines of the bladder neck, as well as the segmentation results of the prostate and bladder.
[0015] Furthermore, the multi-scale feature maps include feature maps with resolutions of 1 / 4, 1 / 8, 1 / 16, and 1 / 32.
[0016] Furthermore, the step of performing position encoding and deep feature extraction based on multi-scale feature maps to obtain a global feature map specifically includes:
[0017] Multi-scale feature maps are fused, and the fused feature maps are then normalized layer by layer.
[0018] Introduce position coordinate encoding to the normalized feature map;
[0019] Deep feature extraction is performed on the position-encoded features using stacked Transformer encoder blocks to obtain a global feature map.
[0020] Furthermore, the step of introducing position coordinate encoding into the normalized feature map involves the following steps:
[0021] Position codes are generated using a sine-cosine function;
[0022] The position codes in the X and Y directions are concatenated and then mapped to the feature channel dimension via linear projection.
[0023] The mapped positional encoding is added to the normalized feature map.
[0024] Furthermore, the specific steps for mapping the global feature map to control points of a Bézier curve are as follows:
[0025] Merge the high and wide channels of the global feature map to obtain sequence features;
[0026] Global average pooling is used to aggregate sequence features into a global feature vector;
[0027] A multilayer perceptron is used to convert the global feature vector into coordinate regression points;
[0028] Finally, the coordinate regression points were reorganized into multiple control points.
[0029] Furthermore, the process of fitting a smooth dividing line involves the following steps:
[0030] The control points are fitted using cubic Bézier curves, and the relevant calculation equations are as follows:
[0031]
[0032] in, , , and These are the coordinates of the control points;
[0033] A curvature constraint loss function is introduced during the training of Bézier curves. The formula for calculating the loss function is as follows:
[0034]
[0035] in For Bézier curves in The curvature of a point This is the preset maximum curvature threshold.
[0036] Furthermore, the global feature map and control points are processed separately to generate a Gaussian heatmap, outputting the probability distribution of the optimal and furthest cutting lines of the bladder neck, as well as the segmentation results of the prostate and bladder. The specific steps are as follows:
[0037] A segmentation decoder is used to decode the global feature map to generate segmentation contours of the prostate and bladder;
[0038] The control points on the optimal cutting line and the farthest cutting line are sampled and decoded respectively to generate two sets of corresponding curve feature sequences;
[0039] The decoded global feature map is concatenated with the curve feature sequence, and multimodal feature fusion is performed using a Transformer encoder;
[0040] A Gaussian heatmap is generated based on the fused features, and the probability distributions of the best cutting line and the farthest cutting line are output.
[0041] Furthermore, the step of using a segmentation decoder to decode the global feature map and generate segmentation contours of the prostate and bladder involves the following steps:
[0042] Map the features in the global feature map to the class space to obtain the segmentation log odds;
[0043] The segmentation log odds are processed by applying the Softmax activation function to obtain the segmentation probability of each pixel;
[0044] For multi-class segmentation, the class with the highest probability among the segmentation results of each pixel is taken as the segmentation result for that pixel.
[0045] Furthermore, the sampling and decoding of control points on the optimal cutting line and the farthest cutting line to generate two corresponding sets of curve feature sequences specifically includes:
[0046] Uniform sampling is performed on N points on the optimal cutting line and the farthest cutting line to obtain a sampling point sequence;
[0047] The two sets of sampling point sequences are encoded into curve feature sequences using a multilayer perceptron.
[0048] Furthermore, the step of generating a Gaussian heatmap based on the fused features and outputting the probability distributions of the optimal and furthest cutting lines specifically includes:
[0049] Extracting image features from the features obtained after multimodal fusion;
[0050] The separated image features are reconstructed into a spatial feature map;
[0051] A 1×1 convolution is used to map the spatial feature map into a Gaussian heatmap, which represents the probability distribution of the cutting line position.
[0052] The technical solution of the present invention has at least the following advantages and beneficial effects:
[0053] This invention discloses an auxiliary identification method for bladder neck transection in minimally invasive radical prostatectomy. By performing real-time processing and multi-scale feature extraction on the endoscopic video stream, it can quickly analyze intraoperative anatomical information and output the optimal and farthest cutting lines of the bladder neck end-to-end. This provides precise and timely cutting and positioning guidance for minimally invasive radical prostatectomy, effectively improving the accuracy and safety of surgical operation, helping to restore postoperative urinary continence function and enhancing the radical tumor removal effect.
[0054] Furthermore, by employing multi-scale feature fusion, layer normalization, and sine-cosine positional coding, the feature distribution can be stabilized and the spatial positional awareness can be enhanced, improving adaptability to individual anatomical differences and intraoperative visual field changes, and ensuring stable and reliable prediction results. In addition, by using cubic Bézier curve fitting and applying curvature constraints, smooth and continuous cutting lines that fit the clinical operation path can be generated, improving the practicality of surgical navigation. Moreover, by leveraging global feature aggregation and control point regression, multimodal feature fusion, and Gaussian heatmap mapping, organ segmentation and cutting line prediction can be completed collaboratively, with intuitive and interpretable output results. The overall method is highly efficient, accurate in positioning, and robust, significantly reducing intraoperative judgment dependence and operational deviation risks. Attached Figure Description
[0055] Figure 1 This is a schematic diagram of a method for assisting in the identification of bladder neck transection during minimally invasive radical prostatectomy according to the present invention.
[0056] Figure 2 This is a schematic diagram of the bladder neck transection auxiliary identification system for minimally invasive radical prostatectomy according to the present invention;
[0057] Figure 3 This is a schematic diagram of an electronic device structure according to the present invention. Detailed Implementation
[0058] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.
[0059] Example 1
[0060] like Figure 1 The method shown is an auxiliary identification method for bladder neck transection during minimally invasive radical prostatectomy, specifically including:
[0061] S1. Acquire the video stream from the cavity mirror and perform preprocessing;
[0062] The video stream resolution was set to 1080P, and the frame rate was maintained at 30fps to ensure the real-time performance and clarity of the intraoperative images. The acquired video stream was preprocessed, and the following operations were performed sequentially:
[0063] ① Frame extraction: Real-time images are extracted from the video stream frame by frame according to the frame rate. The size of each frame image is uniformly adjusted to 640×384 pixels by the Resize method to avoid the size difference affecting the feature extraction accuracy.
[0064] ② Noise Reduction Processing: Gaussian filtering algorithm is used to denoise the extracted single-frame image. The Gaussian filter kernel size is set to 3×3, and the standard deviation is... It effectively filters out noise interference caused by intraoperative laparoscopic light reflection and instrument shadows, while preserving the anatomical details of the bladder and prostate areas.
[0065] ③ Normalization: The pixel values of the denoised image are normalized to the [0,1] interval using the min-max normalization formula to eliminate the influence of pixel scale differences on model training and feature extraction. The normalization formula is as follows:
[0066]
[0067] In the formula, These are the original pixel values. The minimum pixel value in a single frame of an image. The maximum pixel value in a single frame of the image. These are the normalized pixel values;
[0068] ④ Data augmentation: To address potential intraoperative scenarios such as visual field shifts and lighting changes, the preprocessed images are randomly horizontally flipped, slightly rotated (±5°), and their brightness adjusted (±10%) to improve the model's adaptability to complex intraoperative scenarios and ensure the stability of feature extraction.
[0069] S2. Extract multi-scale features from the real-time images in the preprocessed video stream and generate a multi-scale feature map;
[0070] Specifically, the Swing Transformer is used as the backbone network for feature extraction. Multi-scale feature extraction is performed on the preprocessed single-frame image to generate feature maps with four resolutions: 1 / 4, 1 / 8, 1 / 16, and 1 / 32.
[0071] By leveraging the window attention mechanism and multi-stage downsampling capability of the Swing Transformer, detailed and semantic features of the image are extracted hierarchically, solving the problems of detail loss and semantic ambiguity in traditional feature extraction. This enables comprehensive feature coverage of the bladder and prostate regions from superficial details to deep semantics, providing sufficient feature support for subsequent bladder neck location identification and cutting line fitting. The specific implementation process is as follows:
[0072] ① The Swin Transformer is used as the backbone network. This network is based on the window attention mechanism and realizes multi-scale feature extraction through shifting windows. Specifically, it includes a feature extraction module in four stages. Each stage consists of multiple Swin Transformer blocks. Each Swin Transformer block includes layer normalization, multi-head self-attention, a 2-layer perceptron (MLP) and residual connections. Each stage realizes resolution downsampling through a downsampling module (using Patch Merging) with a stride of 2.
[0073] ② After the first downsampling module, a 1 / 4 resolution feature map is output with 64 channels. This scale feature map mainly contains shallow detail features of the image (such as organ edges and textures), providing detailed support for subsequent organ contour segmentation.
[0074] After the second downsampling module, a 1 / 8 resolution feature map is output with 128 channels. This scale feature map contains mid-level semantic features, which can initially distinguish the bladder, prostate and background regions, providing a basis for coarse localization of organ regions.
[0075] After the third downsampling module, a 1 / 16 resolution feature map is output with 256 channels. This scale feature map contains deep semantic features, which can accurately identify the core features of the bladder neck region and provide key support for the generation of cutting line control points.
[0076] After the fourth downsampling module, a 1 / 32 resolution feature map is output with 512 channels. This scale feature map contains global semantic features, which can grasp the overall anatomical relationship of the bladder and prostate, and avoid recognition bias caused by local features.
[0077] ③ Skip Connection is used to fuse feature maps of different resolutions. The detailed features of high-resolution feature maps (1 / 4, 1 / 8) are combined with the semantic features of low-resolution feature maps (1 / 16, 1 / 32) to make up for the spatial detail information lost during downsampling. Finally, a fused multi-scale feature map is obtained, and the number of channels is uniformly adjusted to 256.
[0078] S3. Based on multi-scale feature maps, perform positional encoding and deep feature extraction to obtain a global feature map, specifically including:
[0079] S31. Fuse the multi-scale feature maps and perform layer normalization on the fused feature maps to stabilize the feature distribution, avoid gradient vanishing or gradient exploding, and ensure the stability of subsequent feature extraction. The layer normalization formula is:
[0080]
[0081] In the formula, For multi-scale feature maps, The mean of pixel values in the multi-scale feature map. The variance of the pixel values in the feature map. To prevent the use of tiny constants with a denominator of 0 (take 1e-5). and The learnable scaling and bias coefficients are both initialized to 1 and 0, respectively.
[0082] By normalizing and stabilizing the feature distribution, the gradient changes are smoother during subsequent deep feature extraction, thereby avoiding model training instability caused by differences in feature distribution and improving the accuracy and efficiency of deep feature extraction.
[0083] S32. Introduce position coordinate encoding to the normalized feature map. The specific steps are as follows:
[0084] S321. Use a sine-cosine function to generate position codes, generating position codes in the X direction (width direction) and Y direction (height direction) respectively;
[0085] First, for the normalized multi-scale feature map For each spatial location in the feature map Generate normalized coordinates:
[0086]
[0087] in, and These are the normalized coordinates in the X direction and the normalized coordinates in the Y direction, respectively. , , For the height of the feature map, The width of the feature map;
[0088] Then, sinusoidal positional encoding is used for even indices. Position encoding in the X direction Position encoding in the Y direction They are respectively:
[0089]
[0090]
[0091] Cosine positional encoding is used for even indices. Position encoding in the X direction Position encoding in the Y direction They are respectively:
[0092]
[0093]
[0094] In the formula, The dimension for location encoding.
[0095] S322. The position codes in the X and Y directions are concatenated and then mapped to the feature channel dimension via linear projection;
[0096] The position codes in the X and Y directions are concatenated along the channel dimension to obtain the concatenated position codes. Its dimensions are ; through learnable linear transformation matrices and bias matrix ,Will Mapping to feature channel dimension (256) Obtain the mapped position code The calculation formula is:
[0097]
[0098]
[0099] Used to concatenate strings ,
[0100] S323. The mapped positional encoding is added to the normalized feature map to obtain a feature map with positional information. The calculation formula is:
[0101] .
[0102] S33. Use stacked Transformer encoder blocks to perform deep feature extraction on the position-encoded features to obtain a global feature map;
[0103] Specifically, it uses 6 stacked Transformer encoder blocks. Deep feature extraction is performed. Each Transformer encoder block contains a multi-head self-attention mechanism and a feedforward neural network. The self-attention mechanism can capture the global correlation between features, the stacked structure can progressively deepen the semantic expression of features, and the feedforward neural network can enhance the nonlinear mapping capability of features. This enables the mining of the anatomical structural correlation between the bladder neck region and the bladder and prostate, extracting more discriminative global features, and providing accurate support for subsequent cutting line fitting and organ segmentation. The specific implementation process is as follows:
[0104] ① Feature map with position encoding ,in For batch size, For the number of channels, The feature map size;
[0105] ② Through learnable linear transformation matrices and bias matrix ,Will These are mapped to Query(Q), Key(K), and Value(V) respectively, and the calculation formula is as follows:
[0106]
[0107]
[0108]
[0109] ③ The attention weights are calculated using scaled dot product attention, and the formula is as follows:
[0110]
[0111] in, Given the dimension of the Key (set to 64), the softmax function is used to normalize the attention weights to the [0,1] interval. The softmax calculation formula is:
[0112]
[0113] In the formula, The first of the attention weight vectors One element, The length of the weight vector is denoted by . By scaling the dot product attention, the similarity between features can be effectively calculated. Softmax normalization can ensure the rationality of the attention weights, allowing the model to focus on the key features of the bladder neck region and suppress the interference of irrelevant background features.
[0114] ④ Multiply the attention weights by the Value to obtain the fused features. The calculation formula is:
[0115]
[0116] ⑤ Using 8 attention heads, the output features of each attention head are concatenated and then transformed using a learnable output linear transformation matrix. The multi-head self-attention output is obtained by fusion, and the calculation formula is as follows:
[0117]
[0118] Among them, a single attention head The calculation method is as follows:
[0119]
[0120] In the formula, The first The query, key, and value mapping matrix of each attention head; multi-head self-attention can capture feature associations from different angles, improve the expressive power of features, comprehensively capture multi-dimensional features of the bladder neck region, and avoid feature omissions caused by a single attention head; For attention functions.
[0121] ⑥ Integrate multi-head self-attention output with input features After performing residual connections and layer normalization, the deep feature map is obtained. The calculation formula is as follows:
[0122]
[0123] in, For layer normalization function, This is a multi-head attention mechanism;
[0124] ⑦ Input the above features into a feedforward neural network (FFN). The FFN contains two linear layers and a GELU activation function. The calculation formula is as follows:
[0125]
[0126] In the formula, It is a linear transformation matrix. The bias matrix, This is the activation function.
[0127] ⑧ After processing by stacking 6 Transformer encoder blocks, the global feature map is output. This feature map contains global anatomical features and spatial location information of the bladder and prostate, and achieves accurate capture of bladder neck region features and comprehensive coverage of global anatomical structure, solving the recognition bias problem caused by local feature extraction.
[0128] S4. Map the global feature map to the control points of a Bézier curve, and then fit a smooth cutting line. The specific steps are as follows:
[0129] S41. Transfer the global feature map The high (H) and wide (W) channels are merged to obtain sequence features. ,in This represents the total number of pixels in the feature map.
[0130] S42. The formula for aggregating sequence features into a global feature vector using global average pooling is:
[0131]
[0132] In the formula, , For sequence features The eigenvalues of the channel; global average pooling can aggregate global feature information, eliminate spatial dimensional differences, and be used to extract the core information of global features, avoiding control point deviations caused by local features;
[0133] S43. A multilayer perceptron is used to convert the global feature vector into coordinate regression points. The multilayer perceptron contains two hidden layers and one output layer. The activation function is GELU, and the calculation formula is as follows:
[0134]
[0135] In the formula, Let MLP be the linear transformation matrix. The bias matrix is output. (16-dimensional vector); The MLP enables a non-linear mapping from feature vectors to coordinate vectors, and the GELU activation function enhances the flexibility of the mapping, thereby transforming abstract global features into specific coordinate regression points;
[0136] S44. Finally, the coordinate regression points are reshaped into multiple control points, that is, the 16-dimensional regression point vector is reshaped into 8 2D control points. The first four control points ( ) are used to fit the best cutting line, the last 4 control points ( It is used to fit the farthest cutting line; according to the clinical surgical needs, two sets of control points are set to fit the best and farthest cutting lines respectively, to meet the needs of different surgical scenarios, thereby realizing graded fitting of the cutting line, providing surgeons with more flexible operation references, and ensuring surgical safety and accuracy;
[0137] The next step is to fit a smooth dividing line, and the specific steps are as follows:
[0138] The control points are fitted using cubic Bézier curves, and the relevant calculation equations are as follows:
[0139]
[0140] In the formula, For the parameters of the Bézier curve, Here are the coordinates of the control points; substituting the first four control points and the last four control points into the coordinates, we obtain the optimal cutting line. and the farthest cutting line The optimal cutting line corresponds to the most suitable bladder neck transection position in clinical surgery, while the farthest cutting line corresponds to the limit position for safe transection. The combination of the two provides surgeons with a clear operating range, avoiding over-cutting or under-cutting.
[0141] In the training process of Bézier curves, a curvature constraint loss function is introduced to limit the curvature of the cutting line, avoid sharp inflection points, and ensure that the cutting line is smooth. The formula for calculating the loss function is as follows:
[0142]
[0143] In the formula, For Bézier curves in parameters The curvature of a point The preset maximum curvature threshold is set to 0.05 based on clinical surgical experience. Indicates when Time to take Otherwise, take 0 to ensure that the curvature does not exceed the threshold; this curvature constraint makes the cutting line conform to the natural arc of the human anatomy, avoiding sharp inflection points from damaging the tissues around the bladder and prostate, and improving the safety and standardization of the surgery.
[0144] S5. Process the global feature map and control points separately to generate a Gaussian heatmap, outputting the probability distribution of the optimal and furthest cutting lines of the bladder neck, as well as the segmentation results of the prostate and bladder. The specific steps are as follows:
[0145] S51. The global feature map is decoded using a segmentation decoder to generate segmentation contours of the prostate and bladder. Utilizing the semantic discriminative power of the global feature map, precise segmentation of the organ region and background is achieved through convolutional mapping and probability calculation. This accurately identifies the contour boundaries of the bladder and prostate, providing a clear anatomical reference for the bladder neck transection location and avoiding miscutting. The specific steps are as follows:
[0146] S511. Use 1×1 convolution to map the number of channels in the global feature map to the class space. (Set to 3, corresponding to background, prostate, and bladder respectively), to obtain the segmentation log odds. The calculation formula is:
[0147]
[0148] In the formula, The 1×1 convolution can realize the dimensionality transformation of the feature channels, mapping global features to the class space, thereby achieving the adaptation of features and class labels, and providing a foundation for subsequent probability calculation. This is a convolution operation;
[0149] S512. Applying the Softmax activation function to... The process is performed to obtain the segmentation probability of each pixel, calculated using the following formula:
[0150]
[0151] S513. For multi-class segmentation, the class with the highest probability in the segmentation results of each pixel is taken as the segmentation result of that pixel, and finally the segmentation contours of the prostate and bladder are obtained, realizing accurate identification of organ regions; this operation can accurately distinguish the bladder, prostate and background regions, clarify the anatomical location of the bladder neck, provide a clear reference boundary for the positioning of the cutting line, and improve the accuracy of the cutting line.
[0152] S52. Sample and decode the control points on the optimal cutting line and the farthest cutting line respectively to generate two sets of corresponding curve feature sequences, specifically including:
[0153] S521. Uniformly sample N points on the optimal cutting line and the farthest cutting line respectively to obtain a sampling point sequence;
[0154] Among them, the number of sampling points is set. Uniform sampling parameters ( The two Bézier curves were sampled separately to obtain the sampling point sequence:
[0155]
[0156]
[0157] In the formula, To obtain the optimal cutting line sampling point sequence, This is the sequence of sampling points for the farthest cutting line, all of which are... Dimension; Uniform sampling can fully capture the morphological features of the cutting line, avoid feature omissions caused by sparse sampling, and ensure the integrity of curve features;
[0158] S522. Two sets of sampling point sequences are encoded into curve feature sequences using a multilayer perceptron (MLP). The MLP contains a linear layer and a GELU activation function, and the calculation formula is as follows:
[0159]
[0160]
[0161] In the formula, , These are the feature sequences of the optimal cutting line and the farthest cutting line, respectively. For multilayer sensor operation;
[0162] S53. The decoded global feature map is concatenated with the curve feature sequence, and multimodal feature fusion is performed using a Transformer encoder;
[0163] First, the global feature map Perform layer normalization (same as step S31), and then flatten the normalized feature map into sequence features. Spatial features are converted into sequence features, which are adapted to the sequence format of curve features, making them easy to splice and fuse. This achieves format unification between image features and curve features, providing a suitable feature format for multimodal fusion.
[0164] Then, the image feature sequence With curve characteristic sequence Concatenate along the sequence length dimension to construct a unified multimodal sequence The calculation formula is:
[0165]
[0166] In the formula, Total sequence length ;
[0167] Finally, three stacked Transformer encoder blocks (consistent with the Transformer block structure in step 3.3) are used to... Multimodal feature fusion is performed by using a self-attention mechanism to interactively fuse image features, optimal cutting line features, and farthest cutting line features to obtain enhanced features. This fusion operation achieves deep integration of image features and curve features, enabling the features to simultaneously contain organ anatomical information and cutting line morphology information, thereby improving the accuracy of subsequent Gaussian heatmap generation and ensuring the accuracy of the cutting line probability distribution.
[0168] S54. Generate a Gaussian heatmap based on the fused features, and output the probability distributions of the best cutting line and the farthest cutting line, specifically including:
[0169] S541. From Extracting image feature parts , that is, before taking the sequence The formula for calculating the number of elements is:
[0170]
[0171] S542. Will Reconstructed into a spatial feature map The spatial structure of the image is restored; by restoring the sequence features to spatial features, the generation requirements of Gaussian heatmaps are met, ensuring that the spatial position of the heatmap corresponds to that of the original endoscope image, improving the intuitiveness of the guidance, and reshaping the calculation formula as follows:
[0172]
[0173] S543. Use 1×1 convolution to... Mapped to Gaussian heatmap In a heatmap, each pixel value represents the probability that the location is a cutting line. The higher the probability value, the more suitable the location is as a cutting point. The matrix dimension reconstruction function; 1×1 convolution maps fused features to a probability distribution, and Gaussian distribution highlights high-probability regions, thus intuitively presenting the optimal position and safe range of the cutting line, facilitating rapid identification by surgeons. The formula for calculating the Gaussian heatmap is:
[0174]
[0175] The final output includes three types of results: the segmentation contours of the prostate and bladder (with organ boundaries marked), the Gaussian heatmap of the optimal cutting line (probability distribution), and the Gaussian heatmap of the furthest cutting line (probability distribution). These results are overlaid in real time onto the laparoscopic video stream, providing surgeons with intuitive guidance for bladder neck transection. This output achieves simultaneous output of organ segmentation and cutting line positioning, providing surgeons with comprehensive surgical assistance information, reducing the difficulty of surgical procedures, improving surgical precision and safety, and contributing to postoperative urinary continence recovery and tumor radical resection.
[0176] Example 2
[0177] like Figure 2 The illustrated bladder neck transection auxiliary identification system for minimally invasive radical prostatectomy specifically includes:
[0178] The data acquisition module is used to acquire the video stream from the endoscope and perform preprocessing.
[0179] The feature extraction module is used to extract multi-scale features from the real-time images in the preprocessed video stream and generate multi-scale feature maps.
[0180] The multi-scale feature fusion module performs position encoding and deep feature extraction based on the multi-scale feature map to obtain a global feature map.
[0181] The Bézier curve fitting module is used to map the global feature map to the control points of a Bézier curve and then fit a smooth cutting line.
[0182] The hyperbolic fusion decoding module processes the global feature map and control points separately, and finally generates a Gaussian heatmap, outputting the probability distribution of the best and farthest cutting lines of the bladder neck, as well as the segmentation results of the prostate and bladder.
[0183] Example 3
[0184] As attached Figure 3 An electronic device shown includes:
[0185] Processor, memory, communication interface;
[0186] The memory is used to store the executable instructions of the processor;
[0187] The processor is configured to execute the aforementioned bladder neck transection auxiliary identification method for minimally invasive radical prostatectomy by executing the executable instructions.
[0188] A readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the above-described method for assisted identification of bladder neck transection in minimally invasive radical prostatectomy.
[0189] The above are merely preferred embodiments of the present invention and are not intended to limit the present invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A method for auxiliary identification of bladder neck transection during minimally invasive radical prostatectomy, characterized in that, Specifically, it includes: Acquire the video stream from the endoscope and perform preprocessing; Extract multi-scale features from real-time images in the preprocessed video stream to generate multi-scale feature maps; A global feature map is obtained by performing location encoding and deep feature extraction based on multi-scale feature maps. The global feature map is mapped to the control points of a Bézier curve, and then the best and farthest cutting lines are fitted. The specific steps for mapping the global feature map to control points of a Bézier curve are as follows: Merge the high and wide channels of the global feature map to obtain sequence features; Global average pooling is used to aggregate sequence features into a global feature vector; A multilayer perceptron is used to convert the global feature vector into coordinate regression points; Finally, the coordinate regression points were reorganized into multiple control points; The global feature map and control points are processed separately to generate a Gaussian heatmap, outputting the probability distributions of the optimal and furthest cutting lines for the bladder neck, as well as the segmentation results for the prostate and bladder. The specific steps are as follows: A segmentation decoder is used to decode the global feature map to generate segmentation contours of the prostate and bladder; The control points on the optimal cutting line and the farthest cutting line are sampled and decoded respectively to generate two sets of corresponding curve feature sequences; The decoded global feature map is concatenated with the curve feature sequence, and multimodal feature fusion is performed using a Transformer encoder; Based on the fused features, a Gaussian heatmap is generated, outputting the probability distributions of the optimal cutting line and the farthest cutting line, specifically including: Extracting image features from the features obtained after multimodal fusion; The separated image features are reconstructed into a spatial feature map; A 1×1 convolution is used to map the spatial feature map into a Gaussian heatmap, which represents the probability distribution of the cutting line position.
2. The method for auxiliary identification of bladder neck transection in minimally invasive radical prostatectomy according to claim 1, characterized in that: The multi-scale feature maps include feature maps with resolutions of 1 / 4, 1 / 8, 1 / 16, and 1 / 32.
3. The method for auxiliary identification of bladder neck transection in minimally invasive radical prostatectomy according to claim 1, characterized in that: The step of performing position encoding and deep feature extraction based on multi-scale feature maps to obtain a global feature map specifically includes: Multi-scale feature maps are fused, and the fused feature maps are then normalized layer by layer. Introduce position coordinate encoding to the normalized feature map; Deep feature extraction is performed on the position-encoded features using stacked Transformer encoder blocks to obtain a global feature map.
4. The method for auxiliary identification of bladder neck transection in minimally invasive radical prostatectomy according to claim 3, characterized in that: The step of introducing position coordinate encoding into the normalized feature map is as follows: Position codes are generated using a sine-cosine function; The position codes in the X and Y directions are concatenated and then mapped to the feature channel dimension via linear projection. The mapped positional encoding is added to the normalized feature map.
5. The method for auxiliary identification of bladder neck transection in minimally invasive radical prostatectomy according to claim 1, characterized in that: The next step is to fit the optimal cutting line and the farthest cutting line. The specific steps are as follows: The control points are fitted using cubic Bézier curves, and the relevant calculation equations are as follows: in, For the parameters of the Bézier curve, These are the coordinates of the control points; A curvature constraint loss function is introduced during the training of Bézier curves. The formula for calculating the loss function is as follows: in For Bézier curves in The curvature of a point This is the preset maximum curvature threshold.
6. The method for auxiliary identification of bladder neck transection in minimally invasive radical prostatectomy according to claim 1, characterized in that: The step of using a segmentation decoder to decode the global feature map and generate segmentation contours of the prostate and bladder is as follows: Map the features in the global feature map to the class space to obtain the segmentation log odds; The segmentation log odds are processed by applying the Softmax activation function to obtain the segmentation probability of each pixel; For multi-class segmentation, the class with the highest probability among the segmentation results of each pixel is taken as the segmentation result for that pixel.
7. The method for auxiliary identification of bladder neck transection in minimally invasive radical prostatectomy according to claim 1, characterized in that: The process of sampling and decoding control points on the optimal cutting line and the farthest cutting line to generate two corresponding sets of curve feature sequences specifically includes: Uniform sampling is performed on N points on the optimal cutting line and the farthest cutting line respectively to obtain the sampling point sequence; The two sets of sampling point sequences are encoded into curve feature sequences using a multilayer perceptron.