High-robustness visual marker recognition method, system, device, and media for augmented reality surgical navigation

By combining deep learning and sub-pixel corner refinement networks, the robustness and accuracy issues of visual marker recognition in the high-dynamic environment of the operating room are solved, achieving high-precision pose calculation under complex conditions and supporting accurate positioning in AR surgical navigation.

CN122199876APending Publication Date: 2026-06-12HUAQIAO UNIVERSITY +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUAQIAO UNIVERSITY
Filing Date
2026-02-11
Publication Date
2026-06-12

Smart Images

  • Figure CN122199876A_ABST
    Figure CN122199876A_ABST
Patent Text Reader

Abstract

The application provides a high-robustness visual marker recognition method, system, device and medium for augmented reality surgery navigation, relates to the field of computer vision, and comprises the following steps: acquiring a surgery scene image collected by a collection device in surgery; detecting a candidate region of a marker in the surgery scene image based on a deep learning target detection network to obtain boundary box information of the marker region; accurately positioning a marker corner point in the marker region based on a sub-pixel corner point refinement network to obtain a refined corner point coordinate; and calculating a six-degree-of-freedom pose result of the marker corner point relative to the collection device based on the refined corner point coordinate and a predefined physical size of the marker corner point. The application overcomes the problems of insufficient robustness, limited corner point positioning accuracy and insufficient generalization ability of existing visual marker recognition technology in a high-dynamic imaging environment in an operating room, and realizes stable output of a pose result meeting the accuracy requirements of AR surgery navigation in a high-dynamic surgery environment.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of computer vision, digital image processing, and augmented reality (AR) surgical navigation technology, specifically to a highly robust visual marker recognition method, system, device, and medium for augmented reality surgical navigation. Background Technology

[0002] With the development of precision medicine and minimally invasive surgery, augmented reality surgical navigation technology has demonstrated significant application value in clinical scenarios such as puncture biopsy, tumor ablation, and interventional therapy. In current clinical practice, visual positioning methods based on artificial visual markers (such as planar QR code markers) are widely used for real-time estimation of equipment or patient pose during surgery due to their advantages such as low cost and ease of deployment.

[0003] Current mainstream solutions typically employ the following technical approach: First, threshold segmentation and contour analysis are performed on the acquired images to detect marker regions; then, corner coordinates are obtained using corner detection and interpolation optimization methods, and camera pose is calculated using a perspective geometry model. Some studies have also attempted to introduce deep learning models to improve the robustness of marker detection.

[0004] However, the aforementioned existing technologies generally have the following shortcomings in real operating room environments:

[0005] 1. The assumption of image degradation is too idealistic. Most existing methods implicitly assume that marker edges are sharp and corner responses are concentrated and unimodal. However, in the operating room, the specular reflection and local shadows caused by the strong directional illumination of the operating lights, as well as the rapid camera displacement caused by the movement of the operator or patient, often result in images exhibiting a complex degradation feature of strong illumination changes and significant motion blur. Under these conditions, traditional gradient and threshold-based detection and corner localization methods are prone to failure.

[0006] 2. Corner optimization methods are highly sensitive to initial errors. Existing sub-pixel corner refinement techniques (such as iterative optimization methods based on grayscale gradients) typically assume that the initial corner points are close to their true positions, resulting in a limited convergence range. When the detection error reaches multiple pixels under strong degradation conditions, these methods are prone to divergence or getting trapped in local optima, leading to unstable corner point localization.

[0007] 3. Existing deep learning solutions do not explicitly model the composite degradation mechanism. Although some studies have introduced deep learning models (such as DeepTag) to improve detection performance, their training data mostly comes from ideal or single degraded scenarios. They have not systematically modeled the imaging physical mechanism of "strong light + blur + occlusion" coexisting in the operating room, which results in the model's generalization ability and localization accuracy in real clinical scenarios still being limited.

[0008] In summary, existing technologies have not effectively solved the core technical problem of continuous, stable, and high-precision recognition of visual markers in highly dynamic, highly interfered surgical environments with extremely high precision requirements, and are therefore insufficient to meet the actual needs of AR surgical navigation for sub-millimeter-level spatial registration accuracy. Summary of the Invention

[0009] In view of this, embodiments of this application provide a highly robust visual marker recognition method, system, device, and medium for augmented reality surgical navigation, to overcome the problems of insufficient robustness, limited corner positioning accuracy, and insufficient generalization ability of existing visual marker recognition technologies in the high dynamic imaging environment of the operating room.

[0010] This application provides the following technical solution: a highly robust visual marker recognition method for augmented reality surgical navigation, comprising: Acquire surgical scene images captured intraoperatively by a visible light acquisition device; A deep learning-based object detection network is used to detect candidate regions of visual markers in the surgical scene image and obtain the bounding box information of the marked regions. The object detection network is trained on a synthetic degradation dataset that explicitly models the physical mechanism of operating room imaging. Based on the subpixel corner refinement network, the marked corner points in the marked region are precisely located at the subpixel level to obtain the refined corner point coordinates. Based on the refined corner coordinates and the predefined physical dimensions of the marked corner, the six-degree-of-freedom pose of the marked corner relative to the acquisition device is calculated.

[0011] According to one embodiment of this application, the process of constructing the synthetic degradation dataset includes: Artificial visual markers are superimposed on a natural scene background image with complex texture distribution, and at least one degradation factor is introduced into the superimposed image to obtain the synthetic degradation dataset.

[0012] According to one embodiment of this application, the degradation factors include non-uniform illumination variation, motion blur, noise, color shift, and local occlusion, used to simulate the complex degradation distribution caused by the coexistence of strong directional lighting and the movement of acquisition equipment / instruments in an operating room.

[0013] According to one embodiment of this application, subpixel-level precise positioning of marked corner points in the marked region is performed based on a subpixel corner refinement network, including: Within the marked area, a local image patch of a set size is cropped around each marked corner point; The local image patch is input into the sub-pixel corner refinement network. The sub-pixel corner refinement network models the position of the marked corner point in the local image patch as a discrete probability distribution. The sub-pixel level precise position of the marked corner point is determined by multi-class probability prediction, and the refined corner point coordinates are obtained.

[0014] According to one embodiment of this application, the subpixel corner refinement network is a deep neural network based on the VGG backbone network.

[0015] According to one embodiment of this application, based on the refined corner coordinates and the predefined physical dimensions of the marked corner, the six-degree-of-freedom pose result of the marked corner relative to the acquisition device is calculated, including: The refined corner coordinates are paired with the known three-dimensional coordinates of the corner in the marked coordinate system. The rotation and displacement parameters of the marked corner relative to the coordinate system of the acquisition device are calculated by perspective geometry solution method, and the obtained parameters are output to the augmented reality rendering system in real time.

[0016] According to one embodiment of this application, the method further includes: The pose results are output to the augmented reality rendering system in real time. When a set number of consecutive frames fail to obtain a valid pose result according to a set standard or the confidence of the marked corner point is lower than a preset threshold, it is determined that the current mark is lost and a view adjustment prompt is output.

[0017] This application also provides a highly robust visual marker recognition system for augmented reality surgical navigation, comprising: The image acquisition module is used to acquire surgical scene images captured by the visible light acquisition device during the operation; A robust detection module is used to detect candidate regions of visual markers in the surgical scene image based on a deep learning-based target detection network, and obtain the bounding box information of the marked regions; wherein, the target detection network is trained on a synthetic degradation dataset that explicitly models the physical mechanism of operating room imaging; The sub-pixel corner refinement module is used to perform sub-pixel-level precise positioning of the marked corner points in the marked area based on the sub-pixel corner refinement network, and obtain the refined corner point coordinates. The pose estimation module is used to calculate the six-degree-of-freedom pose of the marked corner points relative to the acquisition device based on the refined corner point coordinates and the predefined physical dimensions of the marked corner points.

[0018] This application also provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the above-described highly robust visual marker recognition method for augmented reality surgical navigation.

[0019] This application also provides a computer-readable storage medium storing a computer program that performs the above-described highly robust visual marker recognition method for augmented reality surgical navigation.

[0020] Compared with the prior art, the beneficial effects that at least one technical solution adopted in the embodiments of this specification can achieve include at least: 1. The embodiments of the present invention can still stably detect visual markers under combined interference conditions such as strong light, motion blur and occlusion, significantly improving the recognition continuity in real intraoperative scenarios; 2. By using corner-level sub-pixel refinement and discrete probability modeling, the uncertainty caused by corner response dispersion in degraded images is effectively eliminated, significantly improving corner positioning accuracy; 3. Due to the significant reduction in corner point error, the stability and accuracy of pose calculation are improved overall, thus achieving stable output of pose results that meet the accuracy requirements of AR surgical navigation even in highly dynamic surgical environments, beyond the expectations of those skilled in the art.

[0021] This invention provides a visual marker recognition and pose calculation technology that can achieve high stability detection and sub-pixel level corner point accurate positioning even under combined degradation conditions such as strong directional illumination, rapid camera movement, motion blur, and local occlusion, and is used to support high-precision AR surgical navigation applications. Attached Figure Description

[0022] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0023] Figure 1 This is a schematic flowchart of a highly robust visual marker recognition method according to an embodiment of the present invention; Figure 2 This is an example diagram illustrating the construction of a synthetic degraded dataset in the method of this embodiment of the invention; Figure 3 This is an example diagram of the corner refinement model dataset of the method in this embodiment of the invention; Figure 4 This is a schematic diagram illustrating the offline deployment and online implementation process of the method in this embodiment of the invention; Figure 5 This is a schematic diagram of a highly robust visual marker recognition system according to an embodiment of the present invention; Figure 6 This is an overall architecture diagram of the highly robust visual marker recognition system according to an embodiment of the present invention; Figure 7 This is a schematic diagram illustrating the implementation effect of an embodiment of the present invention; Figure 8 This is a schematic diagram of the structure of the computer device of the present invention. Detailed Implementation

[0024] The embodiments of this application will now be described in detail with reference to the accompanying drawings.

[0025] The following specific examples illustrate the implementation of this application. Those skilled in the art can easily understand other advantages and effects of this application from the content disclosed in this specification. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. This application can also be implemented or applied through other different specific embodiments, and the details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of this application. It should be noted that, in the absence of conflict, the following embodiments and features in the embodiments can be combined with each other. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0026] like Figure 1 As shown, this embodiment of the invention provides a highly robust visual marker recognition method for augmented reality surgical navigation, comprising: S101. Acquire surgical scene images captured by a visible light acquisition device during the operation; S102. A deep learning-based object detection network is used to detect candidate regions of visual markers in the surgical scene image and obtain bounding box information of the marker regions; wherein, the object detection network is trained on a synthetic degradation dataset that explicitly models the physical mechanism of operating room imaging; S103. Based on the sub-pixel corner refinement network, perform sub-pixel-level precise positioning of the marked corner points in the marked area to obtain the refined corner point coordinates; S104. Based on the refined corner coordinates and the predefined physical dimensions of the marked corner, calculate the six-degree-of-freedom pose of the marked corner relative to the acquisition device.

[0027] In real-world surgical environments, simply improving the detection stage or optimizing corner point accuracy is insufficient to reliably obtain high-precision pose results. Therefore, to overcome the shortcomings of existing visual marker recognition technologies in the high-dynamic imaging environment of the operating room, such as insufficient robustness, limited corner point localization accuracy, and inadequate generalization ability, this invention constructs a two-level recognition structure—combining robust detection and sub-pixel corner point refinement—through a holistic design addressing the complex degradation mechanisms of the operating room. This structure forms an inseparable technical whole, achieving the following objectives: 1. It can still stably and continuously detect visually marked targets under combined degradation conditions such as strong light changes, motion blur, and occlusion; 2. By eliminating the dispersion and multi-peak uncertainty of corner response in degraded images, sub-pixel level precise corner positioning is achieved; 3. Without requiring complex parameter tuning for specific operating room environments, it stably outputs high-precision six-DOF pose information to meet the clinical accuracy requirements of AR surgical navigation.

[0028] In some embodiments of the present invention, the first stage of robust detection is based on a deep learning-based object detection network. Unlike existing methods, this detection network is not trained solely through conventional data augmentation, but rather on a synthetic degradation dataset that explicitly models the physical mechanisms of operating room imaging. The construction process of the synthetic degradation dataset includes: Artificial visual markers are overlaid on a natural scene background image with complex texture distribution. At least one degradation factor is simultaneously introduced into the overlaid image to obtain the synthetic degradation dataset. The degradation factors include non-uniform illumination variation, motion blur, noise, color shift, and local occlusion, used to simulate the complex degradation distribution caused by the coexistence of strong directional lighting and the movement of acquisition equipment / instruments in an operating room.

[0029] In specific implementation, the construction of the synthetic degradation dataset, as follows: Figure 2 As shown, it includes: 1. Overlaying artificial visual markers onto natural scene background images with complex texture distribution; 2. Introduce one or more degradation factors into the superimposed image, including non-uniform illumination variation, motion blur, noise, color shift, and local occlusion; 3. The multi-factor degradation is not applied independently, but is used to simulate the complex degradation distribution caused by the coexistence of strong directional lighting and camera / instrument movement in the operating room.

[0030] The detection network trained in the above manner can still stably recall the marked target in a real intraoperative high-interference environment, thus providing reliable input for subsequent refinement processing.

[0031] In some embodiments of the present invention, a sub-pixel corner refinement network at the corner level is further introduced to mitigate the positioning uncertainty caused by corner response dispersion in degraded images. Based on the sub-pixel corner refinement network, sub-pixel-level precise positioning of the marked corners in the marked region is performed, including: Within the marked area, a local image patch of a set size is cropped around each marked corner point; The local image patch is input into the sub-pixel corner refinement network. The sub-pixel corner refinement network models the position of the marked corner point in the local image patch as a discrete probability distribution. The sub-pixel level precise position of the marked corner point is determined by multi-class probability prediction, and the refined corner point coordinates are obtained.

[0032] In specific implementation, such as Figure 3 As shown, fixed-size local image patches are cropped around each corner point from the detected marked region; these image patches are then input into a sub-pixel corner refinement network; instead of directly regressing continuous coordinates, the refinement network models the position of the corner point within the local region as a discrete probability distribution, determining the most likely sub-pixel position of the corner point through multi-class probability prediction. This discrete probability modeling method effectively addresses the problem of multi-peak or diffuse distribution of corner point responses under motion blur or low contrast conditions, avoiding the high sensitivity of traditional continuous regression or gradient optimization methods to initial estimation errors, thus maintaining stable convergence even under strong degradation conditions.

[0033] In a preferred embodiment of the present invention, the subpixel corner refinement network is a deep neural network based on the VGG backbone network.

[0034] In some embodiments of the present invention, based on the refined corner coordinates and the predefined physical dimensions of the marked corner, the six-degree-of-freedom pose result of the marked corner relative to the acquisition device is calculated, including: pairing the refined corner coordinates with the known three-dimensional coordinates of the corner in the marked coordinate system, calculating the rotation parameters and displacement parameters of the marked corner relative to the coordinate system of the acquisition device through a robust perspective geometry solution method, and outputting the obtained parameters to the augmented reality rendering system in real time.

[0035] In some embodiments of the present invention, the method further includes a monitoring process for the output results, specifically including: outputting the pose results to the augmented reality rendering system in real time; when a set number of consecutive frames fail to obtain a set standard of valid pose results or the confidence of the marked corner points is lower than a preset threshold, it is determined that the current marker is lost and a viewpoint adjustment prompt is output.

[0036] In one specific embodiment, such as Figure 4 As shown, the robust visual marker recognition method of the present invention mainly includes the following steps: Step S1: Image acquisition; The surgical scene is captured in real time by a visible light camera installed on the augmented reality device, and the images are used as input for the recognition process.

[0037] Step S2: Robust label detection; The acquired images are input into the target detection network for the first stage of robust detection. Under conditions of strong illumination changes, motion blur, and occlusion, the network performs target detection on the artificial visual markers and outputs candidate bounding boxes for the markers.

[0038] Step S3: Corner region extraction; For each detected bounding box, determine its corresponding corner position, and crop a local image patch of a preset size around each corner in the original resolution image.

[0039] Step S4: Subpixel corner refinement; The local image block is input into the sub-pixel corner refinement network, and the most likely sub-pixel position of the corner point in the local area is determined by discrete probability prediction, thereby obtaining high-precision corner point coordinates.

[0040] Step S5: Pose calculation; The corner coordinates obtained in step S4 are paired with the physical size parameters of the visual marker, and the pose estimation algorithm is used to calculate the six-degree-of-freedom pose information of the camera relative to the marker.

[0041] Step S6: Result Output and Monitoring; The pose results are continuously output to the augmented reality display system. When a valid pose cannot be obtained for several consecutive frames (e.g., 10 frames) or the corner confidence is lower than a preset threshold (e.g., 0.5), the system determines that the marker is temporarily lost and prompts the operator to adjust the viewpoint.

[0042] This embodiment also includes the process of offline model training and deployment: Both the target detection network and the sub-pixel corner refinement network obtained their model parameters through offline training. During training, artificial visual markers were superimposed on complex background images, and multi-factor composite degradation was introduced to approximate real operating room imaging conditions, thereby improving the model's generalization ability in clinical applications. After training, the model parameters were deployed to augmented reality devices or surgical navigation systems, achieving stable operation without relying on additional dedicated hardware.

[0043] like Figure 5 As shown, an embodiment of the present invention provides a highly robust visual marker recognition system 200 for augmented reality surgical navigation, comprising: Image acquisition module 201 is used to acquire surgical scene images captured by a visible light acquisition device during the operation; The robust detection module 202 is used to detect candidate regions of visual markers in the surgical scene image based on a deep learning-based target detection network, and obtain the bounding box information of the marked regions; wherein, the target detection network is obtained by training on a synthetic degradation dataset that explicitly models the physical mechanism of operating room imaging; The sub-pixel corner refinement module 203 is used to perform sub-pixel-level precise positioning of the marked corner points in the marked area based on the sub-pixel corner refinement network, and obtain the refined corner point coordinates. The pose estimation module 204 is used to calculate the six-degree-of-freedom pose result of the marked corner point relative to the acquisition device based on the refined corner point coordinates and the predefined physical dimensions of the marked corner point.

[0044] In the identification system of this embodiment, such as Figure 6 As shown, it includes: 1. Image input module: used to receive surgical scene images captured by an intraoperative visible light camera. The images may be subject to a combination of interferences such as strong light changes, motion blur, and local occlusion.

[0045] 2. Robust detection module in the first stage: used to stably detect candidate regions of visual markers in the above degraded images and output the bounding box information of the markers.

[0046] 3. The second-stage sub-pixel corner refinement module: is used to perform sub-pixel-level precise positioning of the marked corners in the marked area output by the detection module.

[0047] 4. Pose Estimation Module: Used to calculate the six-DOF pose of the marker relative to the camera based on the refined corner coordinates and the known physical dimensions of the marker.

[0048] The above modules are connected in sequence to form a complete perception link from raw image input to high-precision pose output.

[0049] In one specific embodiment, the highly robust visual marker recognition system for augmented reality surgical navigation of the present invention mainly includes the following steps: 1. Image acquisition and preprocessing.

[0050] In one embodiment, an image acquisition module acquires a raw image containing visual markers. The source of this image can be an industrial camera, a mobile terminal camera, or an embedded imaging device, and the image resolution can be, for example, 640×480, 1280×720, or higher. After acquiring the raw image, the system performs preprocessing operations, including but not limited to: grayscale conversion; normalization; and optional image downsampling or scale normalization. These preprocessing steps can be adjusted according to the specific application scenario.

[0051] 2. Composite degradation modeling and data augmentation.

[0052] To improve the robustness of the system under complex imaging conditions, this invention introduces a composite degradation modeling mechanism during the training phase.

[0053] In one embodiment, the degradation model includes at least one or more of the following: (1) Gaussian blur, the kernel size of which can be selected as an odd value in the range of 3 to 15 pixels; (2) Motion blur, the direction, angle and length of which can be randomly sampled within a preset range; (3) Imaging noise, such as Gaussian noise or Poisson noise, the intensity of which can vary within a preset range; (4) Brightness, contrast or local occlusion disturbance.

[0054] By generating various degradation samples through the above degradation methods, the training model can adapt to different complex imaging conditions.

[0055] In a preferred embodiment, 250 images are randomly selected from the MS COCO 2017 Natural Scene dataset. For each image, it is converted to grayscale using OpenCV's `cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)`, the median brightness is calculated, and all images are divided into 10 bins of equal magnitude based on the median brightness value. 50 images are then uniformly selected from each bin to ensure balanced illumination distribution. For each background image, 1–20 ArUco markers are randomly generated, with IDs randomly selected from `DICT_6X6_250`; sizes randomized between 30–200 pixels; and positions, rotation angles, and perspective projection parameters (simulating different camera perspectives) randomly generated. The markers are projected onto the background using OpenCV's `cv2.warpPerspective`. Then, local background brightness is applied to the marked areas in the superimposed image. Next, a composite degradation enhancement dataset was used, with each image randomly combined with one or more of the following: illumination adjustment (d random grayscale, spotlight effects), Gaussian blur (using cv2.GaussianBlur(ksize=(k,k), sigmaX=σ), (k \in {3,5,7,9,11,13,15} ), (σ \in [0.5, 2.0]); color shift: adding bias to the R / G / B channels respectively ( \Delta c \in [-20, +20]); local occlusion: randomly generating rectangular or elliptical masks (area ratio 10%–70%). Finally, 2500 training images (including bounding box annotations) were generated. Simultaneously, 24×24 corner sub-images were cropped from all labeled regions (ground truth corners are located within the center 8×8), as shown below. Figure 3 As shown, it is used for RefineNet training.

[0056] 3. Robust label detection.

[0057] The robust detection module is implemented based on a deep learning-based object detection network. Its function is to stably identify candidate regions of artificial visual markers from the aforementioned degraded images and output the corresponding bounding box information.

[0058] The key to this module lies not in the choice of specific network structure, but in its training method: the detection network learns from a synthetic degradation dataset based on explicitly modeling the physical features of operating room imaging during the training phase, thus enabling it to maintain a high recall rate even in the complex environment of real surgery.

[0059] In a preferred embodiment, YOLOv8s pre-trained weights provided by Ultralytics are used, and fine-tuned for 100 epochs on the aforementioned synthetic dataset. The input size is 640×640, the batch size is 32, the optimizer is AdamW, and the initial learning rate is 1e-3. During the runtime phase, the detection module performs forward inference on the input image and outputs a set of labeled bounding boxes with a confidence level exceeding a preset threshold (set to 0.6).

[0060] 4. Subpixel corner refinement.

[0061] The sub-pixel corner refinement module is connected to the detection module and is used to further refine the detected marked areas.

[0062] Specifically, this module crops fixed-size local image patches around each corner point of each marked bounding box output by the detection module, and inputs these image patches into the corner refinement network after normalization. The corner refinement network uses discrete probability modeling to predict the position of the corner point within the local region using multi-class probabilities, thereby determining the sub-pixel-level precise position of the corner point. Compared to traditional corner optimization methods that rely on local gray-level gradients, this module can maintain stable output even under conditions of strong blur and low contrast.

[0063] In a preferred embodiment, for each detected marker bounding box, a 24×24 pixel image patch containing a single corner point is cropped and used as input to the refinement network. The refinement network (called RefineNet) is a deep neural network based on the VGG backbone network. It performs 8x super-resolution modeling on the 8×8 pixel region at the center of the input patch, mapping it to a 64×64 sub-pixel grid, and predicts the most likely sub-pixel location of the corner point through 4096-way softmax classification. To reduce computational overhead, a bottleneck layer is introduced before the final mapping, compressing the 128-dimensional features to 8 dimensions. For a standard ArUco marker (containing a maximum of 4 corner points), the refinement process requires 4 independent forward inferences, with manageable computational overhead, and can be completed in real-time on an embedded GPU.

[0064] 5. Output the results.

[0065] The pose estimation module receives the corner image coordinates output by the subpixel corner refinement module and pairs them with the predefined physical size parameters of the visual markers.

[0066] The pose estimation module calculates the rotation and displacement parameters of the visual markers relative to the camera coordinate system based on the perspective geometry model, and transmits the calculated pose results to the augmented reality rendering system in real time for accurate alignment of the virtual model with the real surgical scene, such as... Figure 7 As shown.

[0067] This invention overcomes the problems of insufficient robustness, limited corner positioning accuracy, and insufficient generalization ability of existing visual marker recognition technology in the high dynamic imaging environment of the operating room, and realizes that pose results that can still stably output AR surgical navigation accuracy requirements in the high dynamic surgical environment.

[0068] In one embodiment, a computer device is provided, such as Figure 8 As shown, it includes a memory 301, a processor 302, and a computer program stored in the memory 301 and executable on the processor 302. When the processor 302 executes the computer program, it implements the above-described highly robust visual marker recognition method for augmented reality surgical navigation.

[0069] Specifically, the computer device can be a computer terminal, a server, or a similar computing device.

[0070] In this embodiment, a computer-readable storage medium is provided, which stores a computer program that performs the above-described highly robust visual marker recognition method for augmented reality surgical navigation.

[0071] Specifically, computer-readable storage media, including both permanent and non-permanent, removable and non-removable media, can store information using any method or technology. Information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer-readable storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable storage media does not include transient media, such as modulated data signals and carrier waves.

[0072] Obviously, those skilled in the art should understand that the modules or steps of the above-described embodiments of the present invention can be implemented using general-purpose computing devices. They can be centralized on a single computing device or distributed across a network of multiple computing devices. Optionally, they can be implemented using computer-executable program code, thereby storing them in a storage device for execution by a computing device. In some cases, the steps shown or described can be performed in a different order than those presented here, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, the embodiments of the present invention are not limited to any particular hardware and software combination.

[0073] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A highly robust visual marker recognition method for augmented reality surgical navigation, characterized in that, include: Acquire surgical scene images captured intraoperatively by a visible light acquisition device; A deep learning-based object detection network is used to detect candidate regions of visual markers in the surgical scene image and obtain the bounding box information of the marked regions. The object detection network is trained on a synthetic degradation dataset that explicitly models the physical mechanism of operating room imaging. Based on the subpixel corner refinement network, the marked corner points in the marked region are precisely located at the subpixel level to obtain the refined corner point coordinates. Based on the refined corner coordinates and the predefined physical dimensions of the marked corner, the six-degree-of-freedom pose of the marked corner relative to the acquisition device is calculated.

2. The highly robust visual marker recognition method according to claim 1, characterized in that, The process of constructing the synthetic degradation dataset includes: Artificial visual markers are superimposed on a natural scene background image with complex texture distribution, and at least one degradation factor is introduced into the superimposed image to obtain the synthetic degradation dataset.

3. The highly robust visual marker recognition method according to claim 2, characterized in that, The degradation factors include non-uniform illumination variation, motion blur, noise, color shift, and local occlusion, used to simulate the complex degradation distribution caused by the coexistence of strong directional lighting and the movement of acquisition equipment / instruments in the operating room.

4. The highly robust visual marker recognition method according to claim 1, characterized in that, Based on a sub-pixel corner refinement network, sub-pixel-level precise localization of the marked corner points in the marked region is performed, including: Within the marked area, a local image patch of a set size is cropped around each marked corner point; The local image patch is input into the sub-pixel corner refinement network. The sub-pixel corner refinement network models the position of the marked corner point in the local image patch as a discrete probability distribution. The sub-pixel level precise position of the marked corner point is determined by multi-class probability prediction, and the refined corner point coordinates are obtained.

5. The highly robust visual marker recognition method according to claim 1, characterized in that, The subpixel corner refinement network is a deep neural network based on the VGG backbone network.

6. The highly robust visual marker recognition method according to claim 1, characterized in that, Based on the refined corner coordinates and the predefined physical dimensions of the marked corner points, the six-degree-of-freedom pose of the marked corner points relative to the acquisition device is calculated, including: The refined corner coordinates are paired with the known three-dimensional coordinates of the corner in the marked coordinate system. The rotation and displacement parameters of the marked corner relative to the coordinate system of the acquisition device are calculated by perspective geometry solution method, and the obtained parameters are output to the augmented reality rendering system in real time.

7. The highly robust visual marker recognition method according to claim 1, characterized in that, The method further includes: The pose results are output to the augmented reality rendering system in real time. When a set number of consecutive frames fail to obtain a valid pose result according to a set standard or the confidence of the marked corner point is lower than a preset threshold, it is determined that the current mark is lost and a view adjustment prompt is output.

8. A highly robust visual marker recognition system for augmented reality surgical navigation, characterized in that, include: The image acquisition module is used to acquire surgical scene images captured by the visible light acquisition device during the operation; A robust detection module is used to detect candidate regions of visual markers in the surgical scene image based on a deep learning-based target detection network, and obtain the bounding box information of the marked regions; wherein, the target detection network is trained on a synthetic degradation dataset that explicitly models the physical mechanism of operating room imaging; The sub-pixel corner refinement module is used to perform sub-pixel-level precise positioning of the marked corner points in the marked area based on the sub-pixel corner refinement network, and obtain the refined corner point coordinates. The pose estimation module is used to calculate the six-degree-of-freedom pose of the marked corner points relative to the acquisition device based on the refined corner point coordinates and the predefined physical dimensions of the marked corner points.

9. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the robust visual marker recognition method for augmented reality surgical navigation as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that performs the robust visual marker recognition method for augmented reality surgical navigation as described in any one of claims 1 to 7.