Camera fusion positioning method, device, system and nonvolatile storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By fusing multiple pose data and combining robotic arm motion trends and similarity calculations, the problem of inaccurate positioning of the endoscope's end-effector camera was solved, improving the accuracy and safety of endoscopic examinations.

CN119863518BActive Publication Date: 2026-06-19HANGZHOU BRONCUS MEDICAL CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: HANGZHOU BRONCUS MEDICAL CO LTD
Filing Date: 2024-12-26
Publication Date: 2026-06-19

Application Information

Patent Timeline

26 Dec 2024

Application

19 Jun 2026

Publication

CN119863518B

IPC: G06T7/73; G06T7/246; G06V10/74

AI Tagging

Application Domain

Image analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing methods for endoscopic end-camera positioning are prone to failure, resulting in significant discrepancies between virtual and real physiological channel images, making it impossible to accurately determine the camera's position.

Method used

A multi-pose data fusion method is adopted. The camera pose is determined by combining the initial pose data and motion state data of the robotic arm. The target pose of the camera is estimated by combining the motion trend and similarity calculation of the robotic arm.

Benefits of technology

This improves the accuracy of camera positioning, reduces the probability of tracking failure, and enhances the accuracy and safety of endoscopic examinations.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN119863518B_ABST

Patent Text Reader

Abstract

This application discloses a camera fusion localization method, apparatus, system, and non-volatile storage medium. The camera fusion localization method includes: acquiring first pose data and at least one second pose data of a camera in a physiological channel when capturing a target image, wherein the pose data corresponding to the first pose data and the second pose data are determined in different ways; determining a first motion trend corresponding to the first pose data and a second motion trend corresponding to the second pose data; determining a first virtual image and a second virtual image captured by a virtual camera corresponding to the camera in a physiological channel model of the physiological channel, and determining a first similarity between the first virtual image and the target image, and a second similarity between the second virtual image and the target image; and determining target pose data of the camera when capturing the target image based on the first motion trend, the second motion trend, the first similarity, and the second similarity.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of navigation and positioning, and in particular to a camera fusion positioning method, apparatus, system, and non-volatile storage medium. Background Technology

[0002] In related technologies, a single positioning method is typically used when locating and tracking a camera at the end of an endoscope, which easily leads to tracking failures. In other words, during real-time tracking, there is a significant discrepancy between the virtual physiological channel image displayed in the window and the actual physiological channel image, making it impossible to accurately determine the camera's position within the physiological channel. Summary of the Invention

[0003] This application provides a camera fusion positioning method, apparatus, system, and non-volatile storage medium to at least solve the technical problem that camera tracking and positioning is prone to failure due to the use of a single positioning method in related technologies.

[0004] This application provides a camera fusion positioning method, comprising: acquiring first pose data and at least one second pose data of a camera in a physiological channel when capturing a target image, wherein the pose data corresponding to the first pose data and the second pose data are determined in different ways, and the first pose data is camera pose data determined based on the initial pose data and motion state data of a robotic arm, the robotic arm being used to control the motion state of the camera; determining a first motion trend corresponding to the first pose data and a second motion trend corresponding to the second pose data, wherein the trend types of the first motion trend and the second motion trend include forward and backward; determining a first virtual image and a second virtual image acquired by a virtual camera corresponding to the camera in a physiological channel model of the physiological channel, and determining a first similarity between the first virtual image and the target image, and a second similarity between the second virtual image and the target image, wherein the first virtual image is an image acquired by the virtual camera under the pose indicated by the first pose data, and the second virtual image is an image acquired by the virtual camera under the pose indicated by the second pose data; and determining target pose data of the camera when capturing the target image based on the first motion trend, the second motion trend, the first similarity, and the second similarity.

[0005] Optionally, each type of second pose data corresponds to a second motion trend and a second similarity; determining the target pose data of the camera when capturing the target image based on the first motion trend, the second motion trend, the first similarity, and the second similarity includes: selecting candidate second pose data from at least one type of second pose data, wherein the candidate second pose data is the second pose data whose corresponding second motion trend is consistent with the first motion trend; and determining the target pose data based on the candidate second pose data, the first pose data, the second similarity corresponding to the candidate second pose data, and the first similarity.

[0006] Optionally, determining the target pose data based on the candidate second pose data, the first pose data, the second similarity corresponding to the candidate second pose data, and the first similarity includes: using the first pose data as the default pose data and the first similarity as the default similarity; determining the candidate second pose data with the largest corresponding second similarity as the target candidate second pose data; if the second similarity corresponding to the target candidate second pose data is greater than the default similarity, updating the default similarity to the second similarity corresponding to the target candidate second pose data and updating the default pose data to the target candidate second pose data; determining the default pose data as the target pose data and determining the default similarity as the target similarity when the camera captures the target image.

[0007] Optionally, determining the first similarity between the first virtual image and the target image, and the second similarity between the second virtual image and the target image, includes: dividing the first virtual image into multiple partially overlapping first virtual image blocks, dividing the second virtual image into multiple partially overlapping second virtual image blocks, and dividing the target image into multiple partially overlapping real image blocks; determining the first virtual feature value of the first virtual image block, the second virtual feature value of the second virtual image block, and the real feature value of the real image block, wherein the first virtual feature value, the second virtual feature value, and the real feature value all include the standard deviation of pixel intensity values within the image block, and a whiteness value used to represent the proportion of specified pixels in the image block; deleting the first virtual image block, the second virtual image block, and the real image block whose whiteness value is greater than a preset whiteness value; arranging the retained first virtual image blocks in descending order of the standard deviation of pixel intensity values to obtain a sequence of first virtual image blocks, and arranging the retained second virtual image blocks in descending order of the standard deviation of pixel intensity values to obtain a sequence of first virtual image blocks; and determining the first virtual feature value of the second virtual image block, the second virtual image block, and the real image block. Virtual image blocks are arranged in descending order of the standard deviation of pixel intensity values of each virtual image block to obtain a second virtual image block sequence. For the retained real image blocks, they are arranged in descending order of the standard deviation of pixel intensity values to obtain a first real image block sequence. A predetermined number of virtual image blocks are selected from the first virtual image sequence in a forward-to-back order to obtain a third virtual image block sequence, and a predetermined number of virtual image blocks are selected from the second virtual image sequence in a forward-to-back order to obtain a fourth virtual image block sequence. A predetermined number of real image blocks are selected from the first real image block sequence to obtain a second real image block sequence. A first root mean square error (RMSE) value is determined between the third virtual image block sequence and the second real image block sequence, and a second RMSE value is determined between the fourth virtual image block sequence and the second real image block sequence. The first RMSE value is used to characterize a first similarity, and the second RMSE value is used to characterize a second similarity.

[0008] Optionally, a specified pixel in an image block is determined by: determining the saturation and brightness values of the pixels in the image block; and determining the pixel as the specified pixel when the saturation value of the pixel is less than a preset saturation value and the brightness value of the pixel is greater than a preset brightness value.

[0009] Optionally, determining the first motion trend corresponding to the first pose data and the second motion trend corresponding to the second pose data includes: determining a reference image and reference pose data of the camera when capturing the reference image, wherein the reference image and the target image are two consecutive frames captured by the camera in the physiological channel, and the camera first acquires the reference image and then acquires the target image; determining a first translation increment of the camera in a preset direction relative to the reference image when capturing the target image based on the reference pose data and the first pose data, and determining a second translation increment of the camera in a preset direction relative to the reference image when capturing the target image based on the reference pose data and the second pose data; determining the trend type of the first motion trend based on the sign of the first translation increment, and determining the trend type of the second motion trend based on the sign of the second translation increment.

[0010] Optionally, the preset direction includes the extension direction of the camera lens when capturing the reference image.

[0011] Optionally, after determining the target pose data when the camera captures the target image, the camera fusion localization method further includes: acquiring several frames of real images captured by the camera in chronological order, and the third similarity corresponding to each frame of real images, wherein the target image is the last frame of the several frames of real images, and the third similarity of the real images is a similarity determined based on the first and second similarities corresponding to the real images; determining the tracking state index of the target image based on the third similarity corresponding to each frame of real images, wherein the tracking state index includes state difference and state excellence; and updating the target pose data when the target image is determined to be a keyframe image and the corresponding tracking state index is state difference.

[0012] Optionally, determining the tracking status index of the target image based on the third similarity corresponding to each frame of real images includes: calculating the sum of the third similarities corresponding to each frame of real images; if the sum of the third similarities is not less than a preset threshold, determining the tracking status index as a good state; if the sum of the third similarities is less than the preset threshold, determining the tracking status index as a poor state.

[0013] Optionally, at least one second pose data includes second pose data determined based on a target image and a reference image, wherein the reference image and the target image are two frames of images continuously acquired by the camera in the physiological channel, and the camera acquires the reference image first and then the target image.

[0014] Optionally, at least one second pose data includes second pose data determined based on the shape of the flexible device, wherein the flexible device is connected to the camera.

[0015] This application embodiment also provides a camera fusion positioning device, including: a pose calculation module, used to determine a first pose data and at least one second pose data of a camera in a physiological channel when capturing a target image, wherein the pose data corresponding to the first pose data and the second pose data are determined in different ways, and the first pose data is camera pose data determined based on the initial pose data and motion state data of a robotic arm, the robotic arm being used to control the motion state of the camera; a motion trend calculation module, used to determine a first motion trend corresponding to the first pose data and a second motion trend corresponding to the second pose data, wherein the trend types of the first motion trend and the second motion trend include The system includes forward and backward movement; a similarity calculation module, used to determine the first and second virtual images acquired by the virtual camera corresponding to the camera in the physiological channel model of the physiological channel, and to determine the first similarity between the first virtual image and the target image, and the second similarity between the second virtual image and the target image, wherein the first virtual image is an image acquired by the virtual camera in the pose indicated by the first pose data, and the second virtual image is an image acquired by the virtual camera in the pose indicated by the second pose data; and a processing module, used to determine the target pose data of the camera when capturing the target image based on the first motion trend, the second motion trend, the first similarity, and the second similarity.

[0016] This application embodiment also provides a camera fusion positioning system, including a camera, an electronic device, and a robotic arm. The electronic device is used to determine first pose data and at least one second pose data of the camera in the physiological channel when capturing a target image. The first pose data and the second pose data are determined in different ways, and the first pose data is the camera pose data determined based on the initial pose data and motion state data of the robotic arm, which controls the camera's motion state. The system also determines a first motion trend corresponding to the first pose data and a second motion trend corresponding to the second pose data, wherein the trend types of the first and second motion trends include forward and backward movements. Furthermore, the system determines a first virtual image and a second virtual image acquired by a virtual camera corresponding to the camera in the physiological channel model of the physiological channel, and determines a first similarity between the first virtual image and the target image, and a second similarity between the second virtual image and the target image. The first virtual image is an image acquired by the virtual camera under the pose indicated by the first pose data, and the second virtual image is an image acquired by the virtual camera under the pose indicated by the second pose data. Finally, the system determines the target pose data of the camera when capturing the target image based on the first motion trend, the second motion trend, the first similarity, and the second similarity.

[0017] This application also provides a non-volatile storage medium storing a computer program, which, when executed by a processor, implements a camera fusion positioning method.

[0018] This application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement a camera fusion positioning method.

[0019] This application also provides a computer program product, including a computer program that implements a camera fusion positioning method when executed by a processor.

[0020] Based on the above scheme, it can be seen that this application estimates the target pose data of the camera when capturing the target image by fusing the camera pose data determined by multiple positioning methods. This improves the positioning accuracy of the camera and solves the technical problem that the tracking and positioning of the camera is prone to failure when using a single positioning method in related technologies. Attached Figure Description

[0021] To more clearly illustrate the technical solutions in the embodiments or related technologies of this application, the accompanying drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0022] Figure 1 This is a schematic diagram of the structure of a camera fusion positioning system provided in an exemplary embodiment of this application.

[0023] Figure 2 This is a schematic flowchart of a camera fusion positioning method provided in an exemplary embodiment of this application;

[0024] Figure 3 This is a schematic diagram of the physiological channel model provided in an exemplary embodiment of this application;

[0025] Figure 4 This is a flowchart illustrating the similarity calculation process provided in an exemplary embodiment of this application;

[0026] Figure 5 This is a schematic diagram of an image sub-block provided in an exemplary embodiment of this application;

[0027] Figure 6 This is a schematic flowchart of the camera fusion positioning process provided in an exemplary embodiment of this application;

[0028] Figure 7This is a schematic flowchart of a real-time camera tracking process provided according to an embodiment of this application;

[0029] Figure 8 This is a flowchart illustrating a camera fusion positioning procedure according to an embodiment of this application;

[0030] Figure 9 This is a schematic diagram of the camera fusion positioning device provided in an exemplary embodiment of this application;

[0031] Figure 10 This is a schematic diagram of the structure of an electronic device provided in an exemplary embodiment of this application. Detailed Implementation

[0032] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0033] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0034] The technical solutions of this application will be described in detail below with specific embodiments. The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments.

[0035] Endoscopic navigation is a commonly used auxiliary technique during endoscopic examinations of physiological passages such as the bronchi, designed to improve the accuracy and safety of the procedure. Common endoscopes include bronchoscopes, anoscopes, sigmoidoscopes, nasal endoscopes, and laryngoscopy. Taking bronchoscopy as an example, endoscopy is a medical procedure used to examine the trachea and bronchi, typically for the diagnosis and treatment of respiratory diseases such as lung cancer, bronchitis, and emphysema.

[0036] The background behind endoscopic navigation mainly involves the following aspects:

[0037] The need for precise localization: During endoscopic examinations, it is necessary to accurately locate and access specific bronchial branches in order to observe and collect tissue samples. Some bronchial branches may be small or located deep within the bronchial cavity, making accurate localization difficult using conventional endoscopy.

[0038] Improving examination accuracy: By introducing navigation technology, the anatomical structure of the bronchi can be displayed in real-time images, guiding the endoscope more accurately and thus improving the accuracy of the examination. This is crucial for the early detection and diagnosis of lesions.

[0039] Reducing patient discomfort: Traditional endoscopy may require repeated insertion of the endoscope, which can lead to patient discomfort and the risk of complications. Endoscopic navigation can locate the target position more quickly, reducing the number of insertions and thus alleviating patient discomfort.

[0040] Technological advancements have made endoscopic navigation possible with the continuous development of medical imaging and navigation technologies. Computer-aided navigation systems can combine real-time images, computer-reconstructed three-dimensional anatomical structures, and electromagnetic or optical positioning technologies to provide more comprehensive information, helping doctors better navigate the bronchi.

[0041] Overall, endoscopic navigation technology can improve the effectiveness of endoscopic examinations and enhance the patient experience. By leveraging modern technology, it provides doctors with more information and tools to perform endoscopic examinations more accurately and safely.

[0042] However, existing endoscopic navigation technologies typically employ a single positioning method when locating the camera at the end of the endoscope. This can easily lead to tracking failures and result in significant discrepancies between the virtual bronchial image rendered from the pose data obtained by the endoscopic navigation technology and the real bronchial image. This situation generally occurs when the real endoscope moves too fast, hits a wall, or the endoscope captures too few image features (such as airway openings, bifurcations, and folds).

[0043] To address the aforementioned problems, the technical solutions provided in the various embodiments of this specification are described in detail below with reference to the accompanying drawings.

[0044] Figure 1 This is a schematic diagram of a camera fusion positioning system provided in an embodiment of this application. Figure 1As can be seen, the system includes: a camera 10, an electronic device 12, and a robotic arm 14. The electronic device 12 is used to acquire first pose data and at least one second pose data of the camera 10 when capturing a target image in the physiological channel. The pose data corresponding to the first pose data and the second pose data are determined in different ways. The first pose data is the pose data of the camera 10 determined based on the initial pose data and motion state data of the robotic arm. The robotic arm is used to control the motion state of the camera 10. The system also determines a first motion trend corresponding to the first pose data and a second motion trend corresponding to the second pose data. The trend types include forward and backward; determine the first virtual image and the second virtual image acquired by the virtual camera 10 corresponding to the camera 10 in the physiological channel model of the physiological channel, and determine the first similarity between the first virtual image and the target image, and the second similarity between the second virtual image and the target image, wherein the first virtual image is the image acquired by the virtual camera 10 in the pose indicated by the first pose data, and the second virtual image is the image acquired by the virtual camera 10 in the pose indicated by the second pose data; determine the target pose data of the camera 10 when capturing the target image based on the first motion trend, the second motion trend, the first similarity and the second similarity.

[0045] Figure 2 A flowchart of a camera fusion positioning method provided in this application embodiment specifically includes the following steps:

[0046] Step 202: Obtain the first pose data and at least one second pose data of the camera in the physiological channel when capturing the target image. The pose data corresponding to the first pose data and the second pose data are determined in different ways. The first pose data is the camera pose data determined based on the initial pose data and motion state data of the robotic arm. The robotic arm is used to control the motion state of the camera.

[0047] In the technical solution provided in step S202, at least one second pose data includes second pose data determined based on the target image and the reference image, wherein the reference image and the target image are two frames of images continuously acquired by the camera in the physiological channel, and the camera first acquires the reference image and then acquires the target image.

[0048] As an optional implementation, the determination of second pose data based on the target image and the reference image can be achieved as follows: A sparse pyramid optical flow algorithm is used to determine matching point pairs between the reference image and the target image. The displacement information of the matching point pairs in the reference and target images is calculated, and the matching point pairs are filtered using this displacement information to determine the 3D points corresponding to the filtered matching point pairs. Then, multiple iterative calculations are performed using these 3D points to determine the transformation relationship between the pose data of the reference image and the pose data corresponding to the target image. Finally, using the known pose data corresponding to the reference image and the transformation relationship, the pose data of the camera in a preset 3D spatial coordinate system (such as the CT coordinate system) when the target image is acquired is calculated; this is the second pose data.

[0049] In some embodiments of this application, the target image and a set of virtual images can be matched to determine the virtual image with the highest similarity to the target image. Then, the second pose data is determined based on the pose of the virtual camera corresponding to the most similar virtual image. The set of virtual image data includes different virtual images rendered by the virtual camera within the physiological channel under various pose data.

[0050] As an optional implementation, at least one second pose data includes second pose data determined based on the shape of a flexible instrument, wherein the flexible instrument is connected to a camera. That is, the endoscope includes a flexible instrument and a camera, and the camera is positioned at the end of the flexible instrument.

[0051] Optionally, when the flexible instrument is made of optical fiber, the following process can be used to determine the second pose data based on the shape of the flexible instrument.

[0052] Initially, a first coordinate system and a second coordinate system are established. The first coordinate system is a spatial rectangular coordinate system with its origin located within the endoscope, and the second coordinate system is a spatial rectangular coordinate system with its origin located within the physiological channel. The endoscope includes a camera and a flexible instrument, with the camera positioned at the end of the flexible instrument. Based on the first and second coordinate systems, a rotation matrix and a translation matrix are determined between them. The first target part model of the flexible instrument is then stitched onto the first physiological channel model of the physiological channel using the rotation and translation matrices, resulting in a stitched second physiological channel model. The first target part model is based on the first target part of the flexible instrument, and its two endpoints correspond to the first physiological feature point (epiglottis) and the second physiological feature point (carina) of the physiological channel, respectively. Based on the stitched second physiological channel model and the fiber optic signal, ICP or SVD algorithms are used for registration to determine the second pose data of the camera at the end of the endoscope when capturing the target image. The fiber optic signal includes data such as the wavelength and curvature of each grating point.

[0053] In some embodiments of this application, when determining the first pose data, the initial pose data of the robotic arm connected to the camera can be obtained in advance, wherein the camera is positioned at the end of the endoscope. Subsequently, the initial pose data and motion state data of the robotic arm can be used... (For example, forward distance, rotation angle, and turning angle) to obtain the first pose data of the camera in a preset coordinate system (such as the CT coordinate system) under the current robot motion state data.

[0054] Step 204: Determine the first motion trend corresponding to the first pose data and the second motion trend corresponding to the second pose data, wherein the trend types of the first motion trend and the second motion trend include forward and backward.

[0055] In the technical solution provided in step S204, the steps of determining the first motion trend corresponding to the first pose data and the second motion trend corresponding to the second pose data include: determining a reference image and the reference pose data of the camera when capturing the reference image, wherein the reference image and the target image are two consecutive frames captured by the camera in the physiological channel, and the camera first acquires the reference image and then acquires the target image; determining the first translation increment of the camera in a preset direction relative to the reference image when capturing the target image based on the reference pose data and the first pose data, and determining the second translation increment of the camera in a preset direction relative to the reference image when capturing the target image based on the reference pose data and the second pose data; determining the trend type of the first motion trend based on the positive or negative value of the first translation increment, and determining the trend type of the second motion trend based on the positive or negative value of the second translation increment.

[0056] In some embodiments of this application, the formula for calculating the translation increment is as follows:

[0057]

[0058] In the above formula, Indicates the translation increment. and These represent the translation matrix when the camera captures the target image in a preset coordinate system (such as the CT coordinate system) and the translation matrix when capturing the reference image, respectively. This represents the inverse of the rotation matrix in a preset coordinate system when the camera captures the reference image. The translation matrix of the camera when capturing the target image can be determined based on the pose data of the camera when capturing the target image. When the translation matrix is determined based on the first pose data, the above translation increment is the first translation increment. And when the translation matrix is determined based on the second pose data, the above translation increment is the second translation increment.

[0059] After determining the translation increment, the trend type of the specific movement trend can be further determined based on the sign of the translation increment value. When the first translation increment value is positive, it is assigned the value true, and the trend type of the first movement trend is determined to be forward. When the first translation increment value is negative or zero, it is assigned the value false, and the trend type of the first movement trend is determined to be backward.

[0060] Similarly, when the value of the second translation increment is positive, it is assigned the value true, and the trend type of the second movement trend is determined to be forward. When the value of the second translation increment is negative or zero, it is assigned the value false, and the trend type of the second movement trend is determined to be backward.

[0061] As an optional implementation, the preset direction includes the extension direction of the camera lens when capturing a reference image.

[0062] In some embodiments of this application, the aforementioned preset direction may also be the positive Z-axis direction of the camera coordinate system, where the Z-axis of the camera coordinate system is the lens axis of the camera. The aforementioned camera coordinate system refers to a three-dimensional Cartesian coordinate system established with the camera as a reference. For example, a three-dimensional Cartesian coordinate system can be established with the lens axis of the camera as the Z-axis and the optical center of the camera (i.e., the lens center) as the origin.

[0063] Step 206: Determine the first virtual image and the second virtual image acquired by the virtual camera corresponding to the camera in the physiological channel model of the physiological channel, and determine the first similarity between the first virtual image and the target image, and the second similarity between the second virtual image and the target image, wherein the first virtual image is the image acquired by the virtual camera in the pose indicated by the first pose data, and the second virtual image is the image acquired by the virtual camera in the pose indicated by the second pose data.

[0064] In some embodiments of this application, when the physiological channel is the bronchus, the physiological channel model is as follows: Figure 3 As shown.

[0065] In the technical solution provided in step S206, the steps of determining the first similarity between the first virtual image and the target image, and the second similarity between the second virtual image and the target image, include: dividing the first virtual image into multiple partially overlapping first virtual image blocks, dividing the second virtual image into multiple partially overlapping second virtual image blocks, and dividing the target image into multiple partially overlapping real image blocks; determining the first virtual feature value of the first virtual image block, the second virtual feature value of the second virtual image block, and the real feature value of the real image block, wherein the first virtual feature value, the second virtual feature value, and the real feature value all include the standard deviation of pixel intensity values within the image block, and a whiteness value used to reflect the proportion of specified pixels in the image block; deleting the first virtual image blocks, the second virtual image blocks, and the real image blocks whose whiteness values are greater than a preset whiteness value; and arranging the retained first virtual image blocks in descending order of the standard deviation of pixel intensity values to obtain a sequence of first virtual image blocks. The process involves: arranging the retained second virtual image blocks in descending order of pixel intensity standard deviation to obtain a second virtual image block sequence; arranging the retained real image blocks in descending order of pixel intensity standard deviation to obtain a first real image block sequence; selecting a predetermined number of virtual image blocks from the first virtual image sequence in a forward-to-back order to obtain a third virtual image block sequence; selecting a predetermined number of virtual image blocks from the second virtual image sequence in a forward-to-back order to obtain a fourth virtual image block sequence; selecting a predetermined number of real image blocks from the first real image block sequence to obtain a second real image block sequence; determining a first root mean square error (RMSE) between the third virtual image block sequence and the second real image block sequence; and determining a second RMSE between the fourth virtual image block sequence and the second real image block sequence, wherein the first RMSE is used to characterize a first similarity, and the second RMSE is used to characterize a second similarity.

[0066] In some embodiments of this application, a specific pixel in an image block is determined in the following way:

[0067] Determine the saturation and brightness values of pixels in an image block; when the saturation value of a pixel is less than a preset saturation value and the brightness value of a pixel is greater than a preset brightness value, determine the pixel as a specified pixel.

[0068] Optional, the process for determining similarity is as follows: Figure 4 As shown, the process can be divided into stages such as image segmentation, feature value calculation, sub-block selection, and similarity calculation.

[0069] Taking a target image as an example, in the image segmentation stage, the target image can be divided into, for example... Figure 5 shown Overlapping image blocks are used to avoid small structures in physiological channels being divided into different image blocks. Defined as:

[0070]

[0071]

[0072] in and The following constraints must be met:

[0073] and

[0074] in and These represent the width and height of the target image, respectively. and Describe the number of divisions in the width and height of the image, respectively. and Representing image blocks The range of values for the horizontal axis (width) and the vertical axis (height). and These represent the nth or nth block on the horizontal and vertical axes, respectively.

[0075] During the eigenvalue calculation stage, for each image patch... Its characteristic values include the standard deviation of the intensity value and the whiteness value. Image patch Standard deviation of strength value The calculation formula is as follows:

[0076]

[0077] in It is a sub-block The number of pixels inside, It is the average intensity of all pixels within that sub-block.

[0078] Image Patch whiteness The calculation formula is as follows:

[0079]

[0080] In the above formula It is a function used to determine whether a pixel is a specified pixel. The function is described as follows:

[0081]

[0082] In the formula and The formula represents the saturation and brightness values of that pixel in the HSV color space. saturation value Less than or equal to a certain threshold And brightness value Greater than a certain threshold When the specified pixel is used, the function... The value is 1.

[0083] After dividing the image into blocks, it is necessary to select image blocks based on their feature values. Optionally, all image blocks can be added to a candidate list first. Then, remove candidates with whiteness values greater than a certain threshold from the candidate list. The sub-block, i.e. Then, the remaining image patches are processed according to the standard deviation of their intensity values. Sort in descending order. Finally, keep the list. Before sorting The image blocks are selected, and other image blocks are deleted. It is a proportionality coefficient, and its range of values is... , It is the total number of sub-blocks into which the entire image is divided.

[0084] The above method allows for similarity calculation only on image blocks with characteristic structures, such as airway bifurcations or folds. These characteristic structures exhibit a high standard deviation, but sub-blocks containing alveoli also have a high standard deviation. Therefore, by comprehensively considering the saturation and brightness values of the sub-blocks, the influence of alveoli can be eliminated; that is, the whiteness value of the sub-block is judged to exceed a certain threshold to remove this special case.

[0085] It should be noted that the same processing procedure as the target image will be used for the first and second virtual images, which will not be elaborated here.

[0086] When calculating similarity, taking the first virtual image and the target image as an example, the formula for calculating the root mean square error between the first virtual image and the target image is as follows:

[0087]

[0088] In the above formula, The number of image blocks in the second real image block sequence and the third virtual image block sequence. This represents the number of pixels in any image block. and Image blocks corresponding to the second real image block sequence and the third virtual image block sequence, respectively. The average intensity. and The intensity values of each phase velocity in image block D in the second real image block sequence and the third virtual image block sequence correspond to the intensity values of each phase velocity.

[0089] It should be noted that the root mean square error between the second virtual image and the target image is calculated in the same way as that between the first virtual image and the target image, and will not be repeated here.

[0090] Step 208: Determine the target pose data of the camera when capturing the target image based on the first motion trend, the second motion trend, the first similarity and the second similarity.

[0091] In the technical solution corresponding to step 208, each type of second pose data corresponds to a second motion trend and a second similarity; the step of determining the target pose data of the camera when capturing the target image based on the first motion trend, the second motion trend, the first similarity, and the second similarity includes: selecting candidate second pose data from at least one type of second pose data, wherein the candidate second pose data is the second pose data whose corresponding second motion trend is consistent with the first motion trend; and determining the target pose data based on the candidate second pose data, the first pose data, the second similarity corresponding to the candidate second pose data, and the first similarity.

[0092] As an optional implementation, the step of determining the target pose data based on the candidate second pose data, the first pose data, the second similarity corresponding to the candidate second pose data, and the first similarity includes: using the first pose data as the pose data default value and using the first similarity as the similarity default value; determining the candidate second pose data with the largest corresponding second similarity as the target candidate second pose data; if the second similarity corresponding to the target candidate second pose data is greater than the similarity default value, updating the similarity default value to the second similarity corresponding to the target candidate second pose data and updating the pose data default value to the target candidate second pose data; determining the pose data default value as the target pose data and determining the similarity default value as the target similarity when the camera captures the target image.

[0093] In some embodiments of this application, the process for camera fusion positioning, as summarized above, is as follows: Figure 6 As shown, it includes the following steps:

[0094] Step 602: Simultaneously determine the first pose data and multiple second pose data of the camera when capturing the target image;

[0095] In the scheme provided in step 602, the various second pose data include second pose data determined according to the shape of the flexible device and the physiological channel, and second pose data determined according to a reference image and a target image. The reference image is an image captured by the camera before capturing the target image, and the reference image and the target image are two consecutively captured images.

[0096] Step 604: Determine the first similarity between the target image and the virtual image based on the first pose data, and determine the second similarity between the target image and the virtual image based on various second pose data respectively, and determine the first motion trend index corresponding to the first pose data and the second motion trend index corresponding to various second pose data.

[0097] It should be noted that the virtual image is an image rendered based on the first or second pose data and combined with the physiological channel model.

[0098] Step 606: Based on the motion trend indicators and similarity corresponding to each pose data, fuse various pose data to obtain target pose data and target similarity of the target image.

[0099] Optionally, the first motion trend and first similarity determined by the robotic arm control signal are used as benchmark values, and the second motion trend and second similarity corresponding to each second pose data are compared to determine the final target pose data and target similarity. Since the robotic arm moves under the instruction of the robotic arm control signal, the initial pose data and motion state data of the robotic arm can be quickly and accurately determined through the robotic arm control signal, thus allowing the determination of the first motion trend based on the robotic arm control signal.

[0100] As an alternative implementation, target similarity can be used to determine the tracking state corresponding to the target image, thereby determining whether further optimization of the target pose data is needed.

[0101] Optionally, in some embodiments of this application, after determining the target pose data when the camera captures the target image, the camera fusion positioning method further includes: acquiring a number of real images captured by the camera in chronological order, and a third similarity corresponding to each real image, wherein the target image is the last image among the number of real images, and the third similarity of the real image is a similarity determined based on the first and second similarities corresponding to the real image; determining the tracking state index of the target image based on the third similarity corresponding to each real image, wherein the tracking state index includes a state difference and a state advantage; and updating the target pose data when the target image is determined to be a keyframe image and the corresponding tracking state index is a state difference.

[0102] As an optional implementation, determining the tracking status index of the target image based on the third similarity corresponding to each frame of real images includes: calculating the sum of the third similarities corresponding to each frame of real images; if the sum of the third similarities is not less than a preset threshold, determining the tracking status index as a good state; if the sum of the third similarities is less than the preset threshold, determining the tracking status index as a poor state.

[0103] In some embodiments of this application, it can be determined whether the target image is a keyframe image by inputting the target image into a pre-trained image classification neural network and obtaining the judgment result output by the image classification neural network.

[0104] As an optional implementation, when the tracking status index corresponding to the target image is poor and the target image is a keyframe image, the target pose data can be optimized and the optimized data can be used as the new target pose data. However, in cases where the tracking status is good and it is a keyframe, the tracking status is good and it is a non-keyframe, or the tracking status is poor and it is a non-keyframe, the target pose data is not updated.

[0105] In some embodiments of this application, when optimizing the target pose data, the target pose data can be used as the initial pose data, and different pose offsets can be added to the initial pose data to obtain several different candidate pose data. Then, based on each candidate pose data, a virtual image acquired by a virtual camera in the pose indicated by the candidate pose data is rendered, thereby determining the similarity between each virtual image and the target image. Finally, the candidate pose data corresponding to the highest similarity is selected as the optimized target pose data.

[0106] It is not difficult to see from the above that, in the embodiments of this application, the following is adopted: Figure 7 The illustrated camera fusion localization and real-time tracking process generates a physiological channel model offline and acquires a virtual image corresponding to the target image captured by the camera based on the physiological channel model. Then, in the fusion localization stage, the camera pose data determined by various localization methods are fused based on the similarity between the virtual image and the target image, as well as the motion trends corresponding to various localization methods, to estimate the target pose data of the camera when capturing the target image, thus improving the localization accuracy of the camera. Furthermore, in the real-time tracking stage, the pose of the virtual camera in the physiological channel model is updated in real time, and the target image and virtual image are displayed synchronously, providing navigation information for the operator or doctor. This solves the technical problem of easy failure in camera tracking and localization caused by using a single localization method in related technologies.

[0107] In some embodiments of this application, a camera fusion positioning program is also provided. The execution flow of this program is as follows: Figure 8 As shown, it includes:

[0108] Step 802: Determine the first pose data and at least one second pose data when the camera captures the target image;

[0109] Specifically, the first pose data is based on the robot arm's initial pose data and motion state data. Definite The second pose data is image data containing the reference image and the target image, processed according to the sparse pyramid optical flow algorithm. The second pose data obtained after processing The second pose data is determined based on the shape of the flexible device and the shape of the physiological channel. The shape of the flexible device can be determined through flexible device signals. To obtain.

[0110] Step 804: Determine the motion trend corresponding to the first pose data. and similarity The motion trend corresponding to the second pose data determined by the sparse pyramid optical flow algorithm. and similarity And the motion trend corresponding to the second pose data determined based on the shape of the flexible device. and similarity ;

[0111] Step 806, will As the reference pose data P, and will As the baseline similarity S, the baseline pose data and baseline similarity are updated according to the motion trends and similarities corresponding to various second pose data and first pose data, and the final baseline pose data is used as the target pose data, and the final baseline similarity is used as the target similarity.

[0112] Specifically, we can first compare and Whether they are consistent, and if they are consistent, then determine. Is the similarity greater than the baseline S? If both of the above conditions are met, the baseline pose data P is updated to... Update the baseline similarity S to .

[0113] And in and Inconsistent, or If the similarity is not greater than the baseline S, continue the judgment. Whether or not Consistent and Is the similarity greater than the baseline similarity S? If and Consistent and If it is greater than S, then update S to And update P to .

[0114] if and Consistent and If it is greater than S, then first update S to... And update P to Further judgment will follow. Whether or not Consistent and Is the similarity greater than the baseline S? That is... Is it greater than And in and Consistent, and Greater than S, that is Greater than In the case of S being updated to And update P to .

[0115] In some embodiments of this application, P after the last update can be determined as the target pose data, and S after the last update can be determined as the target similarity.

[0116] Step 808: Determine whether to update the target pose data P;

[0117] Specifically, we can acquire n consecutive real images (image 1, image 2, image 3, ..., image n) captured by the camera in chronological order, where n is an integer greater than 0. Image n is the target image. Then, we can determine whether the target image is a keyframe image K, and whether the tracking state determined by the sum of the target similarities of the n consecutively captured images is poor. When determining the tracking state, we can compare the sum of the target similarities with a preset threshold. If the sum of the target similarities is not less than the preset threshold, the tracking state is considered good; otherwise, it is considered poor.

[0118] If the target image is determined to be a keyframe image and the tracking state is poor, the target pose data P can be directly used as the final pose data of the target image. Otherwise, the target pose data P will be updated accordingly. As .

[0119] As an optional implementation method, when determining At this time, based on P, several different pose data can be automatically generated by adding different random offsets. Then, the generated pose data is used to render the corresponding virtual image, and the obtained virtual image is compared with the target image. The similarity between the virtual images is used to determine the pose of the virtual image with the highest similarity, which is then used as the optimized pose. .

[0120] Figure 8 This application provides a structural block diagram of a camera fusion positioning device. The device includes: a pose calculation module 80, used to determine first pose data and at least one second pose data of a camera in a physiological channel when capturing a target image. The first pose data and the second pose data are determined in different ways, and the first pose data is camera pose data determined based on the initial pose data and motion state data of a robotic arm, which controls the camera's motion state; and a motion trend calculation module 82, used to determine a first motion trend corresponding to the first pose data and a second motion trend corresponding to the second pose data. The first motion trend and the second motion trend tend to converge. The pose types include forward and backward; the similarity calculation module 84 is used to determine the first virtual image and the second virtual image acquired by the virtual camera corresponding to the camera in the physiological channel model of the physiological channel, and to determine the first similarity between the first virtual image and the target image, and the second similarity between the second virtual image and the target image, wherein the first virtual image is the image acquired by the virtual camera in the pose indicated by the first pose data, and the second virtual image is the image acquired by the virtual camera in the pose indicated by the second pose data; the processing module 86 is used to determine the target pose data of the camera when capturing the target image based on the first motion trend, the second motion trend, the first similarity and the second similarity.

[0121] In some embodiments of this application, at least one second pose data includes second pose data determined based on a target image and a reference image, wherein the reference image and the target image are two frames of images continuously acquired by a camera in a physiological channel, and the camera acquires the reference image first and then the target image.

[0122] In some embodiments of this application, at least one second pose data includes second pose data determined based on the shape of a flexible device, wherein the flexible device is connected to a camera.

[0123] In some embodiments of this application, the motion trend calculation module 82 determines the first motion trend corresponding to the first pose data and the second motion trend corresponding to the second pose data by the following steps: determining a reference image and reference pose data of the camera when capturing the reference image, wherein the reference image and the target image are two consecutive frames captured by the camera in the physiological channel, and the camera first captures the reference image and then captures the target image; determining a first translation increment of the camera in a preset direction relative to the reference image when capturing the target image based on the reference pose data and the first pose data, and determining a second translation increment of the camera in a preset direction relative to the reference image when capturing the target image based on the reference pose data and the second pose data; determining the trend type of the first motion trend based on the sign of the first translation increment, and determining the trend type of the second motion trend based on the sign of the second translation increment.

[0124] In some embodiments of this application, the preset direction includes the extension direction of the camera lens when capturing a reference image.

[0125] In some embodiments of this application, the steps of the similarity calculation module 84 in determining the first similarity between the first virtual image and the target image, and the second similarity between the second virtual image and the target image, include: dividing the first virtual image into multiple partially overlapping first virtual image blocks, dividing the second virtual image into multiple partially overlapping second virtual image blocks, and dividing the target image into multiple partially overlapping real image blocks; determining the first virtual feature value of the first virtual image block, the second virtual feature value of the second virtual image block, and the real feature value of the real image block, wherein the first virtual feature value, the second virtual feature value, and the real feature value all include the standard deviation of pixel intensity values within the image block, and a whiteness value used to reflect the proportion of specified pixels in the image block; deleting the first virtual image blocks, the second virtual image blocks, and the real image blocks whose whiteness values are greater than a preset whiteness value; and arranging the retained first virtual image blocks in descending order of the standard deviation of pixel intensity values to obtain the first virtual image block sequence. The system firstly retrieves a sequence of real image blocks and then arranges the remaining real image blocks in descending order of the standard deviation of their pixel intensity values to obtain a second virtual image block sequence. A predetermined number of virtual image blocks are selected from the first virtual image sequence in descending order to obtain a third virtual image block sequence, and a predetermined number of virtual image blocks are selected from the second virtual image sequence in descending order to obtain a fourth virtual image block sequence. A predetermined number of real image blocks are selected from the first real image block sequence to obtain a second real image block sequence. A first root mean square error (RMSE) value is determined between the third virtual image block sequence and the second real image block sequence, and a second RMSE value is determined between the fourth virtual image block sequence and the second real image block sequence. The first RMSE value is used to characterize a first similarity, and the second RMSE value is used to characterize a second similarity.

[0126] In some embodiments of this application, a designated pixel in an image block is determined by: determining the saturation value and brightness value of a pixel in the image block; and determining the pixel as a designated pixel when the saturation value of the pixel is less than a preset saturation value and the brightness value of the pixel is greater than a preset brightness value.

[0127] In some embodiments of this application, each type of second pose data corresponds to a second motion trend and a second similarity; the step of the processing module 86 in determining the target pose data of the camera when capturing the target image based on the first motion trend, the second motion trend, the first similarity and the second similarity includes: selecting candidate second pose data from at least one type of second pose data, wherein the candidate second pose data is the second pose data whose corresponding second motion trend is consistent with the first motion trend; and determining the target pose data based on the candidate second pose data, the first pose data, the second similarity corresponding to the candidate second pose data, and the first similarity.

[0128] In some embodiments of this application, the step of processing module 86 determining target pose data based on candidate second pose data, first pose data, second similarity corresponding to candidate second pose data, and first similarity includes: using the first pose data as a default pose data value and using the first similarity as a default similarity value; determining the candidate second pose data with the largest corresponding second similarity as the target candidate second pose data; when the second similarity corresponding to the target candidate second pose data is greater than the default similarity value, updating the default similarity value to the second similarity corresponding to the target candidate second pose data and updating the default pose data value to the target candidate second pose data; determining the default pose data value as the target pose data and determining the default similarity value as the target similarity value when the camera captures the target image.

[0129] In some embodiments of this application, after determining the target pose data when the camera captures the target image, the camera fusion positioning device is further configured to: acquire a number of real images captured by the camera in chronological order, and a third similarity corresponding to each real image, wherein the target image is the last image among the number of real images, and the third similarity of the real image is a similarity determined based on the first and second similarities corresponding to the real image; determine the tracking state index of the target image based on the third similarity corresponding to each real image, wherein the tracking state index includes a state difference and a state advantage; and update the target pose data when the target image is determined to be a keyframe image and the corresponding tracking state index is a state difference.

[0130] In some embodiments of this application, the step of the camera fusion positioning device determining the tracking status index of the target image based on the third similarity corresponding to each frame of real images includes: calculating the sum of the third similarities corresponding to each frame of real images; determining the tracking status index as a good state when the sum of the third similarities is not less than a preset threshold; and determining the tracking status index as a poor state when the sum of the third similarities is less than the preset threshold.

[0131] According to an embodiment of this application, a non-volatile storage medium is also provided, which stores a computer program. When executed by a processor, the computer program implements the following camera fusion positioning method: acquiring first pose data and at least one second pose data of the camera in the physiological channel when capturing a target image, wherein the pose data corresponding to the first pose data and the second pose data are determined in different ways, and the first pose data is camera pose data determined based on the initial pose data and motion state data of a robotic arm, the robotic arm being used to control the motion state of the camera; determining a first motion trend corresponding to the first pose data and a second motion trend corresponding to the second pose data, wherein... The trend types of the first and second motion trends include forward and backward; the first and second virtual images acquired by the virtual camera corresponding to the camera in the physiological channel model of the physiological channel are determined, and the first similarity between the first virtual image and the target image, and the second similarity between the second virtual image and the target image are determined, wherein the first virtual image is the image acquired by the virtual camera in the pose indicated by the first pose data, and the second virtual image is the image acquired by the virtual camera in the pose indicated by the second pose data; the target pose data of the camera when capturing the target image is determined based on the first motion trend, the second motion trend, the first similarity and the second similarity.

[0132] According to an embodiment of this application, a computer program product is provided, including a computer program that, when executed by a processor, implements the following camera fusion positioning method: acquiring first pose data and at least one second pose data of a camera in a physiological channel when capturing a target image, wherein the pose data corresponding to the first pose data and the second pose data are determined in different ways, and the first pose data is camera pose data determined based on the initial pose data and motion state data of a robotic arm, the robotic arm being used to control the motion state of the camera; determining a first motion trend corresponding to the first pose data and a second motion trend corresponding to the second pose data, wherein the trend types of the first motion trend and the second motion trend include forward and backward; determining a first virtual image and a second virtual image acquired by a virtual camera corresponding to the camera in a physiological channel model of the physiological channel, and determining a first similarity between the first virtual image and the target image, and a second similarity between the second virtual image and the target image, wherein the first virtual image is an image acquired by the virtual camera under the pose indicated by the first pose data, and the second virtual image is an image acquired by the virtual camera under the pose indicated by the second pose data; and determining target pose data of the camera when capturing the target image based on the first motion trend, the second motion trend, the first similarity, and the second similarity.

[0133] According to an embodiment of this application, a surgical procedure is provided, comprising the following steps:

[0134] The first step is to determine the location of the lesion within the physiological pathway;

[0135] The second step involves acquiring the first pose data and at least one second pose data of the camera in the physiological channel as it moves toward the lesion location. The pose data corresponding to the first pose data and the second pose data are determined in different ways. The first pose data is the camera pose data determined based on the initial pose data and motion state data of the robotic arm. The robotic arm is used to control the motion state of the camera.

[0136] The third step is to determine the first motion trend corresponding to the first pose data and the second motion trend corresponding to the second pose data. The trend types of the first motion trend and the second motion trend include forward and backward.

[0137] The fourth step is to determine the first virtual image and the second virtual image acquired by the virtual camera corresponding to the camera in the physiological channel model of the physiological channel, and to determine the first similarity between the first virtual image and the target image, and the second similarity between the second virtual image and the target image. The first virtual image is the image acquired by the virtual camera in the pose indicated by the first pose data, and the second virtual image is the image acquired by the virtual camera in the pose indicated by the second pose data.

[0138] The fifth step is to determine the target pose data of the camera when capturing the target image based on the first motion trend, the second motion trend, the first similarity and the second similarity.

[0139] The sixth step is to run the preset surgical procedure after determining the location of the camera at the lesion based on the target pose data.

[0140] See Figure 10 The present application provides a schematic diagram of the structure of an electronic device according to an embodiment. Figure 10 As shown, the electronic device includes a memory 1001 and a processor 1002.

[0141] The memory 1001 stores an executable computer program 1003. The processor 1002, coupled to the memory 1001, calls the executable computer program 1003 stored in the memory to execute the camera fusion positioning method provided in the above embodiment.

[0142] For example, the computer program 1003 can be divided into one or more modules / units, which are stored in the memory 1001 and executed by the processor 1002 to complete this application. The one or more modules / units may include various modules in the camera fusion positioning device in the above embodiments, such as: pose calculation module, motion trend calculation module, similarity calculation module, and processing module.

[0143] Furthermore, the device also includes at least one input device and at least one output device.

[0144] The processor 1002, memory 1001, input devices, and output devices mentioned above can be connected via a bus.

[0145] The input device can be a camera, touch panel, physical buttons, or mouse, etc. The output device can be a display screen.

[0146] Furthermore, the device may include more components than illustrated, or combine certain components, or different components, such as network access devices, sensors, etc.

[0147] The processor 1002 can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor.

[0148] The memory 1001 can be, for example, a hard disk drive, non-volatile memory (such as flash memory or other electronically programmable erasure-restricted memory used to form a solid-state drive), volatile memory (such as static or dynamic random access memory), etc., and this application embodiment is not limited thereto. Specifically, the memory 1001 can be an internal storage unit of the electronic device, such as the hard disk or RAM of the electronic device. The memory 1001 can also be an external storage device of the electronic device, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the electronic device. Further, the memory 1001 can also include both internal storage units and external storage devices of the electronic device. The memory 1001 is used to store computer programs and other programs and data required by the terminal. The memory 1001 can also be used to temporarily store data that has been output or will be output.

[0149] Furthermore, embodiments of this application also provide a computer-readable storage medium, which may be disposed in the electronic device of the above embodiments, and the computer-readable storage medium may be the aforementioned... Figure 10 The memory 1001 in the illustrated embodiment stores a computer program on the computer-readable storage medium. When executed by a processor, the program implements the image-based target position adjustment method described in the foregoing embodiments. Furthermore, the computer-readable storage medium can also be various media capable of storing program code, such as a USB flash drive, external hard drive, read-only memory (ROM), RAM, magnetic disk, or optical disk.

[0150] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or modules may be electrical, mechanical, or other forms.

[0151] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical modules; that is, they may be located in one place or distributed across multiple network modules. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0152] Furthermore, the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module. The integrated modules described above can be implemented in hardware or as software functional modules.

[0153] If the integrated module is implemented as a software functional module and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to related technologies, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a readable storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned readable storage medium includes various media capable of storing program code, such as USB flash drives, external hard drives, ROM, RAM, magnetic disks, or optical disks.

[0154] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, as some steps may be performed in other orders or simultaneously according to this application. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to this application.

[0155] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0156] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.

[0157] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of this application.

Claims

1. A camera fusion positioning method, characterized in that, include: Acquire first pose data and at least one second pose data of a camera in a physiological channel when capturing a target image, wherein the pose data corresponding to the first pose data and the second pose data are determined in different ways, and the first pose data is camera pose data determined based on the initial pose data and motion state data of the robotic arm, wherein the robotic arm is used to control the motion state of the camera, and the physiological channel includes a bronchus. Determine a first motion trend corresponding to the first pose data and a second motion trend corresponding to the second pose data, wherein the trend types of the first motion trend and the second motion trend include forward and backward. A first virtual image and a second virtual image acquired by the virtual camera corresponding to the camera in the physiological channel model of the physiological channel are determined, and a first similarity between the first virtual image and the target image and a second similarity between the second virtual image and the target image are determined, wherein the first virtual image is an image acquired by the virtual camera in the pose indicated by the first pose data, and the second virtual image is an image acquired by the virtual camera in the pose indicated by the second pose data. Based on the first motion trend, the second motion trend, the first similarity, and the second similarity, the target pose data of the camera when capturing the target image is determined.

2. The camera fusion positioning method according to claim 1, characterized in that, Each type of second pose data corresponds to a second motion trend and a second similarity; the target pose data of the camera when capturing the target image is determined based on the first motion trend, the second motion trend, the first similarity, and the second similarity, including: Select candidate second pose data from the at least one second pose data, wherein the candidate second pose data is second pose data whose corresponding second motion trend is consistent with the first motion trend; The target pose data is determined based on the candidate second pose data, the first pose data, the second similarity corresponding to the candidate second pose data, and the first similarity.

3. The camera fusion positioning method according to claim 2, wherein, The target pose data is determined based on the candidate second pose data, the first pose data, the second similarity corresponding to the candidate second pose data, and the first similarity, including: The first pose data is used as the default pose data, and the first similarity is used as the default similarity. The candidate second pose data with the highest corresponding second similarity is determined as the target candidate second pose data; If the second similarity corresponding to the target candidate second pose data is greater than the default similarity value, the default similarity value is updated to the second similarity corresponding to the target candidate second pose data, and the default pose data value is updated to the target candidate second pose data. The pose data is determined to have a default value of the target pose data, and the similarity is determined to have a default value of the target similarity when the camera captures the target image.

4. The camera fusion positioning method of claim 1, wherein, Determining a first similarity between the first virtual image and the target image, and a second similarity between the second virtual image and the target image, includes: The first virtual image is divided into multiple partially overlapping first virtual image blocks, the second virtual image is divided into multiple partially overlapping second virtual image blocks, and the target image is divided into multiple partially overlapping real image blocks; Determine the first virtual feature value of the first virtual image block, the second virtual feature value of the second virtual image block, and the real feature value of the real image block, wherein the first virtual feature value, the second virtual feature value, and the real feature value all include the standard deviation of the pixel intensity value within the image block, and a whiteness value used to reflect the proportion of a specified pixel in the image block; Delete the first virtual image block, the second virtual image block, and the real image block whose whiteness value is greater than a preset whiteness value; The retained first virtual image blocks are arranged in descending order of the standard deviation of the pixel intensity values of each first virtual image block to obtain a first virtual image block sequence; and the retained second virtual image blocks are arranged in descending order of the standard deviation of the pixel intensity values of each second virtual image block to obtain a second virtual image block sequence. The retained real image blocks are arranged in descending order of the standard deviation of the pixel intensity values of each real image block to obtain the first real image block sequence. A third virtual image block sequence is obtained by selecting a preset number of virtual image blocks from the first virtual image sequence in a front-to-back order, and a fourth virtual image block sequence is obtained by selecting a preset number of virtual image blocks from the second virtual image sequence in a front-to-back order. A second real image block sequence is obtained by selecting the preset number of real image blocks from the first real image block sequence; A first root mean square error value is determined between the third virtual image block sequence and the second real image block sequence, and a second root mean square error value is determined between the fourth virtual image block sequence and the second real image block sequence, wherein the first root mean square error value is used to characterize the first similarity, and the second root mean square error value is used to characterize the second similarity.

5. The camera fusion positioning method according to claim 4, characterized in that, The specified pixels in the image block are determined in the following way: Determine the saturation and brightness values of the pixels in the image block; The pixel is determined as the designated pixel when the saturation value of the pixel is less than the preset saturation value and the brightness value of the pixel is greater than the preset brightness value.

6. The camera fusion positioning method of claim 1, wherein, Determining the first motion trend corresponding to the first pose data and the second motion trend corresponding to the second pose data includes: A reference image and reference pose data of the camera when capturing the reference image are determined, wherein the reference image and the target image are two consecutive frames captured by the camera in the physiological channel, and the camera first captures the reference image and then captures the target image; Based on the reference pose data and the first pose data, a first translation increment is determined by the camera when capturing the target image in a preset direction relative to when capturing the reference image; and based on the reference pose data and the second pose data, a second translation increment is determined by the camera when capturing the target image in a preset direction relative to when capturing the reference image. The trend type of the first motion trend is determined based on the sign of the first translation increment, and the trend type of the second motion trend is determined based on the sign of the second translation increment.

7. The camera fusion positioning method according to claim 6, characterized in that, The preset direction includes the extension direction of the camera lens when capturing the reference image.

8. The camera fusion positioning method of claim 1, wherein, After determining the target pose data of the camera when capturing the target image, the camera fusion positioning method further includes: Acquire a number of real images captured by the camera in chronological order, and a third similarity corresponding to each real image, wherein the target image is the last image in the number of real images, and the third similarity of the real image is a similarity determined based on the first similarity and the second similarity corresponding to the real image. Based on the third similarity corresponding to the real image in each frame, the tracking state index of the target image is determined, wherein the tracking state index includes state difference and state excellence; If the target image is determined to be a keyframe image and the corresponding tracking state index is a state difference, the target pose data is updated.

9. The camera fusion positioning method according to claim 8, wherein, The tracking state index of the target image is determined based on the third similarity corresponding to the real image in each frame, including: Calculate the sum of the third similarity scores corresponding to the real images in each frame; If the sum of the third similarities is not less than a preset threshold, the tracking state index is determined to be in a superior state. If the sum of the third similarity is less than the preset threshold, the tracking state index is determined to be a state difference.

10. The camera fusion positioning method of claim 1, wherein, The at least one second pose data includes second pose data determined based on the target image and the reference image, wherein the reference image and the target image are two consecutive frames of images acquired by the camera in the physiological channel, and the camera acquires the reference image first and then the target image.

11. The camera fusion positioning method of claim 1, wherein, The at least one second pose data includes second pose data determined based on the shape of the flexible device, wherein the flexible device is connected to the camera.

12. A camera fusion positioning apparatus, characterized by, include: The pose calculation module is used to determine the first pose data and at least one second pose data of the camera in the physiological channel when capturing the target image. The pose data corresponding to the first pose data and the second pose data are determined in different ways. The first pose data is the camera pose data determined based on the initial pose data and motion state data of the robotic arm. The robotic arm is used to control the motion state of the camera. The physiological channel includes the bronchus. The motion trend calculation module is used to determine the first motion trend corresponding to the first pose data and the second motion trend corresponding to the second pose data, wherein the trend types of the first motion trend and the second motion trend include forward and backward. A similarity calculation module is used to determine a first virtual image and a second virtual image acquired by a virtual camera corresponding to the camera in the physiological channel model of the physiological channel, and to determine a first similarity between the first virtual image and the target image, and a second similarity between the second virtual image and the target image, wherein the first virtual image is an image acquired by the virtual camera in the pose indicated by the first pose data, and the second virtual image is an image acquired by the virtual camera in the pose indicated by the second pose data; The processing module is used to determine the target pose data of the camera when capturing the target image based on the first motion trend, the second motion trend, the first similarity, and the second similarity.

13. A camera fusion positioning system, characterized by Includes cameras, electronic devices, and robotic arms, among which, The electronic device is configured to determine first pose data and at least one second pose data of a camera in a physiological channel when capturing a target image, wherein the first pose data and the second pose data are determined in different ways, and the first pose data is camera pose data determined based on the initial pose data and motion state data of a robotic arm, the robotic arm being used to control the motion state of the camera; determine a first motion trend corresponding to the first pose data and a second motion trend corresponding to the second pose data, wherein the trend types of the first motion trend and the second motion trend include forward and backward; determine a first virtual image and a second virtual image acquired by a virtual camera corresponding to the camera in a physiological channel model of the physiological channel, and determine a first similarity between the first virtual image and the target image, and a second similarity between the second virtual image and the target image, wherein the first virtual image is an image acquired by the virtual camera under the pose indicated by the first pose data, and the second virtual image is an image acquired by the virtual camera under the pose indicated by the second pose data; and determine the target pose data of the camera when capturing the target image based on the first motion trend, the second motion trend, the first similarity, and the second similarity.

14. A non-volatile storage medium storing a computer program that, when executed by a processor, implements the method described in any one of claims 1 to 10.

15. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the method as described in any one of claims 1 to 11.

16. A computer program product comprising a computer program that, when executed by a processor, implements the method according to any one of claims 1 to 11.