DSA image-based target detection methods, devices, electronic equipment, and media
By using a target recognition model based on DSA images, efficient detection of the positional relationship between microguidewires and catheters was achieved, solving the problem of low detection efficiency in existing technologies and improving the operation response time and sensitivity of vascular interventional surgery robots.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HANGLOK-TECH CO LTD
- Filing Date
- 2023-11-21
- Publication Date
- 2026-06-30
AI Technical Summary
In existing technologies, the efficiency of determining whether the microguidewire extends from the catheter tip by combining the force-touch module with manual monitoring of DSA images is low, which affects the interventional operation response time and sensitivity of the vascular interventional surgery robot.
A pre-trained target recognition model is used to detect the positional relationship between the microwire and the catheter based on DSA images. This model includes a combination of Backbone, Transformer, Neck and Detect modules. Through feature extraction, stitching and convolutional compression, predicted bounding boxes for the microwire and catheter are generated. The non-maximum suppression algorithm is used to determine the target recognition result.
It improves the efficiency of real-time target detection and positioning of catheters and guidewires, reduces the interventional operation response time of vascular interventional surgery robots, and enhances operational sensitivity.
Smart Images

Figure CN117710948B_ABST
Abstract
Description
Technical Field
[0001] This application relates to, but is not limited to, the field of image processing technology, and in particular to a target detection method, apparatus, electronic device, and medium based on DSA images. Background Technology
[0002] During the operation of a vascular interventional surgical robot, real-time monitoring of DAS images acquired by a Digital Subtraction Angiography (DSA) device is necessary for operational guidance. Once the microguidewire is detected extending from the catheter tip, its bending and position information needs to be monitored in real time. Furthermore, to ensure interventional safety, the microguidewire's force-tactile module must be activated. However, relying solely on the force-tactile module's response signal, combined with manual real-time monitoring of DAS images, to determine whether the microguidewire has extended from the catheter tip is inefficient. This inefficiency can negatively impact the response time of the vascular interventional surgical robot and reduce its operational sensitivity. Summary of the Invention
[0003] This application provides a target detection method, device, electronic device, and medium based on DSA images, which can effectively improve the efficiency of real-time target detection and positioning of catheters and guidewires, thereby reducing the response time of interventional operations of vascular interventional surgery robots.
[0004] In a first aspect, embodiments of this application provide a target detection method based on DSA images, including:
[0005] Acquire DSA images;
[0006] A pre-trained target recognition model is obtained, and a DSA image is input into the target recognition model to obtain a target recognition result. The target recognition result includes a reference DSA image, which carries a microguidewire prediction box and a catheter prediction box. The microguidewire prediction box corresponds to first identification information, and the catheter prediction box corresponds to second identification information.
[0007] The positional relationship between the microguidewire and the catheter is determined based on the reference DSA image, the first identification information, and the second identification information.
[0008] In some embodiments, the target recognition model includes a Backbone module, a Transformer module, a Neck module, and a Detect module. The step of inputting the DSA image into the target recognition model to obtain the target recognition result includes:
[0009] The DSA image is input into the Backbone module for feature extraction to obtain a first feature image, a second feature image, and a third feature image, wherein the image dimensions of the first feature image, the second feature image, and the third feature image are different from each other;
[0010] The first feature image, the second feature image, and the third feature image are downsampled respectively, and the downsampled first feature image, second feature image, and third feature image are stitched together to obtain the fourth feature image;
[0011] The fourth feature image is input into the Transformer module for convolutional compression processing to obtain the fifth feature image;
[0012] The first feature image and the fifth feature image are concatenated to obtain the sixth feature image;
[0013] The third feature image is concatenated with the fifth feature image to obtain the seventh feature image;
[0014] The fifth feature image, the sixth feature image, and the seventh feature image are input into the Neck module to obtain three eighth feature images with prediction boxes at different scales;
[0015] The eighth feature image carrying the prediction box is input into the Detect module to obtain the target recognition result.
[0016] In some embodiments, the Backbone module includes a first convolutional layer, a second convolutional layer, and a first residual connection module. The step of inputting the DSA image into the Backbone module for feature extraction to obtain a first feature image, a second feature image, and a third feature image includes:
[0017] The DSA image is input into the first convolutional layer, the second convolutional layer, and the first residual connection module connected in sequence, and the first intermediate image is output.
[0018] The first intermediate image is input into the second convolutional layer and the first residual connection module connected in sequence, and the first feature image is output.
[0019] The first feature image is input into the first convolutional layer and the first residual connection module connected in sequence, and the second feature image is output.
[0020] The second feature image is input into the first convolutional layer, and the third feature image is output.
[0021] In some embodiments, the Transformer module includes a Flatten layer, a first Permute layer, a second Permute layer, a Linear layer, and a Transformer layer. The step of inputting the fourth feature image into the Transformer module for convolutional compression processing to obtain the fifth feature image includes:
[0022] The fourth feature image is input into the Flatten layer for dimensionality reduction processing to obtain the second intermediate image after dimensionality reduction.
[0023] The second intermediate image is input into the first Permute layer to rearrange the image dimensions, resulting in the third intermediate image;
[0024] The feature vector obtained after processing the third intermediate image through the Linear layer is added to the third intermediate image to calculate the fourth intermediate image;
[0025] The fourth intermediate image is input to the second Permute layer and the Transformer layer, which are connected in sequence, and the fifth feature image is output.
[0026] In some embodiments, the Neck module includes an upsampling layer, an SPPF module, a second residual connection module, a concat layer, and a third convolutional layer. The eighth feature image includes a first-scale image, a second-scale image, and a third-scale image. The step of inputting the fifth, sixth, and seventh feature images into the Neck module to obtain three eighth feature images carrying prediction boxes at different scales includes:
[0027] The seventh feature image is input into the third convolutional layer and the upsampling layer connected in sequence to obtain the fourth intermediate image;
[0028] The fourth intermediate image and the fifth feature image are stitched together, and the stitched feature image is input into the second residual connection module and the SPPF module connected in sequence to obtain the fifth intermediate image;
[0029] The fifth intermediate image is input to the upsampling layer, and the upsampled feature image and the sixth feature image are stitched together. The stitched feature image is then input to the second residual connection module, and the first scale image is output.
[0030] The first-scale image after passing through the third convolutional layer and the fifth intermediate image are stitched together to obtain the sixth intermediate image;
[0031] The sixth intermediate image is passed through the second residual connection module to output the second scale image;
[0032] The second-scale image is input into the third convolutional layer, the concat layer, and the second residual connection module, which are connected in sequence, and the third-scale image is output.
[0033] In some embodiments, the Detect module includes a reshape layer, and the step of inputting the eighth feature image carrying the predicted bounding box into the Detect module to obtain the target recognition result includes:
[0034] The eighth feature image carrying the predicted bounding box and the prior bounding box information are input into the reshape layer, and the eighth feature image carrying the candidate predicted bounding box is output. The prior bounding box information is obtained by clustering the predicted bounding boxes in the preset dataset.
[0035] The target prediction box is determined from the candidate prediction box using a non-maximum suppression algorithm to obtain the target recognition result. The target recognition result includes the reference DSA image carrying the target prediction box, and the target prediction box is the prediction box in the candidate prediction box that does not meet the window overlap condition.
[0036] In some embodiments, before inputting the DSA image into the target recognition model to obtain the target recognition result, the method further includes:
[0037] Obtain the preset image preprocessing rules;
[0038] The DSA image is preprocessed according to the image preprocessing rules.
[0039] Secondly, embodiments of this application provide a target detection device based on DSA images, comprising:
[0040] The image acquisition module is used to acquire DSA images;
[0041] The target recognition module is used to acquire a pre-trained target recognition model, input the DSA image into the target recognition model, and obtain a target recognition result. The target recognition result includes a reference DSA image, which carries a microguidewire prediction box and a catheter prediction box. The microguidewire prediction box corresponds to first identification information, and the catheter prediction box corresponds to second identification information.
[0042] The data processing module is used to determine the positional relationship between the microguidewire and the catheter based on the reference DSA image, the first identification information, and the second identification information.
[0043] Thirdly, embodiments of this application provide an electronic device, including at least one control processor and a memory for communicatively connecting to the at least one control processor; the memory stores instructions executable by the at least one control processor, the instructions being executed by the at least one control processor to enable the at least one control processor to perform the DSA image-based target detection method as described in the first aspect.
[0044] Fourthly, embodiments of this application also provide a computer-readable storage medium storing computer-executable instructions for performing the PRACH transmission timing determination method as described in the first aspect.
[0045] This application provides a target detection method, device, electronic device, and medium based on DSA images. The method includes: acquiring a digital subtraction angiography (DSA) image; acquiring a pre-trained target recognition model; inputting the DSA image into the target recognition model to obtain a target recognition result, wherein the target recognition result includes a reference DSA image carrying a microguidewire prediction frame and a catheter prediction frame, the microguidewire prediction frame corresponding to first identification information, and the catheter prediction frame corresponding to second identification information; and determining the positional relationship between the microguidewire and the catheter based on the reference DSA image, the first identification information, and the second identification information. According to the solution provided in this application, by using a pre-set target recognition model to replace manual detection combined with a force-tactile module to detect the positional relationship between the microguidewire and the catheter, the efficiency of real-time target detection and positioning of the catheter and guidewire can be effectively improved, thereby reducing the response time of interventional operations of vascular interventional surgery robots. Attached Figure Description
[0046] Figure 1 This is a flowchart of the steps of a target detection method based on DSA images provided in one embodiment of this application;
[0047] Figure 2 This is a flowchart of the steps for obtaining target recognition results provided in another embodiment of this application;
[0048] Figure 3 This is a flowchart of the steps for obtaining a first feature image, a second feature image, and a third feature image provided in another embodiment of this application;
[0049] Figure 4 This is a flowchart of the steps for obtaining the fifth feature image provided in another embodiment of this application;
[0050] Figure 5 This is a flowchart of the steps to obtain an eighth feature image with prediction boxes at three different scales, provided in another embodiment of this application.
[0051] Figure 6 This is a flowchart of the steps for obtaining target recognition results provided in another embodiment of this application;
[0052] Figure 7 This is a flowchart of the steps for image preprocessing of a DSA image provided in another embodiment of this application;
[0053] Figure 8 This is a model architecture diagram of a target detection model provided in another embodiment of this application;
[0054] Figure 9 This is a schematic diagram of a target detection device based on DSA images provided in another embodiment of this application;
[0055] Figure 10 This is a structural diagram of an electronic device provided in another embodiment of this application. Detailed Implementation
[0056] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0057] It is understandable that although functional modules are divided in the device schematic diagram and a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the module division in the device or the order in the flowchart. The terms "first," "second," etc., in the specification, claims, or the aforementioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.
[0058] During the operation of a vascular interventional surgical robot, real-time monitoring of DAS images acquired by the DSA device is necessary for operational guidance. Once the microguidewire is detected extending from the catheter tip, its bending and position information needs to be monitored in real time. Furthermore, to ensure interventional safety, the microguidewire's force-tactile module must be activated. However, relying solely on the force-tactile module's response signal, combined with manual real-time monitoring of DAS images acquired by the DSA device, to determine whether the microguidewire has extended from the catheter tip is inefficient. This inefficiency can negatively impact the response time of the vascular interventional surgical robot and reduce its operational sensitivity.
[0059] To address the aforementioned problems, this application provides a target detection method, apparatus, electronic device, and medium based on DSA images. The method includes: acquiring a digital subtraction angiography (DSA) image; acquiring a pre-trained target recognition model; inputting the DSA image into the target recognition model to obtain a target recognition result, wherein the target recognition result includes a reference DSA image carrying a microguidewire prediction frame and a catheter prediction frame, the microguidewire prediction frame corresponding to first identification information, and the catheter prediction frame corresponding to second identification information; and determining the positional relationship between the microguidewire and the catheter based on the reference DSA image, the first identification information, and the second identification information. According to the solution provided in this application, by using a pre-set target recognition model to replace manual detection combined with a force-tactile module to detect the positional relationship between the microguidewire and the catheter, the efficiency of real-time target detection and positioning of the catheter and guidewire can be effectively improved, thereby reducing the response time of interventional operations by vascular interventional surgery robots.
[0060] The embodiments of this application will be further described below with reference to the accompanying drawings.
[0061] refer to Figure 1 , Figure 1 Yes, this application provides a target detection method based on DSA images, which includes, but is not limited to, the following steps:
[0062] Step S110: Obtain a DSA image;
[0063] Step S120: Obtain a pre-trained target recognition model, input the DSA image into the target recognition model, and obtain the target recognition result. The target recognition result includes a reference DSA image, which carries a microguidewire prediction box and a catheter prediction box. The microguidewire prediction box corresponds to first identification information, and the catheter prediction box corresponds to second identification information.
[0064] Step S130: Determine the positional relationship between the microguidewire and the catheter based on the reference DSA image, the first identification information, and the second identification information.
[0065] It should be noted that this embodiment does not limit the specific content of the first identification information and the second identification information. For example, the first identification information includes the position information of the microguidewire prediction frame on the reference DSA image and the coordinates of the microguidewire pixel points in the microguidewire prediction frame, and the second identification information includes the position information of the catheter prediction frame on the reference DSA image and the coordinates of the catheter pixel points in the catheter prediction frame.
[0066] It is understandable that using a pre-set target recognition model to replace manual detection combined with a force-tactile module to detect the positional relationship between the microguidewire and the catheter can effectively improve the efficiency of real-time target detection and positioning of the catheter and guidewire. This provides a valid data basis for determining whether the microguidewire has extended beyond the catheter, thereby reducing the response time of the interventional operation of the vascular interventional surgery robot. It should be noted that this embodiment does not limit the specific method of determining the positional relationship between the microguidewire and the catheter based on the reference DSA image, the first identification information, and the second identification information. Referring to the above description, the first pixel coordinates of the edge endpoint of the microguidewire in the microguidewire prediction frame can be determined based on the first identification information, and the second pixel coordinates of the edge endpoint of the catheter near the microguidewire in the catheter prediction frame can be determined based on the second identification information. The distance between the first and second pixel coordinates is then calculated to determine whether the microguidewire has extended beyond the catheter.
[0067] It should be noted that the embodiments of this application do not limit the specific structure of the target recognition model, and it can be as follows: Figure 8 As shown, the target recognition model in this embodiment uses the YOLOv5 model, which offers a good balance between computational efficiency and accuracy, as its main framework. This target recognition model includes a Backbone module, a Transformer module, a Neck module, and a Detect module. Based on this structure, refer to... Figure 2 In some embodiments, Figure 1 Step S120 includes, but is not limited to, the following steps:
[0068] Step S210: Input the DSA image into the Backbone module for feature extraction to obtain a first feature image, a second feature image, and a third feature image, wherein the image dimensions of the first feature image, the second feature image, and the third feature image are different from each other;
[0069] Step S220: Downsample the first feature image, the second feature image, and the third feature image respectively, and then stitch the downsampled first feature image, second feature image, and third feature image together to obtain the fourth feature image;
[0070] Step S230: Input the fourth feature image into the Transformer module for convolutional compression processing to obtain the fifth feature image;
[0071] Step S240: The first feature image and the fifth feature image are concatenated to obtain the sixth feature image;
[0072] Step S250: The third feature image and the fifth feature image are concatenated to obtain the seventh feature image;
[0073] Step S260: Input the fifth feature image, the sixth feature image, and the seventh feature image into the Neck module to obtain three eighth feature images with prediction boxes at different scales;
[0074] Step S270: Input the eighth feature image carrying the predicted bounding box into the Detect module to obtain the target recognition result.
[0075] It is understood that the DSA image is input into the Backbone module for feature extraction, resulting in a first feature image, a second feature image, and a third feature image. The first feature image has a dimension of 80*80, the second feature image has a dimension of 40*40, and the third feature image has a dimension of 20*20, corresponding to different receptive field sizes. Furthermore, this embodiment does not limit the specific structure of the Backbone module. The Backbone module in this embodiment may include a first convolutional layer, a second convolutional layer, and a first residual connection module, as described above. Figure 3 Based on this structure, the Backbone module, Figure 2 Step S210 includes, but is not limited to, the following steps:
[0076] Step S310: Input the DSA image into the first convolutional layer, the second convolutional layer and the first residual connection module connected in sequence, and output the first intermediate image;
[0077] Step S320: Input the first intermediate image into the second convolutional layer and the first residual connection module connected in sequence, and output the first feature image;
[0078] Step S330: Input the first feature image into the first convolutional layer and the first residual connection module connected in sequence, and output the second feature image;
[0079] Step S340: Input the second feature image into the first convolutional layer and output the third feature image.
[0080] Understandably, the first residual connection module consists of the backbone and branches of the Yolov5 model, which makes it less prone to gradient vanishing problems as the network depth increases, thus making it easier to train and ensuring the recognition accuracy of the target recognition model. The first feature image, second feature image, and third feature image output by the Backbone module in this embodiment can provide an effective data foundation for the Transformer module's operations.
[0081] Additionally, it should be noted that this embodiment does not limit the specific structure of the Transformer module. The Transformer module in this embodiment includes a Flatten layer, a first Permute layer, a second Permute layer, a Linear layer, and a Transformer layer, as shown in the reference. Figure 4 In some embodiments, based on the Transformer module of this embodiment, Figure 2 Step S230 includes, but is not limited to, the following steps:
[0082] Step S410: Input the fourth feature image into the Flatten layer for dimensionality reduction to obtain the second intermediate image after dimensionality reduction;
[0083] Step S420: Input the second intermediate image into the first Permute layer to rearrange the image dimensions to obtain the third intermediate image;
[0084] Step S430: Add the feature vector obtained after processing the third intermediate image through the Linear layer to the third intermediate image to obtain the fourth intermediate image;
[0085] Step S440: Input the fourth intermediate image into the second Permute layer and the Transformer layer connected in sequence, and output the fifth feature image.
[0086] Understandably, since the targets to be identified, namely catheters and microwires, are relatively small in the DSA image and have low contrast with the background, this embodiment adds a Transformer module to the Yolov5 model to enhance the recognition of small targets and the model's ability to acquire global information. This embodiment downsamples feature maps of three different scales into 20x20 feature points and concatenates them as tokens input to the Transformer module. Based on a multi-head attention mechanism, the output learns more global and effective information. Deeper features, after compression by convolutional layers, have better expressive power and thus contain richer information, which is more conducive to learning. Adding the Transformer in this dimension also avoids excessive computational cost, effectively saving computational efficiency. Furthermore, the fifth feature image output by the Transformer module in this embodiment provides a valid data foundation for the Neck module's operations.
[0087] Additionally, it should be noted that this embodiment does not limit the specific structure of the Neck module. The Neck module in this embodiment includes an upsampling layer, an SPPF module, a second residual connection module, a concat layer, and a third convolutional layer. In some embodiments, the eighth feature image includes a first-scale image, a second-scale image, and a third-scale image. Based on the above Neck module structure, refer to... Figure 5 , Figure 2 Step S260 includes, but is not limited to, the following steps:
[0088] Step S510: Input the seventh feature image into the third convolutional layer and the upsampling layer connected in sequence to obtain the fourth intermediate image;
[0089] Step S520: The fourth intermediate image and the fifth feature image are stitched together, and the stitched feature image is input into the second residual connection module and the SPPF module connected in sequence to obtain the fifth intermediate image;
[0090] Step S530: Input the fifth intermediate image into the upsampling layer, and stitch the upsampled feature image and the sixth feature image together. Input the stitched feature image into the second residual connection module and output the first scale image.
[0091] Step S540: The first scale image and the fifth intermediate image after passing through the third convolutional layer are stitched together to obtain the sixth intermediate image;
[0092] Step S550: The sixth intermediate image is passed through the second residual connection module to output the second scale image;
[0093] Step S560: Input the second-scale image into the third convolutional layer, the concat layer, and the second residual connection module connected in sequence, and output the third-scale image.
[0094] It is understandable that, such as Figure 8 As shown, the Neck module can decouple the fifth, sixth, and seventh feature maps output by the Transformer module into feature maps of the original size. In this embodiment, the Neck module adopts an FPN+PAN structure. First, the feature maps are decoupled into 80x80 images and stitched with images of the same scale. Then, the feature maps of 40x40 and 20x20 images obtained through the upsampling layer are also stitched together. During this process, the number of features is kept consistent with the original scale image during stitching, resulting in three eighth feature images with prediction boxes at different scales, namely the first scale image, the second scale image, and the third scale image, which provide an effective data foundation for the subsequent Detect module's operation.
[0095] It should be noted that in this embodiment, the SPPF module is used to replace the traditional SPP module, and the max pooling layer in the original SPP module is changed to multiple new 5*5 max pooling layers that are serially passed through the input. Since the multiple new max pooling layers in serial order reduce the repeated calculations caused by the large kenel in the original process, more computing resources can be saved while using the same receptive field size, thereby further improving the efficiency of target recognition.
[0096] Furthermore, the embodiments of this application do not limit the specific structure of the Detect module. The Detect module in this embodiment includes a reshape layer. In some embodiments, refer to... Figure 6 , Figure 2 Step S270 includes, but is not limited to, the following steps:
[0097] Step S610: Input the eighth feature image carrying the predicted bounding box and the prior bounding box information into the reshape layer, and output the eighth feature image carrying the candidate predicted bounding box. The prior bounding box information is obtained by clustering the predicted bounding boxes in the preset dataset.
[0098] Step S620: Use the non-maximum suppression algorithm to determine the target prediction box from the candidate prediction box to obtain the target recognition result. The target recognition result includes a reference DSA image carrying the target prediction box, and the target prediction box is the prediction box in the candidate prediction box that does not meet the window overlap condition.
[0099] Understandably, each eighth feature image output by the Neck module corresponds to a prediction box with a different aspect ratio. The smaller the dimension of the eighth feature image, the larger the receptive field. The feature image information obtained after the eighth feature image is processed by the reshape layer is (bs,3,H,W,8), where bs represents the number of feature image samples, 3 represents 3 prediction boxes, W and H represent the width and height of the feature image, respectively, and 8 represents the 4-dimensional information of the feature image (x,y,h,w), that is, the center coordinates and length and width of the prediction box. Among them, 1 dimension corresponds to the confidence level c of the classification category, and the remaining 3 dimensions are composed of the classification categories, namely 0: background, 1: catheter, and 2: microwire.
[0100] It should be noted that this embodiment does not limit the training method of the target recognition model. It can be based on the regression loss function, and the initial target recognition model can be iteratively trained according to the output result of the model and the true value until the loss function value meets the convergence condition or the number of iterations meets the preset threshold.
[0101] Understandably, after the eighth feature image carrying the predicted bounding boxes is input into the Detect module, the category information of each predicted bounding box in the eighth feature image is multiplied by the confidence score of the predicted bounding box to obtain the corresponding confidence score. Predicted bounding boxes with confidence scores lower than the preset score threshold are removed to obtain candidate predicted bounding boxes. The non-maximum suppression algorithm is used to determine the target predicted bounding box from the candidate predicted bounding boxes to obtain the target recognition result, thereby providing an effective data basis for determining the positional relationship between the microguidewire and the catheter.
[0102] Additionally, refer to Figure 7 In some embodiments, during execution Figure 2 Before step S220, the target detection method based on DSA images in this embodiment includes, but is not limited to, the following steps:
[0103] Step S710: Obtain preset image preprocessing rules;
[0104] Step S720: Perform image preprocessing on the DSA image according to the image preprocessing rules.
[0105] Understandably, the main purpose of image preprocessing for DSA images is to eliminate irrelevant information, restore useful real information, enhance the detectability of relevant information, and simplify the data to the greatest extent possible, thereby improving the reliability of subsequent DSA image applications.
[0106] It should be noted that the embodiments of this application do not limit the specific method of image preprocessing for DSA images. It can be image random number rotation processing, normalization processing, or image binarization processing for DSA images, etc. Those skilled in the art can select the specific method of image preprocessing according to the actual situation.
[0107] refer to Figure 8 This embodiment also provides a target detection device 800 based on DSA images, including:
[0108] Image acquisition module 810 is used to acquire DSA images;
[0109] The target recognition module 820 is used to acquire a pre-trained target recognition model, input the DSA image into the target recognition model, and obtain the target recognition result. The target recognition result includes a reference DSA image, which carries a microguidewire prediction box and a catheter prediction box. The microguidewire prediction box corresponds to first identification information, and the catheter prediction box corresponds to second identification information.
[0110] The data processing module 830 is used to determine the positional relationship between the microguidewire and the catheter based on the reference DSA image, the first identification information, and the second identification information.
[0111] like Figure 9 As shown, Figure 9 This is a structural diagram of an electronic device provided in one embodiment of this application. The present invention also provides an electronic device 900, comprising:
[0112] The processor 910 can be implemented using a general-purpose central processing unit (CPU), microprocessor, application specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this application.
[0113] The memory 920 can be implemented as a read-only memory (ROM), static storage device, dynamic storage device, or random access memory (RAM). The memory 920 can store the operating system and other application programs. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 920 and called and executed by the processor 910 using the DSA image-based target detection method of the embodiments of this application.
[0114] The input / output interface 930 is used to implement information input and output;
[0115] The communication interface 940 is used to enable communication and interaction between this device and other devices. Communication can be achieved through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
[0116] Bus 950 transmits information between various components of the device (e.g., processor 910, memory 920, input / output interface 930, and communication interface 940);
[0117] The processor 910, memory 920, input / output interface 930 and communication interface 940 are connected to each other within the device via bus 950.
[0118] This application embodiment also provides a storage medium, which is a computer-readable storage medium, storing a computer program that, when executed by a processor, implements the above-described target detection method based on DSA images.
[0119] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory may optionally include memory remotely located relative to the processor, and these remote memories can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof. The device embodiments described above are merely illustrative, and the units described as separate components may or may not be physically separate, and may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.
[0120] It will be understood by those skilled in the art that all or some of the steps and systems in the methods disclosed above can be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components can be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit. Such software can be distributed on a computer-readable medium, which can include computer storage media (or non-transitory media) and communication media (or transient media). As is known to those skilled in the art, the term computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridges, magnetic tape, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and is accessible to a computer. Furthermore, as is known to those skilled in the art, communication media typically include computer-readable instructions, data structures, program modules, or other data in modulated data signals such as carrier waves or other transmission mechanisms, and may include any information delivery medium.
[0121] The above provides a detailed description of the preferred embodiments of the present invention. However, the present invention is not limited to the above embodiments. Those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention. All such equivalent modifications or substitutions are included within the scope defined by the claims of the present invention.
Claims
1. A method for target detection based on DSA images, characterized in that, include: Acquire digital subtraction angiography (DSA) images; A pre-trained target recognition model is obtained, and a DSA image is input into the target recognition model to obtain a target recognition result. The target recognition result includes a reference DSA image, which carries a microguidewire prediction box and a catheter prediction box. The microguidewire prediction box corresponds to first identification information, and the catheter prediction box corresponds to second identification information. The target recognition model includes a Backbone module, a Transformer module, a Neck module, and a Detect module. The Transformer module includes a Flatten layer, a first Permute layer, a second Permute layer, a Linear layer, and a Transformer layer. The Neck module includes an upsampling layer, an SPPF module, a second residual connection module, a concat layer, and a third convolutional layer. The Neck module adopts an FPN+PAN structure. The positional relationship between the microguidewire and the catheter is determined based on the reference DSA image, the first identification information, and the second identification information. The process of inputting the DSA image into the target recognition model to obtain the target recognition result includes: inputting the DSA image into the Backbone module for feature extraction to obtain a first feature image, a second feature image, and a third feature image, wherein the first feature image, the second feature image, and the third feature image have different image dimensions; performing downsampling processing on the first feature image, the second feature image, and the third feature image respectively, and concatenating the downsampled first feature image, the second feature image, and the third feature image to obtain a fourth feature image; inputting the fourth feature image into the Transformer module for convolutional compression processing to obtain a fifth feature image; concatenating the first feature image and the fifth feature image to obtain a sixth feature image; concatenating the third feature image and the fifth feature image to obtain a seventh feature image; inputting the fifth feature image, the sixth feature image, and the seventh feature image into the Neck module to obtain three eighth feature images with prediction boxes at different scales; and inputting the eighth feature image with prediction boxes into the Detect module to obtain the target recognition result. The step of inputting the fourth feature image into the Transformer module for convolutional compression to obtain the fifth feature image includes: inputting the fourth feature image into the Flatten layer for dimensionality reduction to obtain a second intermediate image after dimensionality reduction; inputting the second intermediate image into the first Permute layer for image dimension rearrangement to obtain a third intermediate image; adding the feature vector obtained after processing the third intermediate image through the Linear layer to the third intermediate image to obtain a fourth intermediate image; and inputting the fourth intermediate image into the second Permute layer and the Transformer layer connected in sequence to output the fifth feature image.
2. The target detection method based on DSA images according to claim 1, characterized in that, The Backbone module includes a first convolutional layer, a second convolutional layer, and a first residual connection module. The step of inputting the DSA image into the Backbone module for feature extraction to obtain a first feature image, a second feature image, and a third feature image includes: The DSA image is input into the first convolutional layer, the second convolutional layer, and the first residual connection module connected in sequence, and the first intermediate image is output. The first intermediate image is input into the second convolutional layer and the first residual connection module connected in sequence, and the first feature image is output. The first feature image is input into the first convolutional layer and the first residual connection module connected in sequence, and the second feature image is output. The second feature image is input into the first convolutional layer, and the third feature image is output.
3. The target detection method based on DSA images according to claim 1, characterized in that, The eighth feature image includes a first-scale image, a second-scale image, and a third-scale image. The fifth, sixth, and seventh feature images are input into the Neck module to obtain three eighth feature images with prediction boxes at different scales, including: The seventh feature image is input into the third convolutional layer and the upsampling layer connected in sequence to obtain the fourth intermediate image; The fourth intermediate image and the fifth feature image are stitched together, and the stitched feature image is input into the second residual connection module and the SPPF module connected in sequence to obtain the fifth intermediate image; The fifth intermediate image is input to the upsampling layer, and the upsampled feature image and the sixth feature image are stitched together. The stitched feature image is then input to the second residual connection module, and the first scale image is output. The first-scale image and the fifth intermediate image, which have passed through the third convolutional layer, are stitched together to obtain the sixth intermediate image. The sixth intermediate image is passed through the second residual connection module to output the second scale image; The second-scale image is input into the third convolutional layer, the concat layer, and the second residual connection module, which are connected in sequence, and the third-scale image is output.
4. The target detection method based on DSA images according to claim 1, characterized in that, The Detect module includes a reshape layer. The step of inputting the eighth feature image carrying the predicted bounding box into the Detect module to obtain the target recognition result includes: The eighth feature image carrying the predicted bounding box and the prior bounding box information are input into the reshape layer, and the eighth feature image carrying the candidate predicted bounding box is output. The prior bounding box information is obtained by clustering the predicted bounding boxes in the preset dataset. The target prediction box is determined from the candidate prediction box using a non-maximum suppression algorithm to obtain the target recognition result. The target recognition result includes the reference DSA image carrying the target prediction box, and the target prediction box is the prediction box in the candidate prediction box that does not meet the window overlap condition.
5. The target detection method based on DSA images according to claim 1, characterized in that, Before inputting the DSA image into the target recognition model to obtain the target recognition result, the method further includes: Obtain the preset image preprocessing rules; The DSA image is preprocessed according to the image preprocessing rules.
6. A target detection device based on DSA images, characterized in that, include: The image acquisition module is used to acquire digital subtraction angiography (DSA) images; The target recognition module is used to acquire a pre-trained target recognition model, input the DSA image into the target recognition model, and obtain the target recognition result. The target recognition result includes a reference DSA image, which carries a microguidewire prediction box and a catheter prediction box. The microguidewire prediction box corresponds to first identification information, and the catheter prediction box corresponds to second identification information. The target recognition model includes a Backbone module, a Transformer module, a Neck module, and a Detect module. The Transformer module includes a Flatten layer, a first Permute layer, a second Permute layer, a Linear layer, and a Transformer layer. The Neck module includes an upsampling layer, an SPPF module, a second residual connection module, a concat layer, and a third convolutional layer. The Neck module adopts an FPN+PAN structure. The data processing module is used to determine the positional relationship between the microguidewire and the catheter based on the reference DSA image, the first identification information, and the second identification information. The process of inputting the DSA image into the target recognition model to obtain the target recognition result includes: inputting the DSA image into the Backbone module for feature extraction to obtain a first feature image, a second feature image, and a third feature image, wherein the first feature image, the second feature image, and the third feature image have different image dimensions; performing downsampling processing on the first feature image, the second feature image, and the third feature image respectively, and concatenating the downsampled first feature image, the second feature image, and the third feature image to obtain a fourth feature image; inputting the fourth feature image into the Transformer module for convolutional compression processing to obtain a fifth feature image; concatenating the first feature image and the fifth feature image to obtain a sixth feature image; concatenating the third feature image and the fifth feature image to obtain a seventh feature image; inputting the fifth feature image, the sixth feature image, and the seventh feature image into the Neck module to obtain three eighth feature images with prediction boxes at different scales; and inputting the eighth feature image with prediction boxes into the Detect module to obtain the target recognition result. The step of inputting the fourth feature image into the Transformer module for convolutional compression to obtain the fifth feature image includes: inputting the fourth feature image into the Flatten layer for dimensionality reduction to obtain a second intermediate image after dimensionality reduction; inputting the second intermediate image into the first Permute layer for image dimension rearrangement to obtain a third intermediate image; adding the feature vector obtained after processing the third intermediate image through the Linear layer to the third intermediate image to obtain a fourth intermediate image; and inputting the fourth intermediate image into the second Permute layer and the Transformer layer connected in sequence to output the fifth feature image.
7. An electronic device, characterized in that, It includes at least one control processor and a memory for communicatively connecting to the at least one control processor; the memory stores instructions executable by the at least one control processor, which, when executed by the at least one control processor, enable the at least one control processor to perform the DSA image-based target detection method as described in any one of claims 1 to 5.
8. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions for causing a computer to perform the DSA image-based target detection method as described in any one of claims 1 to 5.