A medical image detection method, system, storage medium and electronic device

By combining a selective search algorithm and a multi-scale feature map extraction network with an image attention module, candidate boxes for medical images are automatically extracted and features are fused, solving the problem of doctor annotation dependence in existing technologies and achieving efficient and accurate unsupervised medical image detection.

CN117422671BActive Publication Date: 2026-06-26ZHEJIANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG UNIV
Filing Date
2023-10-08
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing medical image detection methods rely on manual annotation by doctors, resulting in high training costs and detection accuracy depending on the precision of the doctor's annotations, making it impossible to achieve efficient detection without supervision.

Method used

By employing a selective search algorithm and a multi-scale feature map extraction network, combined with an image attention module and a multi-instance detection network, and using a pre-trained model to provide supervision signals, candidate boxes are automatically extracted and features are fused, thereby reducing training costs and improving detection accuracy.

Benefits of technology

It enables medical image detection without manual annotation by doctors, reduces training costs, improves feature extraction capabilities and detection accuracy, solves the focusing problem in traditional methods, and provides more global lesion feature detection.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117422671B_ABST
    Figure CN117422671B_ABST
Patent Text Reader

Abstract

The application provides a medical image detection method, system, storage medium and electronic equipment, and the method comprises the following steps: acquiring a medical image to be detected; inputting the medical image to be detected into a network detection model, and extracting candidate boxes of different levels from the medical image to be detected through a candidate box extractor of a selective search algorithm; meanwhile, the medical image to be detected is further extracted through the pre-training model corresponding to the different levels of feature maps, and multi-scale feature maps are obtained; the two kinds of feature maps mentioned above are scaled to the same size through a pyramid pooling module and are sent into an image attention module to extract global information through information fusion; and a corresponding image detection result is obtained through a multi-instance detection network and a multi-level classification network. Through the structural and module modification of the network, the application increases the focus of the network, increases the receptive field of the network, and reduces the focusing problem in the same method. Meanwhile, a general large model is used in the design of the encoder to fine-tune the method, effectively improving the feature extraction capability of the network and reducing the training cost, improving the generalization ability of the network, and improving the precision of the medical image detection.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of computer vision technology, and specifically relates to a medical image detection method, system, storage medium, and electronic device. Background Technology

[0002] Medical image detection is the process of locating lesions within a medical image based on certain similarity features (such as brightness, color, texture, area, shape, location, local statistical features, or spectral features). A common method involves defining the smallest bounding rectangle that completely encloses the lesion. Current medical image detection methods largely rely on manual annotations by doctors to provide supervision, resulting in significant investment of human and material resources in training the detection model, and the detection accuracy depends heavily on the precision of the doctor's annotations. Summary of the Invention

[0003] The purpose of this invention is to provide a medical image detection method, system, storage medium, and electronic device, which provides an image detection method that does not require manual annotation by doctors, and can provide effective supervision signals for image detection training, thereby reducing doctors' working time.

[0004] In a first aspect, the present invention relates to a medical image detection method, characterized by comprising the following steps:

[0005] S1. Obtain medical images of the lung nodules to be detected;

[0006] S2. Input the medical image to be detected into the feature extraction module to obtain candidate boxes and feature maps, including:

[0007] S21. Construct a candidate box extractor to extract candidate boxes at different levels from the medical image to be detected; the candidate box extractor uses a selective search algorithm to obtain multiple target candidate boxes from the medical image to be detected;

[0008] S22. Construct a multi-scale feature map extraction network to extract multi-scale feature maps from the medical image to be detected; first, extract feature maps from the medical image through a pre-trained model; after extracting the feature maps of the corresponding candidate boxes through mapping, feed them into a pyramid pooling module to scale feature maps of different sizes to the same size and feed them into an image attention module to perform large-scale information fusion to extract global information; thus obtaining multi-scale feature maps;

[0009] S3. Multi-scale feature maps are passed through a multi-instance detection network for target detection, and the output feature vector is used as a supervision signal for downstream multi-level classification network training to adjust and optimize network parameters, thereby enhancing the network's feature extraction and detection capabilities. The output vectors of the multi-instance detection network are summed to obtain the candidate box scores, which are used as the detection results.

[0010] S4. Based on the threshold setting, display the candidate bounding boxes of targets that exceed the threshold score as the final target detection result. Each candidate bounding box is a rectangle that precisely defines the location of the lesion.

[0011] Furthermore, in the selective search algorithm described in step S21, the similarity calculation selects the color similarity in the grayscale space as a benchmark to merge similar candidate boxes, and finally generates multiple sparse target candidate boxes.

[0012] Furthermore, in step S22, the pre-trained model uses a pre-trained deep learning image encoder to extract feature maps.

[0013] Furthermore, the image attention module described in step S2 divides the image into 16 equal parts and unfolds each part into a vector in sequence, which is then fed into the standard attention module as a feature vector for subsequent use. This image attention module can improve the information fusion ability between different features and reduce focusing problems.

[0014] Furthermore, the multi-instance detection network consists of two independent fully connected layers and a softmax classifier. The feature vectors are passed through two different fully connected layers and softmax classifiers respectively to obtain two related vectors, which are then subjected to a dot product operation to serve as the output vector of the multi-level instance detection network.

[0015] Furthermore, each level of the multi-level classification network consists of a fully connected layer and a softmax classifier, with the multi-scale feature map used as input. The supervision signal for the first level comes from the output of the dot product operation of the multi-instance detection network, and the supervision signal for each subsequent level comes from the output of the previous level. This multi-level classification network can enhance the feature extraction capability of the learning network.

[0016] Secondly, the present invention provides a medical image detection system, comprising an image acquisition module, an image detection module, and a result display module. The image acquisition module acquires a medical image to be detected. The image detection module inputs the medical image to be detected into a network-based detection model, and extracts candidate boxes at different levels from the medical image using a candidate box extractor based on a selective search algorithm. Simultaneously, the medical image is processed by a pre-trained model to extract feature maps. After extracting the feature maps of the corresponding candidate boxes, they are fed into a pyramid pooling module to scale them to the same size and then into an image attention module for information fusion to extract global information, resulting in multi-scale feature maps. Finally, the corresponding image detection results are obtained through a multi-instance detection network and a multi-level classification network. The result display module displays the image detection results.

[0017] Thirdly, the present invention provides an electronic device comprising: a memory storing a computer program; and a processor communicatively connected to the memory, which implements the method described above when the computer program is invoked.

[0018] Fourthly, the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by an electronic device, implements the method described above.

[0019] As described above, the medical image detection method, system, storage medium, and electronic device of the present invention have the following beneficial effects:

[0020] The advantages of this invention are: by using a general-purpose large model as a pre-trained model, the training time cost is reduced while achieving strong feature extraction capabilities. Simultaneously, an image attention module is used to fuse candidate box features, improving the global information fusion capability and enhancing the model's receptive field. Furthermore, a two-stage detection model is used to enhance the model's detection ability. This invention effectively compensates for the shortcomings of some unsupervised detection networks that have excellent overall performance but are prone to focusing problems, while reducing training costs and providing a medical image detection method that does not require annotation. Attached Figure Description

[0021] Figure 1 The diagram shown is a flowchart of the medical image detection method according to an embodiment of the present invention.

[0022] Figure 2 The diagram shows the architecture of the image detection network in the medical image detection method described in this embodiment of the invention.

[0023] Figure 3 The diagram shown is a schematic diagram of the image attention module in the medical image detection method according to an embodiment of the present invention;

[0024] Figure 4 The diagram shown is a schematic representation of the principle structure of the medical image detection system according to an embodiment of the present invention.

[0025] Figure 5 The diagram shown is a schematic diagram of the principle structure of the electronic device described in an embodiment of the present invention.

[0026] Part Numbering Explanation

[0027] 100 Medical Image Detection System

[0028] 110 Image Acquisition Module

[0029] 120 Image Detection Module

[0030] 130 Result Display Module

[0031] 101 Electronic Devices

[0032] 1001 processor

[0033] 1002 Memory

[0034] Steps S1 to S4 Detailed Implementation

[0035] The following specific examples illustrate the implementation of the present invention. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that, unless otherwise specified, the following embodiments and features described therein can be combined with each other.

[0036] It should be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of the present invention. Therefore, the drawings only show the components related to the present invention and are not drawn according to the actual number, shape and size of the components in the actual implementation. In the actual implementation, the form, quantity and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex.

[0037] The following embodiments of the present invention provide a medical image detection method, system, storage medium, and electronic device, which compensate for some detection networks with excellent overall performance but prone to focusing problems, while reducing training costs and providing a medical image detection method that does not require annotation.

[0038] This embodiment provides a medical image detection method. Figure 1 This is a schematic flowchart of the medical image detection method described in this embodiment. Figure 1 As shown, the medical image detection method of this embodiment includes the following steps:

[0039] S1. Obtain medical images of the lung nodules to be detected;

[0040] S2. Input the medical image to be detected into the feature extraction module to obtain candidate boxes and feature maps, including:

[0041] S21. Construct a candidate box extractor to extract candidate boxes at different levels from the medical image to be detected; the candidate box extractor uses a selective search algorithm to obtain multiple target candidate boxes from the medical image to be detected;

[0042] S22. Construct a multi-scale feature map extraction network to extract multi-scale feature maps from the medical image to be detected; first, extract feature maps from the medical image through a pre-trained model; after extracting the feature maps of the corresponding candidate boxes through mapping, feed them into a pyramid pooling module to scale feature maps of different sizes to the same size and feed them into an image attention module to perform large-scale information fusion to extract global information; thus obtaining multi-scale feature maps;

[0043] S3. Multi-scale feature maps are passed through a multi-instance detection network for target detection, and the output feature vector is used as a supervision signal for downstream multi-level classification network training to adjust and optimize network parameters, thereby enhancing the network's feature extraction and detection capabilities. The output vectors of the multi-instance detection network are summed to obtain the candidate box scores, which are used as the detection results.

[0044] S4. Based on the threshold setting, display the candidate bounding boxes of targets that exceed the threshold score as the final target detection result. Each candidate bounding box is a rectangle that precisely defines the location of the lesion.

[0045] The following provides a detailed description of steps S1 to S4 in the medical image detection method of this embodiment.

[0046] S1. Obtain medical images of the lung nodules to be detected;

[0047] S2. Input the medical image to be detected into the feature extraction module to obtain candidate boxes and feature maps, including:

[0048] S21. Construct a candidate box extractor to extract candidate boxes at different levels from the medical image to be detected; the candidate box extractor uses a selective search algorithm to obtain multiple target candidate boxes from the medical image to be detected;

[0049] S22. Construct a multi-scale feature map extraction network to extract multi-scale feature maps from the medical image to be detected; first, extract feature maps from the medical image through a pre-trained model; after extracting the feature maps of the corresponding candidate boxes through mapping, feed them into a pyramid pooling module to scale feature maps of different sizes to the same size and feed them into an image attention module to perform large-scale information fusion to extract global information; thus obtaining multi-scale feature maps;

[0050] S3. The multi-scale feature map is passed through a multi-instance detection network for target detection, and the output feature vector is used as a supervision signal for downstream multi-level classification networks for network training. This allows for the adjustment and optimization of network parameters, enhancing the network's feature extraction and detection capabilities. The summation of the output vectors of the multi-instance detection network yields the candidate box scores, which are used as the detection results. In this embodiment, the candidate box extractor employs a selective search algorithm for feature box extraction. The similarity calculation space uses color similarity in the grayscale space as a benchmark to merge similar candidate boxes, generating multiple sparse target candidate boxes.

[0051] In this embodiment, the pre-trained model can use a publicly available deep learning image encoder for feature extraction.

[0052] In this embodiment, the corresponding feature map is obtained through the target candidate box, and a feature map of the same size is obtained through the pyramid pooling module.

[0053] In this embodiment, the image attention module divides the image into 16 equal parts and unfolds each part into a vector, which is then fed into the attention module as a feature vector for subsequent use. This attention module can improve the information fusion capability between different features and reduce focusing problems.

[0054] In this embodiment, the multi-instance detection network consists of two identical links. Each link comprises a fully connected layer and a softmax classifier, used to extract features with different feature vectors. The feature vectors are fused through the dot product of the outputs of these two links to obtain a feature vector that serves as a supervision signal for subsequent multi-level classification networks and also as a subsequent output.

[0055] In this embodiment, the multi-level classification network is used to enhance the training of the attention module. The feature vector output by the attention module serves as the input vector for each level, passes through a fully connected layer and a softmax classifier to obtain the output signal, and is trained using the output vector of the previous level as the supervision signal. Specifically, the supervision signal for the first level comes from the dot product fusion result of the multi-instance detection network.

[0056] This embodiment uses a pre-trained model to extract features, leveraging the excellent feature extraction capabilities of a pre-trained large model. This overcomes the problem of insufficient network generalization ability in scenarios with limited medical data, and improves the network's representation ability.

[0057] To obtain a better model, an image attention module is used to fuse candidate box features, increasing the scale of the receptive field and enabling more global fusion of image features, thus solving the focusing problem in traditional unsupervised methods. Simultaneously, a multi-instance detection network and a multi-level classification reinforcement network are used to repeatedly enhance the model parameters, improving the model's accuracy.

[0058] Specifically, traditional convolutional networks can only focus on information within the kernel size range and cannot establish information fusion for features across large scales. By employing an image attention module, which divides the candidate bounding box image into 16 equal parts and unfolds them into vectors, and then feeds them into the attention module as shown in the figure, the receptive field of the model can be effectively expanded, fusing global image information. This prevents the model from limiting detection boxes to obvious features of lesions, allowing it to focus more on the global features of the lesions, thus generating more complete feature boxes for the lesions.

[0059] S4. Based on the threshold setting, display the candidate bounding boxes of targets exceeding the threshold score as the final target detection result. Each candidate bounding box is a rectangle that precisely defines the location of the lesion. Specifically, the medical image and the detection bounding box are displayed on the screen.

[0060] Therefore, the medical image detection method in this embodiment inputs the medical image to be detected into a network-based detection model, and extracts candidate boxes at different levels from the medical image through a candidate box extractor of a selective search algorithm. Simultaneously, the medical image is processed by a pre-trained model to extract feature maps; after extracting the feature maps of the corresponding candidate boxes, they are fed into a pyramid pooling module to scale them to the same size and then fed into an image attention module for information fusion to extract global information; multi-scale feature maps are obtained; and the corresponding image detection results are obtained through a multi-instance detection network and a multi-level classification network.

[0061] The scope of protection of the medical image detection method described in this embodiment is not limited to the execution order of the steps listed in this embodiment. Any solution implemented by adding, subtracting, or replacing steps in the prior art based on the principle of this invention is included within the scope of protection of this invention.

[0062] This invention also provides a medical image detection system 100, which can implement the medical image detection method described in this invention. However, the implementation system of the medical image detection method described in this invention includes, but is not limited to, the structure of the medical image detection system 100 listed in this embodiment. All structural modifications and substitutions of the prior art made in accordance with the principles of this invention are included within the protection scope of this invention.

[0063] Figure 4 The diagram shown illustrates the principle structure of the medical image detection system 100 according to an embodiment of the present invention. Figure 4As shown, the system provided in this embodiment includes an image acquisition module 110, an image detection module 120, and a result display module 130.

[0064] The image acquisition module 110 is used to acquire the medical image to be detected; the image detection module 120 is used to input the medical image to be detected into a network-based detection model, and extract candidate boxes at different levels from the medical image to be detected through a candidate box extractor of a selective search algorithm. Simultaneously, the medical image is processed by a pre-trained model to extract feature maps; after extracting the feature maps of the corresponding candidate boxes, they are fed into a pyramid pooling module to scale them to the same size and then fed into an image attention module for information fusion to extract global information; multi-scale feature maps are obtained; and corresponding image detection results are obtained through a multi-instance detection network and a multi-level classification network; the result display module 130 is used to display the three-dimensional attention map.

[0065] In this embodiment, the candidate box extractor is based on a selective search algorithm, wherein the similarity calculation space selects the gray space as the benchmark to generate multiple sparse target candidate boxes.

[0066] In this embodiment, the image attention module divides the image into 16 equal parts and unfolds each part into a vector in sequence, which is then fed into the attention module to obtain the feature vector for subsequent use.

[0067] In this embodiment, the pre-trained model can use an open-source pre-trained visual task encoder to extract features.

[0068] In this embodiment, the multi-instance detection network consists of two identical links. Each link comprises a fully connected layer and a softmax classifier, used to extract features with different feature vectors. The feature vectors are fused through the dot product of the outputs of these two links to obtain a feature vector that serves as a supervision signal for subsequent multi-level classification networks and also as a subsequent output.

[0069] In this embodiment, the multi-level classification network is used to enhance the training of the attention module. The feature vector output by the attention module serves as the input vector for each level, passes through a fully connected layer and a softmax classifier to obtain the output signal, and is trained using the output vector of the previous level as the supervision signal. Specifically, the supervision signal for the first level comes from the dot product fusion result of the multi-instance detection network.

[0070] This embodiment uses a pre-trained model to extract features, leveraging the excellent feature extraction capabilities of a pre-trained large model. This overcomes the problem of insufficient network generalization ability in scenarios with limited medical data, and improves the network's representation ability.

[0071] To obtain a better model, an image attention module is used to fuse candidate box features, increasing the scale of the receptive field and enabling more global fusion of image features, thus solving the focusing problem in traditional unsupervised methods. Simultaneously, a multi-instance detection network and a multi-level classification reinforcement network are used to repeatedly enhance the model parameters, improving the model's accuracy.

[0072] Specifically, traditional convolutional networks can only focus on information about the kernel size and cannot establish large-scale information fusion. By using an image attention module, which divides the candidate box image into 16 equal parts and unfolds them into vectors, and then feeds them into the attention module as shown in the figure, the receptive field of the model can be effectively expanded, and global image information can be fused.

[0073] Therefore, the medical image detection system 100 in this embodiment leverages a pre-trained large model to provide excellent feature extraction capabilities, overcoming the problem of insufficient network generalization ability in scenarios with limited medical data, and improving the network's representational ability. To obtain a better model, an image attention module is used to fuse candidate box features, increasing the scale of the receptive field and enabling more global fusion of image features. This solves the focusing problem in traditional unsupervised methods, preventing detection boxes from being limited to obvious features of lesions, and enabling more complete detection of the specific location of the entire lesion, providing more reliable evidence for doctors' diagnosis. Simultaneously, a multi-instance detection network and a multi-level classification reinforcement network are used to repeatedly strengthen the model parameters, improving the model's accuracy.

[0074] In this invention, the medical image detection system 100 can implement the medical image detection method described in this embodiment. Therefore, the specific implementation functions of each module of the medical image detection system 100 are described in detail in the medical image detection method, and will not be repeated here. However, the implementation system of the medical image detection method described in this invention includes, but is not limited to, the medical image detection system 100 listed in this embodiment. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-mentioned division of functional units and modules is only used as an example. In actual applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the system / device can be divided into different functional units or modules to complete all or part of the functions described above.

[0075] In the embodiments provided by this invention, it should be understood that the disclosed systems, apparatuses, or methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative. For instance, the division of modules / units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules or units may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection of apparatuses or modules or units may be electrical, mechanical, or other forms.

[0076] The modules / units described as separate components may or may not be physically separate. The components shown as modules / units may or may not be physical modules; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules / units can be selected to achieve the objectives of the embodiments of the present invention, depending on actual needs. For example, the functional modules / units in the various embodiments of the present invention may be integrated into one processing module, or each module / unit may exist physically separately, or two or more modules / units may be integrated into one module / unit.

[0077] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0078] like Figure 5As shown, this embodiment of the invention provides an electronic device 101. The electronic device may be, for example, a computer including one or more processors 1001, one or more memories 1002, peripheral interfaces, RF circuits, audio circuits, speakers, microphones, input / output (I / O) subsystems, displays, other output or control devices, and external ports. The computer includes, but is not limited to, personal computers such as desktop computers, laptops, tablets, smartphones, smart TVs, and personal digital assistants (PDAs). In other embodiments, the electronic device may also be a server. The server may be deployed on one or more physical servers based on factors such as function and load, or it may consist of a distributed or centralized server cluster; this embodiment does not impose such limitations.

[0079] The electronic device 101 includes a processor 1001 and a memory 1002; the memory 1002 is used to store computer programs; the processor 1001 is used to execute the computer programs stored in the memory 1002, so that the electronic device 101 performs the steps of the medical image detection method as described in Embodiment 1. Since the specific implementation process of the steps of the medical image detection method has been described in detail in the embodiments, it will not be repeated here.

[0080] Processor 1001 is a Central Processing Unit (CPU). Memory 1002 is connected to processor 1001 via a system bus and communicates with it. Memory 1002 stores computer programs, and processor 1001 runs the computer programs to execute the aforementioned edge computing-based rapid battery diagnostic method. Memory 1002 may include Random Access Memory (RAM) and may also include non-volatile memory, such as at least one disk drive.

[0081] This invention also provides a computer-readable storage medium. Those skilled in the art will understand that all or part of the steps in the methods of the above embodiments can be implemented by a program instructing a processor. The program can be stored in a computer-readable storage medium, which is a non-transitory medium, such as random access memory, read-only memory, flash memory, hard disk, solid-state drive, magnetic tape, floppy disk, optical disk, and any combination thereof. The storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center that integrates one or more available media. This available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., digital video disc (DVD)), or a semiconductor medium (e.g., solid-state drive (SSD)).

[0082] Embodiments of the present invention may also provide a computer program product comprising one or more computer instructions. When the computer instructions are loaded and executed on a computing device, all or part of the processes or functions described in the embodiments of the present invention are generated. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, or data center to another via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.

[0083] When the computer program product is executed by a computer, the computer performs the method described in the foregoing method embodiments. The computer program product can be a software installation package; when the foregoing method is required, the computer program product can be downloaded and executed on the computer.

[0084] The descriptions of the processes or structures corresponding to the above figures each have their own emphasis. For parts of a process or structure that are not described in detail, please refer to the relevant descriptions of other processes or structures.

[0085] The above embodiments are merely illustrative of the principles and effects of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or alter the above embodiments without departing from the spirit and scope of the present invention. Therefore, all equivalent modifications or alterations made by those skilled in the art without departing from the spirit and technical concept disclosed in the present invention should still be covered by the claims of the present invention.

Claims

1. A medical image detection method, characterized in that, Includes the following steps: S1. Obtain medical images of the lung nodules to be detected; S2. Input the medical image to be detected into the feature extraction module to obtain candidate boxes and feature maps, including: S21. Construct a candidate box extractor to extract candidate boxes at different levels from the medical image to be detected; the candidate box extractor uses a selective search algorithm to obtain multiple target candidate boxes from the medical image to be detected; S22. Construct a multi-scale feature map extraction network to extract multi-scale feature maps from the medical image to be detected; firstly, extract feature maps from the medical image through a pre-trained model; after extracting the feature maps of the corresponding candidate boxes through mapping, feed them into a pyramid pooling module to scale feature maps of different sizes to the same size and then feed them into an image attention module for large-scale information fusion to extract global information; thus obtaining multi-scale feature maps; the image attention module divides the image into 16 equal parts and unfolds each part into a vector in sequence, which is then fed into a standard attention module as the feature vector for subsequent use. The image attention module improves the information fusion capability between different features and reduces the focusing problem; S3. Multi-scale feature maps are passed through a multi-instance detection network for object detection. The output feature vector is used as a supervision signal for training a downstream multi-level classification network to adjust and optimize network parameters, enhancing the network's feature extraction and detection capabilities. The summation of the multi-instance detection network's output vectors yields the candidate box scores, which serve as the detection results. The multi-instance detection network consists of two independent fully connected layers and a softmax classifier. The feature vectors are passed through two different fully connected layers and softmax classifiers to obtain two related vectors, which are then subjected to a dot product operation to serve as the output vector of the multi-instance detection network. Each level of the multi-level classification network consists of one fully connected layer and one softmax classifier, with the multi-scale feature maps used as input. The supervision signal for the first level comes from the output of the multi-instance detection network's dot product operation, and the supervision signals for each subsequent level come from the output of the previous level. The feature extraction capability of the learning network is enhanced through the multi-level classification network. S4. Based on the threshold setting, display the target candidate boxes that exceed the threshold score as the final target detection result; the candidate box is a rectangle that defines the specific location of the lesion.

2. The method according to claim 1, characterized in that: The selective search algorithm described in step S21 uses color similarity in grayscale space as a benchmark for similarity calculation to merge similar candidate boxes, and finally generates multiple sparse target candidate boxes.

3. The method according to claim 1, characterized in that: The pre-trained model described in step S22 uses a pre-trained deep learning image encoder to extract feature maps.

4. A medical image detection system, characterized in that, The system includes an image acquisition module, an image detection module, and a result display module; The image acquisition module is used to acquire the medical image to be detected; The medical image to be detected is input into a network-based detection model. A candidate box extractor using a selective search algorithm extracts candidate boxes at different levels from the medical image to be detected. At the same time, the medical image is processed by a pre-trained model to extract feature maps. After the feature maps of the corresponding candidate boxes are extracted, they are fed into a pyramid pooling module to scale them to the same size and then fed into an image attention module for information fusion to extract global information, resulting in a multi-scale feature map. The image detection results are obtained through a multi-instance detection network and a multi-level classification network. The image attention module divides the image into 16 equal parts and unfolds each part into a vector, which is then fed into the standard attention module as the feature vector for subsequent use. The image attention module improves the information fusion ability between different features and reduces the focusing problem. The multi-instance detection network consists of two independent fully connected layers and a softmax classifier. The feature vectors are passed through two different fully connected layers and softmax classifiers to obtain two related vectors, which are then subjected to a dot product operation to serve as the output vector of the multi-instance detection network. Each level of the multi-level classification network consists of one fully connected layer and one softmax classifier. The multi-scale feature map is used as the input. The supervision signal of the first level comes from the output of the dot product operation of the multi-instance detection network, and the supervision signal of each subsequent level comes from the output of the previous level. Enhance the feature extraction capabilities of learning networks through multi-level classification networks; The result display module is used to display the image detection results.

5. An electronic device, characterized in that, The electronic device includes: A memory that stores a computer program; The processor, which is communicatively connected to the memory, implements the method of any one of claims 1 to 4 when the computer program is invoked.

6. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by an electronic device, the program implements the method described in any one of claims 1 to 4.