A pest image multi-scale detection method, device, equipment and storage medium
By combining adaptive multi-scale image classification blocks and global-local feature pyramids, the problems of low efficiency and insufficient accuracy in traditional pest and disease detection methods are solved, and high-precision pest and disease target detection is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG EVOTRUE NET TECH CO LTD
- Filing Date
- 2025-07-30
- Publication Date
- 2026-06-23
Smart Images

Figure CN120953673B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and more particularly to the field of pest and disease detection technology, specifically to a method, apparatus, device, and storage medium for multi-scale detection of pest and disease images. Background Technology
[0002] As a core component of smart agriculture, crop disease and pest detection plays a crucial role in ensuring crop yield and quality.
[0003] Traditional methods for detecting crop diseases and pests mainly rely on manual inspections or instrumental analysis. Manual inspections are time-consuming and labor-intensive, and the judgment standards of different personnel vary, leading to a high degree of subjectivity. Instrumental analysis, on the other hand, is often complex and time-consuming. Therefore, traditional methods suffer from significant problems such as low efficiency, high subjectivity, and poor real-time performance. While existing disease and pest detection technologies have achieved automated detection, their accuracy often decreases when dealing with complex farmland environments due to factors such as changes in light intensity and leaf shading. Summary of the Invention
[0004] This application provides a method, apparatus, device, and storage medium for multi-scale detection of pest and disease images, in order to improve the accuracy and reliability of pest and disease image detection.
[0005] According to one aspect of this application, a multi-scale detection method for pest and disease images is provided, the method comprising:
[0006] An adaptive multi-scale image classification block is used to segment the image to be processed into at least two different sizes of non-overlapping image blocks according to the image resolution of the image to be processed; wherein, the image to be processed is obtained by scaling the size of an initial pest and disease image;
[0007] The adaptive multi-scale image classification block is used to extract features from the non-overlapping image blocks to obtain the initial feature sequence of the image to be processed;
[0008] Global feature extraction is performed on the initial feature sequence, and local feature extraction is performed on the image to be processed to obtain the global image features and local image features of the image to be processed.
[0009] Based on a multi-granularity feature pyramid, the key regions of the image to be processed are determined according to the global image features and the local image features; wherein, the multi-granularity feature pyramid is trained based on the global image features and the local image features;
[0010] Based on the multi-granularity feature pyramid, the target objects in the image to be processed are detected according to the key regions to obtain the target detection results.
[0011] According to another aspect of this application, a multi-scale detection device for pest and disease images is provided, the device comprising:
[0012] The image segmentation module is used to segment the image to be processed into at least two different sizes of non-overlapping image blocks based on the image resolution of the image to be processed using adaptive multi-scale image classification blocks; wherein the image to be processed is obtained by scaling the size of an initial pest and disease image;
[0013] The first feature extraction module is used to extract features from the non-overlapping image blocks using the adaptive multi-scale image classification blocks to obtain the initial feature sequence of the image to be processed.
[0014] The second feature extraction module is used to perform global feature extraction on the initial feature sequence and local feature extraction on the image to be processed, so as to obtain the global image features and local image features of the image to be processed.
[0015] A region determination module is used to determine the key regions of the image to be processed based on a multi-granularity feature pyramid, according to the global image features and the local image features; wherein, the multi-granularity feature pyramid is trained based on the global image features and the local image features;
[0016] The image detection module is used to detect target objects in the image to be processed based on the multi-granularity feature pyramid and the key regions, and obtain target detection results.
[0017] According to another aspect of this application, an electronic device is provided, the electronic device comprising:
[0018] One or more processors;
[0019] Memory, used to store one or more programs;
[0020] When the one or more programs are executed by the one or more processors, the one or more processors implement any of the multi-scale detection methods for pest and disease images provided in the embodiments of this application.
[0021] According to another aspect of this application, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, implements any of the multi-scale detection methods for pest and disease images provided in the embodiments of this application.
[0022] According to another aspect of this application, a computer program product is provided, including a computer program that, when executed by a processor, implements any of the multi-scale detection methods for pest and disease images provided in the embodiments of this application.
[0023] This application employs adaptive multi-scale image classification blocks to segment the image to be processed into at least two different sizes of non-overlapping image blocks based on the image resolution. The image to be processed is obtained by scaling an initial pest and disease image. The adaptive multi-scale image classification blocks are used to extract features from the non-overlapping image blocks, resulting in an initial feature sequence for the image to be processed. Global feature extraction and local feature extraction are then performed on the initial feature sequence to obtain global and local image features of the image to be processed. Based on a multi-granularity feature pyramid, key regions of the image to be processed are determined according to the global and local image features. The multi-granularity feature pyramid is trained based on the global and local image features. Based on the multi-granularity feature pyramid, target objects in the image to be processed are detected according to the key regions, yielding target detection results. This technical solution significantly improves the detection accuracy of pest and disease targets of different sizes by fusing global and multi-scale local features. Furthermore, the adaptive block segmentation strategy and local-global attention mechanism dynamically optimize computational resource allocation while maintaining feature representation capabilities. Attached Figure Description
[0024] Figure 1 This is a flowchart of a multi-scale detection method for pest and disease images provided in Embodiment 1 of this application;
[0025] Figure 2 This is a flowchart of a multi-scale detection method for pest and disease images provided according to Embodiment 2 of this application;
[0026] Figure 3 This is a schematic diagram of the structure of a multi-scale detection device for pest and disease images provided in Embodiment 3 of this application;
[0027] Figure 4 This is a schematic diagram of the structure of an electronic device that implements the multi-scale detection method for pest and disease images according to Embodiment 4 of this application. Detailed Implementation
[0028] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort should fall within the scope of protection of the present application.
[0029] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0030] Furthermore, it should be noted that the collection, storage, use, processing, transmission, provision, and disclosure of data related to image resolution and non-overlapping image blocks involved in the technical solution of this application all comply with the provisions of relevant laws and regulations and do not violate public order and good morals.
[0031] Example 1
[0032] Figure 1 This is a flowchart of a multi-scale detection method for pest and disease images according to Embodiment 1 of this application. This embodiment is applicable to real-time detection of pest and disease images and can be executed by a multi-scale detection device for pest and disease images. This multi-scale detection device for pest and disease images can be implemented in hardware and / or software and can be configured in a computer device, such as a server. Figure 1 As shown, the method includes:
[0033] S110. Adaptive multi-scale image classification blocks are used to segment the image to be processed into at least two different sizes of non-overlapping image blocks according to the image resolution of the image to be processed; wherein, the image to be processed is obtained by scaling the initial pest and disease image.
[0034] In this embodiment, the adaptive multi-size image classification block is obtained by improving the ViT (Vision Transformer) block. It is used to segment the image according to different resolutions and divide it into image blocks of at least two different sizes. For example, small-sized blocks are used for high-resolution images to capture microscopic target details, while larger-sized blocks are used for medium- and low-resolution images to reduce computational redundancy. This adaptive multi-size image classification block is based on dynamic mapping optimization between image resolution and target scale, significantly improving the feature extraction capability for small targets. The image to be processed refers to the pest and disease image after scaling the initial pest and disease image to a preset size. Non-overlapping image blocks are non-overlapping sub-regions of the image to be processed.
[0035] For example, the initial pest and disease images are uniformly scaled to a preset size to obtain the image to be processed. The image to be processed is then divided into at least two different sizes of non-overlapping blocks using adaptive multi-size ViT blocks. High-resolution regions use small-size blocks (e.g., 16×16 pixels) to preserve microscopic target details, while low-resolution regions use large-size blocks (e.g., 32×32 pixels) to reduce computational redundancy.
[0036] S120. Adaptive multi-scale image classification blocks are used to extract local features from non-overlapping image blocks to obtain the initial feature sequence of the image to be processed.
[0037] In this embodiment, the initial feature sequence refers to the initial feature set formed by extracting local features from non-overlapping image patches and processing them.
[0038] Optionally, local spatial features are extracted from non-overlapping image blocks to obtain local spatial features of the non-overlapping image blocks; position encoding is performed on the non-overlapping image blocks to obtain relative position data of the non-overlapping image blocks; and the local spatial features and relative position data are fused to obtain the initial feature sequence of the image to be processed.
[0039] In this embodiment, local spatial features refer to information extracted from specific local regions (image patches) of an image. This information reflects the structure, shape, texture, color, and other features of the local region. Relative position data refers to positional data relative to other image patches or regions. It indicates the positional relationship between one image patch and other image patches. For example, if one image patch is located in the upper left and another image patch is located in the lower right, their relative positions can be used as part of information fusion to enhance the accuracy of image analysis. It should be noted that local spatial features can be represented using feature vectors.
[0040] For example, the local spatial features of each non-overlapping image block are extracted using the block coding module of the adaptive multi-scale ViT block, and the positional information between non-overlapping image blocks is preserved by positional coding to obtain the initial feature sequence.
[0041] S130. Perform global feature extraction on the initial feature sequence and local feature extraction on the image to be processed to obtain the global image features and local image features of the image to be processed.
[0042] In this embodiment, global image features refer to features extracted from the entire image that reflect the overall content of the image; typically, these features can capture macroscopic information of the image, such as color distribution, brightness variations, and shape features. Local image features refer to features extracted from small regions or specific objects in the image that reflect local details; local features can capture information such as specific shapes, corners, edges, and textures in the image.
[0043] For example, the initial feature sequence is input into the improved Transformer encoder, and global features are extracted through a staged attention mechanism. At the same time, the GELAN (Gated Linear Attention Network) module is used to achieve efficient local feature extraction, so as to obtain the global image features and local image features of the image to be processed.
[0044] It should be noted that the improved Transformer encoder employs a local-global attention mechanism, introducing local window attention on top of the multi-head self-attention mechanism. This restricts each block to interacting with features only with neighboring blocks, reducing computational complexity. Simultaneously, the global attention mechanism is used to capture long-distance dependencies between blocks, enhancing the model's robustness to complex backgrounds.
[0045] S140. Based on the multi-granularity feature pyramid, the key regions of the image to be processed are determined according to the global image features and local image features; wherein, the multi-granularity feature pyramid is trained based on the global image features and local image features.
[0046] In this embodiment, the multi-granularity feature pyramid is a feature extraction technique based on different image scales. It extracts features at different levels of the image through a multi-layered and multi-scale approach. Typically, each layer uses different scales and details to extract image features in order to capture targets and objects of different sizes. Key regions refer to areas in the image that contain important targets or objects.
[0047] For example, a multi-granularity feature pyramid is used to fuse global and local image features, and the key regions of the image to be processed are determined based on the fused features.
[0048] S150. Based on the multi-granularity feature pyramid, target objects in the image to be processed are detected according to key regions to obtain target detection results.
[0049] In this embodiment, the target object refers to the object that needs to be identified in the pest and disease image. The target detection result refers to the final result of pest and disease image detection, which includes the identified object category and its position in the image (bounding box coordinates, etc.).
[0050] Optionally, based on the key regions, the image to be processed is sliced to obtain at least one sub-image; based on the multi-granularity feature pyramid, the target object in the at least one sub-image is detected to obtain the target object detection result.
[0051] In this embodiment, image slicing refers to the process of dividing an image into multiple small blocks or sub-images.
[0052] Furthermore, based on the multi-granularity feature pyramid, target objects in at least one sub-image are detected to obtain candidate object detection results for at least one sub-image; a non-maximum suppression algorithm is used to integrate the candidate object detection results of at least one sub-image to obtain the target object detection result.
[0053] In this embodiment, candidate object detection results refer to the preliminary detected potential target locations and categories. These candidate results are not necessarily the final detection results and need to be filtered out by subsequent algorithms (such as non-maximum suppression) to remove duplicate or inaccurate detections. Non-maximum suppression is a technique used to optimize target detection results, aiming to remove redundant bounding boxes that detect the same object multiple times.
[0054] For example, the SAHI (Selective Area Hierarchical Integration) strategy is adopted. Based on key regions, the slice position and size are adaptively determined, and the image to be processed is sliced based on the slice position and size to obtain at least one sub-image. Multi-granularity feature pyramids are applied to each sub-image for object detection, and the Inner-MPDIoU (Inner-Multi-Phase Diagonal Intersection over Union) loss function is used to optimize the bounding box regression, especially to enhance the small target features in the feature pyramid, to obtain the candidate object detection results of at least one sub-image. Finally, the slice detection results are integrated, and redundant boxes are eliminated by non-maximum suppression algorithm to output the final detection result.
[0055] This application employs an adaptive multi-scale image classification block to segment the image to be processed into at least two different sizes of non-overlapping image blocks based on the image resolution. The image to be processed is obtained by scaling an initial pest and disease image. The adaptive multi-scale image classification block is used to extract features from the non-overlapping image blocks, resulting in an initial feature sequence for the image to be processed. Global feature extraction and local feature extraction are then performed on the initial feature sequence to obtain global and local image features of the image to be processed. Based on a multi-granularity feature pyramid, key regions of the image to be processed are determined according to the global and local image features. The multi-granularity feature pyramid is trained based on the global and local image features. Based on the multi-granularity feature pyramid, target objects in the image to be processed are detected according to the key regions, resulting in target detection results. This technical solution significantly improves the detection accuracy of pest and disease targets of different sizes by fusing global and multi-scale local features. Furthermore, the adaptive block segmentation strategy and local-global attention mechanism dynamically optimize computational resource allocation while maintaining feature representation capabilities.
[0056] Example 2
[0057] Figure 2 This is a flowchart of a multi-scale detection method for pest and disease images provided in Embodiment 2 of this application. Based on the technical solutions of the above embodiments, this embodiment refines the process of "determining the key regions of the image to be processed based on a multi-granularity feature pyramid, according to global image features and local image features" to "fusion of global image features and local image features based on a multi-granularity feature pyramid to obtain a multi-scale feature map of the image to be processed; and determination of the key regions of the image to be processed based on the multi-scale feature map using a dual-path attention mechanism." It should be noted that for parts not detailed in this embodiment, please refer to the relevant descriptions in other embodiments. Figure 2 As shown, the method includes:
[0058] S210. Adaptive multi-scale image classification blocks are used to segment the image to be processed into at least two different sizes of non-overlapping image blocks according to the image resolution of the image to be processed.
[0059] S220. Adaptive multi-scale image classification blocks are used to extract local features from non-overlapping image blocks to obtain the initial feature sequence of the image to be processed.
[0060] S230. Perform global feature extraction on the initial feature sequence and local feature extraction on the image to be processed to obtain the global image features and local image features of the image to be processed.
[0061] S240. Based on the multi-granularity feature pyramid, global image features and local image features are fused to obtain a multi-scale feature map of the image to be processed.
[0062] In this embodiment, a multi-scale feature map refers to an image feature map obtained through multi-level feature extraction, which contains image information at different scales (sizes).
[0063] S250: Based on the dual-path attention mechanism, the key regions of the image to be processed are determined according to the multi-scale feature maps.
[0064] In this embodiment, the dual-path attention mechanism refers to a technique based on attention mechanisms. It extracts global and local information through two paths respectively, and combines the outputs of the two paths to enhance the representation of important regions in the image. This mechanism can dynamically adjust the allocation of attention according to the features of the image, enabling the model to pay more attention to important regions.
[0065] For example, the dual-path attention mechanism is based on prior data of target size, such as aphids occupying between 5% and 10% of the image area. Through coarse-grained screening and fine-grained calculation, it dynamically focuses on key areas of pests and diseases, effectively suppressing background interference.
[0066] Optionally, the dual-path attention mechanism may include coarse-grained filtering and fine-grained filtering; correspondingly, based on a prior knowledge base, the multi-scale feature maps are divided into feature map regions to obtain at least one feature map region to be verified; wherein, the prior knowledge base is determined according to the size distribution of the target object; coarse-grained filtering is performed on the at least one feature map region to be verified to determine at least one candidate feature map region from the feature map region to be verified; fine-grained filtering is performed on the at least one candidate feature map region to determine the key region of the image to be processed from the candidate feature map region.
[0067] In this embodiment, coarse-grained screening refers to filtering a large area of the image feature map. This process focuses on a large range of potential target regions in the image, typically using lower-resolution feature maps for initial screening to reduce computation and quickly locate the approximate position of the target. Fine-grained screening refers to a more refined screening of target regions in the image after coarse-grained screening. This process relies on higher-resolution feature maps, helping the model to more accurately identify the target's location, shape, and boundaries. A prior knowledge base is a knowledge base containing information about the target object's size, location, shape, etc. By analyzing the target object's size distribution and morphological features, the prior knowledge base helps the model predict which regions might contain the target. Feature map region partitioning refers to dividing the multi-scale feature map into several regions and analyzing and processing each region separately. The feature map regions to be verified refer to several regions after partitioning the multi-scale feature map based on the object size provided in the prior knowledge base. Candidate feature map regions are potential target regions extracted from the feature map regions to be verified after coarse-grained screening; these regions are considered to contain the target after initial screening.
[0068] For example, firstly, the target size distribution characteristics in the training data are analyzed to establish a prior knowledge base. Based on this prior knowledge base, the feature map regions of the multi-scale feature map are divided to obtain at least one feature map region to be verified. Based on coarse-grained screening, the significance score of the feature map region to be verified is calculated, and candidate feature map regions with scores higher than the dynamic threshold are selected. Fine-grained screening is performed on the candidate feature map regions to finally determine the key pest and disease target regions.
[0069] Understandably, by combining a dual-path attention mechanism with a prior knowledge base, multi-scale feature maps are filtered and regions are divided, thereby improving the accuracy of object detection. Coarse-grained filtering is responsible for quickly locating potential target regions, while fine-grained filtering further refines the processing of target regions. The prior knowledge base provides information about target size and distribution, ensuring that the filtering process is more efficient and accurate. These technologies work together to enable accurate and efficient object detection in complex scenes.
[0070] S260. Based on the multi-granularity feature pyramid, target objects in the image to be processed are detected according to key regions to obtain target detection results.
[0071] In one alternative implementation, the technical solution of this application can be deployed on edge devices. Specifically, the network parameters are initialized and the shallow feature extraction layer is frozen through transfer learning, and the high-level network is fine-tuned for the pest and disease detection task. Hybrid precision training and structured pruning techniques are used to optimize the model's computational efficiency, and finally, the edge device deployment is achieved through the TensorRT (Tensor Runtime) acceleration engine.
[0072] Understandably, through transfer learning, networks can utilize low-level features (such as edges, colors, and textures) learned on large-scale datasets, thus avoiding training all network parameters from scratch, accelerating training speed, and improving performance. Freezing shallow feature layers reduces the number of parameters that need to be updated during training, saving computational resources and preventing the "forgetting" of these learned basic features in specific tasks. Through fine-tuning, the model can more accurately adapt to specific tasks (such as the types and characteristics of pests and diseases), thereby improving detection accuracy. Mixed-precision training can improve training speed and reduce hardware resource consumption without significantly sacrificing model accuracy. This method can significantly improve training efficiency, especially when training large models. Pruning can improve the computational efficiency of the model and reduce hardware requirements during deployment, enabling more efficient operation on edge devices. The TensorRT acceleration engine can convert the trained model into a format more suitable for edge device inference. After TensorRT optimization, the model's inference speed is significantly improved, making it suitable for deployment on devices with limited computing resources (such as embedded systems and edge devices).
[0073] This application employs an adaptive multi-scale image classification block to segment the image to be processed into at least two different sizes of non-overlapping image blocks based on the image resolution. The adaptive multi-scale image classification block is then used to extract features from the non-overlapping image blocks, yielding an initial feature sequence for the image to be processed. Global feature extraction and local feature extraction are then performed on the initial feature sequence to obtain global and local image features of the image to be processed. Based on a multi-granularity feature pyramid, the global and local image features are fused to obtain a multi-scale feature map of the image to be processed. Based on a dual-path attention mechanism, key regions of the image to be processed are determined according to the multi-scale feature map. Finally, based on the multi-granularity feature pyramid, target objects in the image to be processed are detected according to the key regions, yielding target detection results. This technical solution significantly improves the detection accuracy of pests and diseases of different sizes by fusing global and multi-scale local features, and dynamically optimizes computational resource allocation while maintaining feature representation capabilities through the adaptive block segmentation strategy and local-global attention mechanism.
[0074] Example 3
[0075] Figure 3 This is a schematic diagram of a multi-scale detection device for pest and disease images provided in Embodiment 3 of this application. It is applicable to real-time detection of pest and disease images. This multi-scale detection device can be implemented in hardware and / or software and can be configured in a computer device, such as a server. Figure 3 As shown, the device includes:
[0076] The image segmentation module 310 is used to segment the image to be processed into at least two different sizes of non-overlapping image blocks according to the image resolution of the image to be processed by using adaptive multi-scale image classification blocks; wherein the image to be processed is obtained by scaling the size of the initial pest and disease image;
[0077] The first feature extraction module 320 is used to extract features from non-overlapping image blocks using an adaptive multi-scale image classification block to obtain an initial feature sequence of the image to be processed.
[0078] The second feature extraction module 330 is used to perform global feature extraction on the initial feature sequence and local feature extraction on the image to be processed, so as to obtain the global image features and local image features of the image to be processed.
[0079] The region determination module 340 is used to determine the key regions of the image to be processed based on a multi-granularity feature pyramid, according to global image features and local image features; wherein, the multi-granularity feature pyramid is trained based on global image features and local image features;
[0080] The image detection module 350 is used to detect target objects in the image to be processed based on key regions using a multi-granularity feature pyramid, and obtain target detection results.
[0081] This application employs an adaptive multi-scale image classification block to segment the image to be processed into at least two different sizes of non-overlapping image blocks based on the image resolution. The image to be processed is obtained by scaling an initial pest and disease image. The adaptive multi-scale image classification block is used to extract features from the non-overlapping image blocks, resulting in an initial feature sequence for the image to be processed. Global feature extraction and local feature extraction are then performed on the initial feature sequence to obtain global and local image features of the image to be processed. Based on a multi-granularity feature pyramid, key regions of the image to be processed are determined according to the global and local image features. The multi-granularity feature pyramid is trained based on the global and local image features. Based on the multi-granularity feature pyramid, target objects in the image to be processed are detected according to the key regions, resulting in target detection results. This technical solution significantly improves the detection accuracy of pest and disease targets of different sizes by fusing global and multi-scale local features. Furthermore, the adaptive block segmentation strategy and local-global attention mechanism dynamically optimize computational resource allocation while maintaining feature representation capabilities.
[0082] Optionally, the region determination module 340 includes:
[0083] The feature fusion unit is used to fuse global image features and local image features based on a multi-granularity feature pyramid to obtain a multi-scale feature map of the image to be processed.
[0084] The region determination unit is used to determine the key regions of the image to be processed based on the dual-path attention mechanism and multi-scale feature maps.
[0085] Optionally, the dual-path attention mechanism includes coarse-grained filtering and fine-grained filtering; correspondingly, the region determination unit is specifically used for:
[0086] Based on a prior knowledge base, feature map regions are divided into multi-scale feature maps to obtain at least one feature map region to be verified; wherein, the prior knowledge base is determined according to the size distribution of the target object.
[0087] Perform coarse-grained screening on at least one feature map region to be verified, and determine at least one candidate feature map region from the feature map regions to be verified;
[0088] Fine-grained screening is performed on at least one candidate feature map region to determine the key regions of the image to be processed from the candidate feature map regions.
[0089] Optionally, the first feature extraction module 320 is specifically used for:
[0090] Local spatial features of non-overlapping image patches are extracted to obtain the local spatial features of the non-overlapping image patches;
[0091] Position encoding is performed on non-overlapping image blocks to obtain their relative position data;
[0092] Local spatial features and relative position data are fused to obtain the initial feature sequence of the image to be processed.
[0093] Optionally, the image detection module 350 includes:
[0094] An image slicing unit is used to slice the image to be processed according to key regions to obtain at least one sub-image.
[0095] The image detection unit is used to detect target objects in at least one sub-image based on a multi-granularity feature pyramid, and obtain the target object detection result.
[0096] Optional, the image detection unit is specifically used for:
[0097] Based on the multi-granularity feature pyramid, target objects in at least one sub-image are detected to obtain candidate object detection results for at least one sub-image.
[0098] A non-maximum suppression algorithm is used to integrate the candidate object detection results of at least one sub-image to obtain the target object detection result.
[0099] The multi-scale detection device for pest and disease images provided in this application can execute the multi-scale detection method for pest and disease images provided in any embodiment of this application, and has the corresponding functional modules and beneficial effects for executing each multi-scale detection method for pest and disease images.
[0100] According to embodiments of this application, this application also provides an electronic device, a readable storage medium, and a computer program product.
[0101] Example 4
[0102] Figure 4 This is a schematic diagram of the structure of an electronic device 410 implementing the multi-scale detection method for pest and disease images according to embodiments of this application. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices (such as helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present application described and / or claimed herein.
[0103] like Figure 4As shown, the electronic device 410 includes at least one processor 411 and a memory, such as a read-only memory (ROM) 412 or a random access memory (RAM) 413, communicatively connected to the at least one processor 411. The memory stores computer programs executable by the at least one processor. The processor 411 can perform various appropriate actions and processes based on the computer program stored in the ROM 412 or loaded from storage unit 418 into the RAM 413. The RAM 413 may also store various programs and data required for the operation of the electronic device 410. The processor 411, ROM 412, and RAM 413 are interconnected via a bus 414. An input / output (I / O) interface 415 is also connected to the bus 414.
[0104] Multiple components in electronic device 410 are connected to I / O interface 415, including: input unit 416, such as keyboard, mouse, etc.; output unit 417, such as various types of displays, speakers, etc.; storage unit 418, such as disk, optical disk, etc.; and communication unit 419, such as network card, modem, wireless transceiver, etc. Communication unit 419 allows electronic device 410 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0105] Processor 411 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of processor 411 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various processors running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. Processor 411 performs the various methods and processes described above, such as multi-scale detection methods for pest and disease images.
[0106] In some embodiments, the multi-scale detection method for pest and disease images can be implemented as a computer program tangibly contained in a computer-readable storage medium, such as storage unit 418. In some embodiments, part or all of the computer program can be loaded and / or installed on electronic device 410 via ROM 412 and / or communication unit 419. When the computer program is loaded into RAM 413 and executed by processor 411, one or more steps of the multi-scale detection method for pest and disease images described above can be performed. Alternatively, in other embodiments, processor 411 can be configured as the multi-scale detection method for pest and disease images by any other suitable means (e.g., by means of firmware).
[0107] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.
[0108] Computer programs used to implement the methods of this application may be written in any combination of one or more programming languages. These computer programs may be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable multi-scale detection device for pest and disease images, such that when executed by the processor, the functions / operations specified in the flowcharts and / or block diagrams are implemented. The computer programs may be executed entirely on a machine, partially on a machine, or as a standalone software package, partially on a machine and partially on a remote machine, or entirely on a remote machine or server.
[0109] In the context of this application, a computer-readable storage medium can be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, apparatus, or device. A computer-readable storage medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. Alternatively, a computer-readable storage medium can be a machine-readable signal medium. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
[0110] To provide interaction with a user, the systems and techniques described herein can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the electronic device. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).
[0111] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or computing systems that include middleware components (e.g., application servers), or computing systems that include frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.
[0112] A computing system can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. The server can be a cloud server, also known as a cloud computing server or cloud host, which is a hosting product within the cloud computing service system to address the shortcomings of traditional physical hosts and VPS services, such as high management difficulty and weak business scalability.
[0113] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this application can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution of this application can be achieved, and this is not limited herein.
[0114] The specific embodiments described above do not constitute a limitation on the scope of protection of this application. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this application should be included within the scope of protection of this application.
Claims
1. A multi-scale detection method for pest and disease images, characterized in that, include: An adaptive multi-scale image classification block is used to segment the image to be processed into at least two different sizes of non-overlapping image blocks according to the image resolution of the image to be processed; wherein, the image to be processed is obtained by scaling the size of an initial pest and disease image; The adaptive multi-scale image classification block is used to extract local features from the non-overlapping image blocks to obtain the initial feature sequence of the image to be processed; Global feature extraction is performed on the initial feature sequence, and local feature extraction is performed on the image to be processed to obtain the global image features and local image features of the image to be processed. Based on a multi-granularity feature pyramid, the key regions of the image to be processed are determined according to the global image features and the local image features; wherein, the multi-granularity feature pyramid is trained based on the global image features and the local image features; Based on the multi-granularity feature pyramid, the target objects in the image to be processed are detected according to the key regions to obtain the target detection results; The process of extracting local features from the non-overlapping image patches to obtain an initial feature sequence of the image to be processed includes: Local spatial features of the non-overlapping image blocks are extracted to obtain the local spatial features of the non-overlapping image blocks; Position encoding is performed on the non-overlapping image blocks to obtain the relative position data of the non-overlapping image blocks; The local spatial features and the relative position data are fused to obtain the initial feature sequence of the image to be processed.
2. The method according to claim 1, characterized in that, Based on a multi-granularity feature pyramid, the key regions of the image to be processed are determined according to the global image features and the local image features, including: Based on the multi-granularity feature pyramid, the global image features and the local image features are fused to obtain the multi-scale feature map of the image to be processed; Based on the dual-path attention mechanism, the key regions of the image to be processed are determined according to the multi-scale feature map.
3. The method according to claim 2, characterized in that, The dual-path attention mechanism includes coarse-grained filtering and fine-grained filtering; correspondingly, the determination of key regions of the image to be processed based on the multi-scale feature map, according to the dual-path attention mechanism, includes: Based on a prior knowledge base, the multi-scale feature map is divided into feature map regions to obtain at least one feature map region to be verified; wherein, the prior knowledge base is determined according to the size distribution of the target object. Coarse-grained screening is performed on the at least one feature map region to be verified to determine at least one candidate feature map region from the feature map region to be verified; Fine-grained screening is performed on the at least one candidate feature map region to determine the key region of the image to be processed from the candidate feature map region.
4. The method according to claim 1, characterized in that, The method of detecting target objects in the image to be processed based on the multi-granularity feature pyramid and the key regions to obtain target detection results includes: Based on the key regions, the image to be processed is sliced to obtain at least one sub-image; Based on the multi-granularity feature pyramid, target objects in at least one sub-image are detected to obtain target object detection results.
5. The method according to claim 4, characterized in that, The step of detecting target objects in at least one sub-image based on the multi-granularity feature pyramid to obtain target object detection results includes: Based on the multi-granularity feature pyramid, target objects in at least one sub-image are detected to obtain candidate object detection results for at least one sub-image. A non-maximum suppression algorithm is used to integrate the candidate object detection results of at least one sub-image to obtain the target object detection result.
6. A multi-scale detection device for pest and disease images, characterized in that, include: The image segmentation module is used to segment the image to be processed into at least two different sizes of non-overlapping image blocks based on the image resolution of the image to be processed using adaptive multi-scale image classification blocks; wherein the image to be processed is obtained by scaling the size of an initial pest and disease image; The first feature extraction module is used to extract features from the non-overlapping image blocks using the adaptive multi-scale image classification blocks to obtain the initial feature sequence of the image to be processed. The second feature extraction module is used to perform global feature extraction on the initial feature sequence and local feature extraction on the image to be processed, so as to obtain the global image features and local image features of the image to be processed. A region determination module is used to determine the key regions of the image to be processed based on a multi-granularity feature pyramid, according to the global image features and the local image features; wherein, the multi-granularity feature pyramid is trained based on the global image features and the local image features; The image detection module is used to detect target objects in the image to be processed based on the multi-granularity feature pyramid and the key regions, and obtain target detection results. The first feature extraction module is specifically used for: Local spatial features of the non-overlapping image blocks are extracted to obtain the local spatial features of the non-overlapping image blocks; Position encoding is performed on the non-overlapping image blocks to obtain the relative position data of the non-overlapping image blocks; The local spatial features and the relative position data are fused to obtain the initial feature sequence of the image to be processed.
7. An electronic device, characterized in that, include: One or more processors; Memory, used to store one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the multi-scale detection method for pest and disease images as described in any one of claims 1-5.
8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by the processor, the program implements the multi-scale detection method for pest and disease images as described in any one of claims 1-5.
9. A computer program product comprising a computer program that, when executed by a processor, implements the multi-scale detection method for pest and disease images according to any one of claims 1-5.