A crop disease and pest intelligent image recognition method and system based on a convolutional neural network

By using contour distance transformation and multi-scale feature fusion techniques from convolutional neural networks, the problem of insufficient pest and disease identification capabilities in traditional methods has been solved, achieving high-precision intelligent image recognition of crop pests and diseases.

CN122244534APending Publication Date: 2026-06-19SHANDONG BUSINESS INST +2

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANDONG BUSINESS INST
Filing Date
2026-03-26
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Traditional methods for identifying crop diseases and pests are unable to effectively extract the subtle textures and boundary structures of lesions in complex field environments, resulting in insufficient identification capabilities and an inability to reliably distinguish similar diseases, thus failing to meet the precise diagnostic needs of agricultural production.

Method used

A convolutional neural network-based approach is adopted to enhance the geometric quantification of lesion boundaries through contour line distance transformation, multi-level multi-scale deep feature fusion, and spatial attention reweighting, thereby achieving high-precision pest and disease identification.

🎯Benefits of technology

It significantly improves the model's ability to distinguish similar pests and diseases at a fine-grained level and its generalization robustness, achieving high-precision and high-efficiency cloud-based automated intelligent diagnosis.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244534A_ABST
    Figure CN122244534A_ABST
Patent Text Reader

Abstract

This invention provides a method and system for intelligent image recognition of crop diseases and pests based on convolutional neural networks, belonging to the field of intelligent agricultural information technology. The method includes: acquiring original images of crops containing symptoms of diseases and pests to be identified; preprocessing the images to obtain preprocessed images; passing the preprocessed images through a pre-trained convolutional neural network feature extraction backbone network; extracting the low-level basic features of the image through the primary convolutional layers at the front end of the backbone network to obtain a first intermediate feature map; processing the first intermediate feature map to geometrically quantify and enhance the spatial hierarchy between the edges and internal structures of lesion regions in the first intermediate feature map, generating an initial feature map that strengthens key boundary information. This invention achieves high-precision and highly robust automatic identification of diseases and pests through a classifier, thereby providing a fast and reliable intelligent diagnostic solution for agricultural production.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent agricultural information technology, and in particular to an intelligent image recognition method and system for crop diseases and pests based on convolutional neural networks. Background Technology

[0002] Automated image recognition of crop diseases and pests is a key technology for the development of smart agriculture. Before the widespread application of deep learning, the mainstream technical approach in this field relied on traditional image processing and machine learning methods. The core process of this approach typically includes: preprocessing the collected field images, then extracting visual features such as the color, texture, and shape of lesions using manually designed algorithms, and finally identifying these features.

[0003] While this technical approach has achieved some success in controlled environments, its limitations stem from several drawbacks. Firstly, artificially designed global features, such as color histograms and overall texture statistics, can obscure crucial, subtle pathological information on leaf surfaces, including lesion edge morphology and the arrangement of minute lesions. Secondly, generalized, handcrafted features are highly insensitive to subtle inter-class differences, such as the distinction between bacterial angular leaf spot and fungal downy mildew in terms of gloss and the degree of restriction by leaf veins. Furthermore, fixed feature sets struggle to characterize the intra-class diversity resulting from the dynamic changes of the same disease from its early to late stages. These limitations lead to a severe deficiency in fine-grained identification capabilities, making it unreliable to differentiate between numerous visually similar diseases commonly encountered in field practice, and thus failing to meet the practical needs of precise disease and pest diagnosis in agricultural production. Summary of the Invention

[0004] The technical problem to be solved by the present invention is to provide a method and system for intelligent image recognition of crop diseases and pests based on convolutional neural networks. The method and system achieve high-precision and robust automatic identification of diseases and pests through a classifier, thereby providing a fast and reliable intelligent diagnostic solution for agricultural production.

[0005] To solve the above-mentioned technical problems, the technical solution of the present invention is as follows: In a first aspect, a method for intelligent image recognition of crop diseases and pests based on convolutional neural networks is provided, the method comprising: Step 1: Obtain the original image of the crop containing the symptoms of the pests and diseases to be identified, and preprocess the image to obtain the preprocessed image; Step 2: The preprocessed image is processed through the pre-trained convolutional neural network feature extraction backbone network. The basic low-level features of the image are extracted through the primary convolutional layer group at the front end of the backbone network to obtain the first intermediate feature map. The first intermediate feature map is processed to quantify and enhance the spatial hierarchical relationship between the edge and internal structure of the lesion area in the first intermediate feature map in a geometric manner, and an initial feature map with enhanced key boundary information is generated. Step 3: Input the initial feature map into the deep convolutional layer group at the back end of the backbone network. The deep convolutional kernel performs a step-by-step, iterative nonlinear combination and abstraction of the initial feature map in the feature space to generate a deep feature map containing high-level semantic information. Step 4: Perform cross-level feature fusion between the initial feature map and the deep feature map to construct a unified multi-level deep feature map containing information at different scales and levels of abstraction. Step 5: Based on the unified multi-level deep feature map, an attention mechanism is introduced to calculate and generate a spatial attention weight matrix. The feature map is then re-weighted and optimized based on the matrix to obtain the optimized feature vector. Step 6: Input the optimized feature vector into the fully connected classifier. The classifier calculates the probability of it belonging to each preset pest and disease category and obtains the final recognition result to achieve intelligent image recognition of crop pests and diseases.

[0006] Furthermore, original images of crops containing symptoms of the pests and diseases to be identified are acquired, and the images are preprocessed to obtain preprocessed images, including: Step 1.1: The input original crop image is normalized by the cloud processor, and the image is uniformly scaled to a preset resolution to obtain a normalized image. Step 1.2: Based on the size-normalized image, perform color space conversion and automatic white balance correction to obtain a color-corrected image. Step 1.3: For the color-corrected image, apply an edge detection algorithm to extract edge information containing leaf and lesion areas, generating an initial edge map. Step 1.4: Perform polygon fitting on the initial edge map to close and smooth the edge contours, separating the foreground region of the leaf and generating a leaf foreground contour mask. Step 1.5: Using the leaf foreground contour mask, perform foreground extraction and background suppression processing on the color-corrected image, and use histogram equalization to enhance the contrast of the foreground region, generating an image with enhanced foreground. Step 1.6: Perform data standardization processing on the image with enhanced foreground to obtain a pre-processed image.

[0007] Furthermore, the preprocessed image is processed through a pre-trained convolutional neural network feature extraction backbone. The underlying basic features of the image are extracted through the primary convolutional layers at the front end of the backbone, resulting in a first intermediate feature map. A contour distance transform algorithm is then applied to process the first intermediate feature map, geometrically quantifying and enhancing the spatial hierarchy between the edges and internal structures of the lesion region in the first intermediate feature map, generating an initial feature map that strengthens key boundary information, including: Step 2.1: The preprocessed image is processed by the front-end primary convolutional layer group of the pre-trained convolutional neural network feature extraction backbone network to extract the low-level texture and edge features of the image and generate the first intermediate feature map. Step 2.2: Based on the first intermediate feature map, apply an edge detection algorithm in the cloud processor to extract the contour information of the lesion region in the first intermediate feature map, and obtain the contour feature map of the lesion region; Step 2.3: Based on the contour feature map of the lesion area, calculate the geometric distance from each pixel in the feature map to the nearest contour boundary, and generate a distance field feature map; Step 2.4: The distance field feature map and the first intermediate feature map are weighted and fused to generate an initial feature map that enhances the key boundary information.

[0008] Furthermore, the initial feature map is input into a group of deep convolutional layers at the back end of the backbone network. The deep convolutional kernels perform a progressive, iterative, non-linear combination and abstraction of the initial feature map in the feature space, generating a deep feature map containing high-level semantic information, including: Step 3.1: The initial feature map that enhances key boundary information is passed through the deep convolutional layers at the back end of the backbone network. The first set of deep convolutional kernels performs a high-dimensional nonlinear transformation on the initial feature map, extracts and combines more complex feature patterns, and generates the first transition feature map. Step 3.2: Based on the first transition feature map, iterative convolution, nonlinear activation and batch normalization operations are performed on it through a deeper set of convolution kernels to perform progressive abstraction and semantic refinement in the feature space, resulting in a second transition feature map containing mid-level semantic information. Step 3.3: Apply pooling operation to the second transition feature map to reduce the dimensionality of the feature map while retaining key feature information, and generate the third transition feature map. Step 3.4: Pass the third transition feature map through the terminal convolutional layer of the deep convolutional layer group to extract the high-level semantic features most relevant to the disease and pest category discrimination, and obtain a deep feature map containing high-level semantic information.

[0009] Furthermore, the initial feature map and the deep feature map are fused across different levels to construct a unified multi-level deep feature map containing information at different scales and levels of abstraction, including: Step 4.1: Obtain a deep feature map containing advanced semantic information and an initial feature map that enhances key boundary information; Step 4.2: Perform an upsampling operation based on the initial feature map to align the spatial dimensions of the initial feature map with the spatial dimensions of the deep feature map, generating an initial feature map with matching dimensions. Step 4.3: The initial feature map with size matching and the deep feature map are concatenated along the channel dimension. The concatenated result is then fused and dimensionality reduced to generate a preliminary fused feature map. Step 4.4: Apply nonlinear activation and feature recalibration operations to the initially fused feature map, and weightedly fuse complementary information from different levels to construct a unified multi-level deep feature map containing information of different scales and abstract levels.

[0010] Furthermore, based on the unified multi-level deep feature map, an attention mechanism is introduced to generate a spatial attention weight matrix, and the feature map is reweighted and optimized according to the matrix to obtain an optimized feature vector, including: Step 5.1: Receive the constructed unified multi-level deep feature map, and through two parallel paths of the spatial attention mechanism, perform global average pooling and global max pooling operations on the unified multi-level deep feature map in the channel dimension to generate a dual-channel feature description containing global spatial context information. Step 5.2: Based on the dual-channel feature description, calculate the relative importance of each spatial location to pest and disease identification through convolutional layers and nonlinear activation functions to generate an initial spatial attention weight distribution map; Step 5.3: Normalize the initial spatial attention weight distribution map to generate a spatial attention weight matrix; Step 5.4: Multiply the spatial attention weight matrix element-wise with the unified multi-level deep feature map to weight and enhance the local features in the feature map that are highly correlated with the identification of pests and diseases, and generate a reweighted and optimized feature map. Step 5.5: Perform global average pooling on the reweighted optimized feature map to compress and aggregate the spatial dimensions, thereby obtaining the optimized feature vector.

[0011] Furthermore, the optimized feature vector is input into a fully connected classifier, which calculates the probability of it belonging to each preset pest and disease category to obtain the final recognition result, thereby achieving intelligent image recognition of crop pests and diseases, including: Step 6.1: Input the optimized feature vector into the first fully connected layer of the fully connected classifier, and map it to the high-dimensional classification feature space through linear transformation to generate the primary classification feature vector; Step 6.2: Apply a non-linear activation function to the primary classification feature vector to obtain the activated classification feature vector; Step 6.3: Based on the activated classification feature vector, nonlinear feature transformation and high-order semantic integration are performed through the second fully connected layer of the fully connected classifier to generate a high-level discriminative feature vector; Step 6.4: Pass the high-level discriminative feature vector through the output layer of the fully connected classifier, and calculate its matching degree with each preset pest and disease category through linear transformation to generate the original score vector for each category; Step 6.5: Apply the normalized exponential function to the original score vector to convert the scores of each category into a probability distribution and obtain the probability that the optimized feature vector belongs to each preset pest category. Step 6.6: Based on the probability distribution, select the category with the highest probability value to obtain the intelligent image recognition results of crop diseases and pests.

[0012] Secondly, a smart image recognition system for crop diseases and pests based on convolutional neural networks includes: The acquisition module is used to acquire original images of crops containing symptoms of pests and diseases to be identified, and to preprocess the images to obtain preprocessed images. The processing module is used to preprocess the image by using a pre-trained convolutional neural network feature extraction backbone network, and extracting the low-level basic features of the image through the primary convolutional layer group at the front end of the backbone network to obtain a first intermediate feature map; the first intermediate feature map is processed to quantify and enhance the spatial hierarchical relationship between the edge and internal structure of the lesion area in the first intermediate feature map in a geometric manner, and generate an initial feature map that strengthens the key boundary information. The input module is used to input the initial feature map into the deep convolutional layer group at the back end of the backbone network. The deep convolutional kernels perform step-by-step, iterative nonlinear combination and abstraction of the initial feature map in the feature space to generate a deep feature map containing high-level semantic information. The fusion module is used to perform cross-level feature fusion between the initial feature map and the deep feature map to construct a unified multi-level deep feature map containing information of different scales and abstraction levels. The computation module is used to generate a spatial attention weight matrix by introducing an attention mechanism based on a unified multi-level deep feature map, and to re-weight and optimize the feature map based on the matrix to obtain an optimized feature vector. The recognition module is used to optimize the feature vector input to the fully connected classifier. The classifier calculates the probability of it belonging to each preset pest and disease category and obtains the final recognition result to realize intelligent image recognition of crop pests and diseases.

[0013] Thirdly, a computing device includes: One or more processors; A storage device for storing one or more programs that, when executed by one or more processors, cause the one or more processors to implement the method.

[0014] Fourthly, a computer-readable storage medium storing a program that, when executed by a processor, implements the method.

[0015] The above-described solution of the present invention has at least the following beneficial effects: By employing a series of synergistic techniques, including contour distance transformation-based geometric quantization enhancement of lesion boundaries, cross-level multi-scale deep feature fusion, and spatial attention reweighting and focusing, this approach systematically overcomes the core technical problems of traditional methods, such as insufficient extraction of subtle texture and boundary structure features of lesions in complex field environments, inadequate utilization of multi-scale semantic information, and susceptibility to interference from cluttered background information. This results in a significant improvement in the model's ability to distinguish similar pests and diseases at a fine-grained level, enhanced generalization robustness of the system under different light and growth stages, and ultimately, a remarkable achievement of high-precision, high-efficiency, cloud-based automated intelligent diagnosis. Attached Figure Description

[0016] Figure 1 This is a flowchart illustrating an intelligent image recognition method for crop diseases and pests based on convolutional neural networks, provided by an embodiment of the present invention.

[0017] Figure 2 This is a schematic diagram of an intelligent image recognition system for crop diseases and pests based on a convolutional neural network, provided by an embodiment of the present invention. Detailed Implementation

[0018] Exemplary embodiments of the present disclosure will now be described in more detail with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

[0019] like Figure 1 As shown in the figure, an embodiment of the present invention proposes an intelligent image recognition method for crop diseases and pests based on convolutional neural networks. The method includes the following steps: Step 1: Obtain the original image of the crop containing the symptoms of the pests and diseases to be identified, and preprocess the image to obtain the preprocessed image; Step 2: The preprocessed image is processed through the pre-trained convolutional neural network feature extraction backbone network. The basic low-level features of the image are extracted through the primary convolutional layer group at the front end of the backbone network to obtain the first intermediate feature map. The first intermediate feature map is processed to quantify and enhance the spatial hierarchical relationship between the edge and internal structure of the lesion area in the first intermediate feature map in a geometric manner, and an initial feature map with enhanced key boundary information is generated. Step 3: Input the initial feature map into the deep convolutional layer group at the back end of the backbone network. The deep convolutional kernel performs a step-by-step, iterative nonlinear combination and abstraction of the initial feature map in the feature space to generate a deep feature map containing high-level semantic information. Step 4: Perform cross-level feature fusion between the initial feature map and the deep feature map to construct a unified multi-level deep feature map containing information at different scales and levels of abstraction. Step 5: Based on the unified multi-level deep feature map, an attention mechanism is introduced to calculate and generate a spatial attention weight matrix. The feature map is then re-weighted and optimized based on the matrix to obtain the optimized feature vector. Step 6: Input the optimized feature vector into the fully connected classifier. The classifier calculates the probability of it belonging to each preset pest and disease category and obtains the final recognition result to achieve intelligent image recognition of crop pests and diseases.

[0020] In this embodiment of the invention, the present invention employs a series of interconnected and synergistic technical means, including contour separation and enhanced preprocessing for the leaf foreground, geometric quantization enhancement of lesion boundaries based on contour distance transformation, cross-level fusion of deep and shallow features, and attention reweighting and focusing based on a unified multi-level feature map. These techniques overcome the systemic technical problems of traditional methods, such as impure feature extraction due to background interference, difficulty in distinguishing similar diseases due to blurred boundary features, incomplete identification of polymorphic lesions due to single-scale features, and the submergence of key areas due to feature response averaging. As a result, the invention improves the model's accuracy and efficiency in capturing and identifying key features of pests and diseases, enhances the model's adaptability to complex field scenes and multi-scale lesions, and ultimately achieves the technical effect of high accuracy and high robustness in cloud-based automated intelligent identification.

[0021] In a preferred embodiment of the present invention, step 1 may include: Step 1.1: The input raw crop images are normalized using a cloud processor, scaling them to a preset resolution to obtain a normalized image. This process includes: receiving various raw crop images collected from the field; reading the basic resolution parameters such as the number of pixels and columns in each raw image; confirming the preset target resolution, which is set based on the actual needs of crop leaf and lesion identification and adapts to the parameter requirements of all subsequent image processing steps such as color correction and edge detection; then performing a proportional scaling operation on the raw images, adjusting the number of pixels and columns according to the aspect ratio of the raw images; interpolating the image pixels during the scaling process to fill the pixel gaps generated after scaling, ensuring the continuity and visual integrity of the image pixels; and finally, outputting a normalized image with all pixels having the same size after all input raw crop images have been proportionally scaled and interpolated.

[0022] Step 1.2: Based on the size-normalized image, perform color space conversion and automatic white balance correction to obtain a color-corrected image. Specifically, this includes: using the size-normalized image as the processing basis, firstly, perform color space conversion to transform the original red-green-blue color space of the image into a color space more suitable for the color analysis of crop leaves and lesions. This weakens the drawbacks of the red-green-blue color space being easily affected by field lighting, allowing for a clearer visual distinction between the green tones of normal leaf areas and the yellowish-brown, grayish-white, and other dissimilar tones of lesion areas. After completing the color space conversion, perform automatic white balance correction. First, the converted image is scanned across the entire pixel area to identify neutral gray regions in the image that are not affected by the color of leaves and lesions. The color parameters of these regions are used as the baseline value for color correction to eliminate the overall color cast caused by different lighting conditions such as strong light, weak light, cloudy days, and direct sunlight during field collection. Then, the color gain of each color channel in the image is adjusted according to the baseline value to accurately restore the natural and true color of the normal area of ​​the leaf, while accurately presenting the inherent color characteristics of the lesion area. After correction, a color-corrected image with accurate color restoration is generated.

[0023] Step 1.3: For the color-corrected image, an edge detection algorithm is applied to extract edge information containing leaf and lesion areas to generate an initial edge map. Specifically, this includes: First, grayscale processing is performed on the color-corrected image. By fusing the pixel values ​​of each color channel of the image, the color-corrected image is converted into a single-channel grayscale image, reducing the interference of multiple color dimensions on edge detection and improving the computational efficiency and recognition accuracy of edge detection. Then, an edge detection algorithm adapted to the edge features of crop leaves and lesions is selected to calculate the grayscale value of the grayscale image pixel by pixel, obtain the grayscale value change gradient of the surrounding area of ​​each pixel, and capture pixels in the image where the grayscale value changes abruptly. These pixels are the overall outline edge of the leaf and the boundary edge between the lesion area and the normal leaf area. All detected edge pixels are highlighted and completely preserved, while non-edge pixels are weakened in grayscale value. Finally, all marked edge pixels are integrated according to their spatial distribution rules to generate an initial edge map that can clearly and completely present the overall outline of the leaf and the edge morphology of the lesion area. Step 1.4 involves performing polygon fitting on the initial edge image to close and smooth the edge contours, thereby separating the foreground region of the leaf and generating a leaf foreground contour mask. Specifically, this includes: first, reading the spatial coordinate information of all edge pixels in the initial edge image; performing polygon fitting on these discretely distributed edge pixels; and then, according to the spatial arrangement of the pixels, sequentially fitting and connecting the discrete edge points with continuous polygonal line segments to fill the gaps between edge points caused by image noise and detection bias, thus achieving complete closure of the leaf edge contour. Next, smoothing is performed on the closed edge contour. By calculating the neighborhood mean of the pixels, burrs, irregular protrusions, and depressions caused by field image noise are removed from the contour, making the leaf contour lines continuous, smooth, and conforming to the actual shape of the leaf. Then, using the processed closed and smooth contour as a clear boundary, the image is divided into the foreground region where the leaf body is located, and the background region, which is irrelevant to soil, weeds, and field facilities. Finally, based on the region division result, a leaf foreground contour mask is generated. In the mask, effective pixel labels are applied to the leaf foreground region, and masked pixel labels are applied to the background region.Step 1.5: Using a leaf foreground contour mask, the color-corrected image undergoes foreground extraction and background suppression processing. Histogram equalization is then used to enhance the contrast of the foreground region, generating an image with a strengthened foreground. Specifically, this involves: first, precisely overlaying the leaf foreground contour mask onto the color-corrected image; then, based on the pixel identifiers in the mask, accurately locating and extracting all pixel information of the leaf foreground region in the color-corrected image; and simultaneously, completely suppressing the background region in the image according to the mask's shielding identifiers, thoroughly eliminating interference from background factors such as soil and weeds on lesion feature analysis, retaining only the main leaf and the lesions attached to the leaf. In the spotted area, histogram equalization is then performed on the extracted pure leaf foreground area. First, the gray value distribution of all pixels in the area is statistically analyzed to determine whether the gray values ​​are concentrated in a narrow range. If this is the case, it indicates that the contrast between the lesion and the normal leaf area is low, and subtle pathological information is difficult to distinguish. Then, the concentrated gray values ​​are redistributed and uniformly mapped to a wider gray value range, expanding the difference in brightness between the lesion area and the normal leaf area, improving the contrast between the areas, and clearly displaying the edge morphology of the lesion, the arrangement of tiny lesion points, and other key local subtle pathological information. After processing, an image with enhanced foreground is generated. Step 1.6 involves performing data standardization on the image for enhancing the foreground to obtain a preprocessed image. This includes: firstly, reading the numerical information of all pixels in the enhanced foreground image; based on these pixel values, calculating the overall mean and standard deviation of the image's pixel values ​​as the core benchmark parameters for data standardization; then, using the calculated mean and standard deviation as a basis, standardizing and adjusting the value of each pixel in the image one by one, converting each pixel value into a value conforming to a standard normal distribution, eliminating the overall pixel value offset caused by factors such as uneven field light intensity, differences in camera hardware parameters, and different shooting angles, ensuring that the pixel values ​​of the image are within a uniform quantization range. During the standardization adjustment process, the relative change law of pixel values ​​is strictly followed to ensure that the key visual features of the lesions in the image, such as texture, shape, and edges, are not changed due to numerical adjustments. After the standardization adjustment of all pixels in the image is completed, the final preprocessed image is output.

[0024] In this embodiment of the invention, a series of collaborative image preprocessing techniques are employed, including size normalization, color space conversion and automatic white balance correction, edge detection and polygon fitting to generate a foreground mask, foreground extraction and background suppression based on the mask, local contrast enhancement, and data standardization. These techniques overcome the technical problems of inconsistent sizes, color casts due to lighting, cluttered backgrounds, blurred leaf outlines, and insufficient contrast between lesions and healthy tissues caused by differences in shooting conditions in the original images. As a result, the invention achieves unified input specifications, corrects color accuracy, accurately separates the main leaf body, and significantly highlights the details of the lesion area.

[0025] In a preferred embodiment of the present invention, step 2 above may include: Step 2.1 involves passing the preprocessed image through the front-end primary convolutional layer group of a pre-trained convolutional neural network feature extraction backbone to extract the low-level texture and edge features of the image, generating the first intermediate feature map. Specifically, this includes: firstly, adapting the pixel format and dimensions of the preprocessed image according to the input requirements of the convolutional neural network feature extraction backbone to ensure that the image can be effectively read by the network's front-end primary convolutional layer group. This primary convolutional layer group consists of multiple layers of convolutional kernels at different scales. The kernel size is set according to the feature scale of crop leaf texture and lesion edges, covering the detection range from tiny lesion points to the basic texture of the leaf. During network operation, the first layer of the primary convolutional layer group first performs pixel-by-pixel sliding convolution operations on the preprocessed image. The algorithm captures the local variation patterns of pixel grayscale values ​​in the image, initially identifying the basic texture of the leaf surface, including leaf vein patterns, fine lines on the leaf epidermis, and basic edge pixels of lesions and normal leaf areas. Multi-layer convolutional kernels sequentially perform secondary and tertiary convolution operations on the feature data output from the previous layer, gradually aggregating local feature information, eliminating minor noise interference remaining in the field image, and enhancing the signal strength of effective texture and edge features. During the convolution operation, the primary convolutional layer group integrates all extracted bottom-level texture features, including the rough texture of the lesion surface and the smooth texture of the normal leaf, and edge features, including the irregular edges of the lesions and the overall contour edges of the leaf, into a multi-channel feature matrix, generating a first intermediate feature map that can completely preserve the bottom-level visual features of the image.

[0026] Step 2.2: Based on the first intermediate feature map, an edge detection algorithm is applied in the cloud processor to extract the contour information of the lesion region in the first intermediate feature map, resulting in a contour feature map of the lesion region. Specifically, the first intermediate feature map is a high-dimensional feature representation output by a convolutional layer, typically containing a multi-channel structure such as 64 or 128 channels with floating-point values. To adapt to the input requirements of the edge detection algorithm, preprocessing is required. This involves filtering out key channels sensitive to edges and textures through the feature response intensity recorded during training, then fusing these key channels into a single-channel feature map using a weighted summation method, and finally mapping its values ​​to the [0,1] interval. Subsequently, the Canny edge detection algorithm, which has strong noise suppression capabilities, accurate edge localization, and good contour continuity, is selected. Combined with the floating-point characteristics of the first intermediate feature map, an optimization process is executed. First, a 5×5 Gaussian kernel with σ=1.4 is used to perform convolution operations on the normalized single-channel feature map. The pixel-level feature fluctuations introduced by the weighted average convolution are smoothed, suppressing noise while preserving the overall trend of the lesion edge. The Sobel operator in the x-direction is used. ; Sobel operator in the y-direction Convolve the filtered feature map and calculate the gradient strength. G represents the gradient intensity of a pixel, where This is the result of the convolution in the x-direction. This is the result of the convolution in the y-direction, while also including the gradient direction. The gradient is quantized into four principal directions (0°, 45°, 90°, 135°). Then, iterates through pixels in the feature map with non-zero gradient strength, comparing the gradient strength of that pixel with its two adjacent pixels along its principal gradient direction. Only pixels with the maximum local gradient strength are retained, and the wide edges are thinned into single-pixel width contours. Finally, a gradient strength is set based on the average gradient strength of the feature map. with standard deviation The dynamically calculated adaptive dual thresholds Thigh and Tlow have a ratio of 2:1 or 3:1, such as... , ,in This is the average value. Using the standard deviation, pixels with gradient strength > Thigh are marked as strong edge pixels and directly retained; pixels with gradient strength < Tlow < Thigh are marked as weak edge pixels, and only the parts directly connected to strong edge pixels are retained; pixels with gradient strength < Tlow are directly suppressed. Finally, 8-neighborhood connectivity analysis is used to mark the connected regions of the edge pixels after double thresholding, filtering out small noise fragments with an area less than 50 pixels of the preset threshold, filling in the small breaks in the lesion contour, and generating continuous and complete lesion edge contours. The contours obtained after edge detection may contain a small number of leaf vein texture fragments, residual background edges, etc. Non-lesion-related contours need to be morphologically screened, retaining connected regions whose aspect ratio and area are within a preset range. The range is determined based on the statistical analysis of lesion morphology in the training set and the location correlation screening, and contours with an overlap of more than 70% with high-response regions in the first intermediate feature map are further optimized. Finally, a binarized lesion region contour feature map is generated, in which the contour pixels are white with a value of 1, and the background and non-lesion regions are black with a value of 0. This feature map has the accuracy of the contour and the true boundary position of the lesion being less than 2 pixels, the continuity without obvious breaks, and the purity of the contour without redundant interference. During algorithm execution, the algorithm iterates through all channels of the first intermediate feature map pixel by pixel, calculates the gradient change rate of the feature value within the neighborhood of each pixel, and identifies pixels with a gradient change rate exceeding a set threshold as candidate contour points. Then, it performs spatial continuity verification on all candidate contour points, removes isolated candidate points caused by feature noise, and retains the real lesion contour points with spatially continuous distribution characteristics. After that, it connects all verified lesion contour points in an orderly manner according to their spatial coordinates in the feature map to form a complete lesion region contour line. Pixels covered by the contour line are marked with high feature values, and pixel points in non-contour regions are weakened in feature value processing. Finally, it integrates all marked contour information to generate a contour feature map that only highlights the contour of the lesion region.

[0027] Step 2.3: Based on the contour feature map of the lesion area, calculate the geometric distance from each pixel in the feature map to the nearest contour boundary to generate a distance field feature map; specifically, this includes: first, taking the lesion contour boundary pixel in the contour feature map as the core reference benchmark, determining the calculation range of the distance transformation to the entire pixel area covered by the contour feature map, ensuring that all pixels inside and outside the lesion are included in the calculation. Then, the contour distance transformation algorithm is initiated, traversing the contour feature map pixel by pixel: For each pixel to be calculated, the algorithm retrieves the spatial coordinates of all surrounding contour boundary pixels and calculates the straight-line geometric distance between the pixel and the nearest contour boundary pixel in two-dimensional space. During the distance calculation process, distance information that reflects the internal structure of the lesion is retained first. For example, the distance value from the core region of the lesion to the contour boundary is larger, while the distance value from the transition region of the lesion edge to the contour boundary is smaller. The spatial hierarchical relationship between different regions and the edge inside the lesion is quantified by the difference in distance values. After completing the distance calculation for all pixels, the distance values ​​are normalized to map all distance values ​​to a unified numerical range. Finally, the normalized distance values ​​replace the feature values ​​of the corresponding pixels in the original contour feature map to generate a distance field feature map.

[0028] Step 2.4: The distance field feature map and the first intermediate feature map are weighted and fused to generate an initial feature map that enhances key boundary information. Specifically, this includes: First, dimensional alignment processing is performed on the first intermediate feature map and the distance field feature map: If the size and number of channels of the two feature maps are inconsistent, they are adjusted to be perfectly matched in dimension by upsampling, downsampling, or channel expansion or compression to ensure that the spatial position of each pixel can correspond one-to-one. Then, the weighting coefficients of feature fusion are determined. Higher weight coefficients are assigned to the feature channels corresponding to lesion boundary information, and lower weight coefficients are assigned to the channels of non-key information such as normal leaf texture. After the weight setting is completed, a pixel-by-pixel weighted fusion operation is performed: The feature value of each pixel in the first intermediate feature map is multiplied by the corresponding weight, and the distance value of the same pixel in the distance field feature map is multiplied by the corresponding weight. The results of the two multiplications are summed to obtain the fused feature value. During the fusion process, the underlying texture information in the first intermediate feature map is retained. At the same time, the spatial hierarchy information of the lesion boundary is highlighted by the weighted superposition of the distance field feature map. This enhances the local key information such as the lesion edge morphology and the arrangement of small lesion points in the fused feature. After the fusion operation is completed, a non-linear activation process is performed on the fused feature values ​​of all pixels to further improve the distinguishability between key boundary features and background features. Finally, all fused feature values ​​are integrated to generate an initial feature map that enhances the key boundary information.

[0029] In this embodiment of the invention, a technique is employed to perform secondary edge detection on the primary convolutional feature map to extract the precise contour of lesions, apply a contour line distance transformation algorithm to quantify the internal spatial hierarchy of lesions, and finally perform weighted fusion of the generated geometric distance field with the original feature map. This technique overcomes the technical problems of traditional convolutional neural networks, which are not precise and structured enough in the initial extraction of lesion boundary features, and are difficult to quantify the geometric relationship between the internal diffusion morphology of lesions and the edges, resulting in insufficient recognition ability for lesions with complex shapes or blurred edges. This technique significantly enhances the expression of key boundaries and internal structures of lesions in the feature map, thereby improving the model's accuracy in identifying the shape and structure of pests and diseases.

[0030] In a preferred embodiment of the present invention, step 3 above may include: Step 3.1 involves passing the initial feature map, which enhances key boundary information, through a deep convolutional layer group at the back end of the backbone network. The first group of deep convolutional kernels performs a high-dimensional nonlinear transformation on the initial feature map, extracting and combining more complex feature patterns to generate the first transitional feature map. Specifically, this includes: firstly, adapting the initial feature map, which enhances key boundary information, to the input specifications of the deep convolutional layer group at the back end of the backbone network, ensuring that the number of channels and size of the feature map perfectly match the input requirements of the first group of deep convolutional kernels. This group of deep convolutional kernels has a larger receptive field, capable of covering the entire lesion area, and contains a greater number of convolutional kernel channels, allowing it to simultaneously capture multiple different types of complex feature patterns. During the convolution operation stage, the first group of deep convolutional kernels performs region-by-region sliding convolution on the initial feature map, no longer limited to the underlying texture. Instead of focusing on basic visual features such as edges, this layer focuses on the combination relationships between features. For example, it integrates basic features such as the irregular shape of the lesion edge, the rough texture of the surface, and the spatial hierarchy reflected by the internal distance field to capture more complex feature patterns such as jagged edges + flocculent internal texture, and smooth edges + dense micro-lesion points inside. After the convolution operation is completed, the output feature values ​​are immediately subjected to high-dimensional nonlinear transformation. During the feature combination process, this layer will perform cross-channel information interaction on the multi-channel features of the convolution output, and fuse the local complex features captured by different channels into an overall feature pattern. For example, the contour distance features of the lesion are associated with the texture features to form a feature expression that can reflect the overall structure of the lesion. The first group of deep convolutional layers integrates all the extracted and combined complex feature patterns into a multi-dimensional feature matrix to generate the first transition feature map.

[0031] Step 3.2: Based on the first transition feature map, iterative convolution, nonlinear activation, and batch normalization operations are performed on it through a deeper convolutional kernel group. This process involves progressive abstraction and semantic refinement in the feature space to obtain a second transition feature map containing mid-level semantic information. Specifically, the first transition feature map is used as input and fed into a deeper convolutional kernel group. This group consists of multiple cascaded convolutional layers, each with a larger receptive field than the previous layer, covering a wider feature region and achieving gradual aggregation from local to global features. The first step of the iterative process is convolution: each convolutional kernel performs sliding convolution on the feature map output from the previous layer, continuously extracting higher-order correlations between features. For example, from local features such as the serrated edges of lesions and the restriction of lesions by leaf veins, typical feature combinations of bacterial angular leaf spot are extracted. These features possess preliminary semantic information... The first step is nonlinear activation: after each convolution operation, the feature values ​​are nonlinearly mapped through the activation function to enhance the differences between features of different pest and disease categories. For example, the feature value distributions of bacterial angular leaf spot and fungal downy mildew are clearly distinguished. The second step is normalization: the activated feature values ​​are normalized to adjust the feature value distribution to a stable range. The above convolution, nonlinear activation, and batch normalization operations will be iterated in multiple rounds. Each round of iteration will abstract and semantically refine the features: from the initial concrete features such as edge morphology and texture combination, it will be gradually abstracted into mid-level semantic features such as the overall morphological features of lesions and the features of disease infection areas. For example, the intra-class diversity features such as the size change and morphological evolution of lesions from the early to the late stage will be integrated into a unified semantic expression. After multiple rounds of iteration, a second transition feature map that can represent the mid-level semantic information of pests and diseases is obtained.

[0032] Step 3.3: Apply pooling to the second transition feature map to reduce its dimensionality while preserving key feature information, generating the third transition feature map. Specifically, this involves: first, determining the appropriate pooling method for pest and disease features; for crop pest and disease identification scenarios, prioritizing a hybrid pooling strategy combining max pooling and average pooling: max pooling preserves key detail features such as lesion edge mutations and minute lesion points, while average pooling preserves the texture distribution features of the overall lesion area, balancing the integrity of both detailed and global features; then, performing pooling on the second transition feature map: setting the feature map size and feature distribution... The pooling window size and sliding step are matched. The pooling window slides across the second transition feature map region by region according to the set step size. Max pooling is used to calculate the maximum feature value in the window as the representative value of the region, and average pooling is used to calculate the average feature value in the window as the representative value of the region. The two pooling results are fused according to a preset ratio and used as the final pooling output value. During the pooling process, the representative feature value in the window is selected to replace all feature values ​​in the original window. After the pooling operation is completed, a third transition feature map with reduced dimensionality, prominent core features and enhanced spatial invariance is generated.

[0033] Step 3.4: The third transition feature map is passed through the final convolutional layer of the deep convolutional layer group to extract the high-level semantic features most relevant to the disease and pest category discrimination, resulting in a deep feature map containing high-level semantic information. Specifically, the third transition feature map is first input into the final convolutional layer of the deep convolutional layer group. The convolutional kernels of this layer have been trained and optimized with a large number of labeled crop disease and pest samples. The weight parameters of the convolutional kernels are specifically adjusted for disease and pest category discrimination, which can accurately capture the feature information most relevant to the category discrimination. The number of convolutional kernels in the final convolutional layer matches the number of disease and pest categories to be identified. Each convolutional kernel corresponds to the core feature pattern of a disease and pest category. For example, the convolutional kernel for bacterial angular leaf spot focuses on the polygonal shape of the lesion restricted by leaf veins. For fungal downy mildew, the convolution kernel focuses on key features such as the absence of obvious leaf veins in the lesions and the frosty texture on the surface. During the convolution operation, each convolution kernel scans the third transition feature map region by region, selects and extracts feature information that highly matches the corresponding pest and disease category, and filters out redundant features that are irrelevant to category discrimination. After feature extraction, the feature values ​​output by all convolution kernels are integrated, and the scattered local high-level semantic features are aggregated into global high-level semantic features. For example, features such as polygonal lesions, leaf vein restriction, and glossiness are integrated into high-level semantic features of bacterial angular leaf spot, forming a feature expression that can directly support the discrimination of pest and disease categories, and finally generating a deep feature map containing high-level semantic information.

[0034] In this embodiment of the invention, a series of deep feature learning techniques are employed, including hierarchical nonlinear abstraction and combination of deep convolutional layers, iterative convolution and activation and batch normalization operation chains, pooling dimensionality reduction and spatial invariance enhancement, and discriminative semantic feature extraction of terminal convolutional layers. These techniques overcome the technical problems of traditional methods, such as insufficient semantic understanding of complex morphological and textural patterns of pests and diseases, oversensitivity of feature representation to subtle spatial changes, and difficulty in automatically focusing on the most discriminative high-level semantic information from massive features. As a result, the invention achieves the construction of deep feature representations containing rich high-level semantic information and significantly improves the model's ability to understand and distinguish complex lesion patterns.

[0035] In a preferred embodiment of the present invention, step 4 above may include: Step 4.1 involves obtaining a deep feature map containing advanced semantic information and an initial feature map that enhances key boundary information. Specifically, this includes: first, accurately retrieving the generated deep feature map containing advanced semantic information and the generated initial feature map that enhances key boundary information from the feature storage module of the cloud processor. During retrieval, the unique identifiers and generation timestamps of the two feature maps are verified to ensure that the retrieved feature maps are generated from the same original image of crop pests and diseases and under the same processing flow. Then, basic validity verification is performed on the two feature maps: checking the integrity of the advanced semantic features of the deep feature map to confirm that it contains core features strongly correlated with pest and disease classification; checking the integrity of the key boundary information of the initial feature map to confirm that the low-level spatial features such as the contour and distance field of lesions have not been lost or distorted. Simultaneously, the data format and numerical range of the two feature maps are unified to ensure that the feature value quantization standards are consistent. After completing the reading, verification, and format unification of the two feature maps, they are loaded into the feature processing memory of the cloud processor, awaiting size alignment.

[0036] Step 4.2: Perform upsampling based on the initial feature map, aligning the spatial dimensions of the initial feature map with those of the deep feature map to generate a size-matched initial feature map. Specifically, this involves: first, reading the spatial dimension parameters of the deep feature map and setting this dimension as the target size for upsampling the initial feature map; clarifying the core standard for size alignment to ensure a one-to-one correspondence between pixels in the spatial dimension of the two feature maps, avoiding loss of the correspondence between lesion boundaries and semantic features; and selecting an upsampling method suitable for crop pest and disease features for the initial feature map that strengthens key boundary information: prioritizing a hybrid upsampling strategy combining bilinear interpolation and transposed convolution. Bilinear interpolation ensures the smoothness of the overall spatial structure of the feature map, while transposed convolution accurately restores the subtle contour features of the lesion edges. Specifically, the initial feature map is first upsampled using a bilinear interpolation algorithm. The initial size is enlarged, and the feature value of each newly added pixel is calculated. Then, the interpolated feature map is finely adjusted through a transposed convolutional layer. The weight of the transposed convolutional kernel is set according to the distribution law of the boundary features of disease lesions. The focus is on restoring key details such as the small lesion points and jagged edge morphology of the lesions in the initial feature map, ensuring that the bottom boundary features are not lost or distorted during the upsampling process. After upsampling, the size of the generated feature map is verified to confirm that its height and width pixel values ​​are completely consistent with the deep feature map. At the same time, the core boundary features of the upsampled initial feature map and the original initial feature map are verified by feature value similarity comparison, such as the matching degree of lesion contour and distance field level, to ensure that the size enlargement process does not destroy the expression logic of the original features. Finally, an initial feature map with a size match is generated that is completely aligned with the spatial size of the deep feature map and retains the boundary features completely.

[0037] Step 4.3: Concatenate the initial feature map and the deep feature map along the channel dimension. Perform channel fusion and dimensionality reduction on the concatenated result to generate a preliminary fused feature map. The specific process is as follows: First, perform the channel-dimensional concatenation operation: Horizontally merge the initial feature map and the deep feature map along the channel dimension. The initial feature map retains multi-channel information of low-level visual features such as lesion boundaries and textures, while the deep feature map retains multi-channel information of high-level abstract features such as pest and disease category semantics. After concatenation, a high-dimensional feature map is formed, containing low-level boundary feature channels and high-level semantic feature channels. The feature matrix encompasses both the subtle spatial structure information of lesions and semantic information related to category discrimination. A weighted summation operation is performed on all channel feature values ​​corresponding to each pixel. This weighting process automatically uncovers the correlation between low-level boundary features and high-level semantic features. For example, the feature association between serrated edges and leaf vein constraints corresponds to bacterial angular leaf spot. Complementary feature information scattered in different channels is integrated into a unified channel feature. Finally, the feature values ​​output by convolution are range-calibrated to ensure stable feature value distribution and generate a preliminary fused feature map that combines low-level boundary details and high-level semantic information.

[0038] Step 4.4 applies nonlinear activation and feature recalibration operations to the initially fused feature map, weighting and fusing complementary information from different levels to construct a unified multi-level deep feature map containing information at different scales and abstract levels. Specifically, this includes: First, performing nonlinear activation on the initially fused feature map: selecting an activation function adapted to the distribution of crop pest and disease features. This function effectively enhances the differences between features of different pest and disease categories while suppressing noise interference in feature values. During the activation operation, nonlinear mapping is performed on the feature values ​​of each pixel in the initially fused feature map, amplifying the feature value signals of key lesion features and weakening the signals of irrelevant features such as normal leaf areas and background residues. Next, performing feature recalibration: using a channel attention mechanism to achieve adaptive weighting, first performing global average pooling on the activated feature map to extract each feature... The global feature statistics of the channels are used to measure the importance of different channels for disease and pest identification. For example, channels containing lesion boundary information and category semantic information are identified as high-importance channels, while channels containing redundant textures are identified as low-importance channels. Then, the importance of the channels is quantified through a fully connected layer to generate a weight coefficient for each channel. High-importance channels are given higher weights, and low-importance channels are given lower weights. Finally, the weight coefficients are applied to the activated feature map channel by channel to adjust the feature values ​​of each channel: the key features of lesions in high-weight channels are further enhanced, and the redundant information in low-weight channels is suppressed. On this basis, all weighted channel features are integrated to construct a unified multi-level deep feature map containing bottom-level lesion boundary features, mid-level spatial hierarchy features, and high-level category semantic features.

[0039] In this embodiment of the invention, a series of feature fusion techniques are employed, including upsampling to align feature map spatial dimensions, channel splicing feature fusion, and adaptive weighted integration of nonlinear activation and feature recalibration. These techniques overcome the technical problems in traditional neural networks, such as the difficulty in effectively combining deep semantic features and shallow detail features due to differences in scale and abstraction level, feature redundancy and interference caused by simple splicing of multi-source information, and the inability to adaptively highlight key level information. As a result, a unified deep feature representation rich in multi-scale details and high-level semantic information is constructed, significantly enhancing the model's ability to collaboratively perceive the overall morphology and local details of pests and diseases.

[0040] In a preferred embodiment of the present invention, step 5 above may include: Step 5.1: Receive the constructed unified multi-level deep feature map and, through two parallel paths of the spatial attention mechanism, perform global average pooling and global max pooling operations on the unified multi-level deep feature map in the channel dimension to generate a dual-channel feature description containing global spatial context information. Specifically, this includes: receiving the constructed unified multi-level deep feature map, which has fused the bottom-level lesion boundaries, the middle-level spatial hierarchy, and the high-level category semantic features; first, performing a full-dimensional integrity check to confirm that the number of channels, spatial size, and feature value distribution of the feature map are all normal, and there are no feature losses or distortions. After passing the check, the feature map is synchronously imported into two independent parallel operation paths of the spatial attention mechanism. The two paths maintain completely consistent computational precision and memory configuration, differing only in the preset pooling operation type, and strictly maintaining the complete correspondence of the spatial pixel positions and channel order of the feature map in the two paths. In the first parallel path, the unified multi-level deep feature map is processed by global average pooling and global max pooling operations in the channel dimension. A multi-level deep feature map undergoes global average pooling along the channel dimension. Using the entire feature map as a single pooling window, the average value of all pixel feature values ​​is calculated for each channel, capturing the global spatial distribution pattern of pest and disease features in the feature map, such as the proportion of lesions within the overall leaf area and the overall trend of feature value changes. This overcomes the shortcomings of traditional manual feature mapping, which only focuses on local statistics. In the second parallel path, global max pooling is performed on the same feature map along the channel dimension. Again, using the entire feature map as the pooling window, the maximum feature value among all pixels is selected for each channel, accurately capturing local extreme features that play a key role in pest and disease identification, such as high-discrimination features in the core area of ​​lesions and feature peaks at abrupt changes at the edge of lesions. After completing the pooling operations of the two paths, the global mean feature channel obtained by global average pooling and the global extreme feature channel obtained by global max pooling are concatenated dimensionally to form a dual-channel feature description containing global spatial context information.

[0041] Step 5.2: Based on the dual-channel feature description, the relative importance of each spatial location to pest and disease identification is calculated through convolutional layers and nonlinear activation functions to generate an initial spatial attention weight distribution map. Specifically, this includes: inputting the generated dual-channel feature description without spatial dimensional information loss, focusing only on the information interaction and fusion between the dual-channel features, and mining the intrinsic correlation between global mean features and global extreme features. For example, the correlation between the core region features of lesions corresponding to local extremes under a global mean background. After the convolution operation, the output fused feature value is input to the nonlinear activation function. This function is specifically adapted to the feature distribution characteristics of crop pests and diseases, effectively amplifying the differences in feature values ​​at different spatial locations, and identifying the core region of lesions, the edge morphology region of lesions, and the arrangement region of small lesion points, etc., in relation to pests and diseases. The system identifies highly relevant spatial locations and strengthens their feature values. For irrelevant spatial locations such as normal leaf areas and residual background areas, the system weakens their feature values ​​to further improve the distinction between effective and ineffective features. Subsequently, the cloud processor traverses the activated fusion feature map for each spatial location and calculates the relative importance value of each pixel location to the pest category, based on the core requirements of pest and disease identification. The importance value is positively correlated with the feature value intensity at that location. For example, the vein-limited polygonal area of ​​bacterial angular leaf spot and the frosty texture core area of ​​downy mildew are assigned high importance values, while smooth, spotless areas of the leaf are assigned low importance values. Finally, the importance values ​​of all spatial locations are arranged in order of their original spatial coordinates to generate an initial spatial attention weight distribution map.

[0042] Step 5.3 involves normalizing the initial spatial attention weight distribution map to generate a spatial attention weight matrix. This includes receiving the generated initial spatial attention weight distribution map and performing normalization to ensure a unified quantification standard for the weight values. This normalization uses a non-linear normalization method adapted to the attention mechanism. Its core function is to uniformly map the weight values ​​in the initial weight distribution map, which have no fixed range, to a value range of 0 to 1, while strictly maintaining the relative importance relationship between different spatial locations. In the specific calculation, the weight value of each spatial location in the initial weight distribution map is adjusted according to a preset non-linear mapping rule. The key lesion regions corresponding to the initial high-importance values ​​are mapped to normalized values ​​close to 1; the irrelevant regions corresponding to the initial low-importance values ​​are mapped to normalized values ​​close to 0. After normalization, the cloud processor performs double verification on the results. Firstly, it confirms that all weight values ​​are strictly within the 0 to 1 range, with no abnormal values ​​exceeding the range. Secondly, it verifies that the spatial dimensions of the generated spatial attention weight matrix are completely consistent with the unified multi-level deep feature map received in Step 5.1, ultimately generating a standardized spatial attention weight matrix.

[0043] Step 5.4 involves element-wise multiplication of the spatial attention weight matrix and the unified multi-level deep feature map to weight and enhance the features of local regions in the feature map that are highly relevant to pest and disease identification, generating a reweighted and optimized feature map. Specifically, this includes: retrieving the generated spatial attention weight matrix and the initially received unified multi-level deep feature map; firstly, performing precise spatial alignment between the two; and secondly, calibrating the pixel coordinates to ensure that the weight value at each position in the weight matrix corresponds one-to-one with the feature value of the pixel at the same coordinate in the feature map. After alignment, element-wise multiplication is performed, iterating through all pixels in the feature map and multiplying the original feature value of each pixel with the normalized weight value at the corresponding position in the weight matrix. For local regions highly relevant to pest and disease identification... For example, the serrated morphology of lesion edges, the arrangement of tiny lesion points, and the glossy areas with significant inter-class differences are effectively preserved or even enhanced after multiplication because their corresponding weight values ​​are close to 1. This significantly improves the signal intensity of key local pathological information that is difficult to capture by traditional methods. For irrelevant areas such as normal leaf texture areas and background residue areas without pathological information, the corresponding weight values ​​are close to 0, and the feature values ​​are significantly suppressed or even approach 0 after multiplication. During the weighting process, the focus is always on solving the core problem of insufficient fine-grained recognition ability of traditional methods. Key features such as lesion edge morphology, arrangement of tiny lesion points, and positional relationship between lesions and leaf veins are emphasized. After the calculation is completed, all weighted pixel feature values ​​are integrated to generate a reweighted optimized feature map.

[0044] Step 5.5 involves performing a global average pooling operation on the reweighted optimized feature map to compress and aggregate the spatial dimension, resulting in an optimized feature vector. Specifically, this includes receiving the generated reweighted optimized feature map and performing a global average pooling operation on it to compress the spatial dimension and aggregate features, adapting it to the input requirements of the subsequent fully connected classifier. During the operation, the entire reweighted optimized feature map is used as a single pooling window, covering all spatial pixel regions of the feature map. The average value of the feature values ​​of all pixels is calculated for each channel, compressing the two-dimensional feature map into a one-dimensional vector while retaining the core feature information corresponding to each channel. During the pooling process, key features after weighted optimization are strictly preserved, such as the enhanced lesion edge features and the channel mean values ​​corresponding to local subtle pathological features, ensuring that the aggregated feature vector can still accurately represent the subtle inter-class differences and intra-class diversity features of pests and diseases. After the pooling operation is completed, the generated one-dimensional vector is subjected to dimension verification and numerical calibration to confirm that the vector dimension meets the input requirements of the subsequent fully connected classifier, and that the feature value distribution is stable and without abnormal fluctuations, ultimately yielding the optimized feature vector.

[0045] In this embodiment of the invention, a series of collaborative attention mechanism techniques are employed, including dual-channel global context feature description based on global average pooling and global max pooling, calculation of spatial location importance through convolutional layers, generation of spatial attention weight matrix through nonlinear normalization, and element-wise weighted enhancement and global pooling compression. These techniques overcome the technical problems in traditional pest and disease identification methods, such as confusion between key discrimination regions and background regions in deep feature maps, the submergence of subtle pathological features by global average information, and the inability of the model to adaptively focus on the core morphology and edge details of lesions. As a result, the model can accurately enhance the local features most relevant to pest and disease identification, significantly improve the discriminability and robustness of feature maps, and output highly condensed and discriminative optimized feature vectors.

[0046] In a preferred embodiment of the present invention, step 6 above may include: Step 6.1: Input the optimized feature vector into the first fully connected layer of the fully connected classifier. Map it to a high-dimensional classification feature space through a linear transformation to generate a primary classification feature vector. Specifically, this includes: First, in the classification operation module of the cloud processor, receiving the output reweighted optimized feature map and flattening it into a one-dimensional optimized feature vector, completing the format conversion from two-dimensional features to a one-dimensional vector to ensure compatibility with the input requirements of the fully connected classifier. Then, perform dimensionality verification and numerical normalization on the optimized feature vector to confirm that the vector dimension perfectly matches the input dimension of the first fully connected layer and that the feature value range is stable. Finally, input the verified optimized feature vector into the first fully connected layer of the fully connected classifier. The layer contains a pre-trained and optimized weight matrix and bias term: the number of rows in the weight matrix is ​​consistent with the dimension of the optimized feature vector, and the number of columns corresponds to the dimension or number of dimensions of the high-dimensional classification feature space. The dimensions are set according to the preset number of pest and disease categories and feature complexity, which can cover all feature dimensions required for pest and disease category discrimination. In the operation process, the optimized feature vector and the weight matrix are first multiplied to explore the linear correlation between the dimensions of the feature vector; then the operation result is added to the bias term to adjust the overall offset of the feature values. Finally, the optimized feature vector is linearly mapped from the original feature space to a high-dimensional classification feature space specifically used for category discrimination, and finally a primary classification feature vector with higher dimension and richer feature correlation is generated.

[0047] Step 6.2 involves applying a nonlinear activation function to the primary classification feature vector to obtain the activated classification feature vector. Specifically, this includes: first, selecting a nonlinear activation function suitable for crop pest and disease classification scenarios. This function must possess the characteristics of enhancing subtle differences between classes and suppressing invalid feature noise, effectively capturing the nonlinear feature differences between visually similar diseases such as bacterial angular leaf spot and fungal downy mildew. The primary classification feature vector is then input element-by-element into this activation function for nonlinear transformation. For feature dimensions in the vector that are highly correlated with pest and disease classification, such as the degree to which lesions are restricted by leaf veins or the feature dimensions corresponding to surface gloss, the activation function amplifies the intensity of their feature value signals, strengthening the distinction between different categories. For redundant feature dimensions in the vector that have no discriminative value or feature dimensions corresponding to normal leaf texture, the activation function weakens their feature value signals, or even maps invalid feature values ​​to 0. During the activation process, the limitations of linear transformation are overcome: traditional linear transformation cannot capture the nonlinear correlation between lesion features, while nonlinear activation can accurately mine such correlations. After element-by-element activation, the activated classification feature vector is generated.

[0048] Step 6.3: Based on the activated classification feature vector, nonlinear feature transformation and high-order semantic integration are performed through the second fully connected layer of the fully connected classifier to generate a high-level discriminative feature vector. Specifically, the activated classification feature vector is input into the second fully connected layer of the fully connected classifier. The second fully connected layer first performs a linear transformation on the activated classification feature vector or multiplies the vector with the weight matrix and adds the bias term to expand the expression dimension of the feature. Then, the built-in nonlinear activation function performs a nonlinear mapping on the linear transformation result again to further enhance the nonlinear discriminative ability of the feature. The core operation focuses on high-order semantic integration: cross-dimensional integration of feature information scattered in different dimensions, such as the shape feature, texture feature, spatial hierarchy feature, and gloss feature of lesions, to explore the high-order correlation between features. For example, the feature dimensions such as leaf vein restriction + polygonal outline + glossy surface are integrated into the core discriminative feature of bacterial angular leaf spot, and the feature dimensions such as no leaf vein restriction + frosty texture + blurred edges are integrated into the core discriminative feature of fungal downy mildew, finally generating a high-level discriminative feature vector.

[0049] Step 6.4: Pass the high-level discriminative feature vector through the output layer of the fully connected classifier, and calculate its matching degree with each preset pest and disease category through linear transformation to generate the original score vector for each category. Specifically, this includes: inputting the high-level discriminative feature vector into the output layer of the fully connected classifier. The weight matrix of this layer has been specifically optimized: the number of columns in the matrix is ​​exactly the same as the number of preset pest and disease categories, and each column of weight parameters corresponds to the core discriminative feature pattern of a pest and disease category. During the operation, the high-level discriminative feature vector and the output layer weight matrix are first linearly transformed. The result of the linear transformation directly reflects the original matching degree between the feature vector and each preset pest and disease category: if the feature vector matches the core discriminative feature pattern of a certain pest and disease category... The higher the matching degree of the discriminant features, the higher the value of the transformation result corresponding to the category; if the matching degree is low, the value is lower. The matching degree values ​​of all preset pest and disease categories are arranged in the order of categories to generate the original score vector: each element in the vector corresponds to the original matching score of a preset pest and disease category. The value is the raw value without normalization. It only intuitively reflects the original matching degree between the feature vector and the category, and does not reflect the relative probability relationship between different categories. For example, the dimension value of bacterial angular leaf spot in the original score vector is 8.5, and the dimension value of fungal downy mildew is 1.2. This shows that the original matching degree between the feature vector and bacterial angular leaf spot is much higher than that of downy mildew, but it cannot directly reflect the probability ratio of the two.

[0050] Step 6.5 involves applying a normalized exponential function to the original score vector, transforming the scores of each category into a probability distribution to obtain the probability of the optimized feature vector belonging to each preset pest and disease category. Specifically, this includes: first, inputting the original score vector into the normalized exponential function, converting the unrestricted original scores into probability values ​​between 0 and 1, with the sum of the probability values ​​for all pest and disease categories being 1, forming an intuitive category attribution probability distribution. The calculation process consists of two steps: the first step involves performing an exponential operation on each element in the original score vector, amplifying the differences between different category scores. Categories with higher original scores show a greater increase in value after the exponential operation, further highlighting the advantages of core categories; the second step involves dividing the value of each category after the exponential operation by the sum of the exponential operation values ​​for all categories to complete the normalization process. For example, in the original score vector, bacterial angular leaf spot is scored 8.5 and downy mildew is scored 1.2, which, after the exponential operation, are respectively... , Then divide by each The sum of the e-values ​​for the other categories yields the corresponding probability values ​​for both categories. After processing by this function, the original score vector is transformed into a probability distribution vector.

[0051] Step 6.6: Based on the probability distribution, select the category with the highest probability value to obtain the intelligent image recognition result for crop diseases and pests. Specifically, this involves: first, traversing all elements in the probability distribution vector, comparing the probability values ​​of each element one by one, and determining the highest probability value and its corresponding disease / pest category. This category is the preliminary recognition result. If multiple categories have the same probability value and are all at the maximum value, the core discriminant features related to these categories in the feature vector will be further retrieved, such as the leaf vein restriction feature of angular leaf spot and the frost-like texture feature of downy mildew. The feature matching confidence is compared, and the category with the higher confidence is selected as the final judgment result. The system then performs a confidence check on the highest probability value: if the probability value is higher than the preset confidence threshold (set according to the field recognition accuracy requirements, such as 90%), the category is confirmed as a valid recognition result; if it is lower than the threshold, a "cannot be accurately identified" message is output, along with the top three candidate categories and their corresponding probability values ​​for manual review. Finally, the system generates and outputs intelligent image recognition results for crop diseases and pests: the results include core information, the name of the identified disease or pest category, and the corresponding probability value, and may include auxiliary information such as key discriminative features. For example, if the disease is identified as bacterial angular leaf spot, the core basis is that the lesions are polygonal due to leaf vein restriction and have a glossy surface.

[0052] In this embodiment of the invention, a series of classifier techniques are employed, including alternating linear and nonlinear transformations of multi-layer fully connected layers, integration of high-order semantic features, linear calculation of category matching degree, and probabilistic normalization and decision selection. These techniques overcome the technical problems of complex mapping relationships between deep features and final classification targets that are difficult to fit directly, limited ability of simple classifiers to distinguish patterns of complex features, and lack of intuitive probabilistic interpretation of model output. As a result, the optimized deep features are accurately mapped to clear category discrimination results, the model's ability to distinguish complex and multi-category pests and diseases and the reliability of decision-making are significantly improved, and finally, interpretable recognition results with clear confidence are output.

[0053] like Figure 2 As shown, embodiments of the present invention also provide an intelligent image recognition system for crop diseases and pests based on convolutional neural networks, comprising: The acquisition module is used to acquire original images of crops containing symptoms of pests and diseases to be identified, and to preprocess the images to obtain preprocessed images. The processing module is used to preprocess the image by using a pre-trained convolutional neural network feature extraction backbone network, and extracting the low-level basic features of the image through the primary convolutional layer group at the front end of the backbone network to obtain a first intermediate feature map; the first intermediate feature map is processed to quantify and enhance the spatial hierarchical relationship between the edge and internal structure of the lesion area in the first intermediate feature map in a geometric manner, and generate an initial feature map that strengthens the key boundary information. The input module is used to input the initial feature map into the deep convolutional layer group at the back end of the backbone network. The deep convolutional kernels perform step-by-step, iterative nonlinear combination and abstraction of the initial feature map in the feature space to generate a deep feature map containing high-level semantic information. The fusion module is used to perform cross-level feature fusion between the initial feature map and the deep feature map to construct a unified multi-level deep feature map containing information of different scales and abstraction levels. The computation module is used to generate a spatial attention weight matrix by introducing an attention mechanism based on a unified multi-level deep feature map, and to re-weight and optimize the feature map based on the matrix to obtain an optimized feature vector. The recognition module is used to optimize the feature vector input to the fully connected classifier. The classifier calculates the probability of it belonging to each preset pest and disease category and obtains the final recognition result to realize intelligent image recognition of crop pests and diseases.

[0054] It should be noted that this system is a system corresponding to the above method. All implementation methods in the above method embodiments are applicable to this embodiment and can achieve the same technical effect.

[0055] Embodiments of the present invention also provide a computing device, including: a processor and a memory storing a computer program, wherein the computer program, when executed by the processor, performs the method described above. All implementations in the above method embodiments are applicable to this embodiment and can achieve the same technical effects.

[0056] Embodiments of the present invention also provide a computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the method described above. All implementations in the above method embodiments are applicable to this embodiment and can achieve the same technical effects.

[0057] The above description represents the preferred embodiments of the present invention. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principles of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A method for intelligent image recognition of crop diseases and insect pests based on a convolutional neural network, characterized in that, The method includes: Step 1: Obtain the original image of the crop containing the symptoms of the pests and diseases to be identified, and preprocess the image to obtain the preprocessed image; Step 2: The preprocessed image is processed through the pre-trained convolutional neural network feature extraction backbone network. The basic low-level features of the image are extracted through the primary convolutional layer group at the front end of the backbone network to obtain the first intermediate feature map. The first intermediate feature map is processed to quantify and enhance the spatial hierarchical relationship between the edge and internal structure of the lesion area in the first intermediate feature map in a geometric manner, and an initial feature map with enhanced key boundary information is generated. Step 3: Input the initial feature map into the deep convolutional layer group at the back end of the backbone network. The deep convolutional kernel performs a step-by-step, iterative nonlinear combination and abstraction of the initial feature map in the feature space to generate a deep feature map containing high-level semantic information. Step 4: Perform cross-level feature fusion between the initial feature map and the deep feature map to construct a unified multi-level deep feature map containing information at different scales and levels of abstraction. Step 5: Based on the unified multi-level deep feature map, an attention mechanism is introduced to calculate and generate a spatial attention weight matrix. The feature map is then re-weighted and optimized based on the matrix to obtain the optimized feature vector. Step 6: Input the optimized feature vector into the fully connected classifier. The classifier calculates the probability of it belonging to each preset pest and disease category and obtains the final recognition result to achieve intelligent image recognition of crop pests and diseases.

2. The crop disease and pest intelligent image recognition method based on a convolutional neural network according to claim 1, characterized in that, Obtain raw images of crops containing symptoms of pests and diseases to be identified, and preprocess the images to obtain preprocessed images, including: Step 1.1: The input original crop image is normalized by the cloud processor, and the image is uniformly scaled to a preset resolution to obtain a normalized image. Step 1.2: Based on the size-normalized image, perform color space conversion and automatic white balance correction to obtain a color-corrected image. Step 1.3: For the color-corrected image, apply an edge detection algorithm to extract edge information containing leaf and lesion areas, generating an initial edge map. Step 1.4: Perform polygon fitting on the initial edge map to close and smooth the edge contours, separating the foreground region of the leaf and generating a leaf foreground contour mask. Step 1.5: Using the leaf foreground contour mask, perform foreground extraction and background suppression processing on the color-corrected image, and use histogram equalization to enhance the contrast of the foreground region, generating an image with enhanced foreground. Step 1.6: Perform data standardization processing on the image with enhanced foreground to obtain a pre-processed image.

3. The crop disease and pest intelligent image recognition method based on a convolutional neural network according to claim 2, characterized in that, The preprocessed image is processed through a pre-trained convolutional neural network feature extraction backbone. The underlying basic features of the image are extracted using the primary convolutional layers at the front end of the backbone, resulting in a first intermediate feature map. This first intermediate feature map is then processed geometrically to quantify and enhance the spatial hierarchy between the edges and internal structures of lesion regions, generating an initial feature map that strengthens key boundary information. This process includes: Step 2.1: The preprocessed image is processed by the front-end primary convolutional layer group of the pre-trained convolutional neural network feature extraction backbone network to extract the low-level texture and edge features of the image and generate the first intermediate feature map. Step 2.2: Based on the first intermediate feature map, apply an edge detection algorithm in the cloud processor to extract the contour information of the lesion region in the first intermediate feature map, and obtain the contour feature map of the lesion region; Step 2.3: Based on the contour feature map of the lesion area, calculate the geometric distance from each pixel in the feature map to the nearest contour boundary, and generate a distance field feature map; Step 2.4: The distance field feature map and the first intermediate feature map are weighted and fused to generate an initial feature map that enhances the key boundary information.

4. The crop disease and pest intelligent image recognition method based on a convolutional neural network according to claim 3, characterized in that, The initial feature map is input into a group of deep convolutional layers at the back end of the backbone network. The deep convolutional kernels perform a step-by-step, iterative, non-linear combination and abstraction of the initial feature map in the feature space to generate a deep feature map containing high-level semantic information. Step 3.1: The initial feature map that enhances key boundary information is passed through the deep convolutional layers at the back end of the backbone network. The first set of deep convolutional kernels performs a high-dimensional nonlinear transformation on the initial feature map, extracts and combines more complex feature patterns, and generates the first transition feature map. Step 3.2: Based on the first transition feature map, iterative convolution, nonlinear activation and batch normalization operations are performed on it through a deeper set of convolution kernels to perform progressive abstraction and semantic refinement in the feature space, resulting in a second transition feature map containing mid-level semantic information. Step 3.3: Apply pooling operation to the second transition feature map to reduce the dimensionality of the feature map while retaining key feature information, and generate the third transition feature map. Step 3.4: Pass the third transition feature map through the terminal convolutional layer of the deep convolutional layer group to extract the high-level semantic features most relevant to the disease and pest category discrimination, and obtain a deep feature map containing high-level semantic information.

5. The crop disease and pest intelligent image recognition method based on a convolutional neural network according to claim 4, characterized in that, Cross-level feature fusion is performed between the initial feature map and the deep feature map to construct a unified multi-level deep feature map containing information at different scales and levels of abstraction, including: Step 4.1: Obtain a deep feature map containing advanced semantic information and an initial feature map that enhances key boundary information; Step 4.2: Perform an upsampling operation based on the initial feature map to align the spatial dimensions of the initial feature map with the spatial dimensions of the deep feature map, generating an initial feature map with matching dimensions. Step 4.3: The initial feature map with size matching and the deep feature map are concatenated along the channel dimension. The concatenated result is then fused and dimensionality reduced to generate a preliminary fused feature map. Step 4.4: Apply nonlinear activation and feature recalibration operations to the initially fused feature map, and weightedly fuse complementary information from different levels to construct a unified multi-level deep feature map containing information of different scales and abstract levels.

6. The crop disease and pest intelligent image recognition method based on a convolutional neural network according to claim 5, characterized in that, The image with enhanced foreground is subjected to data normalization to obtain a preprocessed image, including: Step 5.1: Receive the constructed unified multi-level deep feature map, and through two parallel paths of the spatial attention mechanism, perform global average pooling and global max pooling operations on the unified multi-level deep feature map in the channel dimension to generate a dual-channel feature description containing global spatial context information. Step 5.2: Based on the dual-channel feature description, calculate the relative importance of each spatial location to pest and disease identification through convolutional layers and nonlinear activation functions to generate an initial spatial attention weight distribution map; Step 5.3: Normalize the initial spatial attention weight distribution map to generate a spatial attention weight matrix; Step 5.4: Multiply the spatial attention weight matrix element-wise with the unified multi-level deep feature map to weight and enhance the local features in the feature map that are highly correlated with the identification of pests and diseases, and generate a reweighted and optimized feature map. Step 5.5: Perform global average pooling on the reweighted optimized feature map to compress and aggregate the spatial dimensions, thereby obtaining the optimized feature vector.

7. The crop disease and pest intelligent image recognition method based on a convolutional neural network according to claim 6, characterized in that, The optimized feature vector is input into a fully connected classifier, which calculates the probability of it belonging to each preset pest and disease category to obtain the final recognition result, thereby achieving intelligent image recognition of crop pests and diseases, including: Step 6.1: Input the optimized feature vector into the first fully connected layer of the fully connected classifier, and map it to the high-dimensional classification feature space through linear transformation to generate the primary classification feature vector; Step 6.2: Apply a non-linear activation function to the primary classification feature vector to obtain the activated classification feature vector; Step 6.3: Based on the activated classification feature vector, nonlinear feature transformation and high-order semantic integration are performed through the second fully connected layer of the fully connected classifier to generate a high-level discriminative feature vector; Step 6.4: Pass the high-level discriminative feature vector through the output layer of the fully connected classifier, and calculate its matching degree with each preset pest and disease category through linear transformation to generate the original score vector for each category; Step 6.5: Apply the normalized exponential function to the original score vector to convert the scores of each category into a probability distribution and obtain the probability that the optimized feature vector belongs to each preset pest category. Step 6.6: Based on the probability distribution, select the category with the highest probability value to obtain the intelligent image recognition results of crop diseases and pests.

8. A crop disease and pest intelligent image recognition system based on a convolutional neural network, the system implements the method of any one of claims 1 to 7, characterized in that, include: The acquisition module is used to acquire original images of crops containing symptoms of pests and diseases to be identified, and to preprocess the images to obtain preprocessed images. The processing module is used to preprocess the image by using a pre-trained convolutional neural network feature extraction backbone network, and extracting the low-level basic features of the image through the primary convolutional layer group at the front end of the backbone network to obtain a first intermediate feature map; the first intermediate feature map is processed to quantify and enhance the spatial hierarchical relationship between the edge and internal structure of the lesion area in the first intermediate feature map in a geometric manner, and generate an initial feature map that strengthens the key boundary information. The input module is used to input the initial feature map into the deep convolutional layer group at the back end of the backbone network. The deep convolutional kernels perform step-by-step, iterative nonlinear combination and abstraction of the initial feature map in the feature space to generate a deep feature map containing high-level semantic information. The fusion module is used to perform cross-level feature fusion between the initial feature map and the deep feature map to construct a unified multi-level deep feature map containing information of different scales and abstraction levels. The computation module is used to generate a spatial attention weight matrix by introducing an attention mechanism based on a unified multi-level deep feature map, and to re-weight and optimize the feature map based on the matrix to obtain an optimized feature vector. The recognition module is used to optimize the feature vector input to the fully connected classifier. The classifier calculates the probability of it belonging to each preset pest and disease category and obtains the final recognition result to realize intelligent image recognition of crop pests and diseases.

9. A computing device, comprising: include: One or more processors; A storage device for storing one or more programs, which, when executed by one or more processors, cause the one or more processors to implement the method as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a program that, when executed by a processor, implements the method as described in any one of claims 1 to 7.