A poppy image recognition method and system based on a convolutional neural network
By generating synthetic data through a lightweight multi-scale convolutional network and cGAN, and combining parallel dilated convolution and spatial attention optimization, the problems of model overfitting and low accuracy in poppy image recognition are solved, achieving efficient and accurate poppy recognition and deployment.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- MINGTU TECHNOLOGY (ZHEJIANG) CO LTD
- Filing Date
- 2026-02-02
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies for poppy image recognition suffer from problems such as model overfitting, large number of parameters, difficulty in deployment on mobile devices, and low accuracy in small target detection in complex environments. Traditional methods are inefficient and have a high false alarm rate.
A lightweight multi-scale convolutional network combined with cGAN is used to generate synthetic plant images to enhance the dataset. Feature extraction is performed through parallel dilated convolutional groups and spatial attention optimization modules, and downsampling is performed by combining depthwise separable convolutional blocks to output the final feature map.
It enables efficient and accurate identification of poppy images in complex environments, reduces computational complexity, is suitable for deployment on mobile devices, and improves the ability to distinguish poppies from similar plants and enhances identification efficiency.
Smart Images

Figure CN122244496A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image recognition technology, and in particular to a poppy image recognition method and system based on convolutional neural networks. Background Technology
[0002] Illegal poppy cultivation is a crucial aspect of drug control efforts. Traditional identification methods primarily rely on manual patrols or remote sensing technology. However, manual patrols are inefficient and have limited coverage, while multispectral remote sensing is susceptible to vegetation obstruction and weather interference, resulting in a high false alarm rate. In recent years, computer vision-based target detection technologies (such as traditional SVM and HOG feature methods) have been applied in agricultural monitoring. However, traditional methods suffer from insufficient feature extraction capabilities and poor generalization performance when faced with challenges such as morphological similarity between poppies and similar plants (e.g., corn poppies, wild poppies), changes in lighting, and complex backgrounds. Although convolutional neural networks (CNNs) perform excellently in image classification, directly applying existing CNN models (such as ResNet and YOLO) has the following limitations: 1) The scarcity of poppy samples leads to model overfitting; 2) The large number of model parameters makes it difficult to deploy on mobile law enforcement equipment; 3) Low detection accuracy for small poppy targets in complex environments. Therefore, a lightweight deep learning solution specifically designed for poppy image recognition, balancing accuracy and efficiency, is needed. Summary of the Invention
[0003] The purpose of this invention is to provide a poppy image recognition method and system based on convolutional neural networks, which can quickly and accurately identify poppies.
[0004] To achieve the above objectives, in a first aspect, the present invention provides a poppy image recognition method based on a convolutional neural network, comprising the following steps: The process involves obtaining poppy images and an initial negative sample library in a compliant manner, generating synthetic plant images using cGAN, and enhancing and expanding the dataset. The obtained hierarchical sample set is input into a lightweight multi-scale convolutional network for feature extraction and attention optimization, and the final feature map is output after downsampling. After converting the final feature map into a vector, the classification confidence score and the corresponding coordinates are output.
[0005] The method further includes: The classification confidence scores that meet the requirements are bound to the corresponding coordinates to generate an early warning report.
[0006] The process involves obtaining poppy images and an initial negative sample library in a compliant manner, generating synthetic plant images using cGAN, and enhancing and expanding the dataset, including: The process involves obtaining images of poppy cultivation and similar plants in a compliant manner, and constructing an initial negative sample set based on all images of the similar plants at different growth stages. The plant category, its corresponding image, and environmental parameters are input into the generator, and a morphological constraint loss function is introduced to generate a complete plant image. The complete plant image is enhanced, and adversarial examples are generated based on the fast gradient sign method. The enhanced complete plant images, the poppy planting images, the initial negative sample set, and the adversarial examples are combined to generate a hierarchical sample set.
[0007] The collection of environmental parameters includes: Illumination type is collected based on time-segmented acquisition method, and background category is labeled and collected.
[0008] The resulting hierarchical sample set is input into a lightweight multi-scale convolutional network for feature extraction and attention optimization, and after downsampling, the final feature map is output, including: All images in the hierarchical sample set are preprocessed and standardized. All images in the transformed hierarchical sample set are processed through parallel dilated convolution groups, and the extracted feature values are concatenated into a multi-scale feature map. The multi-scale feature map is input into the spatial attention optimization module to generate a channel-space weight matrix, which is then multiplied with the multi-scale feature map to obtain an enhanced feature map. Downsampling is performed using depthwise separable convolutional blocks to output the final feature map.
[0009] The parallel dilated convolution group has three parallel branches, each using 32 3×3 convolution kernels, with dilation rates set to 1, 2, and 4 respectively. Specifically, dilation rate 1: receptive field 3×3, capturing local details of petal texture; dilation rate 2: receptive field 7×7, extracting flower morphological features; and dilation rate 4: receptive field 15×15, obtaining global contextual information of the plant.
[0010] Specifically, the multi-scale feature map is input into the spatial attention optimization module to generate a channel-spatial weight matrix, which is then multiplied with the multi-scale feature map to obtain an enhanced feature map, including: The multi-scale feature map is subjected to global average pooling, and after passing through two fully connected layers, the channel weight matrix is generated using the Sigmoid activation function. Max pooling and average pooling are performed in the channel dimension, and the spatial weight distribution is learned through 7×7 convolution after concatenation to generate a spatial weight matrix. The channel weight matrix and the spatial weight matrix are multiplied element-wise, and then multiplied with the multi-scale feature map to obtain the enhanced feature map.
[0011] The method further includes, after downsampling using depthwise separable convolutional blocks and outputting the final feature map, the following: Global average pooling is performed on the final feature map, and the generated feature vector is input into a fully connected layer to obtain a classification branch; The final feature map is convolved with 1x1 to obtain the localization branch.
[0012] The final feature map is converted into a vector, and the classification confidence score and corresponding coordinates are output, including: Using the Sigmoid activation function, the corresponding classification confidence score is output based on the classification branch; Based on the center coordinates and width and height of the bounding box output by the positioning branch, a linear activation function is used to convert them into the original coordinate values.
[0013] In a second aspect, the present invention provides a poppy image recognition system based on a convolutional neural network, which is applied to a poppy image recognition method based on a convolutional neural network as provided in the first aspect. The poppy image recognition system based on a convolutional neural network includes an image acquisition module, a feature extraction module, and a classification and recognition module. The image acquisition module is used to legally acquire poppy images and an initial negative sample library, generate synthetic plant images through cGAN, and enhance and expand the dataset; The feature extraction module is used to input the obtained hierarchical sample set into a lightweight multi-scale convolutional network for feature extraction and attention optimization, and output the final feature map after downsampling. The classification and recognition module is used to convert the final feature map into a vector and then output the classification confidence score and the corresponding coordinates.
[0014] This invention discloses a poppy image recognition method and system based on convolutional neural networks. The poppy image recognition system based on convolutional neural networks includes an image acquisition module, a feature extraction module, and a classification and recognition module. It acquires poppy images and an initial negative sample library in a compliant manner, generates synthetic plant images using cGAN, and enhances and expands the dataset. The obtained hierarchical sample set is input into a lightweight multi-scale convolutional network for feature extraction and attention optimization, and outputs a final feature map after downsampling. After converting the final feature map into a vector, it outputs a classification confidence score and corresponding coordinates. Through multi-scale feature fusion and attention mechanisms, the system significantly improves the ability and efficiency to distinguish poppies from similar plants. Attached Figure Description
[0015] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below.
[0016] Figure 1This is a schematic diagram illustrating the steps of a poppy image recognition method based on a convolutional neural network according to the first embodiment of the present invention.
[0017] Figure 2 This is a flowchart illustrating a poppy image recognition method based on a convolutional neural network provided by the present invention.
[0018] Figure 3 This is a structural block diagram of the lightweight multi-scale convolutional network provided by the present invention.
[0019] Figure 4 This is a schematic diagram of the structure of a poppy image recognition system based on a convolutional neural network provided in the second embodiment of the present invention.
[0020] Figure 5 This is a schematic diagram of the electronic device of the present invention.
[0021] In the diagram: 101 - Image acquisition module, 102 - Feature extraction module, 103 - Classification and recognition module. Detailed Implementation
[0022] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application.
[0023] The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The singular forms “a,” “the,” and “the” used in this application and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used herein refers to and includes any or all possible combinations of one or more of the associated listed items.
[0024] It should be understood that although the terms first, second, third, etc., may be used in this application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this application, first information may also be referred to as second information, and similarly, second information may also be referred to as first information. Depending on the context, the word "if" as used herein may be interpreted as "when," "when," or "in response to determination."
[0025] The first embodiment of this application is as follows: Please see Figures 1-3 This invention provides a poppy image recognition method based on a convolutional neural network, comprising the following steps: S101. Obtain poppy images and an initial negative sample library in a compliant manner, generate synthetic plant images through cGAN, and enhance and expand the dataset.
[0026] Specifically, since opium poppies are prohibited items under regulation, they cannot be cultivated privately, and therefore, related images cannot be obtained arbitrarily. Thus, obtaining opium poppy image data must be done legally and compliantly. This can be achieved through cooperation with agricultural regulatory departments and plant research institutions to obtain compliant opium poppy cultivation images (already anonymized) collected during historical law enforcement processes, ensuring the legality of the data source. Images of poppy plants and similar species are extracted from publicly available plant image datasets (such as PlantVillage and iNaturalist). In a controlled laboratory environment, images of opium poppy-like plants (such as corn poppies and wild poppies) are photographed using a multi-angle imaging device (rotating platform + multispectral camera), covering different growth stages, namely seedling, flowering, and fruiting stages, to construct an initial negative sample set.
[0027] To adapt to the analysis and judgment of image data in various outdoor environments, environmental data collection is also required, including: Light-adaptive acquisition: Employing a time-segmented acquisition strategy (early morning, noon, dusk), combined with supplementary lighting equipment to simulate different lighting conditions, such as direct light, diffused light, and shadows. Multi-scale resolution settings: Close-up shooting (pixel resolution ≥ 0.5mm / pixel) to capture flower texture details, and distant shooting (resolution ≤ 5cm / pixel) to simulate the drone's patrol perspective. Background diversity control: Intentionally including different backgrounds such as farmland, woodland, and desert, and labeling them with background complexity levels 1-5. This grading standard comprehensively evaluates the texture complexity of the background, the color contrast with the foreground target, the proportion of occlusions, and the number of structural interferences. Specifically: Level 1 represents a clean, uniform background; Level 2 represents a simple background with slight texture or interference; Level 3 represents a medium-complexity background with similar color and texture interferences or slight occlusion; Level 4 represents a complex background full of highly similar interferences or moderate occlusion; Level 5 represents an extremely complex background where the target is severely occluded or blends into the background.
[0028] A combination of a U-Net generator and a PatchGAN discriminator was used, with input conditional labels consisting of plant category (poppy / corn poppy / wild poppy) and environmental parameters (light type, background category). Morphological constraints such as petal count and stamen structure were added to the standard adversarial loss to ensure that the generated images conformed to botanical characteristics.
[0029] Morphological feature encoding: Number of petals: discrete variable (5-8 petals); Stamen structure: continuous vector (stamen density, distribution radius); Flower morphology: ellipse parameters (major axis, minor axis, tilt angle).
[0030] Texture feature encoding: Petal texture: Histogram of Oriented Gradients (HOG) feature vector; Color distribution: Statistical moment features of LAB color space.
[0031] A conditional generative adversarial network (cGAN) framework incorporating a morphological constraint loss function (L_morph) was employed. This framework not only generates high-fidelity plant images but also precisely controls their morphological features to conform to the biological characteristics of poppies and similar plants. The total loss function consists of three weighted components: adversarial loss, conditional loss, and morphological constraint loss, ensuring consistency in the generated images' fidelity, conditional relevance, and morphological accuracy.
[0032] Adversarial Loss: Standard GAN adversarial loss is employed to encourage generated image distributions that are difficult to distinguish from real image distributions. Wasserstein GAN with Gradient Penalty (WGAN-GP) is used to improve training stability.
[0033] Conditional loss: To ensure that the generated image is consistent with the input conditions, conditional information is added to the discriminator to make judgments and calculate the conditional adversarial loss.
[0034] Morphological constraint loss is used to ensure the plausibility of the botanical structure in the generated image. It is further composed of three sub-constraint terms: The petal number consistency constraint uses a pre-trained petal instance segmentation model (such as PointRend) to segment the generated image and count its petal count. This count is then compared to the ideal petal count specified in the input conditions (e.g., 4-6 petals for poppies), and the absolute value of the difference between the predicted petal count of the generated image and the standard petal count for that flower category is calculated.
[0035] The flower stamen structure preservation constraint uses a keypoint detection model to predict the position of the flower stamen center point. The Chamfer Distance or optimal transmission distance (such as SinkhornDistance) between the predicted point set and the real flower stamen point set is calculated to measure the similarity of the flower stamen distribution. This loss term forces the generator to produce reasonable texture and structure in the specified flower stamen region.
[0036] The overall morphological regularization constraint uses a pre-trained ArcFace or VGG network to extract high-level features between the generated image and the real reference image, and calculates the cosine similarity or mean squared error (MSE Loss) between their feature maps.
[0037] A two-stage generation strategy with seamless transitions between the beginning and end is employed to achieve high-quality environment integration: The first stage receives semantic conditional information such as species category and growth stage, combined with a random noise vector, and outputs a foreground image containing flowers and stems, along with its corresponding pixel-level segmentation mask. This stage employs a U-Net network structure to preserve fine spatial details.
[0038] The second stage combines the generated foreground subject with environmental parameters such as background type and lighting conditions, and achieves adaptive fusion through a gated convolution mechanism to output a complete scene image with coordinated lighting and shadow effects.
[0039] Physical lighting model integration The lighting transmission simulation employs a physically based simplified rendering model. Through network autonomous learning, it determines the contribution weight parameters of the three lighting components—diffuse reflection, specular reflection, and ambient light—ensuring the physical plausibility of the lighting effects. Shadow consistency generation first estimates the direction and intensity of the virtual light source based on the input lighting condition parameters. Then, it generates corresponding shadow regions based on the morphological shape of the foreground objects. Finally, shadow consistency constraints ensure the physical consistency between shadows and the lighting direction.
[0040] Intelligent background blending technology Adaptive color migration automatically adjusts the color saturation, brightness, and hue of foreground objects based on the statistical characteristics of the dominant color tone of the background image, allowing the foreground to better blend into the background environment. Multi-scale edge blending employs Laplacian pyramid decomposition technology to perform foreground-background fusion at different scales, ultimately reconstructing a synthetic image with natural transition edges. Perspective consistency maintenance estimates the perspective morphological relationships of the background and adjusts the perspective distortion of foreground objects accordingly, ensuring that the foreground and background are completely consistent in perspective.
[0041] A latent variable space for the growth period is constructed. By adjusting the dimensionality parameter of the latent vector z, complete plant images of the same plant at different growth stages are continuously generated. Specifically, the complete growth and development process of poppies is discretized into six typical stages: seedling stage, vegetative growth stage, bud formation stage, full bloom stage, fruiting stage, and senescence stage. A variational autoencoder structure is used to learn the continuous vector representation of each growth stage, encoding the discrete growth stages as continuous points in a high-dimensional feature space. Smooth stage transitions are achieved through linear interpolation calculations in the continuous vector space, realizing a natural and smooth transition between different growth stages and simulating a continuous growth and development process.
[0042] Image enhancement is then performed, adding spectral simulation and photometric transformation to traditional enhancement methods. Traditional enhancement includes morphological transformation and photometric transformation. Morphological transformation includes random rotation (±30°), scaling (0.8-1.2x), and shearing (±10°); photometric transformation includes adaptive histogram equalization (CLAHE), Gaussian noise injection, and simulated cloud and fog occlusion. Spectral simulation involves color space conversion (RGB→LAB) and targeted adjustment of a / b channel parameters to simulate poppy color variations under different soil conditions. Occlusion simulation involves randomly adding fallen leaves, mud spots, insects, and other obstructions (occlusion rate ≤15%) to improve resistance to occlusion.
[0043] Using FGSM (Fast Gradient Sign Method) to add small perturbations to the original image generates adversarial examples, improving the model's robustness.
[0044] To ensure data quality, image quality metrics (sharpness, contrast, information entropy) are set, and low-quality images are automatically discarded. The authenticity of generated images is evaluated based on the Inception Score (IS) and FID scores, and low-scoring samples are removed.
[0045] Then all the data are integrated to form a hierarchical sample library, which includes at least: real poppy images: 3,000 (authorized sources); cGAN-generated images: 5,000 (including different growth stages and environmental variants); negative samples: 8,000 (poppy, wild poppy and other similar plants); adversarial samples: 500.
[0046] The challenges of acquiring sensitive data are addressed through institutional collaboration and simulated environmental imaging. Continuous growth generation is achieved through latent variable control, overcoming the limitations of single-phase samples. Traditional enhancement methods are combined with spectral simulation and occlusion simulation to comprehensively improve data diversity.
[0047] S102. Input the obtained hierarchical sample set into a lightweight multi-scale convolutional network for feature extraction and attention optimization, and output the final feature map after downsampling.
[0048] Specifically, all input images are adjusted to a uniform size of 448×448 pixels using bilinear interpolation, maintaining the aspect ratio, with any insufficient areas filled with gray. An adaptive white balance algorithm is used to eliminate color cast due to lighting conditions, ensuring consistent color of poppy petals under different lighting conditions. Pixel values are normalized to the range [0,1], and the mean and standard deviation of the dataset are calculated for standardization.
[0049] The parallel dilated convolution group has three parallel branches, each using 32 3×3 convolution kernels, maintaining the same output dimension. The output feature map size of each branch is maintained at 224×224×32. The dilation rates are set to 1, 2, and 4, respectively. Specifically, dilation rate 1 has a receptive field of 3×3, capturing local details such as petal texture; dilation rate 2 has a receptive field of 7×7, extracting flower morphological features; and dilation rate 4 has a receptive field of 15×15, obtaining global contextual information of the plant. The three outputs are concatenated along the channel dimension to generate a 224×224×96 fused feature map. 1×1 convolution is used for channel compression and feature reshaping to output a 224×224×64 multi-scale feature map. Meanwhile, depthwise convolution (3×3 kernels, 64 groups) is used to process spatial features; pointwise convolution (1×1 kernel, 128 channels) is used for channel fusion, reducing the number of parameters to about 1 / 8 of that of standard convolution. The ReLU6 activation function is adopted to enhance the stability of low-precision computation, and batch normalization is used to accelerate training convergence.
[0050] The multi-scale feature map is input into the spatial attention optimization module to generate a channel-spatial weight matrix. This matrix is then multiplied with the multi-scale feature map to obtain an enhanced feature map. Specifically, global average pooling is performed on the input multi-scale feature map to generate a 1×1×C channel descriptor. Channel dependencies are learned through two fully connected layers (dimensionality reduction ratio r=16). A sigmoid activation function is used to generate a channel weight matrix, focusing on enhancing poppy-related channels. Then, max pooling and average pooling are performed along the channel dimension to generate two H×W×1 feature maps. These maps are concatenated and then subjected to a 7×7 convolution to learn the spatial weight distribution. The generated spatial weight matrix highlights the poppy region and suppresses background interference. The channel weight matrix and the spatial weight matrix are multiplied element-wise and then multiplied with the multi-scale feature map to obtain the enhanced feature map. The output size is maintained at 112×112×128. The 7×7 convolution can capture a wider range of contextual information, which is beneficial for recognizing the overall morphology of the poppy.
[0051] Downsampling is performed using depthwise separable convolutional blocks to output the final feature map. This process involves three downsampling stages: the first stage uses depthwise separable convolutions with a stride of 2, downsampling the feature map size from 112×112 to 56×56 and expanding the number of channels to 256; the second stage repeats the depthwise separable convolution downsampling, downsampling the feature map size from 56×56 to 28×28 while maintaining the 256 channels; the third stage outputs a final feature map size of 14×14×256, preserving sufficient spatial information for classification and localization. Residual connections are added to each downsampling block to avoid the vanishing gradient problem and ensure the stability of deep network training.
[0052] By employing a lightweight architecture and attention mechanism, efficient feature extraction from poppy images is achieved, providing rich and accurate feature representations for subsequent classification and localization tasks. The entire process maintains high accuracy while significantly reducing computational complexity, meeting the deployment requirements of edge devices.
[0053] S103. After converting the final feature map into a vector, output the classification confidence score and the corresponding coordinates.
[0054] Specifically, global average pooling is performed on the final feature map to generate a 1×1×256 feature vector for the classification task. The pooled feature vector is transformed through two fully connected layers, from 256→128→64 dimensions, and a Dropout layer (dropout rate 0.3) is used to prevent overfitting. Finally, the fully connected layer outputs a 2-dimensional feature vector to obtain the classification branch. The Sigmoid activation function is used to output the classification confidence score, with the output format: [poppy probability, non-poppy probability].
[0055] The information entropy and edge density of the input image are calculated to assess the background complexity. The classification confidence threshold (range 0.85-0.95) is automatically adjusted based on the complexity. The output classification confidence score is compared with the classification confidence threshold. If the score is greater than or equal to the classification confidence threshold, it indicates that the plant in the current image is a poppy.
[0056] A 14×14×256 spatial feature map is maintained for target localization tasks to provide spatial information. The spatial feature map is output as a 14×14×4 localization branch through a 1×1 volume. Based on the localization branch, the center coordinates and width and height of the bounding box are output and converted into the original coordinate values using a linear activation function.
[0057] The identification results are linked to geographical location information to generate an early warning report, which is then sent to the regulatory platform.
[0058] The second embodiment of this application is as follows: Please see Figure 4 The present invention provides a poppy image recognition system based on a convolutional neural network, which is applied to a poppy image recognition method based on a convolutional neural network as provided in the first embodiment. The poppy image recognition system based on a convolutional neural network includes an image acquisition module 101, a feature extraction module 102, and a classification and recognition module 103. The image acquisition module 101 is used to acquire poppy images and an initial negative sample library in a compliant manner, generate synthetic plant images through cGAN, and enhance and expand the dataset. The feature extraction module 102 is used to input the obtained hierarchical sample set into a lightweight multi-scale convolutional network for feature extraction and attention optimization, and output the final feature map after downsampling. The classification and recognition module 103 is used to convert the final feature map into a vector and output the classification confidence score and the corresponding coordinates.
[0059] Regarding the system in the above embodiments, the specific manner in which each module performs its operations has been described in detail in the embodiments related to the method, and will not be elaborated upon here.
[0060] For the system embodiments, since they basically correspond to the method embodiments, the relevant parts can be referred to in the description of the method embodiments. The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this application according to actual needs. Those skilled in the art can understand and implement this without creative effort.
[0061] Accordingly, this application also provides an electronic device, comprising: one or more processors; a memory for storing one or more programs; and, when the one or more programs are executed by the one or more processors, causing the one or more processors to implement the poppy image recognition method based on convolutional neural networks as described above. Figure 5 The diagram shown illustrates a hardware structure of any data processing-capable device within which a poppy image recognition system based on a convolutional neural network, as provided in an embodiment of the present invention, is located. (Except for...) Figure 5 In addition to the processor, memory, and network interface shown, any data processing device in the embodiment may also include other hardware depending on the actual function of the data processing device, which will not be described in detail here.
[0062] Accordingly, this application also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the poppy image recognition method based on a convolutional neural network as described above. The computer-readable storage medium can be an internal storage unit of any data-processing device as described in any of the foregoing embodiments, such as a hard disk or memory. The computer-readable storage medium can also be an external storage device, such as a plug-in hard disk, smart media card (SMC), SD card, flash card, etc., equipped on the device. Furthermore, the computer-readable storage medium can include both internal storage units of any data-processing device and external storage devices. The computer-readable storage medium is used to store the computer program and other programs and data required by the data-processing device, and can also be used to temporarily store data that has been output or will be output.
[0063] Other embodiments of this application will readily occur to those skilled in the art upon consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary techniques in the art not disclosed herein.
[0064] It should be understood that this application is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope.
Claims
1. A poppy image recognition method based on convolutional neural networks, characterized in that, Includes the following steps: The process involves obtaining poppy images and an initial negative sample library in a compliant manner, generating synthetic plant images using cGAN, and enhancing and expanding the dataset. The obtained hierarchical sample set is input into a lightweight multi-scale convolutional network for feature extraction and attention optimization, and the final feature map is output after downsampling. After converting the final feature map into a vector, the classification confidence score and the corresponding coordinates are output.
2. The poppy image recognition method based on convolutional neural networks as described in claim 1, characterized in that, The method further includes: The classification confidence scores that meet the requirements are bound to the corresponding coordinates to generate an early warning report.
3. The poppy image recognition method based on convolutional neural networks as described in claim 1, characterized in that, The process involves obtaining poppy images and an initial negative sample library in a compliant manner, generating synthetic plant images using cGAN, and augmenting the dataset, including: The process involves obtaining images of poppy cultivation and similar plants in a compliant manner, and constructing an initial negative sample set based on all images of the similar plants at different growth stages. The plant category, its corresponding image, and environmental parameters are input into the generator, and a morphological constraint loss function is introduced to generate a complete plant image. The complete plant image is enhanced, and adversarial examples are generated based on the fast gradient sign method. The enhanced complete plant images, the poppy planting images, the initial negative sample set, and the adversarial examples are combined to generate a hierarchical sample set.
4. The poppy image recognition method based on convolutional neural networks as described in claim 3, characterized in that, The collection of environmental parameters includes: Illumination type is collected based on time-segmented acquisition method, and background category is labeled and collected.
5. The poppy image recognition method based on convolutional neural networks as described in claim 3, characterized in that, The obtained hierarchical sample set is input into a lightweight multi-scale convolutional network for feature extraction and attention optimization, and after downsampling, the final feature map is output, including: All images in the hierarchical sample set are preprocessed and standardized. All images in the transformed hierarchical sample set are processed through parallel dilated convolution groups, and the extracted feature values are concatenated into a multi-scale feature map. The multi-scale feature map is input into the spatial attention optimization module to generate a channel-space weight matrix, which is then multiplied with the multi-scale feature map to obtain an enhanced feature map. Downsampling is performed using depthwise separable convolutional blocks to output the final feature map.
6. The poppy image recognition method based on convolutional neural networks as described in claim 5, characterized in that, The parallel dilated convolution group has three parallel branches, each using 32 3×3 convolution kernels, with dilation rates set to 1, 2, and 4 respectively. Specifically, dilation rate 1: receptive field 3×3, capturing local details of petal texture; dilation rate 2: receptive field 7×7, extracting flower morphological features; and dilation rate 4: receptive field 15×15, obtaining global contextual information of the plant.
7. The poppy image recognition method based on convolutional neural networks as described in claim 5, characterized in that, The multi-scale feature map is input into the spatial attention optimization module to generate a channel-spatial weight matrix, which is then multiplied with the multi-scale feature map to obtain an enhanced feature map, including: The multi-scale feature map is subjected to global average pooling, and after passing through two fully connected layers, the channel weight matrix is generated using the Sigmoid activation function. Max pooling and average pooling are performed in the channel dimension, and the spatial weight distribution is learned through 7×7 convolution after concatenation to generate a spatial weight matrix. The channel weight matrix and the spatial weight matrix are multiplied element-wise, and then multiplied with the multi-scale feature map to obtain the enhanced feature map.
8. The poppy image recognition method based on convolutional neural networks as described in claim 5, characterized in that, After downsampling using depthwise separable convolutional blocks and outputting the final feature map, the method further includes: Global average pooling is performed on the final feature map, and the generated feature vector is input into a fully connected layer to obtain a classification branch; The final feature map is convolved with 1x1 to obtain the localization branch.
9. The poppy image recognition method based on convolutional neural networks as described in claim 8, characterized in that, After converting the final feature map into a vector, the classification confidence score and its corresponding coordinates are output, including: Using the Sigmoid activation function, the corresponding classification confidence score is output based on the classification branch; Based on the center coordinates and width and height of the bounding box output by the positioning branch, a linear activation function is used to convert them into the original coordinate values.
10. A poppy image recognition system based on a convolutional neural network, applied to the poppy image recognition method based on a convolutional neural network as described in claim 1, characterized in that, The poppy image recognition system based on convolutional neural networks includes an image acquisition module, a feature extraction module, and a classification and recognition module; The image acquisition module is used to legally acquire poppy images and an initial negative sample library, generate synthetic plant images through cGAN, and enhance and expand the dataset; The feature extraction module is used to input the obtained hierarchical sample set into a lightweight multi-scale convolutional network for feature extraction and attention optimization, and output the final feature map after downsampling. The classification and recognition module is used to convert the final feature map into a vector and then output the classification confidence score and the corresponding coordinates.