A field corn tassel detection method based on improved SSD algorithm

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing a small-sized image dataset through an improved SSD algorithm, cropping and annotation information update strategies, and optimizing the SSD network, the problems of time-consuming and labor-intensive traditional methods and large computational load of existing algorithms are solved, and real-time high-precision detection of maize tassels is achieved.

CN115713697BActive Publication Date: 2026-06-16CHINA AGRI UNIV +1

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CHINA AGRI UNIV
Filing Date: 2022-11-16
Publication Date: 2026-06-16

Application Information

Patent Timeline

16 Nov 2022

Application

16 Jun 2026

Publication

CN115713697B

IPC: G06V20/17; G06N3/0464; G06N3/08; G06V10/25; G06V10/774; G06V10/82

AI Tagging

Application Domain

Character and pattern recognition Neural learning methods

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Traditional manual methods for detecting corn tassels are time-consuming, labor-intensive, and prone to errors. Existing deep learning algorithms for detecting corn tassels in the field suffer from problems such as large number of parameters and computational load, and difficulty in handling complex environments, thus failing to meet the requirements for real-time detection.

⚗Method used

An improved SSD algorithm is adopted, which constructs a small image dataset by cropping and updating the annotation information, removes unnecessary convolutional layers, constructs a feature pyramid multi-scale detection structure, optimizes the training process to reduce parameters and computation, and combines high-resolution images collected by UAVs to detect maize tassels in the field.

🎯Benefits of technology

It achieves real-time, high-precision detection of maize tassels, reduces manual annotation workload, lowers model parameters and computational load, adapts to complex field environments, and meets real-time detection requirements.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115713697B_ABST

Patent Text Reader

Abstract

The application provides a field corn tassel detection method based on an improved SSD algorithm, proposes an annotation information updating strategy to reduce the annotation workload, improves a one-stage deep learning model SSD by analyzing the width-height distribution characteristics of the corn tassel in the unmanned aerial vehicle image, retains the first two layers of the original model prediction frame, discards the subsequent convolution operation, simplifies the structure and parameters of the model under the premise of ensuring the accuracy, and can effectively extract image features, realizes the high-precision real-time detection task of the corn tassel in the field, and has high application value and wide application prospect.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, specifically to a method for detecting maize tassels in the field based on an improved S-SSD algorithm. Background Technology

[0002] Corn is an excellent feed, an important industrial raw material, and a high-quality cash crop. Increasing corn yield is crucial under the pressure of global climate change. The tasseling stage is a critical period for corn growth and development. Timely monitoring of the flowering status of corn tassels in the field, and timely topdressing and irrigation during this stage, can ensure stable corn yields. Traditional field-based tassel detection methods rely mainly on manual labor, which is time-consuming, labor-intensive, and prone to errors. Therefore, reducing manual labor and automating this process is essential.

[0003] With the improvement of computer performance and the development of image processing, computer vision plays an increasingly important role in agricultural analysis and applications, and has been widely used in the past few decades. Among these applications, real-time analysis of crop phenotypic information based on images acquired by drones, combined with computer vision technology, has become a novel means of rapidly understanding crop growth status. Currently, there are many studies using traditional machine learning algorithms to detect maize tassels, but these methods generally require manual design of recognition features, have limitations in terms of species and environment, and are difficult to handle the complex conditions of large-scale fields. Meanwhile, with the continuous improvement of GPU computing power and big data processing technology, more and more deep learning algorithms have been applied to agriculture. Compared with traditional methods, deep neural network methods have powerful feature extraction and self-learning capabilities, and have been widely used in the identification and detection of field crops such as wheat, rice, and maize, achieving excellent results.

[0004] Object detection primarily involves two mainstream frameworks: two-stage networks represented by Faster R-CNN and single-stage networks represented by SSD / YOLO. However, two-stage object detection networks have a large number of parameters and computational costs, which is not conducive to real-time detection of maize tassels in the field. SSD, on the other hand, inherits the idea of transforming detection into regression from YOLO, completing object localization and classification in one step. Based on the anchors in Faster R-CNN, it proposes similar prior boxes, incorporating a feature pyramid-based detection method, that is, predicting the object on feature maps of different receptive fields. These designs enable simple end-to-end training, maintaining the fast detection speed of YOLO while possessing the high detection accuracy of Faster R-CNN. Therefore, compared to YOLO, the SSD algorithm better meets the accuracy and real-time requirements for maize tassel detection in the field.

[0005] However, in practical applications, due to limitations in hardware and network models, it is impossible to directly train large-sized raw images on high-resolution RGB image datasets of the same breeding material collected by drones. Furthermore, the presence of occlusion and varying brightness conditions among maize tassels in field maize poses challenges for SSD tassel detection. Additionally, the small size of maize tassels in the images increases the parameters and computational load in the SSD model used to detect large prior boxes, hindering real-time detection of maize tassels in field maize. Summary of the Invention

[0006] To address existing problems, this invention provides a field maize tassel detection method based on an improved SSD algorithm. The core of the improved SSD algorithm lies in removing predictive feature maps from the SSD model that increase parameters and computational load, as the area of the prior boxes in these feature maps is larger than the maximum size of the tassel in the image. Simultaneously, high-resolution images acquired by UAVs are cropped to construct a small-size image dataset of maize tassels under different brightness conditions, and a labeling information update strategy is proposed to reduce the labeling workload.

[0007] To achieve the above objectives, the present invention adopts the following technical solution:

[0008] A method for detecting maize tassels in the field based on an improved SSD algorithm includes the following steps:

[0009] S1. Obtain the corn RGB image dataset;

[0010] Image annotation tools were used to annotate the corn tassels in each image in the dataset. The annotations included the tassel category and the coordinates of the bounding box surrounding the tassel. All annotated images were used as the first sample to form the VOC dataset.

[0011] S2. The RGB image dataset obtained in S1 is cropped to a uniform size to obtain a new dataset with a smaller size. At the same time, the annotation information update strategy is used to automatically update the annotation of the cropped image, and the image is enhanced. The enhanced data is divided into training set and test set.

[0012] S3. Calculate the length and width distribution of the tassel label boxes in all images;

[0013] S4. Based on the length and width distribution obtained in S3, construct an improved SSD network model. Input the training set obtained in S2 into the network and train the model using the SGD method. At the same time, set the hyperparameters and learning rate during training so that the total loss function converges to the optimal value. Finally, obtain the trained field maize tassel detection model.

[0014] S5. Use the trained improved SSD model to implement the task of detecting maize tassels in the field.

[0015] Furthermore, in step S1, corn images are acquired under different brightness conditions;

[0016] The brightness conditions include: slightly bright, normal brightness, and slightly dark.

[0017] Furthermore, in step S2, the cropped images are all the same size.

[0018] Furthermore, the annotation information update strategy adopted in step S2 includes:

[0019] S21. Calculate the distances i and j from the center point of the tassel labeling box to the cutting boundary within the labeling box in the horizontal and vertical directions, respectively;

[0020] S22. Set i, j, and the length and width of the annotation box... Compare them separately, where 'a' represents the length of the annotation box and 'b' represents the width of the annotation box;

[0021] S23. If i is greater than And j is greater than If the cropped label box is selected, it is retained; otherwise, it is discarded, and the label information of the cropped image is updated accordingly.

[0022] Furthermore, step S2 uses horizontal flipping, random rotation, and Gridmask methods to enrich the data volume and enhance the robustness of the model.

[0023] Furthermore, the data division ratio in step S2 is as follows: .

[0024] Furthermore, in step S4, the prior box size set on each feature map is compared with the length and width distribution of the tassel annotation box obtained in S3. The specific formula for calculating the prior box size is as follows:

[0025]

[0026] in This represents the number of feature maps at different resolutions. It is 0.2. The aspect ratio is set to 0.9. Five different aspect ratios are set for the prior frame. The values are 1, 2, 3, 1 / 2, and 1 / 3 respectively. Prior bounding box width. and height The specific calculation formula is as follows:

[0027]

[0028] In addition, for a aspect ratio of 1, an extra prior bounding box is added, with a side length of... The calculation formula is:

[0029] .

[0030] Further, in step S4, Conv8, Conv9, Conv10 and Conv11 of the SSD network model are deleted, and two feature layers, Conv4-3 and Conv7, are selected to form a feature pyramid multi-scale detection structure.

[0031] Furthermore, in step S4, the input size is... Feature extraction is performed on the image;

[0032] Furthermore, the formula for calculating the total loss function in step S4 is as follows:

[0033]

[0034] in, This is the number of prior bounding boxes that match ground truth bounding boxes, specifically the number of bounding boxes whose intersection-union ratio (IoU) is higher than 0.5. When... At that time, the loss function is set to 0. This is the weighting coefficient, set to 1. For confidence loss function, It represents the probability that the prior bounding box matches the corresponding label. To locate the loss function, It is the coordinate information of the prior bounding box. This refers to the coordinates of the actual bounding boxes. Loss function. It is by and We obtain the weighted sum. The calculation formula is as follows:

[0035]

[0036]

[0037] in For category The The first calibrated label box and the first An identifier indicating whether a prior box matches, with a value ranging from 0 to 1; Indicates the first Prior boxes for categories The classification score, and the corresponding background confidence of the prior bounding box are The positive and negative sample sets are respectively used and Logo.

[0038] For the location loss function The Smooth L1 loss mechanism is used, and the calculation formula is as follows:

[0039]

[0040]

[0041]

[0042]

[0043] The local loss function only calculates the error at the location of positive samples, where The value can be 0 or 1; due to the predicted value It has encoding attributes, so for The calculation is based on the actual label box. It is obtained through encoding. Indicates the first The size and position of each actual label box. Indicates the first The size and position of each prior bounding box This represents the offset of the predicted bounding box relative to the prior bounding box.

[0044] Compared with the prior art, the present invention has the following beneficial effects:

[0045] This invention provides a field maize tassel detection method based on an improved SSD algorithm. On the one hand, it automatically updates the annotations of cropped images through an annotation information update strategy, reducing the workload of manual annotation. On the other hand, it reduces the number of convolutional layers used for prediction in the SSD network model based on the features of image data collected by UAVs, thereby reducing the number of model parameters and computational load while maintaining accuracy, and realizing real-time high-precision detection of maize tassels in the field. Attached Figure Description

[0046] Figure 1 This is a schematic diagram of the basic process of the method in an embodiment of the present invention.

[0047] Figure 2 This is a schematic diagram of the annotation information update strategy in an embodiment of the present invention.

[0048] Figure 3 This is a schematic diagram of the specific structure of the improved SSD model according to an embodiment of the present invention. Detailed Implementation

[0049] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0050] This invention proposes a field maize tassel detection method based on an improved SSD algorithm, combined with... Figure 1 As shown, it includes the following steps:

[0051] S1. Using a drone, images of corn tassels in the field under different weather conditions were collected at a flight altitude of 15 m above the ground, with a resolution of 5472×3648. Pixels with a grayscale value less than 40 were defined as dark. Based on the proportion of dark pixels being less than 12%, greater than or equal to 12% and less than 60%, and greater than or equal to 60%, the image dataset was divided into three categories: slightly bright, normal brightness, and slightly dark. The LabelImg annotation tool was used to annotate the bounding boxes of the tassels in the images, resulting in XML annotation files.

[0052] S2, Combination Figure 2 As shown, the original image is cropped at a resolution of 1024×1024. During the cropping process, a corn tassel may be cut into 2 or 4 parts, which will belong to different cropped images. Due to insufficient semantic information, the smaller parts of the cropped tassel will be discarded. In the horizontal and vertical directions, if the distance from the center of the tassel's bounding box to the cropping boundary exceeds a / 3 or b / 3, the bounding box containing the tassel's center point is retained, and the box's coordinate information is updated; otherwise, the bounding box is discarded.

[0053] S3. To avoid overfitting, the dataset was randomly augmented with a probability of 0.5 using the Rotation, Horizontal_flipping, and Gridmask methods during training. The Gridmask method generated a series of binary masks, which were squares with a pixel value of 0, evenly distributed on the image in a grid structure. The heights of the small square masks were set to 50, 150, 250, and 350, respectively. Finally, the maize tassel dataset was divided into training and test sets in a 7:3 ratio. The width and height distribution of the tassel bounding boxes in the maize images of the dataset were analyzed, mainly between 20 and 120 pixels.

[0054] S4. Construct and train an improved SSD model based on the length-width distribution. During training, the initial learning rate is set to 0.02. Stochastic gradient descent (SGD) is used to optimize the loss function, and the total loss function is calculated as follows:

[0055]

[0056] in, This is the number of prior boxes that match ground truth boxes, specifically the number of boxes with an intersection-union ratio (IoU) higher than 0.5. At that time, the loss function is set to 0; The weighting coefficient is set to 1. For confidence loss function, It represents the probability that the prior bounding box matches the corresponding label. To locate the loss function, It is the coordinate information of the prior bounding box. It represents the coordinates of the actual bounding box; the loss function. It is by and We obtain the weighted sum. The calculation formula is as follows:

[0057]

[0058]

[0059] in For category The The first calibrated label box and the first An identifier indicating whether a prior box matches, with a value ranging from 0 to 1; Indicates the first Prior boxes for categories The classification score, and the corresponding background confidence of the prior bounding box are The positive and negative sample sets are respectively used and Logo;

[0060] For the location loss function The Smooth L1 loss mechanism is used, and the calculation formula is as follows:

[0061]

[0062]

[0063]

[0064]

[0065] The local loss function only calculates the error at the location of positive samples, where The value can be 0 or 1; due to the predicted value It has encoding attributes, so for The calculation is based on the actual label box. Obtained through encoding; Indicates the first The size and position of each actual label box. Indicates the first The size and position of each prior bounding box This represents the offset of the predicted bounding box relative to the prior bounding box.

[0066] The weight decay is 0.0001, and the momentum is 0.9. Combined with... Figure 3 As shown, it includes the following steps:

[0067] S41. The model input is an image with a resolution of 3×512×512. First, it enters the Conv 1_3 convolutional block, which contains two convolutional layers. The first layer has 64 3×3×3 convolutional kernels, and the second layer has 64 64×3×3 convolutional kernels. Then, it passes through the ReLU activation function, followed by the Conv 2_3, Conv 3_3, and Conv 4_3 convolutional blocks in sequence. The first convolutional layer in Conv 2_3 contains 128 64×3×3 convolutional kernels, and the second convolutional layer contains 128 128×3×3 convolutional kernels. Conv 3_3 and Conv 4_3 each have three convolutional layers, with 256 and 512 convolutional kernels in each block, respectively. The kernel sizes in Conv 3_3 are 128×3×3, 256×3×3, and 256×3×3, respectively. Meanwhile, Conv... The convolution kernel sizes in 4_3 are 256×3×3, 512×3×3, and 512×3×3, respectively;

[0068] S42. All convolutional layers are followed by ReLU activation functions. The features from the third convolutional layer in Conv 4_3, after ReLU activation, branch out into one branch, and another branch enters the Conv 5_3 convolutional block. This convolutional block contains three convolutional layers, each with 512 kernels of size 512×3×3. All five convolutional blocks are then finished with a MaxPool layer. Subsequently, Conv 6 and Conv 7 are used for convolutional feature extraction. These two convolutional layers contain 1024 kernels of sizes 512×3×3 and 512×1×1, respectively. The resulting feature maps, along with the feature maps extracted from Conv 4_3, are then subjected to non-maximum suppression (NMS) to remove some overlapping or incorrect bounding boxes, generating the final set of predicted bounding boxes.

[0069] S5. Input the image into the trained improved SSD network model for recognition and detection. Then, the detection results of corn tassels in the field can be obtained, and the images with the detection results of tassels can be marked.

[0070] Obviously, the above embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the implementation of the present invention. Those skilled in the art can make other variations or modifications based on the above description. It is neither necessary nor possible to exhaustively describe all embodiments here. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the claims of the present invention.

Claims

1. A method for detecting maize tassels in the field based on an improved SSD algorithm, characterized in that, The process is as follows: S1. Obtain the corn RGB image dataset; Image annotation tools were used to annotate the corn tassels in each image in the dataset. The annotations included the tassel category and the coordinates of the bounding box surrounding the tassel. All annotated images were used as the first sample to form the VOC dataset. S2. The RGB image dataset obtained in S1 is cropped to a uniform size to obtain a new dataset with a smaller size. At the same time, the annotation information update strategy is used to automatically update the annotations of the cropped images. The specific update strategy includes: S21. Calculate the distances i and j from the center point of the tassel labeling box to the cutting boundary within the labeling box in the horizontal and vertical directions, respectively; S22. Set i, j, and the length and width of the annotation box... Compare them separately, where 'a' represents the length of the annotation box and 'b' represents the width of the annotation box; S23. If i is greater than And j is greater than If the cropped label box is selected, the cropped label box is retained; otherwise, it is discarded, and the label information of the cropped image is updated accordingly. Next, the images are enhanced, and the enhanced data is divided into training and testing sets. S3. Calculate the length and width distribution of the tassel label boxes in all images; S4. Construct an improved SSD network model based on the length and width distribution obtained in S3. Specifically, this includes comparing the prior box size set on each feature map with the length and width distribution of the tassel annotation box obtained in S3. The specific formula for calculating the prior box size is as follows: in This represents the number of feature maps at different resolutions. It is 0.

2. The aspect ratio is set to 0.

9. Five different aspect ratios are set for the prior frame. The values are 1, 2, 3, 1 / 2, and 1 / 3 respectively; the prior bounding box widths are... and height The specific calculation formula is as follows: In addition, for a aspect ratio of 1, an extra prior bounding box is added, with a side length of... The calculation formula is: 。 Deleting Conv8, Conv9, Conv10, and Conv11 from the SSD network model, and selecting two feature layers, Conv4-3 and Conv7, to form a feature pyramid multi-scale detection structure. The training set obtained in S2 is input into the network, and the SGD method is used to train the model. During training, hyperparameters and learning rate are set to ensure the total loss function converges to the optimal value, ultimately obtaining a trained field maize tassel detection model. S5. Use the trained improved SSD model to implement the task of detecting maize tassels in the field.

2. The method for detecting maize tassels in the field based on the improved SSD algorithm according to claim 1, characterized in that, In step S1, corn images are acquired under different brightness conditions; The brightness conditions include: pixels with a grayscale value less than 40 account for less than 20% of the entire image, which is considered too bright; between 12% and 60% is considered normal brightness; and above 60% is considered too dark.

3. The method for detecting maize tassels in the field based on the improved SSD algorithm according to claim 1, characterized in that, In step S2, the cropped images are all the same size.

4. The method for detecting maize tassels in the field based on the improved SSD algorithm according to claim 1, characterized in that, In step S2, horizontal flipping, random rotation, and Gridmask methods are used to enrich the data volume and enhance the robustness of the model.

5. The method for detecting maize tassels in the field based on the improved SSD algorithm according to claim 1, characterized in that, The data partitioning ratio in step S2 is: .

6. The method for detecting maize tassels in the field based on the improved SSD algorithm according to claim 1, characterized in that, In step S4, feature extraction is performed on the input image with a size of 512×512.

7. The method for detecting maize tassels in the field based on the improved SSD algorithm according to claim 1, characterized in that, The formula for calculating the total loss function in step S4 is as follows: in, This is the number of prior boxes that match ground truth boxes, specifically the number of boxes with an intersection-union ratio (IoU) higher than 0.

5. At that time, the loss function is set to 0; The weighting coefficient is set to 1. For confidence loss function, It represents the probability that the prior bounding box matches the corresponding label. To locate the loss function, It is the coordinate information of the prior bounding box. It represents the coordinates of the actual bounding box; the loss function. It is by and We get the result by weighted summation; The calculation formula is as follows: in For category The The first calibrated label box and the first An identifier indicating whether a prior box matches, with a value ranging from 0 to 1; Indicates the first Prior boxes for categories The classification score, and the corresponding background confidence of the prior bounding box are The positive and negative sample sets are respectively used and Logo; For the location loss function The Smooth L1 loss mechanism is used, and the calculation formula is as follows: The local loss function only calculates the error at the location of positive samples, where The value can be either 0 or 1; due to the predicted value It has encoding attributes, so for The calculation is based on the actual label box. Obtained through encoding; Indicates the first The size and position of each actual label box Indicates the first The size and position of each prior bounding box This represents the offset of the predicted bounding box relative to the prior bounding box.