A Method for Rice Panicle Extraction and Yield Estimation Based on Semantic Segmentation and UAV Color and Multispectral Images

By acquiring color and multispectral images of rice using drones, and combining semantic segmentation technology and deep learning models, the problem of low efficiency in rice panicle detection has been solved. This enables intelligent extraction of rice panicles and yield estimation, thereby improving agricultural production efficiency and precision management.

CN118447414BActive Publication Date: 2026-06-30SOUTH CHINA AGRICULTURAL UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SOUTH CHINA AGRICULTURAL UNIVERSITY
Filing Date
2024-04-25
Publication Date
2026-06-30

Smart Images

  • Figure CN118447414B_ABST
    Figure CN118447414B_ABST
Patent Text Reader

Abstract

A method for rice panicle extraction and yield estimation based on semantic segmentation and UAV color and multispectral images involves acquiring images of rice seedlings using a UAV equipped with a camera; processing the color and multispectral images to calculate vegetation index, color features, and texture features, and optimizing pseudo-color images through different feature combinations; image preprocessing and manual annotation to construct a pseudo-color image dataset; extracting rice panicles using a semantic segmentation algorithm based on the pseudo-color images, and comparing the detection accuracy using different backbone networks; evaluating the accuracy of the semantic segmentation model on the color and pseudo-color image datasets, and selecting a suitable image dataset for panicle extraction; and fitting the proportion of pixels of the segmented panicles to the total number of pixels in the image with the actual yield to construct different yield estimation models and verify model performance. This invention achieves rice panicle extraction and yield estimation through image processing, belonging to the field of rice panicle extraction technology in rice images.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to rice panicle extraction technology from rice images, specifically to a method for extracting rice panicles and estimating yield based on semantic segmentation and UAV color and multispectral images, as well as computer equipment and storage media. Background Technology

[0002] Rice is one of the world's most important food crops. With the continuous growth of the world's population, food production has become increasingly crucial, making the cultivation of high-yield rice an urgent need to alleviate food shortages. The rice panicle, as the most important asexual reproductive organ and phenotypic characteristic of rice, is not only closely related to crop yield but also plays a vital role in disease detection, crop organ detection, and growth stage identification. However, current rice panicle organ detection mainly relies on manual methods, which suffers from low efficiency and high subjectivity. With the rapid development of image processing technology and the emergence of deep learning methods, numerous studies have utilized this technology for crop organ detection, identification, and diagnosis, aiming to achieve intelligent policy implementation and management.

[0003] In recent years, unmanned aerial vehicles (UAVs) have played an increasingly important role in agricultural monitoring due to their maneuverability, high spatiotemporal resolution, and other advantages. UAV platforms equipped with digital cameras and multispectral or hyperspectral sensors can provide multispectral data with high spatial resolution, high temporal resolution, and multispectral information, acquiring spectral, physiological, and structural information of vegetation to better reflect its ecological status and changes. UAV multispectral imagery is used in precision agriculture for monitoring crop health, soil conditions, and anomaly detection, and has been widely applied in crop classification, disease detection, yield estimation, and growth monitoring. Numerous studies have shown that empirical statistical models built using vegetation indices are an effective way to estimate crop yields and predict yields. Due to their ease of calculation and simple model design, they are widely used in crop remote sensing prediction. With the development of deep learning, methods for image semantic segmentation using deep learning have been proposed. Research results show that the semantic segmentation effect of deep learning methods is significantly better than that of manual feature extraction methods. In summary, multispectral technology and deep learning technology have broad application prospects in the agricultural field, which can effectively improve the efficiency and yield of crop production, and are expected to become an important tool for precision agricultural management in the future. However, semantic segmentation still has gaps in the extraction of rice panicles and yield estimation from multispectral images.

[0004] This invention utilizes color and multispectral images of rice acquired by drones, combined with semantic segmentation technology to extract rice panicles from pseudo-color images of rice, and designs a rice yield estimation method based on panicle density based on the segmentation results, thereby achieving accurate field yield estimation of crops. Summary of the Invention

[0005] To address the technical problems existing in the prior art, the purpose of this invention is to provide a method for extracting rice panicles and estimating yield based on semantic segmentation and UAV color and multispectral images.

[0006] A second objective of this invention is to provide a computer device.

[0007] A third objective of this invention is to provide a storage medium.

[0008] To achieve the above objectives, the present invention adopts the following technical solution:

[0009] A method for extracting rice panicles and estimating yield based on semantic segmentation and UAV color and multispectral images includes the following steps:

[0010] S1. Rice is planted in a small plot. During the rice seedling growth cycle, color and spectral images of rice panicles in each plot are collected by drones equipped with visible light and multispectral cameras. The images are used to manually count the rice yield of each plot after the rice matures and are used for subsequent rice yield modeling and prediction.

[0011] S2 processes color and multispectral images, calculates vegetation index, color features, and texture features, and selects the best pseudo-color image through different feature combinations;

[0012] S3 utilizes image preprocessing techniques and manual annotation to construct a pseudo-color image dataset;

[0013] S4. Based on the selected pseudo-color image, a semantic segmentation algorithm is used to extract rice ears, and the detection accuracy of different backbone networks is compared.

[0014] S5. The evaluation metrics are selected to measure the accuracy of the semantic segmentation model in extracting rice ears on color and pseudo-color image datasets. A suitable pseudo-color image dataset and the trained semantic segmentation model are selected for extracting rice ears.

[0015] S6. For the pseudo-color image of each plot, the proportion of pixels of the segmented rice ears to the total number of pixels in the image is used to fit with the actual yield of the plot measured manually, to construct different yield estimation models, and to verify the performance of the models.

[0016] As a preferred embodiment, step S1 includes:

[0017] S11, the rice seedling raising mode adopts different rice varieties, different seedling densities, and different sowing dates, and the rice seedlings are planted in a small plot according to the seedling raising mode;

[0018] S12: After the rice matures, the yield of each plot is sampled manually and the yield data is recorded.

[0019] S13 uses a drone equipped with a multispectral camera and a color camera to hover over the rice paddies and take aerial photos of the rice seedlings in the area from a vertical, overhead angle.

[0020] As a preferred option, in step S2, to generate a suitable pseudo-color image for extracting rice ears, it is necessary to process the color image and the multispectral image. The processing flow consists of the following three steps:

[0021] S21, Based on the collected color and multispectral image data, obtain images with different vegetation index features, color features, and texture features;

[0022] S22, based on different vegetation index features, color features, and texture features, different three-channel images are selected and recombined to generate different pseudo-color images;

[0023] S23. Based on the evaluation indicators and the manually labeled foreground and background image pixels of rice ears, analyze the rice ear pixels and background pixels of different pseudo-color images, and select the pseudo-color images with obvious rice ear features.

[0024] As a preferred option, the process of selecting a pseudo-color image includes:

[0025] (1) For images with different vegetation indices, color features, and texture features, select different three-channel images and recombine them to generate different pseudo-color images. Use the threshold segmentation method to select an appropriate threshold based on the color of the rice ears and extract the rice ears.

[0026] (2) Compare with manually labeled rice ear images, and use root mean square error and structural similarity index to quantify the similarity between automatic segmentation method and manual labeling, and select pseudo-color combinations with obvious rice ear features.

[0027] As a preferred option, in step S3, in order to construct the training set, the images need to be preprocessed. The preprocessing process consists of the following two steps:

[0028] S31, manually labeled rice ear images to construct a dataset of pseudo-color images;

[0029] S32 divides the pseudo-color images with obvious rice ear characteristics into training, validation and test sets in a ratio of 8:1:1.

[0030] As a preferred embodiment, step S4 includes:

[0031] S41 uses two different networks, ResNet and Xception, as the backbone networks of Deeplabv3+.

[0032] S42, input the images of the rice pseudo-color training set into different models for training, and modify their training parameters;

[0033] S43: Input the validation set and test set divided in S3 into the trained model to evaluate the performance of the two models;

[0034] S44. Input the rice pseudo-color image into the Deeplabv3+ model to extract the rice panicles.

[0035] As a preferred option, in step S5, evaluation metrics are selected to test the chosen network model, using three metrics: average intersection-over-union ratio, pixel accuracy, and average pixel accuracy.

[0036]

[0037]

[0038]

[0039] In the formula: i represents the actual value, j represents the predicted value, and p ij This means predicting i as j, k represents the category, and p ii This means predicting i as i, p ji This means that j will be predicted as i.

[0040] As a preferred option, the specific operation for yield estimation in step S6 is as follows:

[0041] S61, calculate the ratio of the pixels belonging to the rice ear category to the total number of pixels in the segmentation result obtained in S4;

[0042] S62, Select machine vision algorithms, including least squares fitting, random forest, and multilayer perceptron algorithm, and fit the sum of the pixel values ​​of rice ears in the plot extracted based on semantic segmentation with the actual yield data to obtain different yield estimation models.

[0043] S63 uses cross-validation to verify model performance.

[0044] A computer device includes a processor and a memory for storing a processor-executable program, wherein when the processor executes the program stored in the memory, it implements a method for extracting rice panicles and estimating yield based on semantic segmentation and UAV color and multispectral images.

[0045] A storage medium storing a program that, when executed by a processor, implements a method for extracting rice panicles and estimating yield based on semantic segmentation and UAV color and multispectral images.

[0046] The present invention has the following advantages:

[0047] This invention utilizes a drone platform equipped with color and multispectral cameras to acquire color and multispectral images of rice. Appropriate vegetation indices, color features, and texture features are combined to construct pseudo-color images. A pseudo-color image dataset is built through manual annotation, and image enhancement techniques are employed to expand the dataset. A deep learning model is used to train on the pseudo-color image dataset to construct a rice panicle extraction model. Based on this model, a yield estimation model is built to estimate rice yield. Currently, research on direct image processing of multispectral images is lacking. This invention achieves rice panicle extraction and yield estimation, forming a corresponding intelligent detection system. If applied to rice production and management, it will significantly improve yields and has important significance for agricultural meteorological research and services. Attached Figure Description

[0048] Figure 1 This is a flowchart of the method of the present invention. Detailed Implementation

[0049] The present invention will now be described in further detail with reference to specific embodiments.

[0050] A method for extracting rice panicles and estimating yield based on semantic segmentation and UAV color and multispectral images includes steps S1-S6. A network model for intelligent extraction of rice panicles is constructed by training a semantic segmentation model on a pseudo-color image dataset built from multispectral images. Based on the panicle segmentation, the model is fitted with yield data to construct a rice yield estimation model. The crop used in this embodiment is rice.

[0051] The following will provide a detailed introduction to S1-S6.

[0052] 1. S1. Rice is planted in a small plot. During the rice seedling growth cycle, color and spectral images of rice panicles in each plot are collected by drones equipped with visible light and multispectral cameras. The images are used to manually count the rice yield of each plot after the rice matures and are used for subsequent rice yield modeling and prediction.

[0053] Step S1 includes:

[0054] S11, the rice seedling raising mode adopts different rice varieties, different seedling densities, and different sowing dates, and the rice seedlings are planted in a small plot according to the seedling raising mode;

[0055] S12: After the rice matures, the yield of each plot is sampled manually and the yield data is recorded.

[0056] S13 uses a drone equipped with a multispectral camera and a color camera to hover over the rice paddies and take aerial photos of the rice seedlings in the area from a vertical, overhead angle.

[0057] Preferably, the drone is equipped with a color camera (visible light camera) and five multispectral cameras to take aerial photos of the rice seedlings from a vertical, top-down angle, and to maintain the same altitude as much as possible when acquiring images. After the rice matures, the measured rice yield at preset sample points in the target plot is obtained, and the location information of each preset sample point is recorded.

[0058] This invention implements a double-cropping rice plot planting model in field trials. Each planting plot has an area of ​​10.8 × 3.5 m². 2 Different planting patterns were used in different plots to obtain rice datasets with significant differences in growth. Each experiment used three rice varieties, five nitrogen fertilizer application levels, and two planting densities, with each planting pattern repeated three times, for a total of 90 planting plots. Images were captured using a DJI Phantom 4 drone equipped with a six-1 / 2.9-inch CMOS image sensor, including one color sensor for visible light imaging and five monochrome sensors for multispectral imaging, with an effective pixel count of 2.08 million (total pixel count of 2.12 million). The spectral bands of the five multispectral cameras were: Blue (B): 450nm±16nm; Green (G): 560nm±16nm; Red (R): 650nm±16nm; Red Edge (RE): 730nm±16nm; Near Infrared (NIR): 840nm±26nm. The maximum image resolution was 1600×1300 (4:3.25), and the image format supported was JPEG (visible light imaging) + TIFF (multispectral imaging). When acquiring drone imagery data, select the key growth period of rice, while minimizing the impact of rainfall. Take the photos between 10:00 and 14:00, and shoot as vertically as possible from the rice field, maintaining the same shooting height. The drone's built-in GPS positioning system can obtain the geographic information of the image sequence.

[0059] 2. S2: Process color and multispectral images, calculate vegetation index, color features, and texture features, and optimize pseudo-color images through different feature combinations.

[0060] Data was acquired using multispectral UAVs. Drawing on previous research experience and combined with experimental data analysis, sensitive spectral bands for rice were selected. Visible and multispectral vegetation indices were chosen, and local binary mode and gray-level co-occurrence matrix methods were used to extract image texture features. Color features were extracted using HSV space, and combined into a three-channel pseudo-color image. Thresholding was then applied to segment the rice ears based on their color. The similarity between rice ears was automatically segmented using root mean square error quantization and manually labeled rice ears. Combinations with prominent rice ear features were selected for training the semantic segmentation model.

[0061] To generate a suitable pseudo-color image for extracting rice ears, it is necessary to process the color and multispectral images. The processing workflow consists of the following three steps:

[0062] The S21, DJI Phantom 4 drone's multispectral camera has the following bands: blue (ρ RED ): 450nm±16nm; Green (ρ) GREEN ): 560nm±16nm; Red (ρ) RED ): 650nm±16nm; red edge (ρ) RE ): 730nm±16nm; near-infrared (ρ NIR (840nm±26nm, RGB images acquired by a color camera. Based on the acquired color and multispectral image data, images with different vegetation index characteristics, color characteristics, and texture characteristics are obtained.)

[0063] S22. Based on different vegetation index features, color features, and texture features, different three-channel images are selected and recombined to generate different pseudo-color images.

[0064] S23. Based on the evaluation indicators and the manually labeled foreground and background image pixels of rice ears, analyze the rice ear pixels and background pixels of different pseudo-color images, and select the pseudo-color images with obvious rice ear features.

[0065] 1) The specific formulas for commonly used vegetation indices are as follows:

[0066] Vegetation indices can generally be divided into visible light band indices and multispectral band indices. This invention uses the calculated matrix to generate images and then uses them for subsequent pseudo-color image generation.

[0067] (1) Visible light band index

[0068] Super Red Index:

[0069] ExR = 1.4 × ρ RED -ρ GREEN

[0070] Super Blue Index:

[0071] ExB = 1.4 × ρ BLUE -ρ GREEN

[0072] Super Green Index:

[0073] ExG=2×ρ GREEN -ρ RED -ρ BLUE

[0074] Super Green vs. Super Red Difference Index:

[0075] ExGR=(2×ρ GREEN -ρ RED -ρ BLUE )-(1.4×ρ RED -ρGREEN)

[0076] Visible light band differential vegetation index:

[0077]

[0078] Vegetation color extraction index:

[0079] CIVE = 0.441 × ρ RED -0.811×ρ GREEN +0.385×ρ BLUE +18.78745

[0080] Normalized Difference Index:

[0081]

[0082] (2) Multispectral band index

[0083] Normalized Difference Vegetation Index:

[0084] NDVI=(ρ NIR -ρ RED ) / (ρ NIR +ρ RED ),

[0085] Green Normalized Difference Vegetation Index:

[0086] GNDVI=(ρ NIR -ρ GREEN ) / (ρ NIR +ρ GREEN ),

[0087] Enhanced vegetation index:

[0088] EVI = 2.5 × [(ρ NIR -ρ RED ) / (ρ NIR +6×ρ RED -7.5×ρ BLUE +1)],

[0089] Normalized difference red edge index:

[0090] NDRE=(ρ NIR -ρ RE ) / (ρ NIR +ρ RE )

[0091] Leaf chlorophyll index:

[0092] LCI=(ρ NIR -ρ RE ) / (ρ NIR +ρRED )

[0093] Optimize soil and adjust vegetation index:

[0094] OSAVI = (ρ NIR -ρ RED ) / (ρ NIR +ρ RED +0.16)

[0095] Where, ρ RED Represents the red band, with a center wavelength of 680 nm, ρ GREEN Representing the green wave band, the center wavelength of the band is 540nm, ρ BLUE Represents the blue band, with a center wavelength of 465nm, ρ NIR Represents the infrared band, with a center length of 800 nm, ρ RE This represents the red-edge band, with a center wavelength of 725nm.

[0096] 2) Methods for extracting the texture features of rice panicles include:

[0097] (1) Gray-level co-occurrence matrix

[0098] The gray-level co-occurrence matrix (GLCM) is a method for statistically analyzing the gray-level information contained in an image. This method calculates the frequency of occurrence of each type of gray-level information. Its advantage is that it can accurately predict and reflect comprehensive information about the direction, adjacent spacing, and variation amplitude of image gray levels. Image texture is generated by the repeated occurrence of gray-level combinations distributed in a certain pattern on the image. Therefore, there is a spatial correlation in the gray-level distribution between pixels spaced a certain distance apart in an image. The GLCM is a matrix that studies the gray-level distribution relationship between adjacent pixels in an image.

[0099] The gray-level co-occurrence matrix (GLCM) essentially refers to the probability that a pixel with gray level i will reach gray level j after moving away from a specific position d = (Δx, Δy). The ranges of Δx and Δy are determined by two parameters: the pixel spacing d and the angle θ. i and j are used to represent the gray level of the pixels, respectively. d mainly refers to the direction and distance between two different pixels, and θ mainly refers to the corresponding direction generated by the GLCM.

[0100] The gray-level co-occurrence matrix can be used to statistically analyze the texture information of rice multispectral images. Texture features are usually described using features, including energy, contrast, entropy, uniformity, and correlation.

[0101] Energy: can be used to reflect the uniformity of gray-level distribution within an image area.

[0102]

[0103] Here, p(i,j) represents the frequency of pixel pairs under a specific positional relationship. Coarse textures have larger energy values, while fine textures have smaller energy values. The energy value reaches its minimum when all p(i,j) in the co-occurrence matrix are equal.

[0104] Contrast: Used to reflect the sharpness of an image, and also the depth of texture grooves.

[0105]

[0106] In the formula, (ij) represents the grayscale difference of the pixel pair. The greater the contrast, the deeper the grooves of the texture and the clearer the effect; conversely, the smaller the contrast, the shallower the grooves of the texture and the blurrier the effect.

[0107] Entropy: represents the randomness of image content, and also reflects the information content and complexity of the image.

[0108]

[0109] If the texture is complex, the entropy value is large; conversely, if the grayscale in the image is uniform and the texture is relatively simple, the entropy value is small.

[0110] Uniformity: A measure used to reflect the smoothness of image distribution.

[0111]

[0112] For regions where elements are uniformly distributed, the elements of the gray-level co-occurrence matrix are concentrated on the diagonal. Where (ij) 2 A smaller uniformity value indicates a finer texture; conversely, a larger uniformity value indicates a coarser texture.

[0113] Correlation: Reflects the local grayscale correlation of the texture.

[0114]

[0115] The correlation value is larger when some elements of a matrix have uniformly equal values. If there is horizontal texture in the image, the correlation value of the horizontal matrix will be greater than the correlation value of other matrices.

[0116] (2) Local binary mode

[0117] Local Binary Pattern (LBP) is a method that reflects the local structure of a texture by calculating the relationship between the gray values ​​of a pixel and the gray values ​​of its surrounding pixels. It calculates patterns in the image based on the gray-level relationships between pixels, and then uses the frequency of these patterns to represent the image's features. It combines structural and statistical methods, and possesses significant advantages such as gray-level invariance and rotation invariance.

[0118] The LBP operator is defined as a 3×3 window, with point c as the center point, and the grayscale value of the pixel at the center point is g. c The gray values ​​of the pixels in the other eight neighboring areas are g0, g1, g2, g3, g4, g5, g6, and g7, respectively.

[0119] Texture T can be approximated as:

[0120] T≈t(g c (g0-g) c ,g1-g c ,g2-g c ,g3-g c ,g4-g c ,g5-g c ,g6-g c ,g7-g c )

[0121] Where t(g) c ) represents the grayscale intensity information of the center pixel in a local region, and the grayscale interpolation g of adjacent pixels. p -g c (p = 0, 1, 2, ..., 7) describes the grayscale value changes between the center pixel and its neighboring pixels. In the calculation, to ignore changes in illumination intensity, the grayscale value distribution at the center point is ignored. The remaining joint distribution has a very large range, so the pixel differences need to be quantized. The simplest method is to quantize the pixel grayscale value differences into two values, that is, to use binary representations to represent the grayscale difference.

[0122] T≈t(s(g0-g c ),s(g1-g c ),s(g2-g c ),s(g3-g c ),s(g4-g c ),s(g5-g c ),s(g6-g c ),s(g7-g c ))

[0123] Where s(x) is a binary function, defined by the following equation:

[0124]

[0125] 3) Methods for extracting the color characteristics of rice panicles include:

[0126] The HSV color model supports numerous image analysis algorithms, is closer to human visual perception, and is more conducive to image analysis and processing. HSV is a relatively intuitive color space that separates the image's color components (H), saturation components (S), and lightness components (V). Processing the lightness and saturation components does not affect the color components, avoiding the color distortion problems found in the RGB color space. For the color characteristics of rice ears, the HSV color model is used to extract the color feature vector of rice ears. Since the lower-order moments in the color moments store most of the color information, six parameters are mainly extracted as the color feature vector of rice ears: the first-order moment μ (mean) and the second-order moment δ (standard deviation) of the H, S, and V channels in the HSV space of the rice image.

[0127] The mean value reflects the overall color characteristics of the H, S, and V channels of the rice panicle image, and its formula is as follows:

[0128]

[0129]

[0130] The standard deviation reflects the dispersion of color value distribution, and its formula is as follows:

[0131]

[0132]

[0133] M and N represent the pixel length and width of a single rice image, respectively, and P(x,y) represents the pixel value of the H, S, and V channel images at point (x,y).

[0134] 4) The process of selecting the best pseudo-color image includes:

[0135] (1) For images with different vegetation indices, color features, and texture features, select different three-channel images and recombine them to generate different pseudo-color images. Use the threshold segmentation method to select an appropriate threshold based on the color of the rice ears and extract the rice ears.

[0136] (2) Compare with manually labeled rice ear images, and use indicators such as root mean square error (RMSE) and structural similarity index (SSIM) to quantify the similarity between the automatic segmentation method and manual labeling, and select pseudo-color combinations with obvious rice ear features.

[0137] 3. S3: Using image preprocessing techniques and manual annotation, construct a pseudo-color image dataset.

[0138] To construct the training set, the images need to be preprocessed. The preprocessing process consists of the following two steps:

[0139] S31, manually labeled rice ear images to construct a dataset of pseudo-color images;

[0140] S32 divides the pseudo-color images with obvious rice ear characteristics into training, validation and test sets in a ratio of 8:1:1.

[0141] 1) The image preprocessing process includes:

[0142] (1) To improve the generalization ability of the detection model, the image data is enhanced by operations such as horizontal flipping, vertical flipping, blurring, and random transformation of hue and saturation, thereby expanding the dataset.

[0143] (2) Image cropping: The image obtained in step S21 is cropped into 4 sub-images along the horizontal and vertical center lines. The size of each original image is 1600×1300 pixels, while the size of the cropped sub-image is 800×650 pixels.

[0144] 2) The manual annotation process includes:

[0145] (1) The RGB image was manually annotated using the image visualization annotation tool Labelme. The edges of the rice ears were drawn with polygons and a JSON file was generated, which contained the coordinate information and annotation category of each point.

[0146] (2) Convert the JSON file into an integer mask format to accurately label the category of each pixel for training the semantic segmentation model and generate a rice spike dataset.

[0147] In this embodiment, to reduce data processing time and improve model training efficiency during deep learning training, the acquired images are cropped. First, the original image is divided into four sub-images along the horizontal and vertical center lines. Each original image is 1600×1300 pixels in size, while the cropped sub-images are 800×650 pixels in size. To enrich the experimental dataset, image data augmentation techniques are used to expand its size, reduce the dependence of the rice extraction model on certain image attributes, reduce overfitting, and enhance model stability. To improve the generalization ability of the detection model, image data is augmented through operations such as horizontal flipping, vertical flipping, blurring, and random transformations of hue and saturation, thereby expanding the dataset.

[0148] 4. Based on the selected pseudo-color image, a semantic segmentation algorithm is used to extract rice ears, and the detection accuracy of different backbone networks is compared.

[0149] The Deeplabv3+ algorithm was used to extract rice panicles on a pseudo-color image dataset. Two different backbone networks, ResNet and Xception, were trained to compare their effects on panicle extraction and their performance on color and pseudo-color datasets. A suitable dataset and backbone algorithm were selected for panicle segmentation. The deep learning-based rice panicle extraction model is mainly based on a semantic segmentation model, and the process consists of the following four steps:

[0150] S41 uses two different networks, ResNet and Xception, as the backbone networks of Deeplabv3+.

[0151] S42, input the images of the rice pseudo-color training set into different models for training, and modify their training parameters;

[0152] S43: Input the validation set and test set divided in S3 into the trained model to evaluate the performance of the two models;

[0153] S44. Input the rice pseudo-color image into the Deeplabv3+ model to extract the rice panicles.

[0154] The convolutional neural network used in the deep learning-based model for extracting rice spikes is DeepLabv3+. DeepLabv3+ is an evolution of the DeepLab series models, employing a series of advanced techniques to improve the performance of semantic segmentation tasks, using an encoder-decoder structure. The encoder mainly consists of an ASPP module and a backbone network module. The input image first passes through a backbone network, such as ResNet or Xception. A deep convolutional network with atrous spatial pyramid pooling (ASPP) modules is then used, enabling DeepLabv3+ to effectively identify objects at different spatial scales in the image. The decoder module helps fuse high-level semantic information from deep convolutional networks with fine features from low-level convolutional networks, reducing jagged edge effects in the segmentation results and improving segmentation accuracy. For multi-scale processing, DeepLabv3+ introduces a Feature Pyramid Network (FPN) to extract semantic information at different scales. This makes the model better able to capture objects and details at various scales in the image. Global average pooling is introduced into the last layer of the network to capture global information across the entire image, improving the understanding of the overall semantics. Combined with some optimization techniques common in deep learning, such as adversarial training, DeepLabv3+ is designed to be more robust and perform better in the face of noise and interference.

[0155] ResNet achieved remarkable success in deep learning by innovatively introducing residual learning and utilizing skip connections to mitigate the vanishing or exploding gradient problems. ResNet excels in multiple tasks, including image classification, object detection, and semantic segmentation, and its improvements, such as ResNet-50 and ResNet-101, are widely used as the backbone of deep neural networks. Xception, an evolution of the Google Inception model, introduced the concept of depthwise separable convolution. This architecture effectively reduces the number of parameters while maintaining high performance by decomposing standard convolution into two steps: depthwise convolution and pointwise convolution. It is particularly advantageous in resource-constrained environments, and Xception excels in tasks such as image classification and object detection.

[0156] 5. S5: Select an evaluation metric to measure the accuracy of the semantic segmentation model in extracting rice ears on color and pseudo-color image datasets. Select a suitable pseudo-color image dataset and a trained semantic segmentation model for extracting rice ears.

[0157] Using the ratio of extracted rice panicle pixels to original image pixels as the variable and measured rice yield as the dependent variable, rice yield estimation models were constructed using logarithmic function, exponential function, power function, univariate linear regression, and univariate quadratic regression, respectively. Appropriate indicators were used to measure the accuracy of the estimation.

[0158] The selected network model is evaluated using three metrics: mean intersection-over-union ratio (MIOR), pixel accuracy, and average pixel accuracy.

[0159]

[0160]

[0161]

[0162] In the formula: i represents the actual value, j represents the predicted value, and p ij This means predicting i as j, k represents the category, and p ii This means predicting i as i, p ji This means that j will be predicted as i.

[0163] 6. For the pseudo-color image of each plot, the proportion of pixels of the segmented rice ears to the total number of pixels in the image is used to fit the actual yield of the plot by manual measurement, and different yield estimation models are constructed and the performance of the models is verified.

[0164] The specific steps for production estimation are as follows:

[0165] S61, calculate the ratio of the pixels belonging to the rice ear category to the total number of pixels in the segmentation result obtained in S4;

[0166] S62, Select machine vision algorithms (least square fitting, random forest, multilayer perceptron algorithm, etc.) to fit the sum of the pixel values ​​of rice ears in the plot extracted based on semantic segmentation with the actual yield data to obtain different yield estimation models.

[0167] S63 uses cross-validation to verify model performance.

[0168] The process of establishing the production estimation model is as follows:

[0169] (1) Using the ratio of extracted rice panicle pixels to original image pixels as variables and the measured rice yield as the dependent variable, rice yield estimation models were constructed using logarithmic function, exponential function, power function, univariate linear regression and univariate quadratic regression respectively.

[0170] (2) The evaluation criterion for the rice yield estimation model is the coefficient of determination R. 2 The root mean square error (RMSE) and relative root mean square error (MAE) are used to evaluate model performance.

[0171]

[0172]

[0173]

[0174] In the formula: x i , y i Let R represent the measured value, measured mean, estimated value, and estimated mean, respectively, and let n represent the sample size. 2 The closer the value is to 1, the smaller the RMSE, and the higher the model fit.

[0175] The above embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above embodiments. Any changes, modifications, substitutions, combinations, or simplifications made without departing from the spirit and principle of the present invention shall be considered equivalent substitutions and shall be included within the protection scope of the present invention.

Claims

1. A method for rice panicle extraction and yield estimation based on semantic segmentation and unmanned aerial vehicle color and multispectral images, characterized in that, Includes the following steps: S1. Rice is planted in a small plot. During the rice seedling growth cycle, color and spectral images of rice panicles in each plot are collected by drones equipped with visible light and multispectral cameras. After the rice matures, the yield of rice in each plot is counted manually and used for subsequent rice yield modeling and prediction. S2 processes color and multispectral images, calculates vegetation index, color features, and texture features, and optimizes pseudo-color images through different feature combinations; S3 utilizes image preprocessing techniques and manual annotation to construct a pseudo-color image dataset; S4. Based on the selected pseudo-color image, a semantic segmentation algorithm is used to extract rice ears, and the detection accuracy of different backbone networks is compared. S5. The evaluation metrics are selected to measure the accuracy of the semantic segmentation model in extracting rice ears on color and pseudo-color image datasets. A suitable pseudo-color image dataset and the trained semantic segmentation model are selected for extracting rice ears. S6. For the pseudo-color image of each plot, the proportion of the pixels of the segmented rice ears to the total number of pixels in the image is used to fit with the actual yield of the plot measured manually, to construct different yield estimation models and verify the performance of the models. In step S2, to generate a suitable pseudo-color image for extracting rice ears, it is necessary to process the color image and the multispectral image. The processing flow consists of the following three steps: S21, Based on the collected color and multispectral image data, obtain images with different vegetation index features, color features, and texture features; S22, based on different vegetation index features, color features, and texture features, different three-channel images are selected and recombined to generate different pseudo-color images; S23. Based on the evaluation indicators and the manually labeled foreground and background image pixels of rice ears, analyze the rice ear pixels and background pixels of different pseudo-color images, and select the pseudo-color images with obvious rice ear features. The process of selecting the best pseudo-color image includes: (1) For images with different vegetation indices, color features, and texture features, select different three-channel images and recombine them to generate different pseudo-color images. Use the threshold segmentation method to select an appropriate threshold based on the color of the rice ears and extract the rice ears. (2) Compare with manually labeled rice ear images, and use root mean square error and structural similarity index to quantify the similarity between automatic segmentation method and manual labeling, and select pseudo-color combinations with obvious rice ear features.

2. The method for rice panicle extraction and yield estimation based on semantic segmentation and UAV color and multispectral images according to claim 1, characterized in that, Step S1 includes: S11, the rice seedling raising mode adopts different rice varieties, different seedling densities, and different sowing dates, and the rice seedlings are planted in a small plot according to the seedling raising mode; S12: After the rice matures, the yield of each plot is sampled manually and recorded. S13 uses a drone equipped with a multispectral camera and a color camera to hover over the rice paddies and take aerial photos of the rice seedlings in the area from a vertical, overhead angle.

3. The method for extracting rice panicles and estimating yield based on semantic segmentation and UAV color and multispectral images according to claim 1, characterized in that: In step S3, to construct the training set, the images need to be preprocessed. The preprocessing process consists of the following two steps: S31, manually labeled rice ear images to construct a dataset of pseudo-color images; S32 divides the pseudo-color images with obvious rice ear characteristics into training, validation and test sets in a ratio of 8:1:

1.

4. The method for extracting rice panicles and estimating yield based on semantic segmentation and UAV color and multispectral images according to claim 3, characterized in that, Step S4 includes: S41 uses two different networks, ResNet and Xception, as the backbone networks of Deeplabv3+. S42, input the images of the rice pseudo-color training set into different models for training, and modify their training parameters; S43: Input the validation set and test set divided in S3 into the trained model to evaluate the performance of the two models; S44. Input the rice pseudo-color image into the Deeplabv3+ model to extract the rice panicles.

5. The method for extracting rice panicles and estimating yield based on semantic segmentation and UAV color and multispectral images according to claim 1, characterized in that, In step S5, evaluation metrics are selected to test the chosen network model. Three metrics are used: average intersection-over-union ratio (AUC), pixel accuracy, and average pixel accuracy. In the formula: Represents the true value. Indicates the predicted value. Indicates will Predicted , Indicates category, Indicates will Predicted , Indicates will Predicted .

6. The method for extracting rice panicles and estimating yield based on semantic segmentation and UAV color and multispectral images according to claim 1, characterized in that, The specific operation for production estimation in step S6 is as follows: S61, calculate the ratio of the pixels belonging to the rice ear category to the total number of pixels in the segmentation result obtained in S4; S62, Select machine vision algorithms, including least squares fitting, random forest, and multilayer perceptron algorithm, and fit the sum of the pixel values ​​of rice ears in the plot extracted based on semantic segmentation with the actual yield data to obtain different yield estimation models. S63 uses cross-validation to verify model performance.

7. A computer device comprising a processor and a memory for storing a processor-executable program, characterized in that, When the processor executes the program stored in the memory, it implements the method for extracting rice panicles and estimating yield based on semantic segmentation and UAV color and multispectral images as described in any one of claims 1-6.

8. A storage medium storing a program, characterized in that, When the program is executed by the processor, it implements the method for extracting rice panicles and estimating yield based on semantic segmentation and UAV color and multispectral images as described in any one of claims 1-7.