A water surface algae plant identification method based on unmanned aerial vehicle images

By combining drone image acquisition with semantic segmentation of water surface areas and algae identification models, the problem of low algae identification rate in traditional methods has been solved, achieving high-precision algae detection and reducing manpower and material consumption.

CN118840680BActive Publication Date: 2026-06-19ZHONGZAI YUNTU TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHONGZAI YUNTU TECH CO LTD
Filing Date
2024-07-15
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Traditional methods for detecting algae on the water surface are difficult to effectively identify plants on both the water and land surfaces, resulting in low identification rates and a high consumption of manpower and resources, especially in large bodies of water where it is difficult to find targets.

Method used

A method for identifying aquatic algae based on UAV images is adopted. By constructing a semantic segmentation model of the water surface region and an aquatic algae identification model, and combining channel and spatial attention mechanisms, background interference is removed to identify aquatic algae.

🎯Benefits of technology

It improves the accuracy of identifying surface algae, reduces missed and false identifications, lowers the consumption of manpower and resources, and is adaptable to surface algae targets of different sizes.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118840680B_ABST
    Figure CN118840680B_ABST
Patent Text Reader

Abstract

This invention relates to the field of image recognition technology and discloses a method for identifying aquatic algae based on UAV images, comprising the following steps: Step S101, acquiring water surface images using a UAV and constructing a water area image dataset; Step S102, training a semantic segmentation model for water surface regions based on the water area image dataset; Step S103, constructing a water surface algae dataset; Step S104, training a water surface algae recognition model based on the water surface algae dataset; Step S105, inputting the background-removed water surface image into the trained water surface algae recognition model and outputting the water surface algae detection results. This invention adds feature receptive fields in both channel and spatial directions to the water surface region semantic segmentation model, thereby improving the recognition accuracy of water areas and reducing background interference. Furthermore, it adds a convolutional layer at the output of the water surface algae recognition model to increase the range of the receptive field, thereby improving the accuracy of algae recognition.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image recognition technology, and more specifically, to a method for identifying algae on the water surface based on drone images. Background Technology

[0002] Due to economic growth and human activities, an increasing number of rivers, lakes, and reservoirs are gradually transitioning from oligotrophic and mesotrophic states to eutrophic states. Accompanying this eutrophication is the proliferation of algae. When the water surface is covered by algae, oxygen deficiency in the lower and middle layers leads to a sharp drop in dissolved oxygen, causing the water to become smelly and black. The excessive growth of algae deteriorates water quality, impacting fisheries and posing risks to human health. Therefore, surface algae detection technology is of great significance for water quality management in rivers, lakes, and reservoirs.

[0003] Traditional methods for cleaning algae from the water surface mainly involve manual, unscheduled patrols of the water surface, followed by retrieval and removal of any identified algae. While simple, this approach is resource-intensive and difficult to implement effectively, especially in large bodies of water where it can be challenging to locate the algae. With the advent of deep learning, many researchers have proposed deep learning-based methods for detecting algae on the water surface. However, due to the complexity of the aquatic environment, traditional detection methods still have limitations, such as the inability to accurately distinguish between aquatic and terrestrial plants, resulting in low algae identification rates. Summary of the Invention

[0004] This invention provides a method for identifying algae on the water surface based on UAV images, thereby solving the technical problems mentioned in the background section.

[0005] This invention provides a method for identifying algae on the water surface based on UAV images, comprising the following steps:

[0006] Step S101: Collect water surface images using a drone and construct a water area image dataset;

[0007] Step S102: Construct a semantic segmentation model for the water surface region and train the semantic segmentation model for the water surface region based on the water area image dataset;

[0008] Step S103: Input the water surface images from the water area image dataset into the trained water surface region semantic segmentation model, output the water surface images with the background removed, and construct the water surface algae dataset.

[0009] Step S104: Construct a water surface algae identification model and train the water surface algae identification model based on the water surface algae dataset.

[0010] Step S105: Input the background-removed water surface image into the trained water surface algae recognition model and output the water surface algae detection results.

[0011] The results of the surface algae test include: the type of surface algae and the anchor frame.

[0012] Furthermore, annotation software is used to annotate the water surface area in the water surface image, and annotation software is used to annotate the algae on the water surface image after removing the background.

[0013] Furthermore, the semantic segmentation model for the water surface region includes: an encoder, an attention module, and a decoder;

[0014] The encoder includes: 5 convolutional blocks, 5 first upsampling layers, and 1 first feature fusion layer;

[0015] Each convolutional block includes: a first convolutional layer, a modified linear unit layer, and a pooling layer, wherein the kernel size and stride of the first convolutional layer are user-defined parameters;

[0016] Five convolutional blocks are input into the water surface image, and five first feature maps are output, each with a different size.

[0017] Five first upsampling layers each take five first feature maps as input and output five second feature maps, with each second feature map being the same size.

[0018] The first feature fusion layer takes five second feature maps as input, extracts the maximum value of each pixel in the five second feature maps, and outputs one third feature map, which is the same size as the second feature map.

[0019] The attention modules include: channel attention module and spatial attention module;

[0020] The channel attention module takes the third feature map as input and outputs the fourth feature map.

[0021] The spatial attention module takes the third feature map as input and outputs the fifth feature map.

[0022] The decoder consists of one second convolutional layer and one second upsampling layer;

[0023] The second feature fusion layer takes the third, fourth, and fifth feature maps as input, and uses them to weight and merge the third, fourth, and fifth feature maps and upsample them to generate the sixth feature map. The sixth feature map is then binarized to set all background values ​​to 0, and the resulting water surface image with the background removed is output.

[0024] Furthermore, the channel attention module includes: a first global max pooling layer, a first global average pooling layer, a first unfolding layer, a first normalization layer, and a first linear layer;

[0025] The first global max pooling layer takes the third feature map as input and outputs the first intermediate feature map.

[0026] The first global average pooling layer takes the third feature map as input and outputs the second intermediate feature map.

[0027] The width and height of the first and second intermediate feature maps are both 1, and the number of channels in the first and second intermediate feature maps are the same as the number of channels in the third feature map.

[0028] The first unfolding layer is used to unfold the first intermediate feature map and the second intermediate feature map into vector representations;

[0029] The first normalization layer is used to normalize the vector representations of the first intermediate feature map and the second intermediate feature map.

[0030] The first linear layer is used to transpose the vector representation of the first intermediate feature map, multiply it by the vector representation of the second intermediate feature map, and then perform a sigmoid activation function to calculate the channel attention weight parameters, which are then multiplied by the third feature map to obtain the fourth feature map.

[0031] Furthermore, the spatial attention module includes: a second global max pooling layer, a second global average pooling layer, a second unfolding layer, a second normalization layer, and a second linear layer;

[0032] The second global max pooling layer takes the third feature map as input and outputs the third intermediate feature map.

[0033] The second global average pooling layer takes the third feature map as input and outputs the fourth intermediate feature map.

[0034] The number of channels in the third and fourth intermediate feature maps is 1, and the width and height of the third and fourth intermediate feature maps are the same as those of the third feature map.

[0035] The second unfolding layer is used to unfold the third and fourth intermediate feature maps into vector representations;

[0036] The second normalization layer is used to normalize the vector representations of the third and fourth intermediate feature maps.

[0037] The second linear layer is used to transpose the vector representation of the third intermediate feature map, multiply it by the vector representation of the fourth intermediate feature map, and then perform a sigmoid activation function to calculate the spatial attention weight parameters, which are then multiplied by the third feature map to obtain the fifth feature map.

[0038] Furthermore, the cross-loss entropy is specified as the loss function for the water surface region semantic segmentation model.

[0039] Furthermore, the water surface algae identification model includes: an anchor box module, a transport connection module, and an object detection module. The anchor box module consists of a VGG16 network and convolutional layers, performing feature extraction, anchor point generation, anchor point refinement, and negative anchor point filtering. It first determines whether there is a target object in the water surface image after removing the background. If so, it generates an anchor box, performs regression prediction to adjust the position and size of the anchor box, and discards erroneous anchor boxes. The transport connection module consists of three convolutional layers and three ReLU activation layers, which deconvolve the features of the next layer. The object detection module includes a classification layer and a regression layer, consisting of 3×3 convolutional layers. The prediction layer output is refined anchor point category and coordinate information, and the final predicted bounding box is selected according to the non-maximum suppression method.

[0040] Furthermore, the loss function calculation formula for the water surface algae identification model includes:

[0041]

[0042] Where, N a and N o p refers to the number of anchor boxes for positive samples in the anchor box module and the target detection module, respectively. i This refers to the confidence level of the predicted anchor box i, x i c represents the coordinates of the predicted bounding box i after refinement by the anchor box module. i t represents the predicted object category of the target. i The bounding box coordinates representing the predicted target. The actual category label representing anchor box i. It is the actual position and size of anchor frame i.

[0043] The beneficial effects of this invention are as follows: Since the water surface images acquired by UAVs are large, with complex backgrounds and small targets, this invention identifies water targets through a lightweight semantic segmentation network and increases the feature receptive field from both channel and spatial directions, thereby improving the recognition accuracy of water targets and further reducing background interference. In addition, the size of algae targets on the water surface varies, and small targets are easily lost during feature extraction. A deep learning network is constructed, and the input layer of the backbone network is set to match the different sizes of algae targets on the water surface. A convolutional layer is added at the output of the backbone network to increase the range of the receptive field. At the same time, it enables high-level features to be better integrated with low-level features, reducing missed recognition and misrecognition. Attached Figure Description

[0044] Figure 1 This is a flowchart of a method for identifying water surface algae based on UAV images according to the present invention;

[0045] Figure 2 This is a schematic diagram of the semantic segmentation model for water surface regions of the present invention;

[0046] Figure 3 This is a schematic diagram of the water surface algae identification model of the present invention. Detailed Implementation

[0047] The subject matter described herein will now be discussed with reference to exemplary embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and implement the subject matter described herein, and changes may be made to the function and arrangement of the elements discussed without departing from the scope of this specification. Various processes or components may be omitted, substituted, or added as needed in the examples. Furthermore, features described in some examples may be combined in other examples.

[0048] It should be noted that, unless otherwise defined, the technical or scientific terms used in one or more embodiments of the present invention should have the ordinary meaning understood by one of ordinary skill in the art to which this invention pertains. The terms "first," "second," and similar terms used in one or more embodiments of the present invention do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Terms such as "comprising" or "including" mean that the element or object preceding the word encompasses the elements or objects listed after the word and their equivalents, without excluding other elements or objects. Terms such as "connected" or "linked" are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. Terms such as "upper," "lower," "left," and "right" are used only to indicate relative positional relationships; when the absolute position of the described object changes, the relative positional relationship may also change accordingly.

[0049] like Figures 1-3 As shown, a method for identifying algae on the water surface based on UAV images includes the following steps:

[0050] Step S101: Collect water surface images using a drone and construct a water area image dataset;

[0051] Step S102: Construct a semantic segmentation model for the water surface region and train the semantic segmentation model for the water surface region based on the water area image dataset;

[0052] Step S103: Input the water surface images from the water area image dataset into the trained water surface region semantic segmentation model, output the water surface images with the background removed, and construct the water surface algae dataset.

[0053] Step S104: Construct a water surface algae identification model and train the water surface algae identification model based on the water surface algae dataset.

[0054] Step S105: Input the background-removed water surface image into the trained water surface algae recognition model and output the water surface algae detection results.

[0055] The results of the surface algae test include: the type of surface algae and the anchor frame.

[0056] Therefore, the water surface images collected by the drone are first input into the water surface region semantic segmentation model to remove the background that is not related to the water surface, and then input into the water surface algae recognition model to output the water surface algae detection results, thereby directly removing the interference of ground plants on the detection of water surface plants.

[0057] In one embodiment of the present invention, the water surface area of ​​the water surface image is labeled using annotation software, and the algae on the water surface of the water surface image after removing the background are labeled using annotation software. The annotation software can be Labelme or LabelImg, for example, the water surface area of ​​the water surface image is labeled using Labelme. A water surface image will generate a separate folder containing the original image file, the label image file and the category name.

[0058] In one embodiment of the present invention, the water surface region semantic segmentation model includes: an encoder, an attention module, and a decoder;

[0059] The encoder includes: 5 convolutional blocks, 5 first upsampling layers, and 1 first feature fusion layer;

[0060] Each convolutional block includes: a first convolutional layer, a ReLU (corrected linear unit) layer, and a pooling layer. The kernel size and stride of the first convolutional layer are custom parameters. Preferably, the kernel size is set to 3*3 and the stride is set to 1.

[0061] Five convolutional blocks are input into the water surface image, and five first feature maps are output, each with a different size.

[0062] Five first upsampling layers each take five first feature maps as input and output five second feature maps, with each second feature map being the same size.

[0063] The first feature fusion layer takes five second feature maps as input, extracts the maximum value of each pixel in the five second feature maps, and outputs one third feature map, which is the same size as the second feature map.

[0064] For example, if the values ​​of the first pixel of the five second feature maps are 1.1, 1.2, 1.3, 1.4 and 1.5 respectively, then the value of the first pixel of the third feature map is 1.5;

[0065] The attention modules include: channel attention module and spatial attention module;

[0066] The channel attention module takes the third feature map as input and outputs the fourth feature map.

[0067] The spatial attention module takes the third feature map as input and outputs the fifth feature map.

[0068] The decoder consists of one second convolutional layer and one second upsampling layer;

[0069] The second feature fusion layer takes the third, fourth, and fifth feature maps as input, and uses them to weight and merge the third, fourth, and fifth feature maps and upsample them to generate the sixth feature map. The sixth feature map is then binarized to set all background values ​​to 0, and the resulting water surface image with the background removed is output.

[0070] It should be noted that a typical neural network has only one local receptive field, making it difficult to capture global information. Therefore, adding spatial attention and channel attention mechanisms to the semantic segmentation model of the water surface region can effectively capture contextual semantic information and improve recognition accuracy.

[0071] In one embodiment of the present invention, the channel attention module includes: a first global max pooling layer, a first global average pooling layer, a first unfolding layer, a first normalization layer, and a first linear layer;

[0072] The first global max pooling layer takes the third feature map as input and outputs the first intermediate feature map.

[0073] The first global average pooling layer takes the third feature map as input and outputs the second intermediate feature map.

[0074] The width and height of the first and second intermediate feature maps are both 1, and the number of channels in the first and second intermediate feature maps are the same as the number of channels in the third feature map.

[0075] The first unfolding layer is used to unfold the first intermediate feature map and the second intermediate feature map into vector representations;

[0076] The first normalization layer is used to normalize the vector representations of the first intermediate feature map and the second intermediate feature map.

[0077] The first linear layer is used to transpose the vector representation of the first intermediate feature map, multiply it by the vector representation of the second intermediate feature map, and then perform a sigmoid activation function to calculate the channel attention weight parameters, which are then multiplied by the third feature map to obtain the fourth feature map.

[0078] For example, if the size of the third feature map is c×h×w, global max pooling and global average pooling are performed to obtain the first and second intermediate feature maps with a size of c×1×1. Then, the first and second intermediate feature maps are expanded into vector representations. After normalizing the vector representations of the first and second intermediate feature maps, the vector representation of the first intermediate feature map is transposed to c×1 and multiplied by the vector representation of the second intermediate feature map (1×c) to obtain the c×c channel attention weight parameters.

[0079] In one embodiment of the present invention, the spatial attention module includes: a second global max pooling layer, a second global average pooling layer, a second unfolding layer, a second normalization layer, and a second linear layer;

[0080] The second global max pooling layer takes the third feature map as input and outputs the third intermediate feature map.

[0081] The second global average pooling layer takes the third feature map as input and outputs the fourth intermediate feature map.

[0082] The number of channels in the third and fourth intermediate feature maps is 1, and the width and height of the third and fourth intermediate feature maps are the same as those of the third feature map.

[0083] The second unfolding layer is used to unfold the third and fourth intermediate feature maps into vector representations;

[0084] The second normalization layer is used to normalize the vector representations of the third and fourth intermediate feature maps.

[0085] The second linear layer is used to transpose the vector representation of the third intermediate feature map, multiply it by the vector representation of the fourth intermediate feature map, and then perform a sigmoid activation function to calculate the spatial attention weight parameters, which are then multiplied by the third feature map to obtain the fifth feature map.

[0086] For example, the size of the third feature map is c×h×w. Global max pooling and global average pooling are performed to obtain the third and fourth intermediate feature maps with 1 channel and h×w respectively. Then, the third and fourth intermediate feature maps are expanded into vector representations, i.e., 1×(h×w). After normalizing the vector representations of the third and fourth intermediate feature maps, the vector representation of the third intermediate feature map is transposed to (h×w)×1, and then multiplied by the vector representation of the fourth intermediate feature map, 1×(h×w), to obtain the spatial attention weight parameters of (h×w)×(h×w).

[0087] In one embodiment of the present invention, the cross-loss entropy is specified as the loss function of the water surface region semantic segmentation model.

[0088] In one embodiment of the present invention, the water surface algae identification model includes: an anchor module, a TCB (transmission connection) module, and an ODM (object detection) module. The anchor module consists of a VGG16 network and convolutional layers, performing feature extraction, anchor point generation, anchor point refinement, and negative anchor point filtering. It first determines whether there is a target object in the water surface image after removing the background. If so, it generates an anchor box, performs regression prediction to adjust the position and size of the anchor box, and discards erroneous anchor boxes. The transmission connection module consists of three convolutional layers and three ReLU activation layers. It deconvolves the features of the next layer to obtain higher resolution, realizes the fusion of high-level and low-level features, and increases the speech information of low-level features. The object detection module includes a classification layer and a regression layer, mainly composed of 3×3 convolutional layers. The output of the prediction layer is to refine the anchor point category and coordinate information, and select the final predicted bounding box according to the non-maximum suppression method.

[0089] In one embodiment of the present invention, the loss function calculation formula for the water surface algae identification model includes:

[0090]

[0091]

[0092] Where, N a and N o p refers to the number of anchor boxes for positive samples in the anchor box module and the target detection module, respectively. i This refers to the confidence level of the predicted anchor box i, x i c represents the coordinates of the predicted bounding box i after refinement by the anchor box module. i t represents the predicted object category of the target. i The bounding box coordinates representing the predicted target. The actual category label representing anchor box i. It is the actual position and size of anchor frame i.

[0093] The embodiments of this example have been described above. However, this example is not limited to the specific implementation methods described above. The specific implementation methods described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms based on the guidance of this example, and all of them are within the protection scope of this example.

Claims

1. A method for identifying algae on the water surface based on UAV images, characterized in that, Includes the following steps: Step S101: Collect water surface images using a drone and construct a water area image dataset; Step S102: Construct a semantic segmentation model for the water surface region and train the semantic segmentation model for the water surface region based on the water area image dataset; Step S103: Input the water surface images from the water area image dataset into the trained water surface region semantic segmentation model, output the water surface images with the background removed, and construct the water surface algae dataset. Step S104: Construct a water surface algae identification model and train the water surface algae identification model based on the water surface algae dataset; Step S105: Input the background-removed water surface image into the trained water surface algae recognition model and output the water surface algae detection results. The results of the surface algae detection include: the type of surface algae and the anchor frame; The semantic segmentation model for water surface regions includes: an encoder, an attention module, and a decoder; The encoder includes: 5 convolutional blocks, 5 first upsampling layers, and 1 first feature fusion layer; Each convolutional block includes: a first convolutional layer, a modified linear unit layer, and a pooling layer, wherein the kernel size and stride of the first convolutional layer are user-defined parameters; Five convolutional blocks are input into the water surface image, and five first feature maps are output, each with a different size. Five first upsampling layers each take five first feature maps as input and output five second feature maps, with each second feature map being the same size. The first feature fusion layer takes five second feature maps as input, extracts the maximum value of each pixel in the five second feature maps, and outputs one third feature map, which is the same size as the second feature map. The attention modules include: channel attention module and spatial attention module; The channel attention module takes the third feature map as input and outputs the fourth feature map. The spatial attention module takes the third feature map as input and outputs the fifth feature map. The decoder consists of one second convolutional layer and one second upsampling layer; The second feature fusion layer takes the third, fourth, and fifth feature maps as input, and uses them to weight and merge the third, fourth, and fifth feature maps and upsample them to generate the sixth feature map. The sixth feature map is then binarized to set all background values ​​to 0, and the resulting water surface image with the background removed is output. The channel attention module includes: a first global max pooling layer, a first global average pooling layer, a first unfolding layer, a first normalization layer, and a first linear layer; The first linear layer is used to transpose the vector representation of the first intermediate feature map, multiply it by the vector representation of the second intermediate feature map, and calculate the channel attention weight parameters by performing the sigmoid activation function, and then multiply it by the third feature map to obtain the fourth feature map. The spatial attention module includes: a second global max pooling layer, a second global average pooling layer, a second unfolding layer, a second normalization layer, and a second linear layer; The second linear layer is used to transpose the vector representation of the third intermediate feature map, multiply it by the vector representation of the fourth intermediate feature map, and then perform a sigmoid activation function to calculate the spatial attention weight parameters, which are then multiplied by the third feature map to obtain the fifth feature map. The water surface algae identification model includes an anchor box module, a transport connection module, and an object detection module. The anchor box module consists of a VGG16 network and convolutional layers, performing feature extraction, anchor point generation, anchor point refinement, and negative anchor point filtering. It first determines whether the target object exists in the water surface image after removing the background. If so, it generates an anchor box, performs regression prediction to adjust the position and size of the anchor box, and discards erroneous anchor boxes. The transport connection module consists of three convolutional layers and three ReLU activation layers, which deconvolve the features of the next layer. The object detection module includes a classification layer and a regression layer, consisting of 3×3 convolutional layers. The prediction layer output is refined anchor point category and coordinate information, and the final predicted bounding box is selected according to the non-maximum suppression method. 2.The water surface algae identification method based on UAV images according to claim 1, characterized in that, The water surface area of ​​the water surface image is labeled using annotation software, and the algae on the water surface of the water surface image after removing the background are also labeled using annotation software. 3.The water surface algae identification method based on UAV images of claim 1, wherein, The first global max pooling layer takes the third feature map as input and outputs the first intermediate feature map. The first global average pooling layer takes the third feature map as input and outputs the second intermediate feature map. The width and height of the first and second intermediate feature maps are both 1, and the number of channels in the first and second intermediate feature maps are the same as the number of channels in the third feature map. The first unfolding layer is used to unfold the first intermediate feature map and the second intermediate feature map into vector representations; The first normalization layer is used to normalize the vector representations of the first intermediate feature map and the second intermediate feature map. 4.The water surface algae identification method based on UAV images of claim 1, wherein, The second global max pooling layer takes the third feature map as input and outputs the third intermediate feature map. The second global average pooling layer takes the third feature map as input and outputs the fourth intermediate feature map. The number of channels in the third and fourth intermediate feature maps is 1, and the width and height of the third and fourth intermediate feature maps are the same as those of the third feature map. The second unfolding layer is used to unfold the third and fourth intermediate feature maps into vector representations; The second normalization layer is used to normalize the vector representations of the third and fourth intermediate feature maps. 5.The water surface algae identification method based on UAV images of claim 1, wherein, The cross-loss entropy is specified as the loss function for the semantic segmentation model of the water surface region. 6.The water surface algae identification method based on UAV images of claim 5, wherein, The formula for calculating the loss function of the water surface algae identification model includes: in, and These refer to the number of anchor boxes for positive samples in the anchor box module and the target detection module, respectively. Refers to the predicted anchor frame Confidence level, The predicted bounding box after refinement of the anchor box module represents the prediction box. coordinates Represents the predicted object category. The bounding box coordinates representing the predicted target. Represents anchor frame The true category label, It is an anchor frame The actual location and size.

Citation Information

Patent Citations

  • Blue algae recognition and analysis system based on unmanned plane remote sensing data

    CN107527037A

  • Water surface object detection and classification method based on visual saliency

    CN112417931A