A lotus phenotype identification method and device based on a pseudo-label algorithm and a MobileNetV2 network

By combining a pseudo-label algorithm with the MobileNetV2 network, the problem of lotus phenotypic recognition was solved, achieving efficient and low-cost lotus phenotypic recognition and improving the model's accuracy and generalization ability.

CN117953281BActive Publication Date: 2026-06-26NANJING AGRICULTURAL UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NANJING AGRICULTURAL UNIVERSITY
Filing Date
2024-01-19
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

There is a lack of effective methods for lotus phenotypic recognition in the existing technology, especially since there are many types of lotus and the sub-types have high similarity, which makes recognition difficult.

Method used

We adopted a method based on pseudo-labeling algorithm and MobileNetV2 network. By constructing a grid model, we used SE attention mechanism and pseudo-labeling algorithm to pseudo-label unlabeled lotus data. Combined with Adam optimizer for training, we improved the accuracy and generalization ability of the model.

Benefits of technology

It achieves efficient identification of lotus phenotypes, reduces the cost of manual annotation, improves the accuracy and generalization ability of the model, and reduces the computational load.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117953281B_ABST
    Figure CN117953281B_ABST
Patent Text Reader

Abstract

The application discloses a lotus phenotype identification method and device based on a pseudo-label algorithm and a MobileNetV2 network, and the method comprises the following steps: step one, a grid model is constructed, the model selects the MobileNetV2 network as a feature extraction network for lotus identification, applies an SE attention mechanism to a feature processing unit of the MobileNetV2, and simultaneously uses a pseudo-label algorithm to perform pseudo-labeling on unlabeled lotus data; step two, a model is trained, a training set in a lotus data set is used to pre-train a model to initialize the MobileNetV2 feature extraction network, then the model obtained through pre-training is used to predict the unlabeled data, the minimum entropy, i.e. the highest confidence, is selected to perform pseudo-labeling on the lotus data, and finally all the labeled data is retrained to obtain an optimal model; and step three, the model after training is used to identify lotus phenotypes. The application can improve the expression and generalization capabilities of the model, reduce the labeling amount of a large data set, and be more suitable for various unbalanced and complex data distributions.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the fields of computers and smart agriculture, and specifically relates to a method and device for lotus phenotypic recognition based on a pseudo-label algorithm and MobileNetV2 network. Background Technology

[0002] With the continuous development of deep learning technology, image classification methods based on convolutional neural networks (CNN) have been gradually applied to various phenotypic recognition. However, due to the large number of lotus categories and the high similarity between different subcategories, recognition is difficult, and there is currently no product specifically designed for lotus phenotypic recognition.

[0003] Currently, there are image classification and recognition technologies based on convolutional neural networks on the market, but there has been no extensive research in the field of lotus phenotypic recognition. Therefore, there is an urgent need to propose a new method for lotus phenotypic recognition. Summary of the Invention

[0004] The purpose of this invention is to provide a method and apparatus for lotus phenotype recognition based on a pseudo-label algorithm and MobileNetV2 network, which can train a lightweight model based on a lotus phenotype dataset and recognize different types of lotus phenotype images.

[0005] To achieve the above objectives, the technical solution adopted by the present invention is as follows:

[0006] A lotus phenotypic recognition method based on a pseudo-label algorithm and MobileNetV2 network includes the following steps:

[0007] Step 1: Construct a grid model. This model selects the MobileNetV2 network as the feature extraction network for lotus flower recognition and applies the SE attention mechanism to the feature processing unit of MobileNetV2, so that the network can more comprehensively consider the complex relationships between features. At the same time, it also uses a pseudo-label algorithm to pseudo-label the unlabeled lotus flower data, reducing the need for labeling the lotus flower dataset.

[0008] Step 2: Train the model. Use the training set in the lotus dataset to pre-train the model to initialize the MobileNetV2 feature extraction network. Then, use the pre-trained model to predict the unlabeled data. Select the minimum entropy, i.e. the highest confidence, to pseudo-label the lotus data. Finally, retrain all the labeled data to obtain the optimal model.

[0009] Step 3: Recognize the lotus phenotype using the trained model.

[0010] Furthermore, in step two, the Adam algorithm is used as the optimizer, and the pre-training process is as follows:

[0011] (1) Initialize the MobileNetV2 feature extraction network using the pre-trained model from the training set in the lotus dataset;

[0012] (2) Obtain training samples from the dataset and preprocessed images from the input pipeline;

[0013] (3) Input the batch training samples obtained in step (2) into the network model, pass through the feature extraction network fused with the SE module and the fully connected layer, and finally calculate the probability of each category through softmax;

[0014] (4) Calculate the loss value of the network model using the category cross-entropy loss function;

[0015] (5) By calculating the gradient value, using the Adam optimizer, setting the initial learning rate, backpropagating the error back to the entire network, and updating the parameters of the fully connected layer;

[0016] (6) Determine whether the specified number of iterations has been reached. If yes, the network is considered to have converged, and proceed to step (7). Otherwise, re-enter step (2).

[0017] (7) Use the above pre-trained model to predict unlabeled data;

[0018] (8) Obtain training samples from the dataset and preprocessed images from the input pipeline;

[0019] (9) Input the batch training samples obtained in step (8) into the pre-trained network model, pass through the feature extraction network fused with the SE module and the fully connected layer, and finally calculate the probability of each category through softmax.

[0020] (10) Calculate the loss value of the network model using the category cross-entropy loss function;

[0021] (11) By calculating the gradient value, using the Adam optimizer, setting the initial learning rate, backpropagating the error back to the entire network, and updating the parameters of each layer of the network.

[0022] (12) Determine whether the minimum entropy condition has been met. If it has been met and the number of iterations has not been reached, then label the data with pseudo-labels and proceed to step (8). If the number of iterations has been reached, then proceed to step (13).

[0023] (13) Calculate the accuracy, precision, recall and F1 score of the final network model using the test set.

[0024] Furthermore, the feature extraction network uses the SE attention module and seven different inverted residual bottleneck layers to perform attention-enhanced feature processing on the extracted preliminary feature maps, and then obtains a new feature space representation based on the final fully connected layer of the model.

[0025] Furthermore, the feature extraction network incorporates an inverse residual structure, first employing 1×1 convolution for dimensionality upscaling, then using 3×3 depthwise separable convolution for feature extraction, and finally using 1×1 convolution for dimensionality downscaling.

[0026] Furthermore, the SE attention mechanism is placed before and after the inverted residual bottleneck layer. After preprocessing the input lotus data, the SE module adaptively learns the importance of each channel and adjusts the feature representation on the feature map by weighting, thereby enhancing the model's expressive ability in specific tasks. The SE module mainly recalibrates the previously obtained features through three operations: compression, activation, and reweighting.

[0027] Furthermore, the compression involves fusing the feature information of each channel using global average pooling, resulting in an output dimension of 1×1×C. Global average pooling directly compresses the W×H×C feature map containing global information into a single 1×1×C feature vector, where the channel features of the C feature maps are compressed into a single value; as shown in the following formula:

[0028]

[0029] In the formula, U represents the input feature vector, Z represents the output feature vector, H represents the height of U, and W represents the width of U. c (i,j) represents the feature map U c The pixel in the i-th row and j-th column;

[0030] The excitation involves using the compressed global features to predict the importance of each channel through a fully connected layer. First, a fully connected (FC) dimensionality reduction layer with a reduction ratio of r is used for channel compression. Then, the ReLU function is applied, followed by an FC upscaling layer to increase the dimensionality of the previously compressed channels. Finally, the output s is obtained through a sigmoid function, as shown in the following equation:

[0031] s = F ex (z,W)=δ(g(z,W))=δ(W2δ(W1z))

[0032] In the formula, σ is the ReLU function, and W1 and W2 are two fully connected layers.

[0033] The reweighting involves taking the weights of the excitation output as the importance of each feature channel after feature selection, and then progressively weighting each channel onto the previous features through multiplication, thus completing the recalibration of the original features in the channel dimension; the implementation method is as follows:

[0034]

[0035] in and F scale (u c ,s c ) is the index sc and the feature map u c ∈R H×W Channel multiplication between them.

[0036] Furthermore, the pseudo-label algorithm uses the labels of unlabeled samples as variables and leverages the labels of labeled samples to propagate information, thereby predicting the labels of unlabeled samples. It achieves semi-supervised learning by minimizing the entropy during the label propagation process. During label propagation, the labels of labeled samples are propagated to unlabeled samples and adjusted according to similarity metrics until convergence is achieved. By minimizing entropy, the accuracy of the label propagation algorithm in predicting the labels of unlabeled samples is improved.

[0037] Furthermore, in the pseudo-label algorithm, the weight of the pseudo-label part is adjusted using α(t), as shown in the following formula:

[0038]

[0039] Where n is the number of labeled samples, and n′ is the number of unlabeled samples. and f i m These are the true values ​​and class probabilities of the m-class labeled samples, respectively. and f i ′ m These are the pseudo-labels and output classes of the m unlabeled samples, respectively, and α(t) is the coefficient between the true label and the pseudo-label.

[0040] Furthermore, the specific steps of the pseudo-label algorithm are as follows: first, input the unlabeled lotus image into the network, and use a pre-trained model to predict it. At this point, the prediction can be obtained at the end of the network, as shown in the following formula:

[0041]

[0042] Where X t For input data, E s For pre-trained models;

[0043] Next, the prediction results are divided into K sets according to their categories, as shown in the following formula:

[0044]

[0045] The sample information gain is estimated using the BALD method with MCDropout, as shown in the following equation:

[0046]

[0047] Assume category i has n i We take a sample and sort it from highest to lowest according to the sum of confidence and information gain. The top β samples are taken as reliable samples, as shown in the following formula:

[0048]

[0049]

[0050] Where |·| represents the number of elements, and [·] represents the floor operation;

[0051] After traversing K categories, the pseudo-labels of the samples with high confidence and high probability are obtained as shown in the following formula:

[0052]

[0053] Finally, these label information are added to the images to obtain a lotus dataset with pseudo-labels. As training progresses, the pseudo-labels of the target samples generated by the model classifier become more and more accurate. At the same threshold β, β is continuously increased with each training epoch to obtain more high-quality samples. The relationship between the threshold β and the training epoch is as follows:

[0054] β = min(β0 + α*epoch, 1)

[0055] Where β0 and α are constants, representing the initial threshold and the growth rate of the number of pseudo-labels, respectively.

[0056] A lotus phenotypic recognition device based on a pseudo-label algorithm and a MobileNetV2 network, comprising:

[0057] The lotus image processing module is responsible for performing preliminary processing on the input lotus image to extract basic features from the image;

[0058] The feature processing module is used to further process the input feature map to improve its expressive power. This module uses the SE attention module and 7 different inverted residual bottleneck layers to perform attention-enhanced feature processing on the extracted preliminary feature map, and then obtains a new feature space representation based on the final fully connected layer of the model.

[0059] The classification processing module consists of three parts: a global average pooling layer, a dropout layer, and a fully connected layer. It is used to perform linear transformation on the feature map output by the feature processing module to obtain the output, thereby identifying and classifying lotus species.

[0060] Compared with the prior art, the beneficial effects of the present invention are as follows:

[0061] (1) Using attention mechanisms to embed lightweight networks helps the network learn the relationships between features, which to some extent makes up for the information loss caused by the simplification of network design, thereby improving the accuracy and generalization ability of the network.

[0062] (2) The high-performance MobileNetV2 feature extraction network is used to extract features from lotus images, which greatly reduces the number of parameters and computational load.

[0063] (3) Use a pseudo-labeling strategy based on entropy minimization to pseudo-label the unlabeled lotus data, reduce the cost of manual labeling, and improve the generalization ability and performance of the model. Attached Figure Description

[0064] Figure 1 It is a lotus phenotype recognition framework based on pseudo-label algorithm and MobileNetV2.

[0065] Figure 2 This is a preprocessing flowchart.

[0066] Figure 3 This is a diagram showing the difference between ordinary residual structures and inverted residual structures.

[0067] Figure 4 This is the SE module structure diagram.

[0068] Figure 5 This is a diagram of the pseudo-label training process. Detailed Implementation

[0069] The present invention will now be described in detail with reference to the accompanying drawings and specific embodiments.

[0070] I. Network Model

[0071] This patent selects the feature extraction network for lotus flower recognition from the MobileNetV2 network and applies the SE (Squeeze-and-Excitation) attention mechanism, namely SEblock, to the feature processing unit of MobileNetV2. This allows the network to consider the complex relationships between features more comprehensively, improve the expressive power of features, and use a semi-supervised learning method to perform pseudo-labeling on unlabeled lotus flower data, reducing the need for labeling the lotus flower dataset, thereby effectively improving model performance. Figure 1The diagram shows the structure of the lotus phenotypic analysis framework based on the pseudo-label algorithm and MobileNetV2 constructed in this patent. After the image is input into the network model, it first undergoes preprocessing processes including centering, normalization, random cropping, and random horizontal flipping. Then, the feature vectors extracted from the image are obtained through the SE module and the MobileNetV2 feature extraction network. Finally, a fully connected layer maps the global feature vectors to a new feature space to fully represent the information of each category. Using the trained model, predictions are made on unlabeled data. The lotus data is pseudo-labeled using the minimum entropy, i.e., the highest confidence. Finally, all labeled data is retrained to obtain the optimal model. This invention involves 94 lotus categories, including Red Lotus, Golden Autumn, Emerald Cloud, Jade Robe, Embroidered Uniform Guard, Thousand-Petal Lotus, Moling Autumn Colors, and Golden Butterfly.

[0072] 1. Lotus Image Processing Module

[0073] The image processing module is primarily responsible for the initial processing of the input lotus images to extract basic features. The lotus images are resized to 224×224 to ensure all images have the same size. Standardization is used to ensure that lotus images of different varieties have similar sizes and ranges. This is achieved by subtracting the mean from the pixel values ​​of the lotus images and then dividing by the standard deviation, eliminating differences in brightness and contrast between images and making the model more sensitive to image features. Image enhancement is achieved through random cropping, rotation, flipping, and adjustments to brightness and contrast, which helps the trained model generalize better and improves its robustness. The preprocessing flow is as follows: Figure 2 As shown.

[0074] 2. Feature Extraction Network

[0075] The feature processing module is primarily responsible for further processing the input feature maps to enhance their expressive power. This module mainly utilizes the SE attention module and seven different inverted residual bottleneck layers to perform attention-enhanced feature processing on the extracted preliminary feature maps, and then obtains a new feature space representation based on the final fully connected layer of the model. The main function of this part is to change the number of channels in the feature maps and introduce non-linear transformations. Since each channel is independently convolved with other channels, the convolutional layers can capture the interrelationships between channels, introducing stronger feature expressive power.

[0076] 2.1 Inverted Residual Bottleneck Layer

[0077] Unlike typical residual structures that first reduce dimensionality and then increase it, this model's residual structure employs a reverse operation: first increasing dimensionality and then expanding. Since depthwise convolution cannot change the number of channels, the output is simply the number of channels. If the input channels are few, depthwise convolution is only suitable for lower-dimensional spaces, resulting in low efficiency; therefore, channel expansion is necessary. MobileNetV2's structure expands the channels first, then performs convolution and dimensionality reduction, as shown in the diagram. Figure 3 The inverse residual structure shown first uses 1×1 convolution to increase dimensionality, then uses 3×3 depthwise separable convolution for feature extraction, and finally uses 1×1 convolution for dimensionality reduction.

[0078] 2.2 SE Attention Module

[0079] In the feature processing module of this model, the SE attention mechanism is placed before and after the inverted residual bottleneck layer. After preprocessing the input lotus data, the SE module adaptively learns the importance of each channel and adjusts the feature representation on the feature map by weighting, thereby enhancing the model's expressive ability in specific tasks. The SE module mainly recalibrates the previously obtained features through three operations: compression (Squeeze), excitation, and reweighting (Scale). Figure 4 This describes the structure of the SE module.

[0080] Compression refers to transforming the two-dimensional features of each channel into a single entity. It can be understood as fusing the feature information of each channel, which is achieved using global average pooling, with an output dimension of 1×1×C. Figure 2 China F sq Global average pooling directly compresses the W×H×C feature map containing global information into a 1×1×C feature vector. The channel features of the C feature maps are compressed into a single value, which makes the generated channel-level statistical data Z contain contextual information and alleviates the channel dependency problem, as shown in formula (1).

[0081]

[0082] In the formula, U represents the input feature vector, Z represents the output feature vector, H represents the height of U, and W represents the width of U. c (i,j) represents the pixel in the i-th row and j-th column of feature map Uc.

[0083] The excitation refers to predicting the importance of each channel by passing the compressed global features through a fully connected layer. In order to reduce the number of channels and thus reduce the amount of computation, the channel is first compressed by an FC dimensionality reduction layer with a dimensionality reduction ratio of r, which improves the computational efficiency of the model. Then, the ReLU function is used, and then the previously compressed channels are increased in dimensionality in an FC dimensionality increase layer. Finally, the output result s is obtained by passing the Sigmoid function, as shown in Equation (2).

[0084] s = F ex (z,W)=δ(g(z,W))=δ(W2δ(W1z)) (2)

[0085] In the formula, σ is the ReLU function, and W1 and W2 are two fully connected layers.

[0086] Reweighting refers to treating the output weights of the stimulus as the importance of each feature channel after feature selection, and then progressively weighting the channels to the previous features through multiplication, thus completing the recalibration of the original features in the channel dimension. The implementation method is shown in formula (3).

[0087]

[0088] in and F scale (u c ,s c ) is the index sc and the feature map u c ∈R H×W Channel multiplication between them.

[0089] 3. Classification and processing module

[0090] The classification processing module consists of three parts: a global average pooling layer, a dropout layer, and a fully connected linear layer. It is responsible for performing linear transformations on the feature maps output by the feature processing module to obtain the output, thereby identifying and classifying lotus species.

[0091] Table 1 shows the final network structure parameters of the MobileNetV2-SE model in this paper, where t is the expansion factor, representing the dimensionality increase factor of the 1×1 convolution in the inverse residual structure; c is the depth of the output feature matrix (channel); n is the number of times the bottleneck is repeated; s represents the stride, but only represents the stride of the DW convolution in the first bottleneck, and s is equal to 1 for subsequent repeated bottlenecks.

[0092] Table 1. Network parameters of the MobileNetV2-SE lotus classification model

[0093]

[0094] II. Pseudo-label algorithm

[0095] Pseudo-labeling algorithms are commonly used in projects with large datasets to assign class labels. This algorithm uses the labels of unlabeled samples as variables and leverages the labels of labeled samples to propagate information, thereby predicting the labels of unlabeled samples. It achieves semi-supervised learning by minimizing the entropy during label propagation. During label propagation, the labels of labeled samples are propagated to unlabeled samples and adjusted according to similarity metrics until convergence is achieved. By minimizing entropy, the label propagation algorithm can improve the accuracy of label prediction for unlabeled samples. Since the number of pseudo-labeled and labeled samples may be imbalanced, this project addresses this imbalance by designing an overall loss function. The weight of the pseudo-label component is adjusted using α(t), as shown in Equation (4).

[0096]

[0097] Where n is the number of labeled samples, and n′ is the number of unlabeled samples. and f i m These are the true values ​​and class probabilities of the m-class labeled samples, respectively. and f i ′ m These are the pseudo-labels and output classes of the m unlabeled samples, respectively, and α(t) is the coefficient between the true label and the pseudo-label.

[0098] The main steps of the pseudo-label algorithm in this model are as follows: First, train the MobileNetV2-SE network using labeled lotus data to obtain a pre-trained model. Then, for unlabeled lotus samples, use the pre-trained model to make predictions, select high-confidence predictions as pseudo-labels, and add them to the training set to retrain the model. If the iteration condition is not met, replace m with model M and repeat the above process until the model performance no longer improves. The process is as follows: Figure 5 .

[0099] Since high-confidence pseudo-labels are not necessarily correct, and the quality of pseudo-labels is closely related to the calibration error of the model, this study chooses to use high-confidence, low-information-gain predicted labels as pseudo-labels. The specific steps are as follows: first, input the unlabeled lotus image into the network, and then use the pre-trained model to predict it. At this point, the prediction can be obtained at the end of the network, as shown in formula (5):

[0100]

[0101] Where X t For input data, E s This is a pre-trained model.

[0102] Next, the prediction results are divided into K sets according to their categories, as shown in formula (6):

[0103]

[0104] The sample information gain is estimated using the BALD method with MCDropout, as shown in formula (7):

[0105]

[0106] Assume category i has n i The samples are sorted in descending order according to the sum of confidence and information gain, and the top β samples are taken as reliable samples, as shown in formulas (8) and (9):

[0107]

[0108]

[0109] Where |·| represents the number of elements, and [·] represents the floor operation.

[0110] After traversing K categories, the pseudo-labels of the samples with high confidence and high probability are obtained as shown in formula (10):

[0111]

[0112] Finally, these label information are added to the images to obtain the lotus dataset with pseudo-labels. As training progresses, the pseudo-labels of the target samples generated by the model classifier become more and more accurate. Under the same threshold β, the number of samples that pass the screening increases, so β ​​needs to be continuously increased as the training cycle progresses to obtain more high-quality samples. The relationship between the threshold β and the training cycle epoch is shown in formula (11).

[0113] β=min(β0+α*epoch,1) (11)

[0114] Where β0 and α are constants, representing the initial threshold and the growth rate of the number of pseudo-labels, respectively.

[0115] When using a pre-trained model for pseudo-label training, the choice of optimizer is also crucial. In the fine-grained lotus phenotypic recognition model based on pseudo-label algorithms and MobileNetV2, the Adaptive Moment Estimation (Adam) algorithm is used as the optimizer instead of other adaptive learning rate algorithms. This is because the Adam optimizer can adaptively adjust the learning rate for each parameter, calculating an appropriate learning rate based on the historical gradient of each parameter. This allows the Adam optimizer to better adapt to different gradient conditions for different parameters, thereby improving training performance. The overall training process is as follows:

[0116] (1) Construct a lotus phenotype recognition model based on pseudo-label algorithm and SE module, which includes MobileNetV2 as feature extraction network.

[0117] (2) Initialize the MobileNetV2 feature extraction network using the pre-trained model from the training set in the Lotus dataset.

[0118] (3) Obtain 16 training samples from the input pipeline that are obtained from the dataset and have undergone image preprocessing. The image size is 224*224.

[0119] (4) Input the batch training samples obtained in (3) into the network model, pass through the feature extraction network fused with the SE module and the fully connected layer, and finally calculate the probability of each category through softmax.

[0120] (5) Use the category cross-entropy loss function to calculate the loss value of the network model.

[0121] (6) By calculating the gradient value, using the Adam optimizer, setting the initial learning rate to 0.0009, the error is backpropagated back to the entire network to update the parameters of the fully connected layer.

[0122] (7) Determine whether the specified number of iterations of 100 has been reached. If yes, the network is considered to have converged, and proceed to step (8). Otherwise, re-enter step (3).

[0123] (8) Use the pre-trained model described above to make predictions on unlabeled data.

[0124] (9) Obtain 16 training samples from the input pipeline that are obtained from the dataset and have undergone image preprocessing. The image size is 224*224.

[0125] (10) Input the batch training samples obtained in (9) into the pre-trained network model, pass through the feature extraction network fused with the SE module and the fully connected layer, and finally calculate the probability of each category through softmax.

[0126] (11) Use the category cross-entropy loss function to calculate the loss value of the network model.

[0127] (12) By calculating the gradient value, using the Adam optimizer, setting the initial learning rate to 0.0009, the error is backpropagated back to the entire network to update the parameters of each layer of the network.

[0128] (13) Determine whether the minimum entropy condition has been reached. If it has and the iteration has not reached 100, then label the data with pseudo-labels and proceed to step (9). If the iteration count reaches 100, then proceed to step (14).

[0129] (14) Calculate the accuracy, precision, recall and F1 score of the final network model using the test set.

[0130] III. Environment Setup

[0131] The hardware and software parameters of the experimental environment are shown in Table 2.

[0132] Table 2 Experiment environment software and hardware parameters

[0133]

[0134] Finally, this invention compares the accuracy, precision, recall, and F1 score of the MobileNetV2 model without the SE module and pseudo-label algorithm, the MobileNetV2-2SE model with only the SE module, the MobileNetV2-PL model with only the pseudo-label algorithm, and the MobileNetV2-2SE-PL model with both the SE module and pseudo-label algorithm on the Lotus Fine-Grained Phenotyping Dataset, as shown in Table 3.

[0135] Table 3 Results

[0136]

[0137] As can be seen from the table, the fine-grained lotus phenotypic recognition model using MobileNetV2-2SE-PL proposed in this invention has the best performance, achieving an accuracy of 0.98, a macro precision of 0.98, a macro recall of 0.98, and a macro F1 score of 0.98. Moreover, it is about 2% to 8% higher than other methods in all indicators.

[0138] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the above embodiments do not limit the scope of protection of the present invention in any way, and all technical solutions obtained by equivalent substitution or other means fall within the scope of protection of the present invention. Parts not covered in this invention are the same as or can be implemented using existing technology.

Claims

1. A lotus phenotypic recognition method based on a pseudo-label algorithm and MobileNetV2 network, characterized in that... Includes the following steps: Step 1: Construct a grid model. This model selects the MobileNetV2 network as the feature extraction network for lotus flower recognition and applies the SE attention mechanism to the feature processing unit of MobileNetV2, making the network more comprehensively consider the complex relationships between features. At the same time, a pseudo-labeling algorithm is used to pseudo-label the unlabeled lotus flower data, reducing the need for labeling the lotus flower dataset. The feature extraction network uses the SE attention module and 7 different inverted residual bottleneck layers to perform attention-enhanced feature processing on the extracted preliminary feature maps, and then obtains a new feature space representation based on the final fully connected layer of the model. In the pseudo-label algorithm, the weights of the pseudo-label portion use... To make adjustments, as shown in the following formula: Where n is the number of labeled samples, It is the number of unlabeled samples. and They are The true value and class probability of a class-labeled sample. and They are Pseudo-labels and output classes for unlabeled samples. It is the coefficient between real tags and pseudo tags; The specific steps of the pseudo-label algorithm are as follows: First, input the unlabeled lotus image into the network, and use a pre-trained model to predict it. At this point, the prediction can be obtained at the end of the network, as shown in the following formula: in For input data, For pre-trained models; Next, the prediction results are divided into K sets according to their categories, as shown in the following formula: The sample information gain is estimated using the BALD method with MCDropout, as shown in the following equation: Assume category i has These samples are sorted from highest to lowest according to the sum of their confidence score and information gain, and the top samples are selected. The sample is a reliable sample, as shown in the following formula: in Indicates the number of elements. Indicates the integer operation; After traversing K categories, the pseudo-labels of the samples with high confidence and high probability are obtained as shown in the following formula: Finally, this label information is added to the images to obtain a lotus dataset with pseudo-labels. As training progresses, the pseudo-labels of the target samples generated by the model classifier become increasingly accurate, reaching the same threshold. The number of cases increases continuously as the training cycle progresses. To obtain more high-quality samples, threshold The relationship between training cycles (epochs) and the training period is as follows: in and Both are constants, representing the initial threshold and the growth rate of the number of pseudo-labels, respectively. Step 2: Train the model. Use the training set in the lotus dataset to pre-train the model to initialize the MobileNetV2 feature extraction network. Then, use the pre-trained model to predict the unlabeled data. Select the minimum entropy, i.e. the highest confidence, to pseudo-label the lotus data. Finally, retrain all the labeled data to obtain the optimal model. Step 3: Recognize the lotus phenotype using the trained model.

2. The lotus phenotypic recognition method based on pseudo-label algorithm and MobileNetV2 network according to claim 1, characterized in that, In step two, the Adam algorithm is used as the optimizer, and the pre-training process is as follows: (1) Initialize the MobileNetV2 feature extraction network using the pre-trained model from the training set in the Lotus dataset; (2) Obtain training samples from the input pipeline that have been obtained from the dataset and preprocessed by images; (3) Input the batch training samples obtained in step (2) into the network model, pass through the feature extraction network fused with the SE module and the fully connected layer, and finally calculate the probability of each category through softmax; (4) Calculate the loss value of the network model using the category cross-entropy loss function; (5) By calculating the gradient value, using the Adam optimizer, setting the initial learning rate, backpropagating the error back to the entire network, and updating the parameters of the fully connected layer; (6) Determine whether the specified number of iterations has been reached. If yes, the network is considered to have converged, and proceed to step (7). Otherwise, re-enter step (2). (7) Use the above pre-trained model to predict unlabeled data; (8) Obtain training samples from the input pipeline that have been obtained from the dataset and preprocessed for images; (9) Input the batch training samples obtained in step (8) into the pre-trained network model, pass through the feature extraction network fused with the SE module and the fully connected layer, and finally calculate the probability of each category through softmax; (10) Calculate the loss value of the network model using the category cross-entropy loss function; (11) By calculating the gradient value, using the Adam optimizer, setting the initial learning rate, backpropagating the error back to the entire network, and updating the parameters of each layer of the network; (12) Determine whether the minimum entropy condition has been met. If it has been met and the number of iterations has not been reached, then label the data with pseudo-labels and proceed to step (8). If the number of iterations has been reached, then proceed to step (13). (13) Calculate the accuracy, precision, recall and F1 score of the final network model using the test set.

3. The lotus phenotypic recognition method based on pseudo-label algorithm and MobileNetV2 network according to claim 1, characterized in that, The feature extraction network incorporates an inverse residual structure, and firstly employs... Convolution to increase dimensionality, then use Depthwise separable convolutions are used for feature extraction, and finally... Convolution is used for dimensionality reduction.

4. The lotus phenotypic recognition method based on pseudo-label algorithm and MobileNetV2 network according to claim 1, characterized in that, The SE attention mechanism is placed before and after the inverted residual bottleneck layer. After preprocessing the input lotus data, the SE module adaptively learns the importance of each channel and adjusts the feature representation on the feature map by weighting, thereby enhancing the model's expressive ability in specific tasks. The SE module mainly recalibrates the previously obtained features through three operations: compression, excitation, and reweighting.

5. The lotus phenotypic recognition method based on pseudo-label algorithm and MobileNetV2 network according to claim 4, characterized in that, The compression involves fusing the feature information from each channel using global average pooling, resulting in an output dimension of [dimensionality missing]. Global average pooling will include global information. The feature map is directly compressed into one The feature vectors of C feature maps are compressed into a single value; as shown in the following formula: In the formula Represents the input feature vector. This represents the output feature vector. represent height, represent width, Representative feature map The Okay, number Column pixels; The incentive is to use the compressed global features to predict the importance of each channel through a fully connected layer; It first performs channel compression using an FC (Full-Functional) dimension reduction layer with a dimension reduction ratio of r, then uses the ReLU function, followed by an FC dimension upscaling layer to increase the dimension of the previously compressed channels, and finally uses the Sigmoid function to obtain the output result s, as shown in the following equation: In the formula For ReLU function, , It consists of two fully connected layers, among which , ; The reweighting involves taking the weights of the excitation output as the importance of each feature channel after feature selection, and then progressively weighting each channel onto the previous features through multiplication, thus completing the recalibration of the original features in the channel dimension; the implementation method is as follows: in and It is an indicator quantity and feature mapping Channel multiplication between them.

6. The lotus phenotype recognition method based on pseudo-label algorithm and MobileNetV2 network according to claim 1, characterized in that, The pseudo-label algorithm uses the labels of unlabeled samples as variables and the labels of labeled samples to propagate information, thereby predicting the labels of unlabeled samples. It achieves semi-supervised learning by minimizing the entropy during the label propagation process. During label propagation, the labels of labeled samples are propagated to unlabeled samples and adjusted according to the similarity metric until convergence is achieved. By minimizing the entropy, the accuracy of the label propagation algorithm in predicting the labels of unlabeled samples is improved.

7. A lotus phenotypic recognition device based on a pseudo-label algorithm and a MobileNetV2 network, the device being used to implement the lotus phenotypic recognition method as described in any one of claims 1 to 6, characterized in that, include: The lotus image processing module is responsible for performing preliminary processing on the input lotus image to extract basic features from the image; The feature processing module is used to further process the input feature map to improve its expressive power. This module uses the SE attention module and 7 different inverted residual bottleneck layers to perform attention-enhanced feature processing on the extracted preliminary feature map, and then obtains a new feature space representation based on the final fully connected layer of the model. The classification processing module consists of three parts: a global average pooling layer, a dropout layer, and a fully connected layer. It is used to perform linear transformation on the feature map output by the feature processing module to obtain the output, thereby identifying and classifying lotus species.