A method for classifying fall armyworms based on an AGCNet network

By using the AGCNet network, the DACEConv and DG-IBN modules are used to alleviate inter-domain differences. Combined with AMG-ProtoNet for adaptive fusion, the problem of weak discriminative power of cross-domain few-shot learning models in the classification of fall armyworm is solved, and high-precision multi-scale feature representation and stable recognition effect are achieved.

CN122289792APending Publication Date: 2026-06-26CHANGZHOU UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHANGZHOU UNIV
Filing Date
2026-04-21
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing cross-domain small sample learning models have weak discriminative power in the classification of fall armyworm, and are unable to effectively identify morphological differences at different developmental stages and pest characteristics in complex backgrounds.

Method used

We employ the AGCNet network, combined with the DACEConv module to enhance feature extraction, the DG-IBN module to mitigate inter-domain differences, and adaptive fusion through AMG-ProtoNet to improve feature discriminativeness and generalization ability.

Benefits of technology

It achieves high-precision classification and detection of fall armyworm images, significantly improves the model's discriminative power for multi-scale morphological features and cross-domain generalization performance, and enhances recognition stability and robustness in complex scenes.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122289792A_ABST
    Figure CN122289792A_ABST
Patent Text Reader

Abstract

This invention relates to the field of image processing technology, and more particularly to a classification method for fall armyworm based on the AGCNet network. The method includes acquiring images of fall armyworms at various growth stages as target domain images; pre-training an AGCNet model using the source domain images to obtain corresponding weights; and classifying the target domain images using the weighted AGCNet model. The AGCNet model includes a feature extractor and a classifier. The feature extractor replaces the BN module of the ResNet10 network with a DG-IBN module, and replaces the second 3×3 convolutional module in the main branches of stage 3 and stage 4 of the ResNet10 network with a DACEConv module. This invention addresses the problem of weak discriminative power in existing cross-domain few-shot learning models when classifying fall armyworms.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and in particular to a classification method for fall armyworm based on the AGCNet network. Background Technology

[0002] Fall armyworm is difficult to control due to its high resistance to pesticides, voracious appetite, and strong ability to migrate long distances. In the integrated management system for fall armyworm, accurately identifying its developmental stage and instar is of guiding significance for implementing precise control.

[0003] However, in practical applications, achieving high-precision classification and recognition still faces the following challenges: First, the pest is hidden in its natural environment and its spatial and temporal distribution is uneven, making it difficult to obtain image samples; Second, its morphological characteristics show significant polymorphism with different developmental stages (egg, larva, pupa, adult) and different larval stages, which greatly increases the difficulty of classification and recognition.

[0004] Few-shot learning (FSL) has received widespread attention. Wei et al. proposed a lightweight few-shot learning model, LitePlantProto, based on ShuffleNetV2, by integrating shallow fine-grained and deep semantic features through cross-scale adaptive spatial feature fusion and combining it with the lightweight attention mechanism LEAM. Traditional few-shot learning methods assume that the related task and the target task are in the same domain. When faced with extreme situations such as scarce or even zero samples, their task adaptation and generalization capabilities are significantly limited. In contrast, cross-domain few-shot learning (CD-FSL) can utilize source domain data that is easily obtained and has high-quality annotations from other domains. Through feature transformation and domain adaptation techniques, it can effectively transfer data to the target domain pest data, thereby significantly improving the model's representation ability and cross-domain generalization performance in real agricultural scenarios. For example, there are metric-based CD-FSL (CDFSL-BDC), optimization-based CD-FSL (CDFSL-MAML), and non-meta-learning CD-FSL (CDFSL-NML). However, current research on cross-domain few-shot learning for agricultural pest classification is still in its early exploratory stage, with a limited number of related works and a lack of systematic methodological frameworks and large-scale empirical validation. In addition, when facing polymorphic targets such as fall armyworm, morphological differences between different developmental stages will further exacerbate intraclass variation and weaken the model's discriminative power, which places higher demands on cross-domain few-shot learning. Summary of the Invention

[0005] To address the shortcomings of existing methods, this invention solves the problem of weak discriminative power in the classification of fall armyworm by existing cross-domain few-shot learning models.

[0006] The technical solution adopted in this invention is: a classification method for fall armyworm based on AGCNet network, comprising the following steps: Step 1: Obtain images of the fall armyworm at each growth stage as the target domain images; In a preferred embodiment of the present invention, the growth stages of the fall armyworm include: egg, larva, pupa, and adult.

[0007] Step 2: Pre-train the AGCNet model using the source domain image to obtain the corresponding weights; classify the target domain image using the AGCNet model with the weights loaded; the AGCNet model includes a feature extractor and a classifier; the feature extractor replaces the BN module of the ResNet10 network with the DG-IBN module, and replaces the second 3×3 convolutional module of the main branch of stage 3 and stage 4 of the ResNet10 network with the DACEConv module.

[0008] In a preferred embodiment of the present invention, the DG-IBN module includes: Introduce a channel gate at the channel level. , Indicates input features The scalar obtained by performing global average pooling followed by a fully connected layer and a nonlinear transformation; This is the scaling factor. For the Sigmoid function; These are learnable parameters; For each sample Feature map Perform global average pooling in the spatial dimension to obtain the channel vector. Then through Convolution mapping as spatial gating ; , Channels Learnable weights and biases; The gate control coefficient is calculated by multiplying the channel gate by the space gate, and in... Interval truncation ; DG-IBN output ; To calibrate the characteristics of the BN channel; To calibrate the characteristics of the IN channel.

[0009] In a preferred embodiment of the present invention, the DACEConv module includes: fine-grained branching, context branching, and a dynamic gating mechanism; wherein... Fine-grained branches include: depthwise separable convolution and low-rank SE attention mechanism; The context branch preserves the RFCAConv receptive field expansion and coordinate attention aggregation, and imposes a low-rank bottleneck constraint on the channel dimension; The outputs of the two branches are fused by a dynamic gating mechanism, which generates adaptive weights for the input through global average pooling, linear transformation, and the Softmax function.

[0010] In a preferred embodiment of the present invention, the classifier includes ProtoNet.

[0011] In a preferred embodiment of the present invention, the classifier further includes: AMG-ProtoNet, which consists of global branches and local branches.

[0012] In a preferred embodiment of the present invention, global branching occurs in each task unit, targeting the first... c The class supports obtaining the category prototype by taking the average of the set. And estimate the variance vector of this class on each feature dimension. Given the feature vector of the query sample , and the first c Diagonal Mahalanobis distance of the class ; Within the same task unit, in addition to calculating the variance of each category... In addition, the global variance vector is estimated based on the support features of all categories. And a more robust variance estimate is obtained through weighted fusion. ; The shrinkage coefficient; replace variance Then, based on the distance between the query sample and the prototype of each category, the global branch classification score for each category is calculated. .

[0013] In a preferred embodiment of the present invention, the cosine similarity between the local descriptor of the local branch and all local descriptors in the target category support set is calculated, and top-k aggregation is used to select the most relevant local matching response; subsequently, the matching responses at each position are aggregated to obtain the local branch score of the category. ; A sample-level adaptive fusion strategy is adopted to weight and combine the classification scores of the global branch and the local branch. ; The weights are used to transform the confidence scores after temperature scaling.

[0014] As a preferred embodiment of the present invention, a fall armyworm classification system based on the AGCNet network includes: a memory for storing instructions executable by a processor; and a processor for executing the instructions to implement the fall armyworm classification method based on the AGCNet network.

[0015] In a preferred embodiment of the present invention, a computer-readable medium storing computer program code implements a fall armyworm classification method based on the AGCNet network when executed by a processor.

[0016] The beneficial effects of this invention are: 1. This invention proposes the AGCNet framework, which uses DAResNet as a feature extractor and combines it with AMG-ProtoNet as a classifier to achieve high-precision classification and detection of fall armyworm images. 2. In the feature extraction stage, the 3×3 convolution in the high-level residual block is replaced by introducing the DACEConv module. This module enhances the perception of local details through receptive field expansion and coordinate attention mechanism, and captures global context information by combining depthwise separable convolution and low-rank SE attention. Finally, the two branches are dynamically fused by input adaptive gating, which significantly improves the model's discriminative representation ability of multi-scale morphological features. 3. The traditional batch normalization layer is replaced by the DG-IBN module. This module adaptively fuses instance normalization and batch normalization through a dual-gating mechanism of channel and space, and introduces a domain offset estimation fine-tuning normalization strategy, which effectively alleviates the inter-domain differences in data distribution under different acquisition conditions and enhances the domain invariance of features. 3. In the metric inference stage, AMG-ProtoNet further integrates robust prototype metrics and dense local matching on the basis of the prototype network, and suppresses background interference through foreground response weighting; at the same time, it adopts an adaptive fusion strategy to dynamically balance global and local information, thereby improving the model's discriminative stability and generalization ability. Attached Figure Description

[0017] Figure 1 This is a flowchart of the fall armyworm classification method based on the AGCNet network of the present invention; Figure 2 This is a connection diagram of the measuring device system of the present invention; Figure 3 This is a structural diagram of the DACEConv module of the present invention; Figure 4 This is a structural diagram of the DG-IBN module of the present invention; Figure 5 These are the T-SNE visualization results of the Baseline model and the model of this invention; Figure 6 The results are obtained by using the Baseline model and the model of this invention to visualize the extracted features using Grad CAM. Figure 7 This is a comparison of the confusion matrices output by the baseline model and the improved model on the FAW dataset. Detailed Implementation

[0018] The present invention will be further described below with reference to the accompanying drawings and embodiments. The drawings are simplified schematic diagrams, which only illustrate the basic structure of the present invention in a schematic manner, and therefore only show the components related to the present invention.

[0019] like Figure 1 As shown, a classification method for fall armyworm based on the AGCNet network includes the following steps: Cross-Domain Few-Shot Learning (CDFSL) aims to address the limitations of models in the source domain. ( For the source domain image set, (for its corresponding label set) it has a large number of labeled samples, while in the target domain This problem concerns generalization in scenarios where only a very small number of labeled samples are available. The problem is constrained by two factors: first, there is a significant difference in the data distribution between the source and target domains; second, the data distributions in the source and target domains differ considerably. Secondly, the two sets of categories have no overlap, that is... .

[0020] Specifically, the model is first trained on the source domain dataset, learning general transferable representations by minimizing cross-entropy loss, thus providing prior knowledge for few-shot tasks in the target domain; subsequently, in the testing phase, typical N-way K-shot tasks are constructed by sampling from the target domain dataset. Among them, support sets It contains N categories, each category provides K A set of labeled samples to help the model quickly adapt to the target domain distribution; query set Then includes One sample to be classified ( This is used to evaluate the model's predictive performance on the task. After the model completes the adaptation using the support set, it predicts the query set samples and compares the prediction results with the true labels to calculate the classification accuracy. This process is repeated on multiple episodes, and the average accuracy is finally taken as the evaluation metric for measuring the model's cross-domain generalization ability.

[0021] In the image classification task of fall armyworm, traditional deep learning methods are often limited by fixed geometric perception paradigms and single domain statistical priors, lacking adaptive adjustment mechanisms for the content of input images. As a result, they are limited in their performance when faced with fine-grained morphological differences, complex field background noise, and feature distribution drift under multivariable imaging conditions.

[0022] To address the aforementioned issues, this invention proposes a novel image classification framework, AGCNet, aiming to overcome the limitations of existing methods in multi-scale dynamic representation, distribution offset statistical calibration, and noise-resistant metric inference. The framework's innovation lies in three aspects: First, in the feature extraction stage, the DACEConv module replaces the 3×3 convolution in the high-level residual block. This module employs a fine-grained-contextual collaborative dual-branch structure and achieves dynamic alignment and enhancement of multi-scale features through adaptive input weighting. Secondly, the batch normalization layer in all residual blocks of the feature extractor is replaced by the DG-IBN module. The statistical characteristics of instance normalization and batch normalization are dynamically fused by the dual gating mechanism of channel and space, thereby effectively alleviating style and scale shifts between cross-domain data. Finally, AMG-ProtoNet is introduced in the metric inference stage, which combines global prototype metric and local matching strategy on the basis of prototype network: on the one hand, it uses the diagonal Mahalanobis distance of the fusion variance shrinkage strategy to improve inter-class separability on the basis of calibrating statistical distribution bias; on the other hand, it uses the local similarity aggregation of dense features and combines foreground response weighting to suppress background interference; at the same time, an adaptive fusion mechanism is adopted to dynamically balance global and local information, so that the model can obtain more stable discrimination ability and stronger generalization performance under cross-domain small sample conditions.

[0023] Traditional prototype networks obtain image feature vectors through global average pooling and use them to construct mean prototypes for each category. Finally, classification is performed based on the Euclidean distance between the query features and these prototypes. However, this paradigm often reveals its limitations when faced with practical challenges such as significant intra-class differences, background interference, or scarce samples. First, global pooling weakens spatial structure and fine-grained differences, making the discrimination susceptible to the influence of background and non-critical regions, and making it difficult to fully utilize local information. Second, Euclidean distance treats each dimension equally, without considering intra-class fluctuations. Under small sample conditions, it is easily influenced by unstable dimensions, leading to metric bias and reduced prediction stability.

[0024] like Figure 2 AMG-ProtoNet adopts a dual-path inference architecture consisting of global and local branches, and integrates the two types of discriminative information through an adaptive fusion mechanism. The global branch follows the basic idea of ​​prototype classification, but in order to address the problem that Euclidean distance treats each feature dimension equally under small sample conditions and is easily interfered by unstable dimensions, the distance metric is replaced with diagonal Mahalanobis distance.

[0025] Specifically, in each task unit, for the first c The class supports obtaining the category prototype by taking the average of the set. And estimate the variance vector of this class on each feature dimension. Given the feature vector of the query sample , and the firstc The diagonal Mahalanobis distance of a class is defined as:

[0026] in, D Indicates feature dimension; For vectors The j Dimensional components; For category prototypes In the j Dimensional components; For category c In the j The variance estimate of the dimension; ε>0 is a numerically stable term used to avoid anomalies caused by excessively small denominators.

[0027] Due to the small number of supporting samples, the class variance is estimated. Variance can easily become too small or unstable, leading to distance distortion if used directly. Therefore, a variance contraction strategy is introduced: within the same task unit, in addition to calculating the variance of each category... In addition, the global variance vector is estimated based on the support features of all categories. And a more robust variance estimate is obtained through weighted fusion.

[0028]

[0029] in, This is the contraction coefficient, used to balance categorical and global statistics. When... When the variance is large, variance estimation relies more on global statistics to suppress small sample noise; when When the size is smaller, more emphasis is placed on the distribution of the category itself. Ultimately, [the following will be implemented]. Substitute equation (1) for the original variance This avoids distance distortion caused by extreme variance and improves the robustness and consistency of distance measurement in cross-task and cross-domain scenarios.

[0030] Then, calculate the global branch classification score for each category based on the distance between the query sample and the prototype of each category. This yields the category response representation of the query sample on the global branch; Local branching aims to compensate for the loss of spatial detail information caused by global pooling. Its core idea is to preserve feature map representations during the inference stage and represent the image as a set of local descriptors composed of multiple spatial locations, thereby enhancing the capture of fine-grained discriminative features.

[0031] Specifically, for each local location of the query sample, the local branch calculates the cosine similarity between its local descriptor and all local descriptors in the target category support set, and uses top-k aggregation to select the most relevant local matching response; then, the matching responses at each location are aggregated to obtain the local branch score for that category. ; In addition, the local branch introduces a prototype conditional foreground weighting mechanism to further optimize the effect of local matching. By calculating the similarity between each query position and the supporting class prototype, the model can assign different weights to position aggregation, emphasizing the importance of the target region, suppressing the interference of the background region, and enhancing the model's discrimination ability in complex scenarios.

[0032] Given that different target domains and different query samples have different degrees of dependence on global and local information, AMG-ProtoNet adopts a sample-level adaptive fusion strategy to weight and combine the classification scores of the global branch and the local branch, as shown in Equation (3).

[0033]

[0034] in, and The global branch and local branch are respectively for the query samples. The category score vector.

[0035] Indicates query sample The degree of dependence on local branches during inference is specifically assessed by measuring the reliability of the global branch and local branch prediction confidence scores for classifying the query sample. These confidence scores are then converted into comparable weights after temperature scaling. This allows the fusion process to adaptively favor the more reliable branch for each sample; to avoid accidental high confidence leading to extreme fusion outcomes, further... Limited to Within this scope, efforts are made to improve the stability and robustness of cross-domain testing.

[0036] Overall, AMG-ProtoNet systematically improves the robustness and discriminative performance of the prototype network in cross-domain few-shot classification tasks through a collaborative design of dual-path inference and adaptive fusion. This architecture combines robust modeling capabilities for global metrics with detailed modeling capabilities for local fine-grained cues, and can dynamically balance the two types of feature information based on sample characteristics. This results in more stable and accurate predictions even under conditions of significant intra-class differences, strong background interference, and limited supporting samples.

[0037] In cross-domain small-sample classification tasks, sample scarcity and significant differences in cross-domain distribution greatly increase the difficulty of model generalization. Furthermore, the significant morphological variations exhibited by the fall armyworm during complete metamorphosis further amplify the recognition difficulty caused by these cross-domain differences, thus increasing the classification challenge. Traditional backbone networks, such as ResNet, typically employ standard convolutional operations. The feature representations learned in the source domain often fail to effectively capture the fine-grained local features and global contextual information of the fall armyworm at different life stages, such as the contrast between the pest and the background. Moreover, the static weight-sharing mechanism of standard convolution limits the model's adaptive modeling ability for multi-scale features, thereby affecting classification performance.

[0038] Therefore, to enhance the feature representation capability of convolutional layers in the backbone network, a dynamic attention coordinate enhanced convolution module, DACEConv (Dynamic Attentive Coordinate-Enhanced Convolution), was designed based on RFCAConv (Receptive-Field Coordinate Attention Convolution), as follows: Figure 3 As shown; RFCAConv is a coordinate-attention convolutional module oriented towards receptive field spatial features. Its core idea is to dynamically weight spatial features within the receptive field through a coordinate attention mechanism to alleviate the representational limitations caused by parameter sharing in traditional convolutions. However, directly applying RFCAConv to cross-domain small-sample pest classification scenarios still has shortcomings: on the one hand, its modeling of receptive field features is still biased towards statics, lacking dynamic adaptation capabilities to input content; on the other hand, in complex scenarios, a single branch structure struggles to simultaneously capture local details and global contextual information, resulting in limited performance in multi-scale feature fusion.

[0039] Based on the above analysis, DACEConv adopts a fine-grained-context collaborative dual-branch architecture, which aims to meet the dual challenges of significant cross-domain differences and intra-stage morphological differences brought about by complete metamorphic characteristics, while taking into account the representation needs of micro-detail features and macro-semantic context. In the context branch, the module retains the core ideas of RFCAConv's receptive field expansion and coordinate attention aggregation, and achieves lightweight processing by applying low-rank bottleneck constraints in the channel dimension, thereby enhancing the response to local details with limited computational overhead. The contextual branch includes: after the input features are processed by GConv, Norm, ReLU and Adjust Shape, average pooling is performed along the horizontal and vertical directions respectively; after the two directional features are concatenated and fused by convolution, they are normalized and nonlinearly transformed to generate corresponding directional attention weights, and the features are reweighted. Finally, the contextual enhancement features are output through convolution.

[0040] In the fine-grained branch, a lightweight depthwise separable convolution and a low-rank SE attention mechanism are used to extract feature representations with global perception capabilities at a lower computational cost. Specifically, depthwise convolution performs spatial convolution independently within each channel, while pointwise convolution is responsible for cross-channel information integration, enabling the network to obtain a larger effective receptive field and smooth medium- to long-range context with a parameter count far lower than standard convolution. The subsequent low-rank SE unit models the inter-channel dependencies through a bottleneck structure of "dimensionality reduction-dimensionality increase," recalibrates the semantic channels related to category discrimination, effectively suppresses background interference and domain-related noise, and provides more robust contextual semantic support for subsequent classification. The fine-grained branch includes: the input features are processed sequentially by DWConv+Norm+ReLU and PWConv+Norm+ReLU, and are retained as the output of the main branch. On the other hand, attention weights are generated by GlobalPooling, FC, ReLU, FC and Sigmoid, and multiplied element-wise with the output of the main branch to achieve adaptive recalibration of fine-grained features.

[0041] The outputs of the two branches are fused through a dynamic gating mechanism, which generates adaptive weights for the input through global average pooling, linear transformation, and the Softmax function, enabling flexible weighting of fine-grained features and contextual features.

[0042] Given that the features in the third and fourth stages possess both large receptive fields and strong semantic representation, making them crucial layers for cross-domain alignment and final discrimination, this invention integrates the DACEConv module into the third and fourth stages of the ResNet-10 backbone network. Structural enhancement is achieved by replacing the second 3×3 convolutional unit in its high-level residual blocks, as shown below. Figure 1 The pink convolution is used to enhance the model's ability to model and distinguish the features of the fall armyworm at multiple growth stages in cross-domain small sample scenarios, under controllable computational complexity constraints.

[0043] In summary, DACEConv has multiple advantages in terms of structure and performance: First, the fine-grained-context dual-branch structure enables complementary modeling of receptive field-level detail information and large-scale environmental semantics; second, the dynamic gating mechanism allows the module to adaptively adjust the relative weights of the two branches according to the domain distribution and content complexity of the current image; finally, relying on the lightweight design of low-rank SE attention and depthwise separable convolution, the computational cost and parameter increase are significantly controlled, making DACEConv suitable for resource-constrained scenarios while maintaining high performance.

[0044] like Figure 4The DG-IBN module addresses the challenges of cross-domain small-sample classification tasks using the fall armyworm as a research subject. Significant differences exist between the source and target domains in imaging conditions, style distribution, and class space, with extremely scarce labeled samples in the target domain. This presents a dual challenge of domain bias and sample scarcity. Furthermore, the fall armyworm is a holometabolous pest, exhibiting significant stage-specific differences in body color, size, and texture between its eggs, larvae, pupae, and adults, further increasing the difficulty of discrimination. These factors collectively lead to a significant deviation between the feature distribution obtained by the model from the source domain and the true distribution of samples from different developmental stages of the fall armyworm, weakening the generalization performance of the classification algorithm. Normalization methods can map features to a uniform scale and are often used to suppress distribution bias; however, traditional normalization methods such as BN employ fixed statistical forms, making it difficult to simultaneously ensure style invariance and class discriminability. This invention proposes DG-IBN (Dynamic Gated Instance-Batch Normalization), and uses DG-IBN to uniformly reconstruct the normalization links of convolutional paths and residual shortcuts at all levels in the backbone network. This enables BN and IN to achieve dynamic fusion at the sample and channel levels through channel-space dual gating and domain adaptive offset, effectively suppressing domain offset and style differences between samples, thereby improving the cross-domain small sample polymorphic classification performance of fall armyworm.

[0045] Let the input features of a certain layer be... ;in, For batch size, Indicates the number of channels. H and These represent the height and width of the feature map, respectively. A tiny positive number is added to prevent the denominator from being zero; BN is in the first... For each channel, the mean and variance of that channel are calculated based on all samples in the current batch and their spatial locations, and the channel characteristics are recalibrated accordingly.

[0046] IN maintains a "channel-by-channel, cross-space" structure, but only calculates the mean and variance of the spatial dimension within a single sample, thus focusing more on eliminating style differences between samples. For the sample ,aisle ,have:

[0047] BIN uses learnable gating parameters A linear fusion is performed between the normalized results of BN and IN, allowing the two to complement each other's advantages:

[0048] BIN uses channel-level gating parameters Achieving linear interpolation between IN and BN balances style invariance and category discriminability to some extent, but Sharing across all samples within the same channel and being independent of domain conditions essentially still constitutes static weighting, ultimately limiting the model's ability to extract fine-grained discriminative structural features.

[0049] To overcome the above limitations, this invention proposes Dynamic Gated Instance-Batch Normalized DG-IBN; this module introduces channel-space dual gating and domain adaptive offset mechanism on the basis of BIN, realizing the evolution from fixed weight fusion to sample-level and channel-level dynamic adaptive fusion.

[0050] Specifically, DG-IBN first introduces channel gates at the channel level. , by learnable parameters Determined together with the domain offset of the current batch:

[0051] in, Input features for this layer, Indicates to The scalar obtained after global average pooling and a fully connected layer and nonlinear transformation describes the overall domain offset of the current batch. This is the scaling factor. The Sigmoid function is used; this mechanism can adaptively adjust the global weights of BN and IN based on the statistical characteristics of the source and target domains while maintaining the inherent preferences of each channel, thereby achieving adaptive offset adjustment of the domains.

[0052] Based on this, DG-IBN applies it to each sample. Feature map Perform global average pooling in the spatial dimension to obtain the channel vector. Then through Convolution mapping as spatial gating :

[0053] Among them, samples The feature map is obtained by performing global average pooling in the spatial dimension. 3D channel vector, denoted as ; , Channels Learnable weights and biases are used to adaptively adjust the fusion ratio of BN and IN based on the overall appearance features of the samples; the final gating coefficient is obtained by multiplying the channel gate and the spatial gate, and... Cut-off within the interval:

[0054] Finally, the output of DG-IBN is expressed as:

[0055] From the perspective of its mechanism of action, the passageway door At the channel scale, the weights of BN and IN are adaptively allocated by combining domain offset, explicitly addressing the problem of inconsistent feature distributions between the source and target domains; spatial gate. The fusion ratio of BN and IN is refined at the sample-channel scale. When there are drastic changes in illumination, background, and texture style, the IN weight is automatically increased to suppress domain-related style differences. Meanwhile, more BN statistics are retained in the channels that play a major discriminative role in representing the body shape and markings of the fall armyworm. The synergistic effect of channel-space dual gating and domain-adaptive shift effectively mitigates intra-class morphological differences caused by statistical shift and polymorphic phenotypes, thus significantly improving the robustness and generalization ability of cross-domain classification of the fall armyworm under small cross-domain sample conditions.

[0056] Experimental procedure: To verify the performance advantages of the proposed algorithm framework (AGCNet), the miniImagenet dataset was used as the single source domain for training. The self-made fall armyworm dataset FAW and four public datasets, places, CropDiseases, EuroSAT, and ISIC, were selected as the target domains for cross-domain evaluation, and multiple sets of experiments were designed. Among them, the fall armyworm dataset FAW contains nine categories of four developmental stages: eggs, larvae (1-6 instars), pupae, and adults, with a total of 1,450 images. Data collection was divided into two parts: the first part was field collection, which was carried out from September to October 2024 in Xinkang Village Modern Agricultural Industrial Park, Changzhou City, Jiangsu Province, China (geographic coordinates: 31°48′44.1″N, 119°58′9.0″E), and images were acquired using a vivo X100 PRO smartphone (equipped with a 1 / 0.98-inch Sony IMX989 sensor, 50 megapixels). During the field period, the larvae were mainly in the 3rd–6th instar stages. Therefore, all relevant images were taken in natural environments with complex backgrounds, including typical interference factors such as multi-angle leaf shading and uneven lighting. Secondly, laboratory images were collected under controlled lighting conditions (constant light intensity 700–900 Lux) using the same equipment to supplement the images of 1st–2nd instar larvae, pupae, and adults. Images of silkworm larvae (1st–5th instar), pupae, and adults were also collected to enhance the reference value of the cross-species comparative experiment. Egg images were obtained from publicly available internet resources to ensure the completeness and coverage of categories in the dataset.

[0057] miniImageNet is a subset of the ImageNet dataset containing natural color images. It includes 100 categories, approximately 600 images per category, with a uniform resolution of 84×84. Due to its moderate size and diverse categories, it has become a standard benchmark for few-shot learning tasks. Places365 is a large-scale scene recognition dataset covering 365 scene categories. This invention uses the standard version, Places365-Standard, which covers various scene types such as natural environments, architectural spaces, and daily life. It can be used to evaluate the model's generalization ability under complex backgrounds and diverse scene conditions. The CropDiseases dataset originates from agricultural scenes and contains 38 types of crop diseases, totaling approximately 54,306 leaf images, with approximately 1,000 images per category. This dataset mainly relies on leaf shape, texture, and abnormal patterns such as spots caused by diseases for differentiation, and can test the model's ability to represent fine-grained lesion features. The CUB dataset is a typical fine-grained bird recognition benchmark, containing 200 bird subclasses and a total of 11,788 images, covering different poses, backgrounds, and lighting conditions. Because the differences between categories are subtle and the discriminative cues are mostly concentrated in local morphological details, this dataset can effectively evaluate the model's ability to capture fine-grained discriminative features between highly similar categories. The ISIC dataset, provided by the International Skin Imaging Consortium, is used for skin lesion identification and contains 7 categories, with approximately 2,000 high-resolution dermoscopic images per category. Intra-class differences are significant, and the noise level is relatively high. Evaluations on these multi-domain datasets fully validate the robustness and universality of the proposed method in cross-domain transfer and small-sample scenarios.

[0058] The experiments were conducted on the following hardware platform: a 13th generation Intel® Core™ i7-13650HX CPU (2.60 GHz), an NVIDIA GeForce RTX 4060 Laptop GPU (8GB GDDR6 VRAM), and 24.0GB RAM. The algorithm was developed using Python 3.6.13 and implemented using the PyTorch 1.10.2 deep learning framework. In both 5-way 1-shot and 5-way 5-shot settings, 16 images were randomly sampled for each class as the query set. During model training, the learning rate was 0.001, weight decay was 0.0005, and the batch size was 16. The Adam optimizer was used to optimize the entire network, and training iterations were performed for 200 epochs. Classification performance during the testing phase was evaluated as average accuracy (%), and experimental results were reported based on 95% confidence intervals.

[0059] To demonstrate the performance of the proposed model in the CDFSL environment, it was compared with several representative and competitive CDFSL models, including Wave-SAN, ATA, AFA, and LDP-Net. Experiments were conducted under the most common FSL settings, namely 5-way 1-shot and 5-way 5-shot, and the model was trained using the miniImageNet dataset. The model was tested on the self-made dataset FAW and four public CDFSL datasets (CropDiseases, Places, ISIC, and CUB). The comparison results are shown in Table 1. The proposed model outperforms most methods in terms of test accuracy under both 5-way 1-shot and 5-way 5-shot settings.

[0060] As shown in Table 1, under the 5-way 1-shot setting, AGCNet performs exceptionally well across multiple target domains. On the CropDiseases, ISIC, and CUB datasets, the model accuracies are 70.00%, 35.42%, and 46.68%, respectively, outperforming the suboptimal algorithm by 0.82, 1.81, and 0.15 percentage points. On the FAW dataset, AGCNet achieves an accuracy of 57.06%, only 0.22% lower than the optimal AFA (57.28%). Considering the extremely scarce samples under the 1-shot setting, the randomness of feature distribution leads to statistical fluctuations, and this slight deviation is within a reasonable error range.

[0061] In the 5-way 5-shot setting, AGCNet's advantages become even more apparent as the number of support set samples increases. On the FAW dataset, it achieves an accuracy of 85.13%, a 1.25 percentage point improvement over the suboptimal method. On the CropDiseases, ISIC, and CUB datasets, the model outperforms the suboptimal algorithm by 0.22, 1.89, and 1.05 percentage points, respectively. This indicates that when each class of support samples can more fully cover intra-class variations, AGCNet is more likely to extract stable and representative class centers from the support samples, reducing bias caused by single-sample randomness.

[0062] For the Places dataset, AGCNet achieved accuracies of 50.32% and 70.48% in 1-shot and 5-shot tests, respectively. While these figures did not surpass the best Wave-SAN (51.32% and 70.75%), the gap significantly narrowed from 1.00% to 0.27%. Places is a scene recognition task, where discrimination often relies on more macroscopic layout and semantic co-occurrence information. AGCNet's key advantage lies in its decoupled modeling of target morphological details and background interference, thus limiting its gain. However, as the support set increases, AGCNet can utilize more comprehensive intra-class statistical information to stabilize similarity estimates, thereby reducing matching uncertainty caused by scene diversity and inter-class co-occurrence similarity. This improves the representativeness of the prototype representation and the reliability of the discrimination boundary, ultimately allowing it to further approach the optimal method and achieve more stable performance in the 5-shot setting.

[0063] Table 1. Comparison of the present invention with other CDFSL methods under 5-way 1-shot conditions.

[0064] Table 2. Comparison results of the present invention with other CDFSL methods under 5-way 5-shot conditions.

[0065] To verify the effectiveness of each improved mechanism in cross-domain small-sample pest classification tasks, this invention conducted systematic ablation experiments. The original baseline model was used as the starting point (M0), and three improved mechanisms were introduced sequentially: S1 (AMG-ProtoNet) improves the discriminative power of feature metrics and the stability of classification decisions by adaptively fusing global prototype metrics with local feature matching; S2 (DACEConv dynamic convolution) utilizes a dual-branch structure and adaptive gating to fuse local fine-grained and global contextual features; and S3 (DG-IBN normalization) combines the advantages of instance and batch normalization and mitigates inter-domain distribution shifts through a gating mechanism. Experiments were conducted in 5-way 1-shot and 5-way 5-shot settings, and performance was evaluated on the self-made FAW dataset and four common target domain datasets (CropDiseases, Places, ISIC, and CUB).

[0066] Experimental results show that each improvement mechanism brings stable performance improvements in both 5-way 1-shot and 5-way 5-shot settings, with the most significant gains observed in the 5-way 5-shot setting. Specifically, after introducing S1 (M1) alone, the model's accuracy on FAW, CropDiseases, Places, ISIC, and CUB improved by 7.01, 2.33, 0.64, 1.41, and 2.47 percentage points compared to the baseline model, respectively. The improvement on FAW was particularly significant, mainly due to the statistical calibration of the metric space by the shrinkage estimation, which, combined with the local feature matching mechanism, effectively suppressed noise interference from the complex field background. Introducing S2 (M2) alone yielded gains of 1.56, 1.29, 0.74, 1.97, and 4.15 percentage points on the aforementioned datasets, with a particularly significant improvement on CUB. This confirms the advantage of the fine-grained and context-coordinated dual-branch architecture in capturing subtle feather textures, and significantly enhances the richness of species morphology representation through the dynamic aggregation of multi-scale features. Introducing S3 (M3) alone improved the model by -0.15, 1.85, 0.87, 7.05, and 4.24 percentage points on various datasets, with particularly prominent increases on ISIC and CUB. This indicates that the domain-adaptive dual-gating mechanism can maintain structured features while suppressing non-semantic style biases (such as illumination and texture variations), thereby mitigating significant feature distribution shifts caused by imaging modality differences.

[0067] When the three improvement mechanisms (M4) are integrated simultaneously, the model achieves performance improvements of 7.67, 3.50, 2.43, 8.30, and 8.98 percentage points on FAW, CropDiseases, Places, ISIC, and CUB, respectively, compared to the baseline model. These gains significantly exceed the simple summation of the independent contributions of each mechanism. This result indicates that the three improvement points, S1, S2, and S3, form a clear complementarity: S2 enhances structural and semantic representation capabilities, S3 strengthens cross-domain distribution alignment and style robustness, and S1 effectively transforms superior representations into more stable small-sample decisions at the measurement and matching levels. The three work together to achieve an overall enhancement from feature extraction to decision reasoning.

[0068] Table 3 5way-1shot ablation experiment

[0069] Table 4 5way-5shot ablation experiment

[0070] To further evaluate the performance of the AGCNet model in feature representation, this study conducted a t-SNE visualization comparison experiment on five target domain datasets (FAW, CropDiseases, Places, ISIC, and CUB) to compare the feature distributions of the baseline and improved models. Specifically, under a fixed random seed, five classes were randomly selected in each target domain, and a fixed number of samples were sampled from each class: 50 images per class for FAW, 60 images per class for CropDiseases, 80 images per class for Places, 80 images per class for ISIC, and 55 images per class for CUB. Specifically, feature representations were first extracted from the pre-trained backbone network, and the discrimination scores for the five classes were calculated according to the classification rules to serve as visualization representations. These representations were then processed by L2 normalization, standardization, and optional PCA before being input into t-SNE and mapped to a two-dimensional space, thus visually presenting the degree of intra-class aggregation and inter-class separation boundaries under cross-domain conditions.

[0071] like Figure 5 As shown, the feature embeddings of the baseline model are generally characterized by intra-class dispersion and inter-class boundary ambiguity, and there is obvious distribution shift and class overlap in datasets with large inter-domain differences (such as FAW and ISIC). This indicates that relying solely on the basic residual convolution backbone and simple metric structure is insufficient to fully alleviate the feature distribution differences in cross-domain scenarios, resulting in limited performance of the model in terms of domain adaptability and discriminativeness. In contrast, after adding three improvement mechanisms, the improved model shows significant improvement in the feature space distribution of all target domains. Specifically, on the FAW dataset, the improved model reduces the number of outliers and relatively widens the inter-class intervals, indicating that the model's discrimination representation of different developmental stages of fall armyworm is more stable. On datasets with significant differences, such as CropDiseases and ISIC, the improved model effectively reduces the overlapping areas between classes and forms clearer class partitions, indicating that the model has a stronger adaptability to distribution changes and can suppress feature perturbations caused by differences in imaging conditions and textures. On natural scenes and fine-grained recognition datasets with relatively similar domains, such as Places and CUB, the improved model significantly alleviates multi-class mixing and boundary blurring phenomena, making the intra-class cluster structure more compact and further shrinking the overlapping areas of adjacent classes, reflecting the improved robustness of the model in suppressing complex background interference and distinguishing fine-grained classes. In summary, the T-SNE visualization results show that by introducing multi-scale dynamic convolutions into the backbone network to improve the fusion ability of local and global features, and combining adaptive gating normalization to alleviate inter-domain style shift, while using AMG-ProtoNet improved metric for more robust similarity calculation during the testing phase, the distribution inconsistency problem can be effectively mitigated in cross-domain environments, thereby enhancing the model's feature discrimination and cross-domain generalization ability in small sample tasks.

[0072] Grad-CAM visualization is introduced to compare the attention distribution of the baseline model and the improved model (Ours), evaluating the stability and consistency of their spatial focusing. It visually demonstrates the key regions the model relies on when predicting the true class. Specifically, Grad-CAM backpropagates the output of the target class, using the gradient of this output with respect to the convolutional feature map to measure the importance of each channel, and then weights and converges the feature maps to generate a class activation map. This activation map is then non-linearly processed and upsampled to the input scale to form a heatmap, which characterizes the spatial region where the model makes the most discriminative contribution when making predictions for that class.

[0073] like Figure 6 As shown, the baseline model's response is often more scattered and easily drawn by background textures or non-discriminatory regions, resulting in an inconsistency between the region of interest and the target subject. In contrast, the improved model's high-response regions are more concentrated on the target subject or key lesion areas, and it exhibits more consistent spatial focusing characteristics across different target domains. Taking the FAW dataset as an example, the baseline model often generates strong activation on the leaf background while insufficiently covering the insect subject; the improved model's response is more concentrated on the insect and its outline boundaries, with significantly reduced background interference. On natural image datasets such as CUB and Places, the baseline model's attention often spreads to branches, the ground, or the environmental background; the improved model more stably focuses on the target subject or key structures related to the scene category. On professional domain datasets such as CropDiseases and ISIC, the baseline model is more prone to activation shifts or over-response to background textures; the improved model tends to cover lesion / lesion areas and their boundaries. Overall, Grad-CAM visualization shows that the improved model can effectively suppress background interference, reduce attention dispersion, and more stably focus on the target subject or lesion areas related to category discrimination across multiple target domains.

[0074] To quantitatively evaluate the separability of the model across the nine insect stages in the FAW target domain and to identify the main sources of confusion to validate the effectiveness of the improvement strategy, this study designed a confusion statistics experiment. Specifically, 9×9 confusion matrices were constructed on the FAW dataset for both the baseline and improved algorithms. In each task sampling, predictions were made simultaneously for all nine categories, with 15 query samples set for each category. After 200 repeated samplings and cumulative statistics, 3000 prediction counts were obtained for each true category. To facilitate comparison of the recognition distribution of different categories, the confusion matrix was represented using row-normalized methods. The values ​​in the figure represent the proportion of a true category predicted as each category; diagonal elements represent the proportion of that category correctly identified; and off-diagonal elements represent the proportion of that category misclassified as other categories. Darker colors indicate higher proportions.

[0075] like Figure 7As shown, the improved algorithm generally increases the proportion of correctly classified insect life stages. Compared with the baseline algorithm, the improved algorithm improves the correct recognition rate across all seven categories; for example, the rate for first instar larvae increases from 0.72 to 0.87, for third instar larvae from 0.69 to 0.84, and for fourth instar larvae from 0.74 to 0.94. Furthermore, looking at the overall mean of the diagonal elements, the improved algorithm increases from 0.70 to 0.79, indicating an improvement in the average correct recognition level across the nine insect life stages. This demonstrates that the proposed improvement strategy is not only effective for most categories but also enhances the model's overall ability to correctly distinguish different life stages of FAW insects. Regarding error prediction, the improved algorithm shows a significant inhibitory effect on multiple key confusion relationships, and this improvement is mainly concentrated between adjacent larval instars. For example, the proportion of fourth-instar larvae misclassified as third-instar larvae significantly decreased from 0.20 to 0.05, and the proportion of third-instar larvae misclassified as fourth-instar larvae decreased from 0.13 to 0.06; the proportion of second-instar larvae misclassified as third-instar larvae decreased from 0.13 to 0.07, and the proportion of third-instar larvae misclassified as second-instar larvae decreased from 0.12 to 0.07. These results indicate that the improved algorithm effectively compresses several major misclassification channels, particularly showing more significant suppression of confusion between adjacent instars.

[0076] This invention proposes AGCNet, an integrated representation-inference framework for the image classification task of fall armyworm. It aims to mitigate feature statistical drift and local cue interference caused by differences in imaging conditions and background morphology, thereby maintaining stable and separable discriminative representations and reliable category inference even with very few labeled samples. This algorithm framework innovatively integrates the following three improvements: First, it utilizes the DG-IBN module to achieve a dynamic balance between instance normalization and batch normalization, adaptively suppressing domain-related style shift while preserving discriminative statistics. Second, it introduces the DACE module, achieving accurate multi-scale feature capture through complementary synergy of fine-grained feature enhancement and contextual semantic relabeling. Third, it employs the AMG-ProtoNet strategy, improving the reliability of category similarity estimation under complex backgrounds and fine-grained inter-class differences through collaborative inference of global distribution metrics and local feature matching, as well as confidence-based adaptive fusion. The synergistic effect of these three elements comprehensively enhances the model's resistance to appearance changes and background noise from feature extraction to metric inference, and improves the focusing and utilization efficiency of key morphological cues. Experimental results show that the representation enhancement and metric inference co-optimization mechanism adopted by the framework significantly improves the discriminativeness and robustness of features, enabling the model to achieve significant performance improvements over benchmark algorithms in multi-objective domains. Furthermore, compared to several representative CDFSL methods, AGCNet demonstrates good performance gains and competitiveness across multiple object domains.

[0077] This invention focuses on advanced cross-domain small-sample image classification methods, selecting the fall armyworm, a pest with polymorphic features, as the main classification object. A novel image classification algorithm framework, AGCNet (Adaptive Gated Calibration Network), is proposed. This framework uses GCA-ResNet (Gated Calibration Perceptual Residual Network) as the feature extractor and combines it with AMG-ProtoNet (Adaptive Multi-Granularity Prototype Network) as the classifier to complete the task. Specifically, GCA-ResNet significantly improves the model's sensitivity to multi-scale fine-grained morphological differences and the feature statistical stability under multi-morphological imaging conditions by introducing DACEConv (Detail-Aware Context-Enhanced Convolution) and DG-IBN (Dynamically Gated Instance-Batch Normalization). The DACEConv module jointly strengthens fine-grained discriminative cues and global context constraints during convolutional feature extraction and dynamically adjusts them using an input-adaptive fusion strategy to achieve flexible modeling of multi-scale features. The DG-IBN module employs a dual-gating mechanism of channel and space, adaptively fusing instance normalization and batch normalization to alleviate the distribution offset between the source and target domains. The two work together to enable the feature extractor to learn more discriminative and domain-invariant feature representations.

[0078] AMG-ProtoNet improves upon ProtoNet by introducing diagonal Mahalanobis distance based on shrinkage estimation, utilizing global statistical regularization of intra-class variance, and incorporating local similarity aggregation of dense features and foreground-guided weighting. This effectively suppresses background interference and strengthens key discriminative cues, thereby further enhancing the reliability and consistency of classification decisions. Furthermore, this study constructed a multi-instar image dataset (FAW) of fall armyworms based on a standard laboratory environment. This dataset strictly controls variables such as illumination and background, comprehensively covering the typical morphological characteristics of fall armyworms at each developmental stage.

[0079] Experimental results show that, under the 5-way 5-shot setting, AGCNet achieves a classification accuracy of 85.13% on the FAW dataset, a significant improvement of 7.67% compared to the benchmark algorithm (77.46%). Furthermore, compared to several representative cross-domain few-shot algorithms, this algorithm framework achieves competitive classification accuracy on multiple public datasets, fully demonstrating its superior generalization ability.

[0080] Based on the above-described preferred embodiments of the present invention, and through the foregoing description, those skilled in the art can make various changes and modifications without departing from the inventive concept. The technical scope of this invention is not limited to the contents of the specification, but must be determined according to the scope of the claims.

Claims

1. A classification method for fall armyworm based on AGCNet network, characterized in that, Includes the following steps: Step 1: Obtain images of the fall armyworm at each growth stage as the target domain images; Step 2: Pre-train the AGCNet model using source domain images to obtain the corresponding weights; The AGCNet model with weights loaded is used to classify the target domain image; the AGCNet model includes a feature extractor and a classifier; the feature extractor replaces the BN module of the ResNet10 network with the DG-IBN module, and replaces the second 3×3 convolution module of the main branch of stage3 and stage4 of the ResNet10 network with the DACEConv module.

2. The fall armyworm classification method based on AGCNet network according to claim 1, characterized in that, The DG-IBN module includes: Introduce a channel gate at the channel level. , Indicates input features The scalar obtained by performing global average pooling followed by a fully connected layer and a nonlinear transformation; This is the scaling factor. For the Sigmoid function; These are learnable parameters; For each sample Feature map Perform global average pooling in the spatial dimension to obtain the channel vector. Then through Convolution mapping as spatial gating ; , Channels Learnable weights and biases; The gate control coefficient is calculated by multiplying the channel gate by the space gate, and in... Interval truncation ; DG-IBN output ; To calibrate the characteristics of the BN channel; To calibrate the characteristics of the IN channel.

3. The fall armyworm classification method based on AGCNet network according to claim 1, characterized in that, The DACEConv module includes: fine-grained branching, context branching, and dynamic gating mechanisms; among which... Fine-grained branches include: depthwise separable convolution and low-rank SE attention mechanism; The context branch preserves the RFCAConv receptive field expansion and coordinate attention aggregation, and imposes a low-rank bottleneck constraint on the channel dimension; The outputs of the two branches are fused by a dynamic gating mechanism, which generates adaptive weights for the input through global average pooling, linear transformation, and the Softmax function.

4. The fall armyworm classification method based on AGCNet network according to claim 1, characterized in that, Classifiers include: ProtoNet.

5. The fall armyworm classification method based on AGCNet network according to claim 1, characterized in that, The classifier also includes AMG-ProtoNet, which consists of global and local branches.

6. The fall armyworm classification method based on AGCNet network according to claim 5, characterized in that, Global branches in each task unit, for the first... c The class supports obtaining the category prototype by taking the average of the set. And estimate the variance vector of this class on each feature dimension. ; Given the feature vector of the query sample , and the first c Diagonal Mahalanobis distance of the class ; Within the same task unit, in addition to calculating the variance of each category... In addition, the global variance vector is estimated based on the support features of all categories. And a more robust variance estimate is obtained through weighted fusion. ; The shrinkage coefficient; replace variance Then, based on the distance between the query sample and the prototype of each category, the global branch classification score for each category is calculated. .

7. The fall armyworm classification method based on AGCNet network according to claim 6, characterized in that, The cosine similarity between the local descriptor of the local branch and all local descriptors in the target category support set is calculated, and top-k aggregation is used to select the most relevant local matching response. Subsequently, the matching responses at each position are aggregated to obtain the local branch score of the category. ; A sample-level adaptive fusion strategy is adopted to weight and combine the classification scores of the global branch and the local branch. ; The weights are used to transform the confidence scores after temperature scaling.

8. The fall armyworm classification method based on AGCNet network according to claim 1, characterized in that, The growth stages of the fall armyworm include: egg, larva, pupa, and adult.

9. A classification system for fall armyworm based on the AGCNet network, characterized in that, include: Memory is used to store instructions that can be executed by the processor; A processor for executing instructions to implement the fall armyworm classification method based on the AGCNet network as described in any one of claims 1-8.

10. A computer-readable medium storing computer program code, characterized in that, The computer program code, when executed by a processor, implements the fall armyworm classification method based on the AGCNet network as described in any one of claims 1-8.