A method and apparatus for weed detection with direction sensitivity and high frequency enhancement

The lightweight weed detection model DHG-Net, constructed using techniques such as dynamic attention strip convolution and high-frequency interactive gating fusion modules, solves the problems of accuracy and robustness in weed detection under complex agricultural scenarios, achieving efficient weed detection.

CN122200367APending Publication Date: 2026-06-12XINJIANG AIR & EARTH INTEGRATION LABORATORY TECHNOLOGY CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XINJIANG AIR & EARTH INTEGRATION LABORATORY TECHNOLOGY CO LTD
Filing Date
2026-05-06
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing deep learning-based weed detection technologies struggle to maintain high detection accuracy in complex agricultural scenarios with lightweight models, especially in cases of occlusion and overlap, where they are prone to missed detections and misclassifications. Furthermore, traditional models have difficulty distinguishing between weed species with similar morphologies and handling complex backgrounds.

Method used

By employing dynamic attention strip convolution blocks, high-frequency interactive gating fusion modules, and gating asymmetric reparameterizable modules, and through direction-sensitive strip convolution combinations, lightweight spatial attention and channel adaptation, explicit high-frequency detail enhancement, and asymmetric convolution branches, a lightweight weed detection model DHG-Net is constructed, achieving accurate detection of slender targets and robust anti-interference performance in complex backgrounds.

🎯Benefits of technology

While maintaining the model's lightweight nature, it significantly improves the accuracy and robustness of weed detection, reduces computational complexity, and enhances the detection capability in complex scenes, especially the ability to capture slender weeds and suppress background noise.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122200367A_ABST
    Figure CN122200367A_ABST
Patent Text Reader

Abstract

The application discloses a direction-sensitive and high-frequency enhanced weed detection method and device, and the method comprises the following steps: dividing and writing a dataset label file for a pretreated corn weed dataset, taking the labeled dataset file as a dataset under a complex background; constructing a lightweight weed detection model DHG-Net based on a YOLOv13 model according to a dynamic attention strip convolution block, a high-frequency interaction gate fusion module and a gate asymmetric reparameterization module; and detecting weeds in a corn field based on the weed detection model DHG-Net. The device comprises a processor and a memory. The application realizes the weed detection with the consideration of both the light weight and the high precision under a complex scene, improves the detection precision through an optimized algorithm, and better adapts to the actual scene requirements.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of weed detection, and more particularly to a direction-sensitive and high-frequency enhanced weed detection method and apparatus. Background Technology

[0002] Weeds are a key biological stressor that restricts healthy plant growth. They compete with crops for light, water, nutrients, and space, posing a serious threat to crop yield and quality. Therefore, achieving accurate and rapid weed detection is of great significance for improving crop yield, reducing herbicide overuse, and promoting the development of precision agriculture.

[0003] In recent years, deep learning-based weed detection technology has made significant progress, especially in automatic feature extraction and target recognition, demonstrating superior performance compared to traditional methods. However, maintaining high detection accuracy while ensuring lightweight models remains a key challenge in practical agricultural applications. Lightweight models typically reduce computational overhead by decreasing the number of network layers, pruning, or using depthwise separable convolutions to meet the deployment requirements of edge devices (e.g., drones, mobile robots). However, this often comes at the cost of sacrificing feature representation capabilities, leading to increased false negative rates for small or poorly shaped weeds.

[0004] Currently, many scholars are dedicated to improving the structure of deep learning algorithms, such as introducing attention mechanisms, multi-scale feature fusion, or knowledge distillation, to enhance the robustness and adaptability of weed detection. However, although these methods have achieved significant accuracy improvements in controlled environments or specific datasets, their generalization ability in real-world, complex agricultural scenarios remains insufficient. Specifically, factors such as different growth stages (seedling stage, vigorous growth stage, flowering stage), temperature fluctuations, light variations, nutrient gradients, and water differences can significantly alter the growth status and morphological performance of plants within a region, leading to significant phenotypic plasticity of the same weed species under different environmental conditions. This inter-individual variation makes refined feature detection extremely difficult.

[0005] Even more challenging is the fact that some weed species share highly similar morphological characteristics. For example, Chenopodium album and Chenopodium glaucum are extremely similar in leaf shape, color, and plant structure, making them difficult to distinguish using conventional visual features. When these similar species are combined with intra-individual differences (e.g., leaf curling and color variations caused by the environment), the difficulty of detection is further increased, and traditional models are prone to misclassification or localization bias.

[0006] Furthermore, existing research has made significant progress in detecting individual weeds, enabling the identification of independently growing targets to a certain extent. However, in real-world farmland environments, crops and weeds often overlap and intertwine, sometimes forming multi-layered structures. In this complex context, models not only need to identify targets but also address challenges such as local occlusion, blurred boundaries, and multi-scale overlap. Most current lightweight models struggle to simultaneously meet the requirements for high-resolution modeling and real-time inference of overlapping regions within the constraints of limited parameters and computational resources, often resulting in a significant decrease in accuracy.

[0007] In summary, future research on weed detection in complex farmland scenarios should focus on the following directions: constructing diverse datasets covering various environmental conditions, weed species, and occlusion / overlap situations; developing adaptive feature extraction networks that balance lightweight design with high expressive power; and exploring weakly supervised or self-supervised learning mechanisms to improve the model's generalization ability under conditions of varied morphology and cluttered backgrounds. Only by achieving breakthroughs in these areas can weed detection technology be effectively deployed and reliably applied in real-world agricultural systems. Summary of the Invention

[0008] This invention provides a direction-sensitive and high-frequency enhanced weed detection method and device. The invention captures the long-range structure of slender targets through dynamic attention strip convolution with direction-sensitive strip combinations, and uses lightweight spatial attention and channel adaptation to focus on key regions. A high-frequency interactive gating fusion mechanism explicitly enhances low-level details and interacts bidirectionally with high-level semantics to generate discriminative fusion features. Furthermore, a gated asymmetric reparameterization structure is used during training to introduce asymmetric convolutional branches and sample-related channel gating to enhance expressive power. During inference, strict reparameterization folds the model into a single convolutional layer, thus achieving both lightweight and high-precision weed detection in complex scenarios. The core idea of ​​this model is to improve detection accuracy through algorithm optimization while maintaining low parameter count and low computational resources, better adapting to the needs of real-world scenarios. See the description below for details.

[0009] Firstly, a direction-sensitive and high-frequency enhanced weed detection method, the method comprising:

[0010] The preprocessed corn weed dataset was divided and a dataset label file was created. The labeled dataset file was then used as a dataset in a complex background.

[0011] Based on the dynamic attention strip convolutional block, high-frequency interactive gating fusion module, and gating asymmetric reparameterizable module, a lightweight weed detection model DHG-Net is constructed on the basis of the YOLOv13 model.

[0012] Weed detection in cornfields is performed using the DHG-Net weed detection model.

[0013] The dynamic attention strip convolutional block is:

[0014] At the spatial modeling level, a combination of vertical strip convolution and horizontal strip convolution is used to independently extract long-range structural information in two orthogonal principal directions.

[0015] At the saliency recalibration level, a lightweight spatial attention mechanism is introduced after dynamic strip convolution. By combining attention calculation with Softmax weight mapping, a spatial attention distribution map is constructed.

[0016] At the channel adaptation level, the attention distribution is dynamically transformed into a gating factor that can participate in feature modulation, so that the attention signal can not only act on the spatial location, but also be differentiated according to the channel dimension.

[0017] The calculation formula for the dynamic attention strip convolutional block is as follows:

[0018] y = BN(W1 * [DropPath(BN(W out * * f i )) + GeLU(BN(W1 * x))]) + x

[0019] Where y is the final output feature map of the module; x is the input feature map of the module; BN is batch normalization; W i It is the weight of the main path linear transformation or convolutional layer; W out These are the output projection weights of the fusion path; f i is the feature of the i-th branch; GeLU is the Gaussian error linear unit activation function; DropPath is stochastic depth regularization.

[0020] The high-frequency interactive gating fusion module is as follows:

[0021] A Laplace high-pass filter is applied to the spatially enhanced features channel by channel, and then injected into the original features through residual superposition. A learnable scaling factor is introduced to control the enhancement intensity.

[0022] Based on two high-frequency enhancement features, two sets of channel gating coefficients are generated, one for lower-level guidance of higher-level and the other for higher-level inverse constraint of lower-level, to achieve dynamic bidirectional alignment of details and semantics during the fusion process.

[0023] The channel-level fusion weight map is generated from the bidirectional cross-referencing features to perform fine-grained recalibration of the fusion representation; at the same time, coordinate attention is introduced to recalibrate the orientation and position information.

[0024] The features after two gating recalibrations are concatenated along the channel dimension and then shaped by lightweight convolution to obtain the module output. This output can be directly used for skip connections in the decoder or for subsequent feature fusion units.

[0025] The gated asymmetric reparameterizable module is as follows:

[0026] In addition to the standard 3×3 and 1×1 convolution branches, 1×3 and 3×1 asymmetric convolution branches are introduced, and identity residual batch normalization is preserved under shape matching conditions. All branches are aligned with the stride of the backbone to support downsampling operations.

[0027] The input features are subjected to global average pooling, and channel-level dynamic gating coefficients are generated for each branch through a multilayer perceptron. During the training phase, the outputs of each branch are weighted channel by channel.

[0028] The gating coefficients of each branch are averaged over the batch dimension and updated synchronously to the corresponding EMA vector;

[0029] The outputs of each branch are weighted and summed according to the corresponding dynamic gating coefficients, then added to the identity residual path, and after passing through the activation function, an enhanced representation is obtained, realizing multi-branch collaborative expression;

[0030] Conv-BN fusion is performed on each branch, expanding the 1×1 convolutional kernel to 3×3 size through zero padding at the center, and padding the 1×3 and 3×1 asymmetric convolutional kernels to 3×3 in the height and width directions, respectively. At the same time, the identity residual BN is equivalent to a unit 3×3 convolutional kernel with a center value of 1 and BN fusion is completed. Then, the channel-level scaling of the fused convolutional kernels and biases of each branch is performed using the calibrated EMA gating coefficients. Finally, the convolutional kernels and biases of all branches are summed element-wise to obtain a single 3×3 convolutional kernel and bias.

[0031] The multi-branch dynamic gating structure during training is replaced losslessly with a single-layer 3×3 convolution.

[0032] In a second aspect, a direction-sensitive and high-frequency enhanced weed detection device, the device comprising: a processor and a memory, the memory storing program instructions, the processor calling the program instructions stored in the memory to cause the device to perform the method described in any one of the first aspects.

[0033] Third aspect, a computer-readable storage medium storing a computer program, the computer program including program instructions that, when executed by a processor, cause the processor to perform the method described in any one of the first aspects.

[0034] The beneficial effects of the technical solution provided by this invention are:

[0035] 1. This invention uses a sequential combination of vertical and horizontal strip convolutions to accurately focus on the strip-shaped distribution features of high aspect ratio targets such as weed leaves and stems, avoiding the background noise introduced by square convolutions while maintaining the inherent lightweight advantage of strip convolutions (the number of parameters is much smaller than that of square convolutions of the same size); it introduces a lightweight spatial attention mechanism to adaptively enhance significant target regions, suppress background noise, and improve semantic focusing ability; and it achieves pixel-wise and channel-wise precise matching of attention and feature channels through a channel expansion mapping mechanism, alleviating the mismatch problem in training and improving training stability and feature expression consistency.

[0036] 2. This invention explicitly enhances high-frequency details through Laplacian high-pass filtering and residual injection, combined with learnable scaling coefficients, making the module more sensitive to edges and fine-grained structures, effectively solving the problem that infrared small targets are easily overwhelmed by noise due to their reliance on high-frequency details; through a bidirectional interactive gating mechanism of low-level guiding high-level and high-level inversely constraining low-level, dynamic alignment of details and semantics is achieved, avoiding semantic mismatch and information redundancy caused by one-way injection or static splicing, significantly improving the discriminativeness of fused features; through joint recalibration of channel-level fusion maps and coordinate attention, target-related channels are enhanced, redundant channels are suppressed, and directional sensitivity and background suppression capabilities are improved, providing enhanced feature representations with clear details, semantic alignment, and stable background suppression for small target detection in complex infrared scenes;

[0037] 3. This invention significantly improves the modeling ability for fine-grained features in the horizontal and vertical directions and the directionality of edge textures by introducing 1×3 and 3×1 asymmetric convolution branches, thus compensating for the problem of insufficient representation of asymmetric directional information. Through sample-related dynamic gating generation and exponential moving average calibration, it enhances the module's adaptive adjustment capability to complex scenes and provides static scaling coefficients for inference deployment. Through rigorous training-state branch fusion and pre-deployment reparameterization folding, the multi-branch dynamic gating structure is losslessly converted into a single-layer 3×3 convolution. During the inference stage, there is no need for dynamic gating computation and multi-path overhead, maximizing inference speed while maintaining high-performance representation during training, thus solving the problem of inference deployment burden caused by multi-branch structures.

[0038] 4. After introducing the three modules DASConv, HF-IGate, and GARBlock, the detection models for weeds such as *Amaranthus retroflexus*, *Amaranthus alkekengi*, *Portulaca oleracea*, *Chenopodium album*, *Chenopodium glaucum*, and *Setaria viridis* achieved a balance between lightweight design and high performance. DASConv accurately focuses on the strip-like features of the slender leaves and stems of weeds through sequential combination of strip convolutions, avoiding background noise. It also adaptively enhances the target region by combining lightweight attention and channel expansion mechanisms, improving the stability of feature representation. HF-IGate enhances high-frequency details through high-pass filtering and residual injection, making weed edges more sensitive. It uses bidirectional gating to achieve dynamic alignment of details and semantics, and suppresses redundant channels and enhances target channels through joint recalibration of channel and coordinate attention. GARBlock improves the fine-grained modeling capability in the horizontal and vertical directions through asymmetric convolution branches. It enhances scene adaptability by combining dynamic gating and exponential moving average, and maximizes inference speed by converting multi-branch losslessly into single-layer convolution through reparameterized folding. The three technologies work together to improve detection accuracy by 1.3%, reduce the number of model parameters by 21%, and reduce computational complexity by 17%, significantly reducing the deployment burden of edge devices. At the same time, they enhance the detection accuracy of slender weeds, edge capture capabilities, and robustness against interference in complex backgrounds. Attached Figure Description

[0039] Figure 1 A flowchart of a direction-sensitive and high-frequency enhanced weed detection method;

[0040] Figure 2 This is a network structure diagram of a direction-sensitive and high-frequency enhanced weed detection method.

[0041] Figure 3 A schematic diagram of the DASConv module;

[0042] Figure 4 This is a schematic diagram of the HF-IGate module;

[0043] Figure 5 This is a schematic diagram of the GARBlock module. Detailed Implementation

[0044] To make the objectives, technical solutions, and advantages of the present invention clearer, the embodiments of the present invention will be described in further detail below.

[0045] In recent years, deep learning-based weed detection technology has made significant progress, but maintaining high detection accuracy while keeping it lightweight remains a major challenge. The YOLO series of algorithms is widely used due to its efficiency. However, existing YOLO algorithms struggle to achieve low parameter and computational costs while maintaining high accuracy when detecting occluded and overlapping weeds. This invention proposes a direction-sensitive and high-frequency enhanced weed detection model, DHG-Net.

[0046] First, we propose the Dynamic Attention Strip Convolutional Block (DASConv), which uses a combination of direction-sensitive vertical and horizontal strip convolutions to capture long-range structural features and introduces lightweight spatial attention and channel adaptation mechanisms to achieve adaptive enhancement and semantic focus of key regions. Second, we propose the High-Frequency Interactive Gated Fusion Module (HF-IGate), which strengthens the dynamic alignment of low-level details and high-level semantics through explicit high-frequency enhancement and bidirectional interactive gating mechanisms, and uses channel-level fusion maps and coordinate attention for fine-grained recalibration to generate fusion features with strong discriminative power and stable background suppression. Third, we propose the Gated Asymmetric Reparameterizable Module (GARBlock), which introduces asymmetric convolution branches and sample-related channel gating during training to enhance direction sensitivity and adaptive expressive power, and uses a strict reparameterization mechanism to fold all branches into a single 3×3 convolution without loss, achieving both high performance and real-time deployment during the inference stage.

[0047] The meanings of the terms used in the embodiments of this invention are given below:

[0048] 1. Strip Convolution: Strip convolution is a convolution operation that uses a non-square kernel, typically 1×N or N×1, instead of the traditional square kernel, making it more suitable for weed detection. [1] .

[0049] 2. Channel Expansion Mapping Mechanism: The channel expansion mapping mechanism increases the dimensionality of the spatial attention map to the same number of channels as the feature map through 1×1 convolution, generating channel-specific and spatially independent gating factors. This achieves precise matching between attention weights and each feature channel, thereby alleviating the mismatch problem during training and improving the consistency of feature representation. [2] .

[0050] 3. Laplacian High-Pass Filter: The Laplacian high-pass filter is a second-order differential operator that quickly detects high-frequency details such as edges and corners with drastic gray-level changes by calculating the second derivative of image pixels in the neighborhood, while suppressing slowly changing low-frequency background components. [3] .

[0051] 4. Residual Stacking: Residual stacking refers to the operation of adding the transformed features to the original input features element-wise. Its core purpose is to allow the network to focus on learning the "residual difference" between the input and output without losing the original information, thereby alleviating gradient vanishing and improving training stability. [4] .

[0052] 5. Channel-level fusion: Channel-level fusion refers to assigning independent weights to each feature channel, adaptively enhancing important channels and suppressing redundant channels. [5] .

[0053] 6. Multilayer Perceptron: A multilayer perceptron is a machine in which each neuron in one layer is fully connected to the adjacent layers. It introduces nonlinear transformation capabilities through nonlinear activation functions (such as ReLU and Sigmoid), and can be used for tasks such as feature extraction, classification and regression, or dynamic gating coefficient generation. [6] .

[0054] 7. Identical Residual Batch Normalization: Identical residual batch normalization refers to the process in residual connections where, when the input and output shapes match, the input features (identical mapping) are directly added to the features transformed by convolution, etc. Batch normalization (BN) is also introduced along this identity path to maintain scale consistency between the residual branch and the main branch during training, thereby stabilizing the gradient flow and accelerating convergence. [7] .

[0055] Example 1

[0056] A direction-sensitive and high-frequency enhanced weed detection method, comprising the following steps:

[0057] Step 101: Take images of weeds in the cornfield, preprocess the collected cornfield weed image data, divide the preprocessed weed dataset into segments and create a dataset label file, and use the labeled dataset file as dataset A in the complex background.

[0058] Step 102: Based on the complex characteristics of dataset A, three innovative modules are proposed, namely:

[0059] ① Based on the direction-sensitive sequential combination design of vertical and horizontal strip convolution, it accurately captures the long-range structural features of slender targets while maintaining high efficiency, and introduces a lightweight spatial attention and channel adaptation mechanism to achieve adaptive enhancement and semantic focusing of key regions, thereby obtaining a dynamic attention strip convolution module.

[0060] ② Based on the design of explicit high-frequency enhancement and bidirectional interactive gating mechanism, the dynamic alignment and mutual modulation of low-level details and high-level semantics are strengthened, and fine-grained recalibration is performed using channel-level fusion graph and coordinate attention, thereby generating fusion features with strong discriminativeness and stable background suppression, and obtaining high-frequency interactive gating fusion module;

[0061] ③ Based on the introduction of asymmetric convolution branches and sample-related channel gating during the training period (statically calibrated by EMA) and the use of a strict reparameterization mechanism, all branches are folded into a single 3×3 convolution design without loss. This achieves both high performance and real-time deployment during the inference stage while enhancing orientation sensitivity and adaptive expressive ability, thus obtaining a gated asymmetric reparameterizable module.

[0062] Step 103: Based on the YOLOv13n model, and according to the combination of dynamic attention strip convolutional blocks, high-frequency interactive gating fusion modules, and gating asymmetric reparameterization modules proposed in Step 102, a lightweight weed detection model DHG-Net is constructed on the basis of YOLOv13.

[0063] Among them, the core objective of Dynamic Attention Strip Convolutional Block (DASConv) is to improve the model's ability to adaptively adjust to differences in input content, thereby accurately capturing the morphological features of slender targets under the premise of lightweight design. To this end, it has been designed in a coordinated manner from three dimensions: orientation-sensitive spatial modeling, saliency recalibration, and channel adaptation.

[0064] First, at the spatial modeling level, DASConv abandons the uniform receptive field of traditional square convolution and adopts a sequential combination of vertical and horizontal strip convolution to independently extract long-range structural information in two orthogonal principal directions. This direction-sensitive strip convolution combination not only maintains the inherent computational efficiency of strip convolution (the number of parameters is much smaller than that of square convolution of the same size), but also can accurately focus on the strip distribution characteristics of slender targets such as weed leaves and stems and high aspect ratio regions, effectively avoiding the drawback of square convolution introducing irrelevant background noise when modeling across regions.

[0065] Secondly, at the saliency recalibration level, DASConv introduces a lightweight spatial attention mechanism after dynamic strip convolution. By combining attention calculation with Softmax weight mapping, a spatial attention distribution map is constructed with extremely low computational overhead. This strategy avoids the quadratic complexity problem in traditional self-attention mechanisms, but can achieve adaptive enhancement of key regions in the input image. It assigns higher response weights to weed targets with significant morphology and clear structure, while suppressing background or noise regions, thereby significantly improving the semantic focusing ability of the model, which is different from the uniform convolution method that treats all regions equally.

[0066] Finally, at the channel adaptation level, in order to ensure that the spatial attention weights can achieve accurate pixel-by-pixel and channel-by-channel matching with the feature channels, DASConv designed a channel expansion mapping mechanism, which dynamically transforms the attention distribution into a gating factor that can participate in feature modulation. This allows the attention signal to not only act on the spatial location, but also to be differentiated according to the channel dimension. This mechanism effectively alleviates the mismatch problem that may occur between attention and features during training, and improves training stability and consistency of feature representation.

[0067] In summary, DASConv achieves efficient adaptive adjustment to differences in input content through the synergistic effect of three mechanisms: orientation-sensitive strip convolutional sequence combination, lightweight spatial attention-guided saliency recalibration, and attention dimension-aligned trainable channel adaptation mechanism. This significantly enhances the feature capture capability for slender weed targets while maintaining lightweight characteristics.

[0068] The formula for the DASConv module is shown below:

[0069] y = BN(W1 * [DropPath(BN(W out * * f i )) + GeLU(BN(W1 * x))]) + x

[0070] Where y is the final output feature map of the module; x is the input feature map of the module; BN is batch normalization; W i It is the weight of the main path linear transformation or convolutional layer; W out These are the output projection weights of the fusion path; f i is the specific feature of the i-th branch; GeLU is the Gaussian error linear unit activation function; DropPath is stochastic depth regularization.

[0071] Among them, the high-frequency interactive gating fusion module (HF-IGate) proposes a collaborative optimization fusion mechanism to address key issues in infrared small target detection, such as high-frequency details being easily submerged by background noise, lack of bidirectional transmission between low-level details and high-level semantics, and overly coarse-grained fusion weights.

[0072] First, in the input preparation stage, the high-frequency interactive gating fusion module upsamples the high-level features to the spatial size of the low-level features, and uses 1×1 convolution to map the two features to the same number of channels, establishing a fusionable common representation space; then, spatial attention maps are generated for the two features respectively to highlight the responses in key regions and provide a cleaner spatial prior for subsequent processing.

[0073] Based on this, the high-frequency interactive gating fusion module has achieved three core innovations:

[0074] Firstly, explicit modeling for high-frequency detail enhancement involves applying a Laplace high-pass filter to each channel of the spatially enhanced features to extract high-frequency components, and then injecting them into the original features through residual superposition. At the same time, a learnable scaling factor is introduced to control the enhancement intensity, making the module more sensitive to edges and fine-grained structures, balancing robustness and trainability, and effectively solving the problem that infrared small targets are easily overwhelmed by noise due to their reliance on high-frequency details.

[0075] Secondly, the cross-layer mutual transmission and fusion of bidirectional interactive gating is based on two high-frequency enhanced features. Two sets of channel gating coefficients are generated respectively, which guide the lower layer to the higher layer and the higher layer to constrain the lower layer. This achieves dynamic bidirectional alignment of details and semantics in the fusion process, avoiding semantic mismatch and information redundancy caused by traditional one-way injection or static splicing, and significantly improving the discriminativeness of the fused features.

[0076] Third, joint recalibration of channel-level fusion map and coordinate attention: channel-level fusion weight map is generated from the bidirectionally mutually derived features to perform fine-grained recalibration of the fusion expression, so as to enhance the target-related channels and suppress redundant channels; at the same time, coordinate attention is introduced to recalibrate the orientation and position information, further improving the orientation sensitivity and background suppression ability of the fusion features.

[0077] Finally, the features after two gated recalibrations are concatenated along the channel dimension and then shaped by lightweight convolution to obtain the module output. This output can be directly used for the skip connection of the decoder or subsequent feature fusion units, providing enhanced feature representations with clear details, semantic alignment, and stable background suppression for small target detection in complex infrared scenes.

[0078] The Gated Asymmetric Reparameterizable Module (GARBlock) addresses key issues such as underexpression of asymmetric directional information, inference deployment burden caused by multi-branch structures, difficulty in reusing dynamic gating mechanisms, and imprecise folding loop of reparameterization. It proposes a design scheme for gated asymmetric reparameterizable convolution (GARConv) that balances training-time expression enhancement with lightweight and efficient inference-time performance.

[0079] The specific processing procedure consists of six steps:

[0080] The first step is to construct a multi-branch structure—in addition to the standard 3×3 and 1×1 convolution branches, 1×3 and 3×1 asymmetric convolution branches are introduced, and identity residual batch normalization (BN) is preserved under shape matching conditions. All branches are aligned with the stride of the backbone to support downsampling operations, thereby significantly improving the ability to model fine-grained features in the horizontal and vertical directions as well as the directionality of edge textures.

[0081] The second step is sample-related dynamic gating generation—the input features are subjected to global average pooling (GAP), and channel-level dynamic gating coefficients are generated for each branch through a multilayer perceptron (MLP). During the training phase, the output of each branch is weighted channel by channel to enhance the module's adaptive adjustment capability for complex scenes and diverse samples.

[0082] The third step is the Exponential Moving Average (EMA) calibration update—the gating coefficients of each branch are averaged over the batch dimension and synchronously updated to the corresponding EMA vector, providing static scaling calibration coefficients for the inference deployment phase.

[0083] The fourth step is to fuse the training branch outputs—the outputs of each branch are weighted and summed according to the corresponding dynamic gating coefficients, and then added to the identity residual path. After passing through the activation function, the enhanced representation is obtained, realizing multi-branch collaborative expression.

[0084] The fifth step is strict reparameterization folding before deployment—first, Conv-BN fusion is performed on each branch, expanding the 1×1 convolutional kernel to 3×3 size through zero padding at the center, and padding the 1×3 and 3×1 asymmetric convolutional kernels to 3×3 in the height and width directions respectively. At the same time, the identity residual BN is equivalent to a unit 3×3 convolutional kernel with a center value of 1 and BN fusion is completed. Then, the channel-level scaling of the fused convolutional kernels and biases of each branch is performed using the calibrated EMA gating coefficients. Finally, the element-wise summation of the convolutional kernels and biases of all branches is obtained to obtain a single 3×3 convolutional kernel and bias.

[0085] Step 6, one-click deployment switch - call the deployment switch function to seamlessly replace the multi-branch dynamic gating structure during training with a single-layer 3×3 convolution. No dynamic gating computation or multi-path overhead is required during the inference stage. While maintaining high performance during training, it maximizes inference speed and is friendly to edge device deployment.

[0086] Among them, see Figure 2 The DHG-Net model adopts a three-part architecture of Backbone-Neck-Head, supplemented by a global feature enhancement module: The Backbone part uses Conv as the initial convolutional layer, followed by efficient convolutional modules such as DARConv, DSConv, HF-A2C2f, and DS-C3k2, to extract multi-scale features from the input image layer by layer; The Neck part relies on the FullIPAD Tunnel feature tunnel, and achieves the fusion and enhancement of features at different scales through Concat feature concatenation, UpSample upsampling operations, and residual connections. At the same time, the HyperACE global feature enhancement module is introduced to generate H5, H4, and H3 multi-scale enhanced features and inject them into the FullIPAD Tunnel to further enhance the feature expression capability; The Head part is equipped with three detection branches, each corresponding to the detection task of targets at different scales, and finally outputs the target detection results. Overall, the model's target detection performance is improved through the collaborative design of each module.

[0087] See Figure 3The DASConv module's input features are first processed by initial Conv2d, then divided into three parallel branches. These branches use convolutional kernels of different sizes (3×3, 5×5, and 7×7) to extract multi-scale receptive field features. Each branch is then processed sequentially by BatchNorm normalization, GeLU activation, 1×1 convolution dimensionality reduction, and a second BatchNorm operation. The output features from the three branches are then concatenated. The concatenated features are compressed using Adaptive AvgPool, and then channel attention weights are generated through two Conv2d operations, one ReLU activation, and one Softmax activation. Finally, these weights are concatenated with the initial input features to output an enhanced feature map. This combination of multi-scale feature extraction and channel attention mechanisms enhances feature representation capabilities.

[0088] See Figure 4 The HF-IGate input features are split into two parallel processing paths: x_low (low-scale) and x_high (high-scale). The two paths first extract features through a SAM spatial attention module and two CBR3×3 convolutions. One path generates SAM_low weights via a Sigmoid function, and the other generates SAM_high weights. These two paths are multiplied with the original features from the opposing branch and then fused to obtain intermediate features. The intermediate features are then processed through a Conv1×1 and Efficient Attention module to generate SAM_fuse global attention, and then through Adaptive AvgPool to obtain global weights. Simultaneously, the x_low and x_high branches each generate spatial detail weights through residual summation, CBR1×1, Laplacian Filter, and Sigmoid. After multiplying with the global weights, the two paths are then fused through Concat and CBR1×1. Convolution, which ultimately outputs an enhanced feature map, achieves synergy between spatial attention, cross-scale interaction, detail enhancement, and global attention, thereby strengthening the ability to express and fuse multi-scale features.

[0089] See Figure 5The GARBlock input features are first processed by Conv2d, BatchNorm, SiLU activation, DSConv depthwise separable convolution, and LWSA lightweight spatial attention module for initial feature extraction and enhancement. Subsequently, the features are fed into five parallel heterogeneous convolutional branches (3×1, 1×3, 3×3, 5×1, 1×5), each followed by a 1×1 convolution for channel adjustment to extract multi-directional, multi-scale spatial features. Simultaneously, the input features are processed through a parallel global context branch, undergoing Adaptive AvgPool adaptive average pooling, SiLU activation, two 1×1 convolutions, ReLU, and Sigmoid activation to generate global channel attention weights. The outputs of the multi-scale branch and the global weight branch are concatenated and then sequentially passed through an EMAGate gate and Softmax activation, ultimately outputting an enhanced feature map. This achieves synergy between multi-scale spatial features, global context attention, and gated feature selection, enhancing feature representation capabilities with a lightweight design.

[0090] 104: Detection of weeds in cornfields based on the DHG-Net weed detection model.

[0091] In summary, the embodiments of the present invention achieve both lightweight and high-precision weed detection in complex scenarios, better adapting to the needs of actual scenarios.

[0092] Example 2

[0093] This invention provides a direction-sensitive and high-frequency enhanced weed detection method. This invention addresses the dual challenges of limited recognition accuracy in computer vision-based cornfield weed detection due to complex backgrounds and occluded targets, and the difficulty of deploying high-performance models on resource-constrained devices. Details are provided below:

[0094] Step 201: Construct a lightweight cornfield weed dataset, take photos of cornfield weeds, perform lightweight adaptation preprocessing on the collected data, then divide the dataset and write label files;

[0095] This step includes:

[0096] Step (1.1): Collect image data of six types of plants: Amaranthus retroflexus, Amaranthus albiflorus, Chenopodium album, Chenopodium monnieri, Portulaca oleracea, and Setaria viridis. Data sources include actual field photography, public datasets, and agricultural research databases. During the collection process, samples with moderate background complexity, relatively complete targets, and representativeness are prioritized to reduce unnecessary reliance on high-resolution features in subsequent models.

[0097] Step (1.2): Perform lightweight annotation on the cornfield weed image data. Manual annotation is performed according to standards established by agricultural experts, including the type of weed. To reduce feature redundancy during model inference, the annotation focuses on areas with clear target outlines and high recognizability, avoiding the introduction of overly blurry or severely occluded samples, thereby reducing the learning burden on the lightweight model.

[0098] Step (1.3): Standardize the images and perform lightweight size adaptation. Crop and resize the images of various weeds in the cornfield. Considering the computational and storage limitations of resource-constrained devices, the image size is uniformly set to 320×320 (instead of 640×640). This significantly reduces the input dimension while maintaining the recognizability of key features, thereby reducing the computational load and parameter count of subsequent convolutional layers.

[0099] Step (1.4): Divide the dataset by category to support a lightweight training strategy. Save the data to their respective folders according to the weed category, and randomly divide them into training, test, and validation sets in an 8:1:1 ratio. During the partitioning process, ensure that the target scale, lighting conditions, and background complexity distributions are similar in each set to improve the generalization ability of the lightweight model with limited parameters.

[0100] Step (1.5): Create a lightweight tag file. The tag file contains: the dataset category to which the image belongs, the cornfield weed category to which the image belongs, and the image data path. Furthermore, to facilitate efficient loading on mobile or embedded devices, the tag file uses a concise key-value pair structure, avoids redundant fields, and can support integer-encoded category indexes as needed, reducing runtime parsing overhead.

[0101] Step 202: Based on the YOLOv13n model, a direction-sensitive and high-frequency enhanced weed detection model DHG-Net is constructed by combining dynamic attention strip convolutional blocks, high-frequency interactive gating fusion modules, and gating asymmetric reparameterization modules to solve the above problems.

[0102] Among them, the YOLOv13n model, as the latest lightweight version of the YOLO series, has the advantages of small number of parameters, small amount of computation, and fast detection speed. It performs well in target detection tasks with simple backgrounds, but in complex farmland environments, due to problems such as weeds occluding each other, overlapping with crops, and varied morphology, the detection effect is still not ideal.

[0103] Specifically, in pursuit of extreme lightweightness and real-time performance, the YOLOv13n model employs a shallow backbone and a simplified neck structure, resulting in insufficient ability to extract local features from occluded targets. For example, when only the edges of some leaves are visible on a weed target, the model struggles to activate enough high-level semantic features for accurate identification. Furthermore, the model's multi-scale feature fusion capability is relatively weak. For partially occluded small targets, the information transfer between shallow details and deep semantics is insufficient, leading to a significant increase in the false negative rate in occluded scenarios. In addition, while the YOLOv13n model drastically reduces the number of channels and simplifies the attention mechanism, effectively reducing computational overhead, it also prevents the model from dynamically suppressing interfering features introduced by occluded regions. This results in insufficient boundary discrimination and instance discrimination capabilities for overlapping targets, easily leading to multiple targets being misdetected as a single target or targets being obscured by the background.

[0104] In summary, YOLOv13n performs well in ideal scenarios, but its lightweight design has key shortcomings when dealing with challenges such as occlusion, overlap and morphological variations in complex farmland environments. These shortcomings include insufficient feature extraction, weak multi-scale fusion capability and insufficient interference suppression capability. For example, the false detection rate increases when crops and weeds are similar in color.

[0105] Therefore, this invention addresses the occlusion and overlap issues by improving and optimizing the backbone and neck sections, respectively. First, a Dynamic Attention Strip Convolutional Block (DASConv) is designed, employing a direction-sensitive combination of vertical and horizontal strip convolutions to capture long-range structural features. It also introduces lightweight spatial attention and channel adaptation mechanisms to achieve adaptive enhancement and semantic focus in key regions. Second, a High-Frequency Interactive Gated Fusion Module (HF-IGate) is designed. This module enhances the dynamic alignment of low-level details and high-level semantics through explicit high-frequency enhancement and bidirectional interactive gating mechanisms. Fine-grained recalibration using channel-level fusion maps and coordinate attention generates highly discriminative and stable background suppression fusion features. Third, a Gated Asymmetric Reparameterizable Module (GARBlock) is designed. During training, asymmetric convolutional branches and sample-related channel gating are introduced to enhance direction sensitivity and adaptive expressive power. A strict reparameterization mechanism is used to seamlessly fold all branches into a single 3×3 convolution, achieving both high performance and real-time deployment during the inference phase.

[0106] Step (2.1): To address the issues of insufficient extraction of elongated targets and local features in YOLOv13n, this embodiment of the invention proposes Dynamic Attention Strip Convolutional Block (DASConv). See the network structure diagram below. Figure 2The core objective of Dynamic Attention Strip Convolutional Block (DASConv) is to improve the model's ability to adaptively adjust to differences in input content, thereby accurately capturing the morphological features of slender targets while maintaining a lightweight design. To this end, it employs a collaborative design across three dimensions: orientation-sensitive spatial modeling, saliency recalibration, and channel adaptation. Firstly, at the spatial modeling level, DASConv abandons the uniform receptive field of traditional square convolutions, adopting a sequential combination of vertical and horizontal strip convolutions to independently extract long-range structural information along two orthogonal principal directions. This orientation-sensitive strip convolution combination not only maintains the inherent computational efficiency of strip convolutions (with far fewer parameters than square convolutions of the same size), but also accurately focuses on the strip-like distribution features of slender targets such as weed leaves and stems, as well as high aspect ratio regions, effectively avoiding the drawback of square convolutions introducing irrelevant background noise when modeling across regions. Secondly, at the saliency recalibration level, DASConv introduces a lightweight spatial attention mechanism after dynamic strip convolution. By combining attention computation with Softmax weight mapping, it constructs a spatial attention distribution map with extremely low computational overhead. This strategy avoids the quadratic complexity problem of traditional self-attention mechanisms, while achieving adaptive enhancement of key regions in the input image—assigning higher response weights to salient, well-structured weed targets, while suppressing background or noisy regions, thereby significantly improving the model's semantic focusing ability, unlike uniform convolution which treats all regions equally. Finally, at the channel adaptation level, to ensure that the spatial attention weights can achieve accurate pixel-by-pixel and channel-by-channel matching with the feature channels, DASConv designs a channel expansion mapping mechanism. This dynamically transforms the attention distribution into a gating factor that can participate in feature modulation, allowing the attention signal to not only act on spatial location but also to be differentiated according to channel dimensions. This mechanism effectively alleviates the mismatch problem that may occur between attention and features during training, improving training stability and the consistency of feature representation. In summary, DASConv achieves efficient adaptive adjustment to differences in input content through the synergistic effect of three mechanisms: orientation-sensitive strip convolutional sequence combination, lightweight spatial attention-guided saliency recalibration, and attention dimension-aligned trainable channel adaptation mechanism. This significantly enhances the feature capture capability for slender weed targets while maintaining lightweight characteristics.

[0107] like Figure 2As shown, firstly, in the orientation-sensitive spatial modeling stage, the input feature map is sequentially combined with vertical and horizontal strip convolutions to independently extract long-range structural information in two orthogonal principal directions. This accurately focuses on the strip-like distribution features of slender targets such as weed leaves and stems, while maintaining the inherent computational efficiency of strip convolutions and avoiding the introduction of irrelevant background noise by square convolutions. Next, in the saliency recalibration stage, the feature map processed by strip convolutions is fed into a lightweight spatial attention mechanism. Through attention calculation combined with Softmax weight mapping, a spatial attention distribution map is constructed with extremely low computational overhead. Higher response weights are assigned to key regions with significant morphology and clear structure, while background or noise regions are suppressed, thereby achieving adaptive enhancement and semantic focus. Finally, in the channel adaptation stage, to ensure accurate pixel-by-pixel and channel-by-channel matching between spatial attention weights and feature channels, the module dynamically transforms the attention distribution into a gating factor that can participate in feature modulation through a channel expansion mapping mechanism. This allows the attention signal to not only act on spatial location but also to be differentiated according to channel dimensions, effectively alleviating the mismatch between attention and features during training. Finally, the feature map processed in these three stages is used as the output of DASConv. While maintaining its lightweight characteristics, it achieves efficient adaptive adjustment to differences in input content, significantly enhancing the feature capture capability for slender weed targets.

[0108] Step (2.2): To address the issues of weak multi-scale feature fusion capability and insufficient interference suppression in YOLOv13n, this embodiment of the invention designs a high-frequency interactive gating fusion module (HF-IGate). This module addresses key problems in infrared small target detection, such as high-frequency details being easily submerged by background noise, lack of bidirectional transmission between low-level details and high-level semantics, and overly coarse-grained fusion weights. A collaborative optimization fusion mechanism is proposed. First, in the input preparation stage, the module upsamples high-level features to the spatial size of low-level features and uses 1×1 convolution to map both feature paths to the same number of channels, establishing a fusionable common representation space. Subsequently, spatial attention maps are generated for both feature paths to highlight key region responses, providing a cleaner spatial prior for subsequent processing. Building upon this foundation, the module achieves three core innovations: First, explicit modeling of high-frequency detail enhancement—a Laplace high-pass filter is applied channel-by-channel to extract high-frequency components from the spatially enhanced features, and these components are injected into the original features through residual superposition. Simultaneously, a learnable scaling factor is introduced to control the enhancement intensity, making the module more sensitive to edges and fine-grained structures, balancing robustness and trainability, and effectively solving the problem that infrared small targets are easily overwhelmed by noise due to their reliance on high-frequency details. Second, cross-layer mutual transmission and fusion with bidirectional interactive gating—based on two high-frequency enhanced features, two sets of channel gating coefficients are generated, one for lower-layer guidance of higher-layer features and the other for higher-layer inverse constraint of lower-layer features. This achieves dynamic bidirectional alignment of details and semantics during the fusion process, avoiding semantic mismatch and information redundancy caused by traditional unidirectional injection or static splicing, and significantly improving the discriminative power of the fused features. Third, joint recalibration of channel-level fusion map and coordinate attention—a channel-level fusion weight map is generated from the bidirectionally mutually derived features, and the fusion representation is recalibrated in a fine-grained manner to enhance target-related channels and suppress redundant channels; at the same time, coordinate attention is introduced to recalibrate the orientation and position information, further improving the orientation sensitivity and background suppression capability of the fusion features. Finally, the features recalibrated by the two gates are concatenated along the channel dimension and shaped by lightweight convolution to obtain the module output. This output can be directly used for skip connections in the decoder or subsequent feature fusion units, providing enhanced feature representations with clear details, semantic alignment, and stable background suppression for small target detection in complex infrared scenes.

[0109] First, in the input preparation stage, high-level features are upsampled to the spatial size of low-level features. Then, a 1×1 convolution is used to map both features to the same number of channels to establish a fusionable common representation space. Spatial attention maps are generated for each feature path to highlight responses in key regions, providing a cleaner spatial prior for subsequent processing. Next, in the high-frequency detail enhancement stage, a Laplacian high-pass filter is applied channel-by-channel to extract high-frequency components from the spatially enhanced features. These components are then injected into the original features through residual stacking. A learnable scaling factor is introduced to control the enhancement intensity, making the module more sensitive to edges and fine-grained structures. Then, in the bidirectional interactive gating fusion stage, two sets of channel gating coefficients are generated based on the two high-frequency enhanced features: one for guiding the high-level features and the other for inversely constraining the low-level features. This achieves dynamic bidirectional alignment of details and semantics during the fusion process, avoiding semantic mismatch and information redundancy caused by traditional unidirectional injection or static splicing. Next, the channel-level joint recalibration stage begins. Channel-level fusion weight maps are generated from the bidirectionally cross-referenced features. Fine-grained recalibration of the fused representation enhances target-related channels and suppresses redundant channels. Simultaneously, coordinate attention is introduced to recalibrate directional position information, further improving orientation sensitivity and background suppression capabilities. Finally, in the output shaping stage, the two gated recalibrated features are concatenated along the channel dimension and shaped using lightweight convolution to obtain the module output. This output can be directly used for skip connections in the decoder or subsequent feature fusion units.

[0110] Step (2.3): To address the challenge of balancing lightweight design with expressive power in YOLOv13n, a gated asymmetric reparameterizable module (GARBlock) is proposed. This module addresses key issues such as underexpression of asymmetric directional information, the inference deployment burden caused by multi-branch structures, the difficulty in reusing dynamic gating mechanisms, and the lack of rigor in reparameterization folding loops. A gated asymmetric reparameterizable convolution (GARConv) design scheme is proposed that balances training-phase expression enhancement with lightweight and efficient inference. The specific processing flow consists of six steps: First, constructing a multi-branch structure—in addition to the standard 3×3 and 1×1 convolution branches, 1×3 and 3×1 asymmetric convolution branches are introduced. Identical residual batch normalization (BN) is preserved under shape-matching conditions. All branches are aligned with the stride of the main branch, supporting downsampling operations, thereby significantly improving the modeling ability for fine-grained features in the horizontal and vertical directions and the directionality of edge textures. The second step is sample-related dynamic gating generation—Global average pooling (GAP) is applied to the input features, and a multilayer perceptron (MLP) is used to generate channel-level dynamic gating coefficients for each branch. During the training phase, the outputs of each branch are weighted channel-by-channel to enhance the module's adaptive adjustment capability to complex scenarios and diverse samples. The third step is exponential moving average (EMA) calibration update—The gating coefficients of each branch are averaged over the batch dimension and synchronously updated to the corresponding EMA vector, providing static scaling calibration coefficients for the inference deployment phase. The fourth step is training-state branch fusion output—The outputs of each branch are weighted and summed according to the corresponding dynamic gating coefficients, then added to the identity residual path, and after passing through an activation function, an enhanced representation is obtained, achieving multi-branch collaborative expression. Step 5, Strict Reparameterization and Folding Before Deployment—First, Conv-BN fusion is performed on each branch, expanding the 1×1 convolutional kernel to 3×3 size through zero padding at the center. The 1×3 and 3×1 asymmetric convolutional kernels are padded to 3×3 in the height and width directions, respectively. At the same time, the identity residual BN is equivalent to a unit 3×3 convolutional kernel with a center value of 1 and BN fusion is completed. Then, the fused convolutional kernels and biases of each branch are scaled at the channel level using the calibrated EMA gating coefficients. Finally, the convolutional kernels and biases of all branches are summed element-wise to obtain a single 3×3 convolutional kernel and bias. Step 6, One-Click Deployment Switching—The deployment switching function is called to seamlessly replace the multi-branch dynamic gating structure during training with a single-layer 3×3 convolution. No dynamic gating computation or multi-path overhead is required during the inference stage. While maintaining high-performance expression during training, inference speed is maximized and edge device deployment is friendly.

[0111] like Figure 4As shown, firstly, the input feature map is simultaneously fed into five branches in parallel: standard 3×3 convolution, 1×1 convolution, 1×3 asymmetric convolution, 3×1 asymmetric convolution, and the identity residual batch normalization branch. All branches are aligned with the stride of the backbone to support downsampling operations. Simultaneously, global average pooling is performed on the input features, generating channel-level dynamic gating coefficients for the five branches via two layers of a multilayer perceptron. Subsequently, the output features of each branch are multiplied by their corresponding gating coefficients and weighted channel-wise. Then, all weighted branch outputs are added to the identity residual path, and after passing through an activation function, the enhanced representation output is obtained. During the inference deployment phase, the model pre-performs rigorous reparameterization folding: it merges the convolutional kernels and batch normalization of each branch, expands 1×1, 1×3, and 3×1 convolutional kernels to 3×3 size, and equivalence residual batch normalization is equivalent to a single 3×3 convolutional kernel. Then, it uses gating coefficients calibrated by exponential moving average to perform channel-level scaling on the weights and biases of each branch. Finally, it sums the convolutional kernels and biases of all branches element-wise to obtain a single 3×3 convolutional kernel and bias. After folding, the module switches to a single-layer 3×3 convolutional structure. At this point, the feature map only needs to pass through this one convolutional layer to directly output the result, without any dynamic gating computation or multi-path overhead, maximizing inference speed.

[0112] Step 203: Set evaluation indicators and experimental parameters, specifically:

[0113] Step (3.1): Set evaluation metrics: To evaluate the target detection model, this embodiment of the invention uses multiple metrics, including: maximum average precision (mAP50~95), mean average precision (mAP50), precision, recall, computations per second (GFLOPs), and number of parameters.

[0114] Step (3.2): Setting experimental parameters: The experimental equipment was an Nvidia RTX 3050 Ti 4G, and the environment was configured with CUDA 11.7, Torch 1.12.0, and Torchvision 0.13.0. The learning rate decrease function was a cosine function, the model optimizer was stochastic gradient descent (SGD), and the optimizer weight decay factor was set to 0.0005. Considering the GPU memory limitations, the batch size was set to 6, the training cycle was set to 500 epochs, and the experimental resolution was 640×640 pixels.

[0115] Table 1

[0116]

[0117] Step 204: Design comparative experiments to verify the comprehensive performance advantages of the method proposed in this invention.

[0118] Step 4.1: Comparative experiments were conducted with multiple YOLO series models on public datasets, namely YOLOv5n, YOLOv8n, YOLOv9t, YOLOv10n, YOLOv11n, YOLOv12n, and YOLOv13n, to verify the performance of the proposed model in object detection tasks.

[0119] Step 4.2: To evaluate the effectiveness of the model improvement strategy in weed detection in cornfields, ablation experiments were conducted on the DASConv, HF-IGate, and GARBlock modules.

[0120] Step 4.3: To systematically evaluate the model deployment efficiency and accuracy of the pruning and distillation strategies in weed detection in cornfields, an ablation experiment was conducted on the pruning and distillation strategies.

[0121] In summary, this embodiment of the invention, through steps 201 to 204 above, builds upon the existing target detection model YOLOv13n by combining the proposed Dynamic Attention Strip Convolutional Block (DASConv), High Frequency Interactive Gated Fusion Module (HF-IGate), and Gated Asymmetric Reparameterizable Module (GARBlock) to propose a direction-sensitive and high-frequency enhanced weed detection method. This embodiment of the invention has achieved good results on both self-built datasets and publicly available cornfield weed datasets.

[0122] Example 3

[0123] The feasibility of the scheme in Example 1 is verified below using specific experimental data and calculation formulas, as detailed in the following description:

[0124] Comparative experiments were conducted with several YOLO series models on public datasets, namely YOLOv5n, YOLOv8n, YOLOv9t, YOLOv10n, YOLOv11n, YOLOv12n, and YOLOv13n, to verify the performance of the proposed model in object detection tasks.

[0125] Table 2

[0126]

[0127] To systematically evaluate the effectiveness of the model improvement strategy in weed detection in cornfields, ablation experiments were conducted on three modules: Dynamic Attention Strip Convolutional Block (DASConv), High Frequency Interactive Gated Fusion Module (HF-IGate), and Gated Asymmetric Reparameterizable Module (GARBlock).

[0128] Table 3

[0129]

[0130] Example 4

[0131] To evaluate the overall performance of the model, the evaluation metrics used in this experiment are mean precision, precision, recall, F1 score, GFLOPs, and parameters. Mean precision, mean precision, and F1 score consider both precision and recall. The calculation methods for these metrics are summarized in formulas (1)-(7). Mean precision, mean precision, and F1 score consider both precision and recall. The above metrics are defined as follows:

[0132]

[0133]

[0134]

[0135]

[0136]

[0137]

[0138]

[0139] In this context, TP represents a true positive defect, FP represents a false positive defect, and FN represents a false negative defect. Precision is the proportion of true positive samples among predicted positive samples. Recall is the proportion of true positive samples correctly identified. AP is the mean precision; the precision-recall curve reflects the relationship between precision and recall for classifiers at different thresholds. F1-Score is a metric used to comprehensively evaluate the performance of classification models; it is the harmonic mean of precision and recall, aiming to balance the trade-off between the two. GFLOPs is a metric for measuring model computational complexity, used to evaluate the computational efficiency of the model in practical applications. Parameters is an important metric for measuring model complexity, reflecting the model's storage requirements and training difficulty, where O represents constant order, K represents kernel size, C represents the number of channels, M represents the input image size, and i represents the number of iterations.

[0140] To demonstrate the accuracy of the method, a generalization comparison was conducted using the publicly available dataset CottonWeedDet12 and the self-built dataset Six weeds. The CottonWeedDet12 dataset was collected from common weeds in cotton fields in various southern states of the United States. These images were taken between February and October 2021 under natural light conditions using smartphones or handheld digital cameras, forming a large-scale weed dataset.

[0141] The results show that the model proposed in this embodiment of the invention exhibits superior performance on both public and self-built datasets. Furthermore, the model in this embodiment of the invention improves both the average detection accuracy and F1-score on different datasets, indicating that SSS-YOLO has good generalization ability across various datasets.

[0142] Table 4

[0143]

[0144] This invention is based on Anaconda Navigator 2.14. It uses the GUI functionality to design the original *.ui and *.py files. Then, the *.ui file is modified using the pyside6-designer built into Anaconda Navigator 2.14 to generate the *.py file, which can be used for interface design.

[0145] The software offers multiple cornfield weed detection models for users to choose from. These models may be based on different algorithms or training data, allowing users to select the most suitable model for cornfield weed detection according to their needs and application scenarios. Each model may have different performance characteristics and feature recognition capabilities, allowing users to choose based on their specific circumstances.

[0146] The software currently supports detecting six common cornfield weeds. When the user inputs a photograph, the system will activate the offline detection function; when the user uses a camera for real-time detection, the system will activate the real-time detection function.

[0147] The software can not only detect the types of weeds in cornfields, but also count their numbers. Users can obtain information on the quantity of each type of weed in the detected area, making it easier to understand the distribution of weeds in cornfields.

[0148] To improve the accuracy and flexibility of detection, the software provides adjustable confidence level and IOU (Intersection over Union) parameters. Users can adjust these parameters as needed to meet the precise detection requirements of different scenarios and needs.

[0149] For offline detection, the system supports saving the detection results. After each offline detection, users can save the detection results by clicking the "Save" button. The saved data includes photos or videos with marked boxes after detection, as well as a *.csv file containing the weed category, confidence level, and coordinate location, which facilitates subsequent weed tracking.

[0150] The software can present the detection results to users in the form of intuitive charts and data. Users can understand the distribution of different weed types in cornfields through statistical results, which can help them with agricultural research, ecological analysis, and other work.

[0151] Example 5

[0152] A direction-sensitive and high-frequency enhanced weed detection device includes a processor and a memory. The memory stores program instructions, and the processor invokes the program instructions stored in the memory to cause the device to perform the following method steps in Embodiment 1:

[0153] The collected cornfield weed data was preprocessed, and the preprocessed weed dataset was divided and a dataset label file was created.

[0154] Based on the YOLOv13n model, a direction-sensitive and high-frequency enhanced weed detection model DHG-Net is constructed by combining dynamic attention strip convolutional blocks, high-frequency interactive gating fusion modules, and gating asymmetric reparameterizable modules.

[0155] Weed detection is performed using a direction-sensitive and high-frequency enhanced weed detection model.

[0156] The core objective of the Dynamic Attention Strip Convolutional Block (DASConv) is to improve the model's adaptive adjustment capability to differences in input content, thereby accurately capturing the morphological features of slender targets while maintaining a lightweight design. To this end, it was designed collaboratively from three dimensions: orientation-sensitive spatial modeling, saliency recalibration, and channel adaptation. First, at the spatial modeling level, DASConv abandons the uniform receptive field of traditional square convolutions and adopts a sequential combination of vertical and horizontal strip convolutions to independently extract long-range structural information in two orthogonal principal directions. This orientation-sensitive strip convolution combination not only maintains the inherent computational efficiency of strip convolutions (the number of parameters is much smaller than that of square convolutions of the same size), but also accurately focuses on the strip distribution features of slender targets such as weed leaves and stems, as well as high aspect ratio regions, effectively avoiding the drawback of square convolutions introducing irrelevant background noise when modeling across regions. Secondly, at the saliency recalibration level, DASConv introduces a lightweight spatial attention mechanism after dynamic strip convolution. By combining attention computation with Softmax weight mapping, it constructs a spatial attention distribution map with extremely low computational overhead. This strategy avoids the quadratic complexity problem of traditional self-attention mechanisms, while achieving adaptive enhancement of key regions in the input image—assigning higher response weights to salient, well-structured weed targets, while suppressing background or noisy regions, thereby significantly improving the model's semantic focusing ability, unlike uniform convolution which treats all regions equally. Finally, at the channel adaptation level, to ensure that the spatial attention weights can achieve accurate pixel-by-pixel and channel-by-channel matching with the feature channels, DASConv designs a channel expansion mapping mechanism. This dynamically transforms the attention distribution into a gating factor that can participate in feature modulation, allowing the attention signal to not only act on spatial location but also to be differentiated according to channel dimensions. This mechanism effectively alleviates the mismatch problem that may occur between attention and features during training, improving training stability and the consistency of feature representation. In summary, DASConv achieves efficient adaptive adjustment to differences in input content through the synergistic effect of three mechanisms: orientation-sensitive strip convolutional sequence combination, lightweight spatial attention-guided saliency recalibration, and attention dimension-aligned trainable channel adaptation mechanism. This significantly enhances the feature capture capability for slender weed targets while maintaining lightweight characteristics.

[0157] Among them, the High-Frequency Interactive Gated Fusion Module (HF-IGate) addresses key issues in infrared small target detection, such as the ease with which high-frequency details are submerged by background noise, the lack of bidirectional transmission between low-level details and high-level semantics, and the overly coarse-grained fusion weights. It proposes a collaborative optimization fusion mechanism. First, in the input preparation stage, the module upsamples high-level features to the spatial size of low-level features and uses 1×1 convolution to map both sets of features to the same number of channels, establishing a fusionable common representation space. Then, spatial attention maps are generated for each set of features to highlight responses in key regions, providing a cleaner spatial prior for subsequent processing. Based on this, the module achieves three core innovations: First, explicit modeling of high-frequency detail enhancement—a Laplacian high-pass filter is applied channel-by-channel to extract high-frequency components from the spatially enhanced features, and these components are injected into the original features through residual superposition. Simultaneously, a learnable scaling factor is introduced to control the enhancement intensity, making the module more sensitive to edges and fine-grained structures, balancing robustness and trainability, and effectively solving the problem of infrared small targets being easily submerged by noise due to their reliance on high-frequency details. Secondly, bidirectional interactive gating enables cross-layer mutual transmission and fusion—based on two high-frequency enhanced features, two sets of channel gating coefficients are generated, one for lower-layer guidance of higher-layer features and the other for higher-layer reverse constraint of lower-layer features. This achieves dynamic bidirectional alignment of details and semantics during the fusion process, avoiding semantic mismatch and information redundancy caused by traditional unidirectional injection or static splicing, and significantly improving the discriminativeness of the fused features. Thirdly, joint recalibration of channel-level fusion graph and coordinate attention—a channel-level fusion weight graph is generated from the bidirectionally transmitted features, and the fused representation is recalibrated in a fine-grained manner to enhance target-related channels and suppress redundant channels. At the same time, coordinate attention is introduced to recalibrate the directional position information, further improving the directional sensitivity and background suppression capability of the fused features. Finally, the features recalibrated by the two gating are spliced ​​in the channel dimension and shaped by lightweight convolution to obtain the module output. This output can be directly used for skip connections in the decoder or subsequent feature fusion units, providing enhanced feature representations with clear details, semantic alignment, and stable background suppression for small target detection in complex infrared scenes.

[0158] Among them, the gated asymmetric reparameterizable module (GARBlock) addresses key issues such as underexpression of asymmetric directional information, inference deployment burden caused by multi-branch structures, difficulty in reusing dynamic gating mechanisms, and imprecise reparameterization folding loop. It proposes a design scheme for gated asymmetric reparameterizable convolution (GARConv) that balances training-time expression enhancement with lightweight and efficient inference-time.

[0159] The specific processing procedure consists of six steps:

[0160] The first step is to construct a multi-branch structure—in addition to the standard 3×3 and 1×1 convolutional branches, 1×3 and 3×1 asymmetric convolutional branches are introduced, and identity residual batch normalization (BN) is preserved under shape matching conditions. All branches are aligned with the stride of the backbone, supporting downsampling operations, thereby significantly improving the modeling ability for fine-grained features in the horizontal and vertical directions and the directionality of edge textures. The second step is sample-related dynamic gating generation—Global average pooling (GAP) is performed on the input features, and channel-level dynamic gating coefficients are generated for each branch through two layers of multilayer perceptron (MLP). During the training phase, the outputs of each branch are weighted channel-wise, enhancing the module's adaptive adjustment capability to complex scenes and diverse samples. The third step is exponential moving average (EMA) calibration update—the gating coefficients of each branch are averaged in the batch dimension and synchronously updated to the corresponding EMA vector, providing static scaling calibration coefficients for the inference deployment phase. The fourth step is to fuse the training branch outputs—the outputs of each branch are weighted and summed according to their corresponding dynamic gating coefficients, then added to the identity residual path, and after passing through an activation function, an enhanced representation is obtained, realizing multi-branch collaborative expression. The fifth step is to strictly reparameterize and fold before deployment—first, Conv-BN fusion is performed on each branch, the 1×1 convolutional kernel is expanded to 3×3 size through zero padding at the center, and the 1×3 and 3×1 asymmetric convolutional kernels are padded to 3×3 in the height and width directions, respectively. At the same time, the identity residual BN is equivalent to a unit 3×3 convolutional kernel with a center value of 1 and BN fusion is completed; then, the channel-level scaling of the fused convolutional kernels and biases of each branch is performed using the calibrated EMA gating coefficients; finally, the element-wise summation of the convolutional kernels and biases of all branches is obtained to obtain a single 3×3 convolutional kernel and bias. Step 6, one-click deployment switch - call the deployment switch function to seamlessly replace the multi-branch dynamic gating structure during training with a single-layer 3×3 convolution. No dynamic gating computation or multi-path overhead is required during the inference stage. While maintaining high performance during training, it maximizes inference speed and is friendly to edge device deployment.

[0161] It should be noted that the device descriptions in the above embodiments correspond to the method descriptions in the embodiments, and the embodiments of the present invention will not be repeated here.

[0162] The execution entities of the aforementioned processor and memory can be devices with computing functions such as computers, microcontrollers, and single-chip microcomputers. In specific implementations, the embodiments of the present invention do not limit the execution entities and can select them according to the needs of actual applications.

[0163] Data signals are transmitted between the memory and the processor via a bus, which will not be elaborated upon in this embodiment of the invention.

[0164] Based on the same inventive concept, embodiments of the present invention also provide a computer-readable storage medium, the storage medium including a stored program, which, when the program is running, controls the device where the storage medium is located to execute the method steps in the above embodiments.

[0165] The computer-readable storage medium includes, but is not limited to, flash memory, hard disk, solid-state drive, etc.

[0166] It should be noted that the description of the readable storage medium in the above embodiments corresponds to the description of the method in the embodiments, and the embodiments of the present invention will not be repeated here.

[0167] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the flow or function according to the embodiments of the present invention is generated.

[0168] A computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. Computer instructions can be stored in or transmitted through a computer-readable storage medium. A computer-readable storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center that integrates one or more available media. The available medium can be magnetic or semiconductor, etc.

[0169] Unless otherwise specified, the model numbers of the various devices in this embodiment of the invention are not limited, and any device that can perform the above functions is acceptable.

[0170] Those skilled in the art will understand that the accompanying drawings are merely schematic diagrams of a preferred embodiment, and the sequence numbers of the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0171] References

[0172] [1] Wang Tian, ​​Zhu Chenyang, Li Shengchen, et al.: Sound event detection based on multi-scale strip convolutional attention [J]. Intelligent Computer and Applications, 2025, 15(2):168-174.

[0173] [2] Ou Wei, Xiao Shitao, Pang Mengxue, Wang Fang, et al.: HOG pedestrian detection method based on color channel interval mapping reduction [J]. Computer Knowledge and Technology, 2025, 21(1):27-32,44.

[0174] [3] Yang Ning, Yang Zhijun, Wang Peisen et al.: Multifocal plane fusion imaging for fiber composition detection based on focal pixels [J]. Journal of Textile Research, 2024, 45(12):50-57.

[0175] [4] Jiang He, Sun Mang, Zheng Zhou, et al.: A single-image detail enhancement algorithm inspired by Kepler's laws [J]. Journal of Electronics and Information Technology, 2025, 47(12):5166-5177

[0176] [5] Yang Wenjun, Jia Fangxiu, Xue Shangjie, et al.: A method for dehazing missile-borne images based on superpixel segmentation and dual-channel fusion [J]. Computer Applications and Software, 2025, 42(9):248-254.

[0177] [6] Shi Yinyan, Xin Yapeng, Wang Xiaochan, et al.: Fertilizer application strategy for rice and wheat bivariate precision fertilization machine based on multilayer perceptron model [J]. Transactions of the Chinese Society of Agricultural Engineering, 2025, 41(10):51-58.

[0178] [7] Zhao Xiaoqiang, Liu Yongyong, Hui Yongyong, et al.: Intermittent process quality prediction model based on improved temporal convolutional network and multi-head self-attention mechanism [J]. Computer Applications, 2025, 45(7):2245-2252.

Claims

1. A direction-sensitive and high-frequency enhanced weed detection method, characterized in that, The method includes: The preprocessed corn weed dataset was divided and a dataset label file was created. The labeled dataset file was then used as a dataset in a complex background. Based on the dynamic attention strip convolutional block, high-frequency interactive gating fusion module, and gating asymmetric reparameterizable module, a lightweight weed detection model DHG-Net is constructed on the basis of the YOLOv13 model. Weed detection in cornfields is performed using the DHG-Net weed detection model.

2. The method for detecting weeds with direction sensitivity and high frequency enhancement according to claim 1, characterized in that, The dynamic attention strip convolutional block is: At the spatial modeling level, a combination of vertical strip convolution and horizontal strip convolution is used to independently extract long-range structural information in two orthogonal principal directions. At the saliency recalibration level, a lightweight spatial attention mechanism is introduced after dynamic strip convolution. By combining attention calculation with Softmax weight mapping, a spatial attention distribution map is constructed. At the channel adaptation level, attention distribution is dynamically transformed into a gating factor that can participate in feature modulation, so that attention signals can not only act on spatial location, but also be differentiated according to channel dimension.

3. The direction-sensitive and high-frequency enhanced weed detection method according to claim 2, characterized in that, The calculation formula for the dynamic attention strip convolution block is as follows: y = BN(W1 * [DropPath(BN(W out * * f i )) + GeLU(BN(W1 * x))]) + x; Where y is the final output feature map of the module; x is the input feature map of the module; BN is batch normalization; W i It is the weight of the main path linear transformation or convolutional layer; W out These are the output projection weights of the fusion path; f i is the feature of the i-th branch; GeLU is the Gaussian error linear unit activation function; DropPath is stochastic depth regularization.

4. The direction-sensitive and high-frequency enhanced weed detection method according to claim 2, characterized in that, The high-frequency interactive gating fusion module is: A Laplace high-pass filter is applied to the spatially enhanced features channel by channel, and then injected into the original features through residual superposition. A learnable scaling factor is introduced to control the enhancement intensity. Based on two high-frequency enhancement features, two sets of channel gating coefficients are generated, one for lower-level guidance of higher-level and the other for higher-level inverse constraint of lower-level, to achieve dynamic bidirectional alignment of details and semantics during the fusion process. The channel-level fusion weight map is generated from the bidirectional cross-referencing features to perform fine-grained recalibration of the fusion representation; at the same time, coordinate attention is introduced to recalibrate the orientation and position information. The features after two gating recalibrations are concatenated along the channel dimension and then shaped by lightweight convolution to obtain the module output. This output can be directly used for skip connections in the decoder or for subsequent feature fusion units.

5. The direction-sensitive and high-frequency enhanced weed detection method according to claim 2, characterized in that, The gated asymmetric reparameterizable module is: In addition to the standard 3×3 and 1×1 convolution branches, 1×3 and 3×1 asymmetric convolution branches are introduced, and identity residual batch normalization is preserved under shape matching conditions. All branches are aligned with the stride of the backbone to support downsampling operations. The input features are subjected to global average pooling, and channel-level dynamic gating coefficients are generated for each branch through a multilayer perceptron. During the training phase, the outputs of each branch are weighted channel by channel. The gating coefficients of each branch are averaged over the batch dimension and updated synchronously to the corresponding EMA vector; The outputs of each branch are weighted and summed according to the corresponding dynamic gating coefficients, then added to the identity residual path, and after passing through the activation function, an enhanced representation is obtained, realizing multi-branch collaborative expression; Conv-BN fusion is performed on each branch, expanding the 1×1 convolutional kernel to 3×3 size through zero padding at the center, and padding the 1×3 and 3×1 asymmetric convolutional kernels to 3×3 in the height and width directions, respectively. At the same time, the identity residual BN is equivalent to a unit 3×3 convolutional kernel with a center value of 1 and BN fusion is completed. Then, the channel-level scaling of the fused convolutional kernels and biases of each branch is performed using the calibrated EMA gating coefficients. Finally, the convolution kernels and biases of all branches are summed element by element to obtain a single 3×3 convolution kernel and bias. The multi-branch dynamic gating structure during training is replaced losslessly with a single-layer 3×3 convolution.

6. A direction-sensitive and high-frequency enhanced weed detection device, characterized in that, The device includes a processor and a memory, the memory storing program instructions, the processor invoking the program instructions stored in the memory to cause the device to perform the method according to any one of claims 1-5.

7. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, the computer program including program instructions that, when executed by a processor, cause the processor to perform the method described in any one of claims 1-5.