High-throughput dense and sticky wheat kernel target detection model based on lma-deim
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HENAN INST OF SCI & TECH
- Filing Date
- 2026-04-08
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies suffer from insufficient detection accuracy, poor adaptability, and a lack of lightweight solutions in high-throughput detection of densely adhered wheat grains, making it difficult to achieve high-precision, real-time detection on embedded devices.
A high-throughput, densely adhered wheat grain target detection model based on LMA-DEIM was constructed, including the WSD dataset and model structure. The model employs a lightweight backbone network Lite-Mamba, a feature interaction module MIFI, and a lightweight downsampling module ADown, combined with data augmentation techniques to improve the model's generalization ability and adaptability.
It achieves high-precision, real-time detection of wheat grains on embedded devices, with a detection accuracy of 92.7%, a recall rate of 91.5%, a detection speed of 143.1 f/s, and a parameter reduction of 66.9%, meeting the requirements for high-throughput real-time counting.
Smart Images

Figure CN122244424A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of wheat grain detection technology, specifically a high-throughput densely adhered wheat grain target detection model based on LMA-DEIM. Background Technology
[0002] Wheat is a vital global food crop, and thousand-grain weight is one of the core indicators for evaluating the yield potential and stability of wheat varieties. In the breeding and evaluation process, grain counting is a fundamental step in thousand-grain weight determination. However, current mainstream methods for determining thousand-grain weight still rely on manual counting, which is time-consuming, labor-intensive, subjective, and prone to significant errors, failing to meet the demands of modern breeding for high-throughput, high-precision automation. Therefore, researching high-throughput automated counting methods for densely clustered wheat grains is of great significance for improving crop breeding efficiency.
[0003] Early research mainly focused on traditional image processing algorithms, such as morphological operations and watershed algorithms. These methods have two major limitations: first, they rely on manually designed features, which result in insufficient generalization ability and robustness to complex stacking and adhesion situations; second, the algorithms have high computational complexity and low processing efficiency, making it difficult to meet the application requirements of high-throughput real-time detection.
[0004] In recent years, the development of deep learning technology has provided a new path to solve the above problems. Models represented by Convolutional Neural Networks (CNNs) and Transformers have abandoned tedious manual feature design and can directly and automatically learn more discriminative features from data. They perform better in complex scenarios such as seed adhesion and occlusion, and their inference speed has also been greatly improved. However, existing deep learning models still face significant challenges in high-throughput detection scenarios with heavily densely adhered seeds.
[0005] For example, some studies based on the improved YOLOv5 algorithm have achieved high detection accuracy by introducing a hybrid depthwise separable convolutional module and an attention mechanism. However, the cumulative counting error increases significantly when the number of seeds exceeds 300, making it difficult to meet the demands of high-throughput detection. Similarly, improved models based on YOLOv8, which simplify the detection head by sharing convolutional layers and introducing a deformable attention mechanism, still exhibit low counting accuracy in heavily contiguous scenarios, reflecting the limited scenario coverage of the training dataset and insufficient adaptability to complex real-world environments.
[0006] Furthermore, in practical breeding scenarios, testing equipment needs to be both portable and capable of real-time processing. Mobile and embedded platforms typically have limited computing resources, and existing deep learning models often have large parameter counts and high memory consumption, making deployment on embedded devices difficult. Therefore, model lightweighting has become a key research direction. Existing research shows that through reasonable lightweight design, the number of model parameters and computational load can be significantly reduced while maintaining high accuracy.
[0007] In summary, existing technologies for detecting and counting densely adhered wheat grains have the following shortcomings: traditional image processing methods rely on manual features, have weak generalization ability, and low processing efficiency; existing deep learning models have insufficient counting accuracy in high-density, heavily adhered scenarios and poor adaptability to complex environments; lightweight solutions for embedded deployment still lack an effective architecture that balances detection accuracy, inference speed, and low computational complexity; and the lack of large-scale public datasets covering multiple varieties, multiple backgrounds, and high-density adhered scenarios limits the training and evaluation of models.
[0008] To address the aforementioned issues, there is an urgent need for a method for detecting densely adhered wheat grains that can achieve high throughput, high precision, and real-time detection on embedded devices. Summary of the Invention
[0009] The technical problem to be solved by the present invention is to overcome the existing defects and provide a high-throughput densely adhered wheat grain target detection model based on LMA-DEIM, which can effectively solve the problems in the background technology.
[0010] To achieve the above objectives, this invention discloses a high-throughput, densely clustered wheat grain target detection model based on LMA-DEIM. The technical solution adopted includes the construction of the WSD dataset and the model structure. The WSD dataset was used to construct training and testing datasets containing target boxes for grains from various wheat varieties. A WSD dataset containing more than 230,000 grain target boxes from five wheat varieties—Aikang 58, Bainong 607, Zhengmai 379, Tunmai 127, and Yunong 916—was constructed, covering various backgrounds, shooting devices, and high-density adhesion scenes. The model structure is based on the DEIM framework, including the lightweight backbone network Lite-Mamba, the feature interaction module MIFI, and the lightweight downsampling module ADown; The lightweight backbone network Lite-Mamba is used to perform global context modeling of input images with linear computational complexity, enhancing the ability to distinguish densely clustered grain regions. The Feature Interaction Module (MIFI) is used to perform global information interaction and modeling of deep feature maps based on selective state space, replacing the feature interaction module based on self-attention mechanism. The lightweight downsampling module ADown is used to preserve key details during feature map scaling.
[0011] As a preferred technical solution of the present invention, the dataset WSD includes data acquisition and data augmentation; data acquisition includes acquiring images using mobile devices under different backgrounds, with the shooting height of the mobile devices being 20-30cm, and mobile devices such as Huawei Mate60 and Xiaomi 14 Pro being used; data augmentation uses flipping, translation, rotation, color changing, and introducing random Gaussian noise to expand the original images and labels, constructing a dataset containing seed target boxes, and dividing it into training set, validation set, and test set; providing a high-quality data foundation for model training and evaluation, effectively improving the model's generalization ability and adaptability to complex real-world environments.
[0012] As a preferred embodiment of the present invention, the lightweight backbone network Lite-Mamba includes: Layer normalization layer and channel compression unit: used to perform layer normalization on the input feature map and compress the number of channels through 1×1 convolution; Parallel bidirectional scanning unit: includes a horizontal scanning selective state space module and a vertical scanning selective state space module, used to capture long-range dependencies along the horizontal and vertical spatial directions, respectively; Feature fusion unit: used to fuse global modeling features with local features preserved by parallel 1×1 convolution branches, and output an enhanced feature map; While maintaining the highest detection accuracy, it achieves the lowest parameter count and the fastest detection speed, providing an ideal lightweight solution for mobile and embedded platforms with limited computing resources.
[0013] As a preferred embodiment of the present invention, the Feature Interaction Module (MIFI) includes: Dimension transformation unit is used to expand the input feature map into a sequence form; Three independent linear transformation layers are used to dynamically generate the parameters B, C, and discretization step size of the state-space model from the input sequence. ; Internally learnable static parameter A; Discretization computation unit, used to discretize continuous-time state-space models based on the zero-order preservation method; Parallel scanning computation unit is used to perform state-space equation calculations at each time step in the sequence to obtain the global context output sequence; The residual connection unit is used to fuse the output sequence with the input sequence.
[0014] As a preferred embodiment of the present invention, the discretization calculation unit performs discretization using the following formula: (1) In the formula, exp(·) — matrix exponentiation operation; is the discretization step size; A is the continuous-time system matrix, which is a learnable parameter in the MIFI module; It is a discrete state transition matrix; (2) In the formula, B is the input matrix of the continuous-time system; This is the discrete input matrix after discretization.
[0015] As a preferred embodiment of the present invention, the state-space equation is expressed as: (3) (4) In the formula, t is the time step index, taken as... , where L is the sequence length; Let be the input vector at the t-th time step; Let be the hidden state vector at time step t; Let be the discrete state transition matrix at time step t; Let be the discrete input matrix at time step t; This is the output matrix at time step t; Let be the output scalar at the t-th time step.
[0016] As a preferred technical solution of the present invention, the ADown lightweight downsampling module adopts a dilated convolution and feature rearrangement mechanism to reduce information loss while reducing the resolution of the feature map.
[0017] As a preferred embodiment of the present invention, the model structure is deployed on an embedded device to achieve high-throughput, real-time grain detection and counting.
[0018] Compared with existing technologies, the beneficial effects of this invention are as follows: This invention constructs a WSD dataset containing over 230,000 grain bounding boxes from five wheat varieties: Aikang 58, Bainong 607, Zhengmai 379, Tunmai 127, and Yunong 916. This dataset covers various backgrounds, shooting devices, angles, and high-density scenes. Data augmentation is performed using methods such as flipping, translation, rotation, color transformation, and random Gaussian noise, resulting in 14,207 expanded images. This provides a high-quality data foundation for model training and evaluation, effectively improving the model's generalization ability and adaptability to complex real-world environments.
[0019] This invention, based on the DEIM framework, designs a lightweight backbone network, Lite-Mamba, to achieve global context modeling with linear computational complexity, enhancing the ability to distinguish densely clustered seed regions. It proposes a MIFI module based on selective state space to replace the original AIFI module, significantly reducing computational complexity while achieving efficient feature interaction. A lightweight downsampling module, ADown, is introduced to effectively mitigate detail loss during downsampling. The synergistic effect of these three modules enables the model to achieve precision, recall, and mAP@50 of 92.7%, 91.5%, and 92.0% respectively on the WSD test set, with a detection speed of 143.1 f / s. Compared to the original DEIM framework, the number of parameters is reduced by 66.9%, and the detection speed is increased by 240%, achieving an optimal balance between accuracy and efficiency.
[0020] The number of model parameters in this invention is only The floating-point operation complexity is With a video memory usage of 3.7GB, far lower than other detection models at the same accuracy level, and verified on a Raspberry Pi 5 embedded platform, the model inference speed reaches 62 frames per second when the input image resolution is 640×640 pixels, meeting the requirements for high-throughput real-time counting. Compared with mainstream backbone networks such as ResNet50, EfficientNetV2, and Swin-T, it achieves the lowest parameter count and the fastest detection speed while maintaining the highest detection accuracy, providing an ideal lightweight solution for mobile and embedded platforms with limited computing resources.
[0021] For high-density test subsets with more than 300 seeds, the model of this invention achieved a mean absolute error of 17.41 seeds and a mean relative error as low as 3.08%, with a coefficient of determination... With a score as high as 0.9735, it significantly outperforms mainstream models such as Faster-RCNN, YOLO11n, and RT-DETR. In the visualization detection of heavily adhered grain images, it has the lowest number of missed detections, false detections, and counting errors, demonstrating stronger robustness and scene adaptability.
[0022] This invention develops a wheat detection and counting WEB detection model based on Python and the Django framework, and integrates it into embedded platforms such as Raspberry Pi 5 and RK3588 industrial control host to form a portable counting system and an integrated counting and weighing system. It has been verified in practical engineering applications. It can meet the needs of breeding and seed evaluation scenarios such as thousand-grain weight determination for high-throughput, real-time, and high-precision grain counting, and provides a reliable end-side deployment solution for improving the efficiency of automated seed evaluation. Attached Figure Description
[0023] Figure 1 This is an example diagram of the dataset for this invention; Figure 2 This is an example diagram illustrating data enhancement for the present invention; Figure 3 This is a statistical graph showing the categories and bounding boxes of the dataset in this invention; Figure 4 This invention relates to the LMA-DEIM network architecture; Figure 5 This is the MIFI module architecture of the present invention; Figure 6 This is a graph showing the trend of mAP variation in this invention; Figure 7 For the visualization and comparison of the heatmap of the present invention Grad-CAM++; Figure 8 This is a visualization of the detection performance results of different models in this invention; Figure 9 These are the results of the ablation test of this invention; Figure 10 These are the performance test results of different target detection models of this invention; Figure 11 This is a graph showing the relationship between the predicted and actual number of grains in this invention; Figure 12 The results show the grain counting index of each model in this invention. Detailed Implementation
[0024] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0025] Example 1 like Figures 1 to 5 As shown, this invention discloses a high-throughput densely adhered wheat grain target detection model based on LMA-DEIM. The technical solution adopted includes dataset construction and model construction. Dataset Construction: To enrich the data in the field of high-throughput detection of densely clustered wheat grains, a WSD dataset was constructed, covering five wheat varieties: Aikang 58, Bainong 607, Zhengmai 379, Tunmai 127, and Yunong 916. Each seed target box.
[0026] Images were captured using a mobile device against two backgrounds: white A4 paper and corrugated paper (used to simulate the color of soil in a field). The shooting height was 25cm. Each image contained 100-500 samples with a high degree of adhesion. An example image and its annotation are shown in Figure 1.
[0027] The original dataset contains 2368 images, approximately Each seed target has an image resolution of 4624 pixels × 2080 pixels. The variety of images and background conditions can effectively improve the generalization ability of the dataset.
[0028] To expand the dataset and enable the model to better learn the densely clustered seed features of various scenes, the original images and labels were augmented using methods such as flipping, translation, rotation, color changes, and the introduction of random Gaussian noise, resulting in enhanced performance. Figure 2 As shown, the expanded image set contains 14,207 images. Seed targets were labeled using LabelImg software, with VOC label format, resulting in 230,051 seed target boxes. The class distribution and target box statistics of the dataset are shown below. Figure 3 As shown; after annotation, the dataset is divided into training set, validation set and test set in a ratio of 8:1:1 (11367 images in training set, 1420 images in validation set and test set).
[0029] Model building: Currently, mainstream object detection frameworks can be divided into two categories: CNN frameworks (such as YOLO, FCOS, CenterNet, etc.) and Transformer frameworks (such as DETR, RT-DETR, etc.). In dense grain detection, CNN frameworks, due to their reliance on Non-Maximum Suppression (NMS) post-processing and the local receptive field of convolution, are prone to accidentally deleting overlapping detection boxes or missing clustered grain targets. Transformer frameworks eliminate NMS and capture long-range dependencies through end-to-end design and global attention, but suffer from slow convergence and sparse supervision. To address these issues, the DEIM framework, based on the RT-DETR model, employs a DenseO2O (DenseOne-to-One) matching strategy to effectively enhance the supervision signal and introduces Match-Aware loss to improve convergence speed, providing a new approach for real-time detection of dense targets.
[0030] Based on the DEIM framework, a lightweight, dense wheat grain detection model, LMA-DEIM, is proposed. The network structure is as follows: Figure 4As shown, a lightweight linear complexity global modeling backbone network, Lite-Mamba, is designed, and a state-space equation is introduced to achieve efficient, global modeling of visual feature extraction. A linear time complexity efficient feature interaction module, MIFI (Mamba-based intra-scale feature interaction), is proposed to replace the original framework's AIFI (Attention-based Intra-scale Feature Interaction) module, which significantly reduces the computational cost while enabling interaction within deep semantic feature maps. A lightweight downsampling module, ADown (Adaptive downsampling), is introduced to significantly reduce the accuracy loss caused by downsampling at the cost of a small increase in the number of parameters.
[0031] Lightweight backbone network Lite-Mamba; Wheat grains are tiny, densely packed, and prone to sticking and overlapping. Therefore, the boundaries of the stacked grains can only be correctly defined by their relationships with multiple neighboring and distant grains. However, traditional convolutional algorithms have limited ability to obtain long-range dependencies between grains. Self-attention-based backbone networks can achieve global modeling, but their high complexity makes them difficult to meet the deployment and high-throughput detection requirements of embedded devices.
[0032] This paper proposes a lightweight backbone network, Lite-Mamba, which introduces a selective state-space model to achieve global context modeling with linear computational complexity, thereby improving the ability to distinguish between contiguous regions while maintaining efficiency.
[0033] After the input feature map is stabilized through layer normalization training, it is then processed... Convolution compression channels to To reduce computational load, considering that the adhesion boundary of grains is arbitrary in direction, scanning only along a single direction would introduce directional bias. Referring to existing bidirectional scanning mechanisms, parallel horizontal and vertical scanning selective state space modules are used to capture long-range dependencies along two spatial directions respectively, with outputs as follows: The features are analyzed to comprehensively capture global contextual information, and then the horizontal and vertical scan outputs are concatenated along the channel dimension. To reduce the impact of local feature loss due to dimensionality reduction on the input feature map, global modeling is performed through state space while parallel processing is used. The convolutional branches expand the channels back to C, preserving as many original local features as possible; finally, the multi-branch features from global modeling and local preservation are fused to output the enhanced version. Feature map; Through a direction-separated scanning mechanism and a lightweight multi-branch design, Lite-Mamba achieves efficient modeling of complex spatial relationships between densely clustered grains while significantly reducing computational load.
[0034] Feature interaction module MIFI; The original DEIM framework uses the AIFI module for internal interaction of deep semantic feature maps. However, due to its self-attention mechanism, its computational complexity is too high, making it unsuitable for deployment in embedded devices. Therefore, a high-efficiency feature interaction module, MIFI, based on Mamba, is designed. This module replaces the self-attention mechanism with a selective state-space equation and uses a linear computational complexity-based SSM to achieve global information interaction and modeling of deep feature maps. This significantly reduces the computational load while enhancing the semantic representation capabilities of deep features. The structure is as follows: Figure 5 As shown.
[0035] Given an S5 feature map from the backbone network with a spatial size of 20×20 and 256 channels, flatten the feature map along the spatial dimensions and adjust the dimensions to form the input sequence. Then, through three independent linear transformation layers, parameters B, C, and ... are dynamically generated from the input sequence. At the same time, learnable static parameters A are obtained from within the module.
[0036] Based on this, the continuous-time state-space model is discretized using the zero-order preservation method, and the discretization calculation formula is as follows: (1) In the formula, exp(·) — matrix exponentiation operation; is the discretization step size; A is the continuous-time system matrix, which is a learnable parameter in the MIFI module; It is a discrete state transition matrix; (2) In the formula, B is the input matrix of the continuous-time system; The discrete input matrix after discretization; After discretization, the state-space equation for a single time step can be expressed as: (3) (4) In the formula, t is the time step index, taken as... , where L is the sequence length; Let be the input vector at the t-th time step; Let be the hidden state vector at time step t; Let be the discrete state transition matrix at time step t; Let be the discrete input matrix at time step t; This is the output matrix at time step t; Let be the output scalar at the t-th time step.
[0037] By employing an efficient parallel scanning algorithm, the above calculations are performed on all time steps in the sequence to obtain an output sequence containing global context information. Finally, the output sequence is fused with the input sequence through residual connections to obtain the final output of the module. The MIFI module achieves efficient global modeling within feature maps with linear complexity, significantly reducing model computation and improving inference speed.
[0038] ADown, a lightweight downsampling module; Traditional downsampling methods often result in the loss of information in feature maps when reducing resolution. Therefore, the ADown module is introduced to improve the model structure. By combining dilated convolution and feature rearrangement mechanism, the problem of seed detail loss caused by downsampling is effectively alleviated while reducing computational complexity, thus improving detection performance.
[0039] Effect verification This technical solution uses seven metrics to comprehensively evaluate the model performance: precision (P), recall (R), mean precision (mAP@50, mAP@50-95), number of parameters (Param), floating-point operations (GFLOPs), detection speed (FPS), and model memory usage. Since the research focuses on high-throughput, densely clustered grain detection and counting tasks, the emphasis is on the model's detection speed and mean precision.
[0040] To verify the improvement effect of the proposed Lite-Mamba backbone network, under the same experimental conditions, ResNet50, EfficientNetV2-S, MobileNetV3-Small, PVTv2-B2, Swin-TTiny, EdgeNeXt-XXS, VMamba-Tiny, and Lite-Mamba were used as the backbone networks of the LMA-DEIM framework for comparative experiments. The performance of the eight different backbone networks on the WSD dataset is shown in Table 1.
[0041] Table 1. Performance test results of different backbone networks As shown in Table 1, the model using the Lite-Mamba backbone network performs best in the densely clustered grain detection task, achieving an mAP@50 of 92.0% and a detection speed of 143.1 f / s, making it the fastest among all models, while consuming only 3.7 GB of GPU memory. This indicates that the Lite-Mamba backbone network possesses powerful feature extraction capabilities and excellent real-time performance, making it suitable for applications requiring high detection efficiency.
[0042] ablation test To verify the effectiveness of each improved module in the LMA-DEIM model, using the DEIM framework as the baseline model, a Lite-Mamba backbone network, a MIFI feature interaction module, and an ADown downsampling module were added sequentially. Ablation experiments were conducted on the WSD test set, and the results are as follows: Figure 9 As shown, introducing the Lite-Mamba module into the model backbone significantly improved the model's mAP@50 by 4.3 percentage points compared to the baseline of 86.2%, reaching 90.5%, while reducing the number of parameters by 67% and increasing the detection speed by 187%. This verifies the outstanding advantages of the lightweight backbone based on the selective state-space model in efficiently capturing global dependencies and distinguishing densely clustered targets. Replacing the AIFI module with the MIFI module improved mAP@50 by 1.0 percentage point, while also increasing the detection speed, indicating that the MIFI module achieves more efficient feature interaction with linear time complexity and can effectively replace the original attention mechanism. Introducing the ADown lightweight downsampling module increased mAP@50 by 0.6 percentage points, verifying that optimizing the downsampling process can alleviate the loss of feature map details and has a positive effect on improving the detection accuracy of clustered seed targets.
[0043] When Lite-Mamb is used in conjunction with the MIFI module, the model's mAP@50 is further improved by 0.9 percentage points to 91.4%, building upon the high performance brought by the lightweight backbone network. Detection speed also increases simultaneously, indicating that the efficient backbone and the linear complexity interaction module can produce a significant synergistic effect, jointly optimizing the feature extraction and fusion process. Furthermore, the combination of Lite-Mamba and the ADown module outperforms each module individually in both accuracy (mAP@50) (91.1%) and speed (130.5f / s), demonstrating that the improved downsampling strategy can effectively cooperate with the advanced global modeling backbone network, further ensuring the integrity of multi-scale features.
[0044] When all three improvements were integrated into the DEIM framework, LMA-DEIM achieved optimal performance. Compared to the original baseline model, the mean precision, recall, and average precision improved by 3.8, 4.0, and 5.8 percentage points, respectively, ultimately reaching 92.7%, 91.5%, and 92.0%. Although the model's memory usage increased slightly, the number of parameters was effectively controlled. The computational load was significantly reduced, with the number of parameters decreasing by 66.9%, while the detection speed increased to 143.1f / s, a speed improvement of 240%. This indicates that the module in this paper significantly improves model accuracy while achieving model lightweighting and inference acceleration, thus achieving a better balance between accuracy, speed, and computational load, meeting the deployment requirements for high-throughput, real-time detection on embedded or mobile devices with limited computing resources.
[0045] Performance Validation of Different Target Detection Models To verify the effectiveness of the model, several representative mainstream object detection models were selected for comparative experiments with the proposed LMA-DEIM model, including the YOLO series (YOLO v5s, YOLO v8n, YOLOv8s, YOLOv10n, YOLO 11n, YOLO 12n), the DETR series (DETR-ResNet50, RT-DETR-R18, RT-DETR-R50), and the two-stage model Faster R-CNN (ResNet50-FPN). The experiments were conducted on the WSD dataset using the same training and testing settings. The experimental results are as follows: Figure 10 As shown, the mAP trend of the validation set during the training process is as follows: Figure 6 As shown.
[0046] In terms of detection accuracy, the LMA-DEIM model achieved the best performance in all three core metrics: precision, recall, and mean precision (mAP), at 92.7%, 91.5%, and 92.0%, respectively. Compared to the two-stage model Faster R-CNN, LMA-DEIM improved mAP@50 by 4.8 percentage points, mainly due to its end-to-end detection paradigm avoiding localization errors that might be introduced by the region proposal network, making it more favorable for dense grain targets. Among the YOLO series models, YOLOv11 achieved the best mAP@50 at 89.5%, but still lagged behind LMA-DEIM by 2.5 percentage points, indicating that the CNN-based YOLO architecture, when dealing with highly dense and severely adhered wheat grains, may still lead to false suppression of overlapping targets due to its NMS-dependent post-processing mechanism. Among the Transformer-based models, RT-DETR-R50 achieved mAP@50 of 90.8%, demonstrating the advantage of the global attention mechanism in modeling complex relationships between targets. Building upon the end-to-end detection advantages of the DEIM framework, LMA-DEIM further improves accuracy by 1.2 percentage points by introducing modules such as Lite-Mamba and MIFI, achieving the highest level among all models.
[0047] In terms of model efficiency and real-time performance, LMA-DEIM also demonstrates outstanding performance. Its detection speed reaches 143.1 f / s, the fastest of all models, a 3.5% improvement over the second fastest YOLO11n and a 21% improvement over YOLOv8n. Meanwhile, LMA-DEIM's floating-point computation is only [missing information - likely a fraction of a second]. It is significantly lower than models with the same level of accuracy. In terms of model complexity, LMA-DEIM has fewer parameters. While exceeding the ultra-lightweight versions of YOLOv8n, YOLOv10n, and YOLO11n, LMA-DEIM is significantly slower than two-stage models and large DETR models, achieving a good balance between accuracy and model size. Its 3.4GB of GPU memory usage is also relatively low, meeting the deployment constraints of edge devices. LMA-DEIM achieves the fastest detection speed while maintaining the highest detection accuracy, providing an ideal technical solution for high-throughput, real-time wheat grain detection tasks.
[0048] In summary, the LMA-DEIM model achieves an optimal balance between accuracy and speed in the detection of densely clustered wheat grains. Compared with current mainstream target detection models, it demonstrates significant advantages in detection accuracy, detection speed, and computational complexity, validating the effectiveness and advancement of the model design. It is particularly suitable for high-throughput, real-time grain detection and counting on resource-constrained embedded or mobile platforms.
[0049] Lite-Mamba Heatmap Visualization To intuitively evaluate the feature discrimination capability of the Lite-Mamba backbone network in the densely clustered grain detection task, the Grad-CAM++ algorithm is introduced to visualize and analyze the model's regions of interest on the input image. Based on the original DEIM framework, the ResNet50 backbone network is replaced with the proposed Lite-Mamba, and corresponding category activation heatmaps are generated based on the same input image. The results are as follows: Figure 7 As shown; by Figure 7 It can be seen that the activation response generated by the Lite-Mamba backbone network in the dense seed region is more concentrated in the center of the seed, with weaker responses at the adjacent edges between seeds, resulting in sharper edges and significantly improved boundary discrimination. In contrast, the original DEIM framework based on ResNet50 exhibits a strong activation response at the seed edges, which can easily diffuse to adjacent boundaries, causing the model to confuse adjacent targets when segmenting adhered seeds, thereby increasing the probability of missed detections and false detections. The Lite-Mamba backbone network can more effectively model the entity features of independent seeds, suppress interference from adhered regions, and thus improve the robustness and accuracy of detection.
[0050] Visualization of test results To visually compare the detection performance of different models, Faster-RCNN, YOLO11n, RT-DETR, and the proposed LMA-DEIM model were tested on the test set. One image containing 342 heavily adhered grains was selected, as shown below. Figure 8As shown in the figure. Experimental statistics show that the number of missed grains for the four models are 5, 4, 1, and 1 respectively; the number of false positives (i.e., misdetecting multiple adhered grains as a single target) are 5, 3, 4, and 1 respectively; and the counting errors are -10, -7, -6, and -2 respectively. The LMA-DEIM model of our proposed solution has the lowest number of missed positives, false positives, and counting errors, indicating that it has better detection accuracy and robustness in severe adhesion scenarios, thus verifying the effectiveness of the proposed method.
[0051] Counting accuracy test To evaluate the accuracy and robustness of the LMA-DEIM model on the task of counting densely adhered wheat grains, counting accuracy experiments were conducted on a test set not used for training. To comprehensively quantify model performance, the coefficient of determination (R²), mean absolute error (MAE), mean relative error (MRE), and average inference time were used as evaluation metrics. A more challenging high-density test subset containing 100 images was constructed from the original test set. The selected samples all contained more than 300 grains and exhibited significantly higher levels of adhesion and background complexity than the average of the test set, aiming to simulate extreme high-throughput detection conditions and more effectively verify the model's performance limits in real-world complex environments.
[0052] Experiments were conducted on a high-density test subset using Faster-RCNN, YOLO11n, RT-DETR, and the proposed LMA-DEIM model. The number of bounding boxes in each image was counted and used as the predicted seed count, which was then compared with the actual number of labeled data. The prediction results of each model and the actual labeled data are shown below. Figure 11 As shown, the corresponding counting performance indicators are as follows: Figure 12 As shown.
[0053] Depend on Figure 11 , Figure 12 It can be seen that the LMA-DEIM model achieves comprehensive and significantly better performance than other mainstream models in the high-density grain counting task. LMA-DEIM's... The accuracy rate (MAE) is as high as 0.9735, indicating a very high linear correlation and consistency between the predicted grain count and the actual value. The MAE is 17.41 grains, and the MRE is as low as 3.08%, making it the best performing model among all models. This demonstrates that the improved model maintains extremely high detection integrity and counting accuracy even in extremely dense and adherent scenarios. Furthermore, the average processing time is 12.45 ms, reflecting the success of the model's lightweight design. It combines the efficient global modeling capabilities based on the state-space model with the end-to-end efficiency of the DEIM framework, achieving a balance between accuracy and speed.
[0054] Embedded device deployment To enhance the engineering application value of the model, a wheat detection and counting web-based detection model was designed using Python and the Django framework. Deployed on a Raspberry Pi 5 embedded platform, it integrates a display screen and a document scanner to transform into a portable counting system. With an input image resolution of 640 pixels × 640 pixels, the model's inference speed reaches 62 images per second, meeting the requirements for high-throughput real-time counting. It is suitable for integration into devices such as those for weighing thousands of grains to extend counting functionality, demonstrating high practical value.
[0055] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A high-throughput dense adhesion wheat kernel target detection model based on LMA-DEIM, characterized in that: This includes the constructed WSD dataset and model structure. The WSD dataset was used to construct training and testing datasets containing target boxes for grains from various wheat varieties. The model structure is based on the DEIM framework, including the lightweight backbone network Lite-Mamba, the feature interaction module MIFI, and the lightweight downsampling module ADown; The lightweight backbone network Lite-Mamba is used to perform global context modeling of input images with linear computational complexity, enhancing the ability to distinguish densely clustered grain regions. The Feature Interaction Module (MIFI) is used to perform global information interaction and modeling of deep feature maps based on selective state space, replacing the feature interaction module based on self-attention mechanism. The lightweight downsampling module ADown is used to preserve key details during feature map scaling.
2. The LMA-DEIM-based high-throughput dense adhesion wheat kernel target detection model according to claim 1, wherein: The dataset WSD includes data acquisition and data augmentation; data acquisition includes capturing images using mobile devices under different backgrounds; data augmentation uses flipping, translation, rotation, color changing and introducing random Gaussian noise to expand the original images and labels, constructing a dataset containing seed target boxes, and dividing it into training set, validation set and test set.
3. The LMA-DEIM based high-throughput dense wheat kernel target detection model according to claim 1, wherein, The lightweight backbone network Lite-Mamba includes: Layer normalization layer and channel compression unit: used to perform layer normalization on the input feature map and compress the number of channels through 1×1 convolution; Parallel bidirectional scanning unit: includes a horizontal scanning selective state space module and a vertical scanning selective state space module, used to capture long-range dependencies along the horizontal and vertical spatial directions, respectively; Feature fusion unit: used to fuse global modeling features with local features preserved by parallel 1×1 convolution branches, and output an enhanced feature map.
4. The high-throughput densely adhered wheat grain target detection model based on LMA-DEIM according to claim 1, characterized in that, The Feature Interaction Module (MIFI) includes: Dimension transformation unit is used to expand the input feature map into a sequence form; Three independent linear transformation layers are used to dynamically generate the parameters B, C, and discretization step size of the state-space model from the input sequence. ; Internally learnable static parameter A; Discretization computation unit, used to discretize continuous-time state-space models based on the zero-order preservation method; Parallel scanning computation unit is used to perform state-space equation calculations at each time step in the sequence to obtain the global context output sequence; The residual connection unit is used to fuse the output sequence with the input sequence.
5. The high-throughput densely adhered wheat grain target detection model based on LMA-DEIM according to claim 4, characterized in that: The discretization calculation unit uses the following formula for discretization: (1) In the formula, exp(·) — matrix exponentiation operation; is the discretization step size; A is the continuous-time system matrix, which is a learnable parameter in the MIFI module; It is a discrete state transition matrix; (2) In the formula, B is the input matrix of the continuous-time system; This is the discrete input matrix after discretization.
6. The high-throughput densely adhered wheat grain target detection model based on LMA-DEIM according to claim 4, characterized in that: The state-space equation is expressed as: (3) (4) In the formula, t is the time step index, taken as... , where L is the sequence length; Let be the input vector at the t-th time step; Let be the hidden state vector at time step t; Let be the discrete state transition matrix at time step t; Let be the discrete input matrix at time step t; This is the output matrix at time step t; Let be the output scalar at the t-th time step.
7. The high-throughput densely adhered wheat grain target detection model based on LMA-DEIM according to claim 1, characterized in that: The ADown lightweight downsampling module employs dilated convolution and feature rearrangement mechanisms to reduce information loss while lowering the feature map resolution.
8. The high-throughput densely adhered wheat grain target detection model based on LMA-DEIM according to claim 1, characterized in that: The model structure is deployed on an embedded device to achieve high-throughput, real-time grain detection and counting.