A multi-scale tea quality detection method and system based on improved DEIM

CN122265985APending Publication Date: 2026-06-23NANJING UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING UNIV OF POSTS & TELECOMM
Filing Date
2026-01-28
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing tea testing methods are inefficient and susceptible to subjective factors in determining the age of tea, and cannot effectively capture the subtle differences in characteristics of tea over time. Traditional methods also lack robustness in complex contexts.

Method used

A multi-scale tea quality detection method based on the improved DEIM framework is adopted. The enhanced upsampling convolution module EUCB_SC, the Inception-style depthwise separable convolution block IDWB, and the windmill-shaped convolution PSConv are introduced. Combined with the dense one-to-one matching Dense O2O training strategy and the geometric calibration matching perceptual loss function GCMAL, the number of model parameters and computational complexity are optimized, thereby improving the accuracy of tea quality detection.

Benefits of technology

It significantly improves the average accuracy and model inference efficiency of tea quality testing, can accurately identify the age and quality grade of tea, adapts to complex backgrounds and multi-target detection, and realizes intelligent quality control in the tea industry.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122265985A_ABST
    Figure CN122265985A_ABST
Patent Text Reader

Abstract

The application discloses a multi-scale tea quality detection method and system based on an improved DEIM, belongs to the field of computer vision, and constructs a tea picture dataset; a multi-scale tea quality detection network model based on an improved DEIM framework is constructed, an Inception type deep separable convolution block IDWB is introduced into a backbone network Backbone, an enhanced up-sampling convolution module EUCB_SC is introduced into an up-sampling module, and a pinwheel convolution module PSConv is introduced into a down-sampling module; the multi-scale tea quality detection network model based on the improved DEIM framework is trained according to the dataset by introducing a dense one-to-one matching training strategy and a geometric calibration matching perception loss function GCMAL. The application can effectively improve the average precision of tea quality detection.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of computer vision, specifically the area of ​​intelligent agriculture. More specifically, it relates to a method and system for grading and detecting the age of tea leaves based on deep learning. Background Technology

[0002] As a crucial link in the tea industry chain, tea quality testing is facing increasing demands for precision and efficiency in quality assessment due to the rapid development of modern agricultural intelligence. Traditional manual testing methods are no longer sufficient to meet the needs of large-scale industrial production.

[0003] In the early stages of tea quality testing technology development, the main method relied on experienced tea experts to manually grade tea leaves by observing their appearance, color, aroma, and other sensory characteristics. While this method had a certain degree of accuracy, it suffered from high subjectivity, low efficiency, and high labor costs. With the development of machine vision technology, tea detection methods based on traditional image processing began to emerge, classifying tea leaves by extracting features such as color, texture, and shape. However, these methods had poor robustness in complex environments, and the feature extraction process required a significant amount of manual design.

[0004] In recent years, with the rise of deep learning technology, tea classification methods based on convolutional neural networks (CNNs) have made significant progress in tea variety identification and pest and disease detection. Traditional CNN-based methods perform well in tea image classification tasks, but they have limitations when handling complex backgrounds and multi-object detection. Subsequent object detection algorithms, such as the R-CNN series and YOLO series, have provided new solutions for tea detection; however, these methods mainly focus on tea variety classification and disease detection, lacking in-depth research on tea age, a key factor affecting tea quality.

[0005] The age of tea leaves (i.e., the number of days since picking) is a crucial factor determining their quality and commercial value. Freshly picked tea leaves (1-2 days old) have the best brewing quality, while their quality gradually declines over time. Traditional tea age assessment relies primarily on manual experience, which is not only inefficient but also susceptible to subjective influences, leading to inaccurate grading. Existing machine learning-based tea assessment methods mostly focus on variety identification and disease detection, lacking sufficient ability to model the temporal characteristic of tea age and failing to effectively capture the subtle differences in tea characteristics over time. Summary of the Invention

[0006] Objective: This invention addresses the challenges of feature extraction due to the complex vein texture of tea leaves and the low computational efficiency caused by redundant parameters in traditional detectors in current tea quality detection methods. Based on the DEIM framework, it proposes a multi-scale tea quality detection method using an improved DEIM. By designing an enhanced upsampling convolution module EUCB_SC, an Inception-style depthwise separable convolution block IDWB, and a windmill-shaped convolution PSConv, the method significantly reduces the number of model parameters and computational complexity while maintaining detection accuracy. This method effectively improves the average accuracy of tea quality detection, showing a significant improvement over existing mainstream methods. Furthermore, the model is more compact, and inference efficiency is optimized, providing a superior technical solution for intelligent quality control in the tea industry.

[0007] Technical solution: To achieve the above objectives, the technical solution adopted by this invention is as follows: A multi-scale tea quality detection method based on improved DEIM includes the following steps: Step 1: Obtain images of tea leaves of different ages and quality grades, and use image processing to create a dataset from the obtained tea leaf images; Step 2: Construct a multi-scale tea quality detection network model based on the improved DEIM framework; Step 3: Train the multi-scale tea quality detection network model based on the improved DEIM framework using the dataset. In the multi-scale tea quality detection network model based on the improved DEIM framework, a dense one-to-one matching Dense O2O training strategy is introduced, and the original matching perception loss function is improved to the geometric calibration matching perception loss function GCMAL to enhance the optimization strength of low-quality matching samples and improve the consistency between classification confidence and localization quality. Thus, the multi-scale tea quality detection network model is optimized end-to-end, and the trained multi-scale tea quality detection network model based on the improved DEIM framework is obtained. Step 4: Use the trained multi-scale tea quality detection network model based on the improved DEIM framework to perform quality detection and age identification on the tea leaves to be tested.

[0008] Preferably, the multi-scale tea quality detection network model based on the improved DEIM framework includes a feature extraction backbone network DEIM-Backbone, an encoder, and a decoder connected in sequence. The backbone network includes an initialization module Stem, a first feature extraction module HGStage1, a second feature extraction module HGStage2, a third feature extraction module HGStage3, and a C2f_IDWB module connected in sequence. The C2f_IDWB module is formed by embedding Inception-style depthwise separable convolutional blocks IDWB within the C2f_IDWB module. The encoder introduces an Inception-style depthwise separable convolutional block (IDWB) and employs a branching strategy to proportionally allocate input channels to four branches: identity mapping, square convolution, horizontal stripe convolution, and vertical stripe convolution. The encoder's upsampling module introduces an enhanced upsampling convolutional module (EUCB_SC), which uses the Shift_channel_mix operation to perform channel segmentation and spatial cyclic shifting, enhancing cross-channel information interaction capabilities. The encoder's downsampling module introduces a windmill-shaped convolutional module (PSConv), which utilizes four asymmetric filling modes to generate multi-directional receptive fields, enhancing the ability to capture tea leaf vein texture and irregular edge features.

[0009] Preferably, the encoder includes a first convolutional layer, a second convolutional layer, a third convolutional layer, an upsampling module, and a downsampling module. The upsampling module includes a Transformer Layer with self-attention, a second convolutional layer, a first enhanced upsampling convolutional module EUCB_SC, a second connection layer, a first reparameterizable multi-branch aggregation module RepNCSPELAN, a second convolutional layer, a second enhanced upsampling convolutional module EUCB_SC, and a second connection layer, all connected in sequence. The downsampling module includes a second reparameterizable multi-branch aggregation module RepNCSPELAN, a first windmill-shaped convolutional module PSConv, a third connection layer, a third reparameterizable multi-branch aggregation module RepNCSPELAN, a second windmill-shaped convolutional module PSConv, a third connection layer, and a fourth reparameterizable multi-branch aggregation module RepNCSPELAN, all connected in sequence. The Inception-style deep separable convolutional block IDWB, the first convolutional layer 1, and the self-attention layer Transformer Layer are connected in sequence. The third feature extraction module HGStage3, the first convolutional layer 2, and the second convolutional layer 1 are connected in sequence. The second feature extraction module HGStage2, the first convolutional layer 3, and the second convolutional layer 2 are connected in sequence. The second convolutional layer 1 and the third connection layer 2 are connected. The second convolutional layer 2 and the third connection layer 1 are connected. The decoder comprises N decoder layers connected in sequence, and the two adjacent decoder layers are connected by self-distillation. The output of the flatten layer is connected to the input of the first decoder layer, and the output of the last decoder layer is connected to the score prediction head ScorePred and the bounding box prediction head BoxPred, respectively.

[0010] Preferred: The geometric calibration matching perception loss function (GCMAL) in step 2 is as follows:

[0011] in, This represents the geometric calibration matching sensing loss. This represents the prediction confidence level for the matching category. For geometrically perceptual matching degree, Indicates sample label, To focus on the index, and These are the focus indices for positive and negative samples, respectively. This refers to the weighting coefficients for the calibration items.

[0012] Preferred method: The method of introducing a dense one-to-one matching Dense O2O training strategy in step 2 and improving the original match-aware loss function to the geometrically calibrated match-aware loss function GCMAL includes: Before inputting the training set images into the network, mosaic stitching and blending augmentation are performed on each minimum batch of data with a preset probability to increase the number of real targets in a single training image. The augmented image and its set of ground truth bounding boxes are then input into a multi-scale tea quality detection network model based on the improved DEIM framework to obtain the class confidence and bounding box regression results for N predicted queries. Subsequently, a matching cost matrix between the predicted and the real targets is constructed, with the cost consisting of a weighted average of classification error and localization error. The optimal one-to-one matching set Ω is obtained using the Hungarian algorithm. Matching pairs within Ω are used as positive samples to calculate classification and regression losses, while unmatched pairs are used as background samples and only participate in classification supervision. The classification uses the matching-aware loss function MAL, whose expression is:

[0013] in, The classification uses a matching-perceptual loss function. This represents the prediction confidence level for the matching category. This represents the intersection-union ratio (IoU) between the predicted bounding box and the ground truth bounding box. Indicates sample label, The adjustment coefficient is ; For any matching pair in the one-to-one matching set Ω obtained by the Hungarian algorithm ,in To predict the bounding box, Define the geometry-aware matching degree to correspond to the real bounding box. for:

[0014] in, This represents the intersection-union ratio (IoU) between the predicted bounding box and the ground truth bounding box. This represents the Euclidean distance between the center point of the predicted bounding box and the center point of the ground truth bounding box; This represents the diagonal length of the smallest bounding rectangle between the predicted bounding box and the ground truth bounding box; This is the distance penalty coefficient; This represents the truncation function; Introducing low-quality enhancement weights :

[0015] in, To focus on the index; Introducing a confidence level-quality consistency calibration item :

[0016] in, For calibration item weighting coefficients, The prediction confidence for the matching category; This leads to the geometric calibration matching perception loss function GCMAL.

[0017] Preferred method: The process of constructing the enhanced upsampling convolutional module EUCB_SC in step 2 is as follows: First, the upsampling operator is used. The input feature map is enlarged in spatial dimension; then it is passed through a 3×3 depthwise separable convolution. Extract local spatial features and integrate batch normalization. With nonlinear activation functions Feature enhancement is performed; then, a shift operation is introduced to explicitly model spatial neighborhood relationships; next, a channel shuffle operation is used to shuffle and reorganize feature representations at the channel level, thereby promoting cross-channel information interaction; finally, a 1×1 convolution is applied. Compress the number of channels to match the next stage; The calculation process of the enhanced upsampling convolution module EUCB_SC is as follows:

[0018] in, This represents the output feature map of the enhanced upsampling convolutional module EUCB_SC. This represents the feature map input to EUCB_SC. This represents an upsampling operator with a scaling factor of 2. This represents a 3×3 depthwise separable convolution. Indicates batch normalization, This represents the activation function. This represents a 1×1 convolution; The steps in step 2 to construct the Inception-style depthwise separable convolutional block (IDWB) are as follows: First, input the features. It is divided into four branches along the channel dimension:

[0019] in For the identity mapping branch, some channels are directly retained to reduce computational redundancy; , , These represent inputs of 3×3 and 1× respectively. , The feature subsets of the ×1 convolutional branch are fed into three different depth convolutional branches respectively:

[0020] in, , , Representing the input feature subsets respectively , , The output features after processing by the corresponding depth convolutional branch Indicates the length of the convolution kernel. This represents a depthwise separable convolution with a kernel size of m×n, where the resulting features are concatenated along the channel dimension.

[0021] Then, output Cross-channel feature interaction is performed after normalization and using an MLP based on 1×1 convolution:

[0022] in, This represents the final output characteristic of the IDWB module. Indicates residual connection, This represents a multilayer perceptron module implemented using 1×1 convolutions. This indicates a normalization operation.

[0023] Preferred method: The operation of constructing the windmill-shaped convolutional module PSConv in step 2 is as follows: The windmill-shaped convolution module PSConv achieves a larger effective receptive field than standard convolution through asymmetric padding and directional convolution, while maintaining high efficiency in terms of parameter count; its calculation process is as follows: Let the input feature map be The convolution stride is The number of output channels is First, the input is asymmetrically padded in four directions, and 1× is applied to each direction. or ×1 convolution:

[0024] in, , , , Indicates input features The four branches of output feature maps are obtained after performing asymmetric padding and convolution in four directions. This represents the activation function of the Sigmoid Linear Unit. This indicates a batch normalization operation. Indicates input features The feature map is filled with corresponding pixels in the left, right, top, and bottom directions. , , , These represent the kernel parameters for the four directional branches; the first two are... Striped convolution kernels, the latter two are Striped convolution kernel, with the number of output channels being , This indicates the number of output channels for each branch. , This indicates the number of fill pixels in the left, right, top, and bottom directions. This is a convolution operation; the concatenated convolution results are:

[0025] in, This represents the feature map after concatenating the output features from the four directional branches along the channel dimension. , This represents the height and width of the spliced ​​feature map. This represents the concatenation operation, followed by 2×2 convolution to fuse the results, yielding the final output:

[0026] in, This represents the final output feature map of the PSConv module. This represents the activation function of the Sigmoid Linear Unit. This indicates a batch normalization operation. Indicates the use of four-branch splicing features for fusion. Convolution kernel parameters, number of output channels , , , These represent the height, width, and number of channels of the final output feature map, respectively.

[0027] Preferred method: The method for creating a dataset from the obtained tea images in step 1 through image processing includes: Step 1.1: Remove blurry, overexposed, and underexposed images from the collected tea leaf pictures; Step 1.2: Image data is obtained by performing data augmentation processing on each image, including random cropping, horizontal flipping, vertical flipping, random rotation, color dithering, and adding Gaussian noise. Step 1.3: Use annotation software to annotate the collected images in COCO format to ensure that the bounding box accurately matches the outline of the tea leaves, and annotate the age category and quality grade of the tea leaves. Step 1.4: After labeling, the dataset is divided into training set, validation set and test set according to the proportions.

[0028] Preferably, the dense one-to-one matching training strategy and the geometrically calibrated matching-aware loss function (GCMAL) method include: The Dense O2O matching mechanism is adopted to increase the number of positive samples in a single image through data augmentation. At the same time, the original matching perception loss function is improved to the geometric calibration matching perception loss function GCMAL to enhance the optimization strength of low-quality matching samples and improve the consistency between classification confidence and localization quality. This improves the ability to distinguish between different quality matches during training, speeds up model convergence, and improves the accuracy of tea quality detection.

[0029] Another objective of this invention is to provide a multi-scale tea quality detection system based on improved DEIM, used to implement the multi-scale tea quality detection method based on improved DEIM, comprising an input unit, an image processing unit, a multi-scale tea quality detection network model unit based on the improved DEIM framework, a training unit, a detection unit, and an output unit, wherein: The input unit receives images of tea leaves of different ages and quality grades. It is used to input the image of the tea leaf to be tested.

[0030] The image processing unit is used to create a dataset from the obtained tea images through image processing.

[0031] The multi-scale tea quality detection network model unit based on the improved DEIM framework is used to construct a multi-scale tea quality detection network model based on the improved DEIM framework. This model includes a feature extraction backbone network DEIM-Backbone, an encoder, and a decoder connected sequentially. The backbone network introduces Inception-style depthwise separable convolutional blocks (IDWB), employing a branching strategy to proportionally allocate input channels to four branches: identity mapping, square convolution, horizontal stripe convolution, and vertical stripe convolution. The encoder's upsampling module introduces an enhanced upsampling convolution module EUCB_SC, which uses the Shift_channel_mix operation to achieve channel segmentation and spatial cyclic shifting, enhancing cross-channel information interaction capabilities. The encoder's downsampling module introduces a windmill-shaped convolution module PSConv, utilizing four asymmetric filling modes to generate multi-directional receptive fields, enhancing the ability to capture tea leaf vein texture and irregular edge features.

[0032] The training unit is used to obtain a trained multi-scale tea quality detection network model based on the improved DEIM framework by introducing a dense one-to-one matching training strategy and the geometric calibration matching perceptual loss function GCMAL based on the dataset.

[0033] The detection unit is used to perform quality detection and age recognition of tea leaves by using a trained multi-scale tea quality detection network model based on the improved DEIM framework through images of the tea leaves to be detected.

[0034] The output unit is used to output the detected quality and age.

[0035] Compared with the prior art, the present invention has the following advantages: 1. This invention improves the DEIM framework for tea quality detection tasks by introducing an EUCB_SC enhanced upsampling convolution module, an IDWB multi-branch feature extraction module, and a PSConv windmill-shaped convolution module. These modules can integrate channel mixed-wash spatial perception features, multi-scale depth separable features, and multi-directional asymmetric receptive field features, achieving efficient extraction and utilization of multi-scale information. The loss module is improved through dense one-to-one matching strategy and matching perception loss function, which significantly reduces the number of model parameters while further improving the ability to detect subtle quality differences in tea.

[0036] 2. The improved DEIM framework can effectively cope with complex texture interference (such as interlacing leaf veins and changes in surface texture), morphological differences between different tea varieties, and real-time detection requirements in tea quality inspection. It can achieve accurate detection of tea age identification and quality grade classification. Through lightweight design, it achieves a balance between accuracy and efficiency, which has important practical significance for intelligent quality control in the tea industry and automated sorting of tea processing. Attached Figure Description

[0037] Figure 1 The flowchart of a multi-scale tea quality detection method based on improved DEIM provided by the present invention.

[0038] Figure 2 The overall architecture diagram of the improved DEIM framework provided by this invention.

[0039] Figure 3 The diagram of the EUCB_SC module added to the DEIM framework.

[0040] Figure 4 A diagram illustrating the Shift-Chnnel-Mix operation added to the DEIM framework.

[0041] Figure 5 A diagram of the C2f-IDWB module added to the DEIM framework.

[0042] Figure 6 A diagram of the PSConv module added to the DEIM framework. Detailed Implementation

[0043] The present invention will be further illustrated below with reference to the accompanying drawings and specific embodiments. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the invention. After reading this invention, any modifications of the invention in various equivalent forms by those skilled in the art will fall within the scope defined by the appended claims.

[0044] Example 1 This embodiment provides a multi-scale tea quality detection method based on improved DEIM. It constructs a standardized image dataset containing various tea ages and quality grades to ensure the diversity and representativeness of training samples. An enhanced upsampling convolutional block, EUCB_SC, is designed, which uses the Shift_channel_mix operation to achieve channel quadrature and spatial cyclic shift operations, enhancing cross-channel information interaction capabilities. An Inception-style depthwise separable convolutional block, IDWB, is proposed, its core consisting of Inception Depthwise Convolution and Conv-MLP. The module consists of four branches: identity mapping, square convolution, horizontal strip convolution, and vertical strip convolution. This significantly reduces the number of parameters while improving multi-scale feature extraction capabilities. A windmill-shaped convolution PSConv is introduced, utilizing four asymmetric filling modes to generate multi-directional receptive fields, enhancing the capture of tea leaf vein texture and irregular edge features. The module integrates the above three modules based on an improved real-time Transformer detection framework. It incorporates a dense one-to-one matching training strategy and a geometrically calibrated matching perceptual loss function GCMAL to achieve end-to-end tea quality detection and classification, outputting tea age recognition results and quality grade assessments. Figure 1-6 As shown, the specific steps include: Step 1: Obtain images of tea leaves of different ages and quality grades, and then process the images to create a dataset.

[0045] Images of tea leaves of different ages and quality grades were collected through on-site photography. These images were then processed to create a dataset containing various tea age and quality grade classifications.

[0046] The specific method for collecting data through on-site photography is as follows: images of tea leaves are taken at different times, under different lighting conditions, and from different angles in the tea garden to ensure that the samples cover tea leaves of different varieties, different growth stages, and different environmental conditions, so as to improve the diversity and representativeness of the data and avoid model overfitting.

[0047] Methods for creating datasets from obtained tea images through image processing include: Step 11: Remove images that do not meet the standards, such as blurry, overexposed, or underexposed images, from the collected images. Step 12: Image data is obtained by performing data enhancement processing on each image, such as random cropping, horizontal flipping, vertical flipping, random rotation, color jittering, and adding Gaussian noise.

[0048] Step 13: Use annotation software to annotate the collected images in COCO format to ensure that the bounding box accurately matches the outline of the tea leaves, and annotate the age category and quality grade of the tea leaves.

[0049] Step 14: After labeling, the dataset is divided into training set, validation set and test set in a ratio of 7:2:1.

[0050] Step 2: Construct a multi-scale tea quality detection network model based on the improved DEIM framework, such as... Figure 2 As shown, the proposed model structure incorporates three core modules: the enhanced upsampling convolutional module EUCB_SC, the Inception-style depthwise separable convolutional block IDWB, and the windmill-shaped convolution PSConv. These modules fuse channel-mixed spatial awareness features, multi-scale depthwise separable features, and multi-directional asymmetric receptive field features, achieving efficient extraction and utilization of multi-scale information. Furthermore, by introducing a dense one-to-one matching training strategy and the geometrically calibrated matching-aware loss function GCMAL, the model improves its ability to detect subtle quality differences in tea leaves, enabling it to handle tea leaves of different ages and quality grades simultaneously. The lightweight design significantly reduces the number of parameters, effectively enhancing the model's performance and efficiency. In the upsampling module, an enhanced upsampling convolutional module, EUCB_SC, is introduced. Shift_channel_mix is ​​used to perform channel segmentation and spatial cyclic shift operations, enhancing cross-channel information interaction capabilities. In the feature extraction stage, an Inception-style depthwise separable convolutional block, IDWB, is introduced. A branching processing strategy is employed to achieve multi-scale feature extraction, significantly reducing the number of parameters while improving feature representation capabilities. In the downsampling module, a windmill-shaped convolutional module, PSConv, is introduced. An asymmetric filling strategy generates multi-directional receptive fields, effectively capturing tea leaf vein texture and irregular edge features. Finally, a dense one-to-one matching training strategy and a geometrically calibrated matching perceptual loss function, GCMAL, are introduced to further enhance the model's ability to detect subtle quality differences in tea leaves.

[0051] The multi-scale tea quality detection network model based on the improved DEIM framework includes a feature extraction backbone network DEIM-Backbone, an encoder, and a decoder connected sequentially. The backbone network introduces an Inception-style depthwise separable convolutional block (IDWB), employing a branching strategy to proportionally allocate input channels to four branches: identity mapping, square convolution, horizontal stripe convolution, and vertical stripe convolution. The encoder's upsampling module introduces an enhanced upsampling convolutional module EUCB_SC, which uses the Shift_channel_mix operation to achieve channel segmentation and spatial cyclic shifting, enhancing cross-channel information interaction. The encoder's downsampling module introduces a windmill-shaped convolutional module PSConv, utilizing four asymmetric filling modes to generate multi-directional receptive fields, enhancing the capture of tea leaf vein texture and irregular edge features.

[0052] The backbone network includes an initialization module Stem, a first feature extraction module HGStage1, a second feature extraction module HGStage2, a third feature extraction module HGStage3, and a C2f_IDWB module connected in sequence. The C2f_IDWB module is formed by embedding Inception-style depthwise separable convolutional blocks IDWB in the C2f_IDWB module.

[0053] The encoder includes a first convolutional layer, a second convolutional layer, a third convolutional layer, an upsampling module, and a downsampling module. The upsampling module comprises, in sequence, a Transformer Layer (self-attention layer), a second convolutional layer, a first enhanced upsampling convolutional module (EUCB_SC), a second connection layer, a first reparameterizable multi-branch aggregation module (RepNCSPELAN), a second convolutional layer, a second enhanced upsampling convolutional module (EUCB_SC), and a second connection layer. The downsampling module comprises, in sequence, a second reparameterizable multi-branch aggregation module (RepNCSPELAN), a first windmill-shaped convolutional module (PSConv), a third connection layer, a third reparameterizable multi-branch aggregation module (RepNCSPELAN), a second windmill-shaped convolutional module (PSConv), a third connection layer, and a fourth reparameterizable multi-branch aggregation module (RepNCSPELAN).

[0054] The Inception-style deep separable convolutional block (IDWB), the first convolutional layer 1, and the self-attention layer (Transformer Layer) are sequentially connected. The third feature extraction module (HGStage3), the first convolutional layer 2, and the second convolutional layer 1 are sequentially connected. The second feature extraction module (HGStage2), the first convolutional layer 3, and the second convolutional layer 2 are sequentially connected. The second convolutional layer 1 and the third connection layer 2 are connected, and the second convolutional layer 2 and the third connection layer 1 are connected.

[0055] The Enhanced Upsampling Convolutional Module (EUCB_SC) combines upsampling, depthwise separable convolution, batch normalization, shift operation, channel shuffle, and 1×1 convolution to enhance feature representation. Specifically, the operation is as follows: First, the input feature map is enlarged using an upsampling operator; then, local features are extracted using depthwise separable convolution, and enhanced by batch normalization and the ReLU activation function; next, a shift operation is introduced to model spatial neighborhood relationships; then, a channel shuffle operation is used to shuffle and reassemble channel information; finally, a 1×1 convolution is used to compress the number of channels to match the next stage.

[0056] The IDWB module employs a branched deep convolution strategy, which works as follows: First, the input feature map is divided into four branches along the channel dimension. One branch is an identity mapping branch, directly retaining some channels to reduce computational redundancy; the other three branches are fed into convolution branches of different sizes for processing. Each branch uses different convolution operations to extract features, and finally, these features are concatenated along the channel dimension to form a new feature map. Next, the newly generated feature map is normalized and then processed through a multilayer perceptron based on 1×1 convolutions for cross-channel feature interaction. The final output is then added to the original input features via residual connections to enhance information transfer. To improve training stability, an adjustment parameter is added before the multilayer perceptron. Through this design combining partial-channel convolutions and a multi-branch structure, the IDWB module can effectively expand the receptive field while significantly reducing the computational and memory overhead of large-kernel deep convolutions.

[0057] The PSConv windmill-shaped convolutional module employs an asymmetric padding strategy, operating as follows: This module achieves high efficiency in parameter count while expanding the effective receptive field through asymmetric padding and directional convolution. First, the input feature map undergoes asymmetric padding in four directions, with each direction's convolution processed using kernels of different sizes. Each direction's convolution operation incorporates batch normalization and a non-linear activation function to enhance feature representation. Then, the convolution results from the four directions are concatenated along the channel dimension to obtain a new feature map. Next, a 2×2 convolution is used for feature fusion to obtain the final output. PSConv, through its windmill-shaped directional convolutional structure, significantly expands the receptive field while maintaining a low parameter count and enhances the response capability to small infrared targets, making it highly suitable for multi-scale tea quality detection tasks.

[0058] The process of constructing the enhanced upsampling convolutional module EUCB_SC is as follows: First, the upsampling operator is used. The input feature map is enlarged in spatial dimension; then it is passed through a 3×3 depthwise separable convolution. Extract local spatial features and integrate batch normalization. With nonlinear activation functions Feature enhancement is performed; then, additional shift operations (including horizontal and vertical offsets) are introduced to explicitly model spatial neighborhood relationships; next, a channel shuffle operation is used to shuffle and reorganize the feature representations at the channel level, thereby promoting cross-channel information interaction; finally, a 1×1 convolution is applied. Compress the number of channels to match the next stage.

[0059] Compared to the original EUCB, EUCB_SC achieves lightweight spatial-channel hybrid enhancement through a joint design of Shift and Channel Shuffle. The Shift operation compensates for the shortcomings of Channel Shuffle in spatial information modeling, while Channel Shuffle improves the interactivity of features between channels. The combination of the two further enhances feature representation capabilities while maintaining low computational overhead. The computation process of the enhanced upsampling convolution module EUCB_SC is as follows:

[0060] in, This represents the output feature map of the enhanced upsampling convolutional module EUCB_SC. This represents the feature map input to EUCB_SC. This represents an upsampling operator with a scaling factor of 2. This represents a 3×3 depthwise separable convolution. Indicates batch normalization, This represents the activation function. This represents a 1×1 convolution, used for channel compression.

[0061] Inception-style depthwise separable convolutional blocks (IDWB) The steps for constructing an Inception-style depthwise separable convolutional block (IDWB) are as follows: First, input the features. It is divided into four branches along the channel dimension:

[0062] in For the identity mapping branch, some channels are directly retained to reduce computational redundancy. , , These represent inputs of 3×3 and 1× respectively. , The feature subsets of the ×1 convolutional branch are fed into three different depth convolutional branches respectively:

[0063] in, , , Representing the input feature subsets respectively , , The output features after processing by the corresponding depth convolutional branch Indicates the length of the convolution kernel. This represents a depthwise separable convolution with a kernel size of m×n, where the resulting features are concatenated along the channel dimension.

[0064] Then, output Cross-channel feature interaction is performed after normalization and using an MLP based on 1×1 convolution:

[0065] in, This represents the final output characteristic of the IDWB module. Indicates residual connection, This represents a multilayer perceptron module implemented using 1×1 convolutions. This indicates a normalization operation.

[0066] By combining partial channel convolution with Inception-style multi-branch decomposition, IDWB can significantly reduce the computational and memory access overhead caused by large kernel depth convolution while effectively expanding the receptive field, thus achieving a better trade-off between speed and accuracy.

[0067] The steps to construct the windmill-shaped convolutional module PSConv are as follows: The windmill-shaped convolution module PSConv achieves a larger effective receptive field than standard convolution through asymmetric padding and directional convolution, while maintaining high efficiency in terms of parameter count. Its calculation process is as follows: Let the input feature map be The convolution stride is The number of output channels is First, the input is asymmetrically padded in four directions, and 1× is applied to each direction. or ×1 convolution:

[0068] in, , , , Indicates input features The four branches of output feature maps are obtained after performing asymmetric padding and convolution in four directions. This represents the activation function of the Sigmoid Linear Unit. This indicates a batch normalization operation. Indicates input features The feature map is filled with corresponding pixels in the left, right, top, and bottom directions. , , , These represent the kernel parameters for the four directional branches; the first two are... Striped convolution kernels, the latter two are Striped convolution kernel, with the number of output channels being , This indicates the number of output channels for each branch. , This indicates the number of fill pixels in the left, right, top, and bottom directions. This is a convolution operation. The concatenated convolution results are:

[0069] in, This represents the feature map after concatenating the output features from the four directional branches along the channel dimension. , This represents the height and width of the spliced ​​feature map. This represents the concatenation operation, followed by 2×2 convolution to fuse the results, yielding the final output:

[0070] in, This represents the final output feature map of the PSConv module. This represents the activation function of the Sigmoid Linear Unit. This indicates a batch normalization operation. Indicates the use of four-branch splicing features for fusion. Convolution kernel parameters, number of output channels , , , These represent the height, width, and number of channels of the final output feature map, respectively.

[0071] The windmill-shaped convolution module PSConv, through its windmill-shaped directional convolution structure, significantly expands the receptive field while maintaining a low parameter count and enhances the center response capability to small infrared targets, making it suitable for multi-scale tea quality detection tasks.

[0072] The decoder comprises N decoder layers connected in sequence, and the two adjacent decoder layers are connected by self-distillation. The output of the flatten layer is connected to the input of the first decoder layer, and the output of the last decoder layer is connected to the score prediction head ScorePred and the bounding box prediction head BoxPred, respectively.

[0073] Step 3: Based on the target distribution characteristics of the tea quality dataset described in this invention, a dense one-to-one matching (Dense O2O) training strategy is introduced under the improved DEIM framework. The original matching-aware loss function is improved to a geometrically calibrated matching-aware loss function (GCMAL) to enhance the optimization strength of low-quality matching samples and improve the consistency between classification confidence and localization quality. This allows for end-to-end optimization training of the multi-scale tea quality detection network model, resulting in a trained multi-scale tea quality detection network model based on the improved DEIM framework.

[0074] The optimization training method based on a dense one-to-one matching strategy and a matching-aware loss function includes: The optimized training method based on dense one-to-one matching strategy and matching-aware loss function includes: adopting a dense one-to-one matching Dense O2O mechanism, combined with data augmentation methods such as mosaic stitching and hybrid augmentation to increase the target density in a single training image, thereby increasing effective positive samples while maintaining the one-to-one allocation constraint; in terms of loss function design, the original matching-aware loss is improved from MAL to the geometrically calibrated matching-aware loss function GCMAL, introducing geometric information such as IoU and center distance into the matching quality factor, and setting low-quality matching reinforcement weights and confidence-quality consistency constraints, so that classification confidence and localization quality are optimized in synergy, enhancing the ability to distinguish and learn samples with different matching quality, thereby improving training stability, accelerating convergence and improving the accuracy of tea quality detection.

[0075] After training, the improved multi-scale tea quality detection network under the DEIM framework was tested using the test set of the collected dataset. Detection performance was comprehensively evaluated based on metrics such as average precision (AP), AP@IoU=0.50, AP@IoU=0.75, number of parameters, and computational complexity. Using the training and test sets obtained in step 1, the network model constructed in step 2 was trained end-to-end, and hyperparameters were adjusted. The specific steps included: Step 3.1 Based on the training set and validation set obtained in Step 1, perform end-to-end training on the network model constructed in Step 2 and complete hyperparameter tuning; during the training process, the training set images are used as input, multi-scale features are extracted through the improved DEIM framework and tea quality detection is completed, a dense one-to-one matching strategy is adopted, and the geometric calibration matching perceptual loss function GCMAL is introduced to jointly optimize classification and localization, thereby obtaining the trained model weight file.

[0076] Preferably, the dense one-to-one matching strategy in step 3.1 specifically includes the following process: Before inputting the training set images into the network, mosaic stitching and mixup are performed on each minimum batch of data with a preset probability to increase the number of real targets in a single training image; the augmented image and its set of real bounding boxes are input into the improved DEIM network to obtain the class confidence and bounding box regression results of N predicted queries; then, a matching cost matrix between the predicted and real targets is constructed, the cost being a weighted average of classification error and localization error, and the optimal one-to-one matching set Ω is obtained using the Hungarian algorithm; the matching pairs in Ω are used as positive samples to calculate classification and regression losses, while unmatched pairs are used as background samples and only participate in classification supervision; wherein the classification uses the matching-aware loss function MAL, the expression of which is:

[0077] in This represents the prediction confidence level for the matching category. This represents the IoU between the predicted bounding box and the ground truth bounding box. Indicates sample label, For adjustment coefficients; by... By introducing loss weights, samples with different matching quality can be given adaptive penalty strengths, thereby improving the optimizability of low-quality matching and improving the overall detection effect.

[0078] Preferably, in order to further improve the optimizability of low-quality matching samples during intensive one-to-one matching training and enhance the consistency between classification confidence and localization quality, this invention proposes an improved geometrically calibrated matching-aware loss function GCMAL based on the matching-aware loss function MAL, which is used to improve the classification loss term.

[0079] Specifically, for any matching pair in the one-to-one matching set Ω obtained by the Hungarian algorithm ,in To predict the bounding box, Define the geometry-aware matching degree to correspond to the real bounding box. for:

[0080] in This represents the intersection-union ratio (IoU) between the predicted bounding box and the ground truth bounding box. This represents the Euclidean distance between the center point of the predicted bounding box and the center point of the ground truth bounding box; This represents the diagonal length of the smallest bounding rectangle between the predicted bounding box and the ground truth bounding box; This is the distance penalty coefficient; This represents a truncation function that limits the range of values ​​for s to [0,1].

[0081] Furthermore, to enhance the training contribution of low-quality matched samples, low-quality reinforcement weights are introduced. :

[0082] in, To focus on the index, The smaller The larger the value, the stronger the optimization drive is applied to low-quality matching samples.

[0083] Meanwhile, to suppress the overconfidence phenomenon of low-quality but high-confidence products, a confidence-quality consistency calibration term is introduced. :

[0084] in, For calibration item weighting coefficients, The prediction confidence for the matching category.

[0085] Ultimately, the classification loss of GCMAL is defined as:

[0086] in, Indicates sample label, and These are the focus indices for positive and negative samples, respectively. It is a logarithmic function.

[0087] By introducing the geometric calibration matching perception loss function GCMAL, this invention can further improve training stability and convergence speed, and improve the overall accuracy of tea quality detection, even when the one-to-one matching strategy results in a large number of low-quality matches.

[0088] Step 3.2 The network model uses the weight parameters obtained during training to predict the validation set. It uses the three core modules EUCB_SC, IDWB and PSConv to extract multi-scale features of tea leaves and obtains preliminary performance indicators such as average accuracy, detection accuracy and model efficiency. Step 3.3 Adjust the model hyperparameters based on the validation set performance metrics, including parameters such as learning rate, batch size, number of training epochs, and loss function weights. Repeat the above training and validation steps, and obtain the optimal weight file through multiple rounds of iterative optimization. Step 3.4 Use the optimal weights to test the model's generalization ability on the test set, and verify the effectiveness and feasibility of the improved DEIM framework in the tea quality detection task.

[0089] Step 4: Use the trained multi-scale tea quality detection network model based on the improved DEIM framework to perform quality detection and age identification on the tea leaves to be tested.

[0090] Another embodiment of the present invention provides a multi-scale tea quality detection system based on improved DEIM, used to implement the multi-scale tea quality detection method based on improved DEIM, including an input unit, an image processing unit, a multi-scale tea quality detection network model unit based on the improved DEIM framework, a training unit, a detection unit, and an output unit, wherein: The input unit receives images of tea leaves of different ages and quality grades. It is used to input the image of the tea leaf to be tested.

[0091] The image processing unit is used to create a dataset from the obtained tea images through image processing.

[0092] The multi-scale tea quality detection network model unit based on the improved DEIM framework is used to construct a multi-scale tea quality detection network model based on the improved DEIM framework. This model includes a feature extraction backbone network DEIM-Backbone, an encoder, and a decoder connected sequentially. The backbone network introduces Inception-style depthwise separable convolutional blocks (IDWB), employing a branching strategy to proportionally allocate input channels to four branches: identity mapping, square convolution, horizontal stripe convolution, and vertical stripe convolution. The encoder's upsampling module introduces an enhanced upsampling convolution module EUCB_SC, which uses the Shift_channel_mix operation to achieve channel segmentation and spatial cyclic shifting, enhancing cross-channel information interaction capabilities. The encoder's downsampling module introduces a windmill-shaped convolution module PSConv, utilizing four asymmetric filling modes to generate multi-directional receptive fields, enhancing the ability to capture tea leaf vein texture and irregular edge features.

[0093] The training unit is used to train a multi-scale tea quality detection network model based on the improved DEIM framework by introducing a dense one-to-one matching training strategy and a geometric calibration matching perceptual loss function (GCMAL) based on the dataset, thereby obtaining a trained multi-scale tea quality detection network model based on the improved DEIM framework.

[0094] The detection unit is used to perform quality detection and age recognition of tea leaves by using a trained multi-scale tea quality detection network model based on the improved DEIM framework through images of the tea leaves to be detected.

[0095] The output unit is used to output the detected quality and age.

[0096] This invention enables end-to-end tea quality detection and classification, outputting tea age identification results and quality grade assessments. The method and system of this invention are specifically optimized for tea texture features, and combined with lightweight design and multi-scale feature enhancement, enabling the model to adapt to the intelligent quality control needs of different tea varieties. This invention can automatically learn the visual feature changes of tea at different age stages through deep neural networks, achieving accurate tea age identification and quality grading, improving the accuracy and automation level of tea grading, providing technical support for the intelligent upgrading and standardized production of the tea industry, and promoting the transformation of traditional agriculture to precision agriculture.

[0097] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A multi-scale tea quality detection method based on improved DEIM, characterized in that, Includes the following steps: Step 1: Obtain images of tea leaves of different ages and quality grades, and use image processing to create a dataset from the obtained tea leaf images; Step 2: Construct a multi-scale tea quality detection network model based on the improved DEIM framework; Step 3: Train the multi-scale tea quality detection network model based on the improved DEIM framework using the dataset. In the multi-scale tea quality detection network model based on the improved DEIM framework, a dense one-to-one matching Dense O2O training strategy is introduced, and the original matching perception loss function is improved to the geometric calibration matching perception loss function GCMAL to enhance the optimization strength of low-quality matching samples and improve the consistency between classification confidence and localization quality. Thus, the multi-scale tea quality detection network model is optimized end-to-end, and the trained multi-scale tea quality detection network model based on the improved DEIM framework is obtained. Step 4: Use the trained multi-scale tea quality detection network model based on the improved DEIM framework to perform quality detection and age identification on the tea leaves to be tested.

2. The multi-scale tea quality detection method based on improved DEIM according to claim 1, characterized in that: The multi-scale tea quality detection network model based on the improved DEIM framework includes a feature extraction backbone network DEIM-Backbone, an encoder, and a decoder connected in sequence. The backbone network includes an initialization module Stem, a first feature extraction module HGStage1, a second feature extraction module HGStage2, a third feature extraction module HGStage3, and a C2f_IDWB module connected in sequence. The C2f_IDWB module is formed by embedding Inception-style depthwise separable convolutional blocks IDWB within the C2f_IDWB module. The backbone network introduces... The Inception-style depthwise separable convolutional block (IDWB) employs a branching strategy to proportionally allocate input channels to four branches: identity mapping, square convolution, horizontal stripe convolution, and vertical stripe convolution. The encoder's upsampling module introduces an enhanced upsampling convolutional module (EUCB_SC), which uses the Shift_channel_mix operation to perform channel segmentation and spatial cyclic shifting, enhancing cross-channel information interaction. The encoder's downsampling module introduces a windmill-shaped convolutional module (PSConv), utilizing four asymmetric filling modes to generate multi-directional receptive fields, enhancing the capture of tea leaf vein texture and irregular edge features.

3. The multi-scale tea quality detection method based on improved DEIM according to claim 2, characterized in that: The encoder includes a first convolutional layer, a second convolutional layer, a third convolutional layer, an upsampling module, and a downsampling module. The upsampling module includes a Transformer Layer with self-attention, a second convolutional layer, a first enhanced upsampling convolutional module EUCB_SC, a second connection layer, a first reparameterizable multi-branch aggregation module RepNCSPELAN, a second convolutional layer, a second enhanced upsampling convolutional module EUCB_SC, and a second connection layer, all connected in sequence. The downsampling module includes a second reparameterizable multi-branch aggregation module RepNCSPELAN, a first windmill-shaped convolutional module PSConv, a third connection layer, a third reparameterizable multi-branch aggregation module RepNCSPELAN, a second windmill-shaped convolutional module PSConv, a third connection layer, and a fourth reparameterizable multi-branch aggregation module RepNCSPELAN, all connected in sequence. The Inception-style deep separable convolutional block IDWB, the first convolutional layer 1, and the self-attention layer TransformerLayer are connected in sequence. The third feature extraction module HGStage3, the first convolutional layer 2, and the second convolutional layer 1 are connected in sequence. The second feature extraction module HGStage2, the first convolutional layer 3, and the second convolutional layer 2 are connected in sequence. The second convolutional layer 1 and the third connection layer 2 are connected. The second convolutional layer 2 and the third connection layer 1 are connected. The decoder comprises N decoder layers connected in sequence, and the two adjacent decoder layers are connected by self-distillation. The output of the flatten layer is connected to the input of the first decoder layer, and the output of the last decoder layer is connected to the score prediction head ScorePred and the bounding box prediction head BoxPred, respectively.

4. The multi-scale tea quality detection method based on improved DEIM according to claim 3, characterized in that: The geometric calibration matching perception loss function (GCMAL) in step 2 is as follows: in, This represents the geometric calibration matching sensing loss. This represents the prediction confidence level for the matching category. For geometrically perceptual matching degree, Indicates sample label, To focus on the index, and These are the focus indices for positive and negative samples, respectively. This refers to the weighting coefficients for the calibration items.

5. The multi-scale tea quality detection method based on improved DEIM according to claim 4, characterized in that: Step 2 introduces a dense one-to-one matching Dense O2O training strategy and improves the original match-aware loss function to the geometrically calibrated match-aware loss function GCMAL. The methods include: Before inputting the training set images into the network, mosaic stitching and blending augmentation are performed on each minimum batch of data with a preset probability to increase the number of real targets in a single training image. The augmented image and its set of ground truth bounding boxes are then input into a multi-scale tea quality detection network model based on the improved DEIM framework to obtain the class confidence and bounding box regression results for N predicted queries. Subsequently, a matching cost matrix between the predicted and the real targets is constructed, with the cost consisting of a weighted average of classification error and localization error. The optimal one-to-one matching set Ω is obtained using the Hungarian algorithm. Matching pairs within Ω are used as positive samples to calculate classification and regression losses, while unmatched pairs are used as background samples and only participate in classification supervision. The classification uses the matching-aware loss function MAL, whose expression is: in, The classification uses a matching-perceptual loss function. This represents the prediction confidence level for the matching category. This represents the intersection-union ratio (IoU) between the predicted bounding box and the ground truth bounding box. Indicates sample label, The adjustment coefficient is ; For any matching pair in the one-to-one matching set Ω obtained by the Hungarian algorithm ,in To predict the bounding box, Define the geometry-aware matching degree to correspond to the real bounding box. for: in, This represents the intersection-union ratio (IoU) between the predicted bounding box and the ground truth bounding box. This represents the Euclidean distance between the center point of the predicted bounding box and the center point of the ground truth bounding box; This represents the diagonal length of the smallest bounding rectangle between the predicted bounding box and the ground truth bounding box; This is the distance penalty coefficient; This represents the truncation function; Introducing low-quality enhancement weights : in, To focus on the index; Introducing a confidence level-quality consistency calibration item : in, For calibration item weighting coefficients, The prediction confidence for the matching category; This leads to the geometric calibration matching perception loss function GCMAL.

6. The multi-scale tea quality detection method based on improved DEIM according to claim 5, characterized in that: The process of constructing the enhanced upsampling convolutional module EUCB_SC in step 2 is as follows: First, the upsampling operator is used. The input feature map is enlarged in spatial dimension; then it is passed through a 3×3 depthwise separable convolution. Extract local spatial features and integrate batch normalization. With nonlinear activation functions Feature enhancement is performed; then, a shift operation is introduced to explicitly model spatial neighborhood relationships; next, a channel shuffle operation is used to shuffle and reorganize feature representations at the channel level, thereby promoting cross-channel information interaction; finally, a 1×1 convolution is applied. Compress the number of channels to match the next stage; The calculation process of the enhanced upsampling convolution module EUCB_SC is as follows: in, This represents the output feature map of the enhanced upsampling convolutional module EUCB_SC. This represents the feature map input to EUCB_SC. This represents an upsampling operator with a scaling factor of 2. This represents a 3×3 depthwise separable convolution. Indicates batch normalization, This represents the activation function. This represents a 1×1 convolution; The steps in step 2 to construct the Inception-style depthwise separable convolutional block (IDWB) are as follows: First, input the features. It is divided into four branches along the channel dimension: in For the identity mapping branch, some channels are directly retained to reduce computational redundancy; , , These represent inputs of 3×3 and 1× respectively. , The feature subsets of the ×1 convolutional branch are fed into three different depth convolutional branches respectively: in, , , Representing the input feature subsets respectively , , The output features after processing by the corresponding depth convolutional branch Indicates the length of the convolution kernel. This represents a depthwise separable convolution with a kernel size of m×n, where the resulting features are concatenated along the channel dimension. Then, output Cross-channel feature interaction is performed after normalization and using an MLP based on 1×1 convolution: in, This represents the final output characteristic of the IDWB module. Indicates residual connection, This represents a multilayer perceptron module implemented using 1×1 convolutions. This indicates a normalization operation.

7. The multi-scale tea quality detection method based on improved DEIM according to claim 6, characterized in that: The steps in step 2 to construct the windmill-shaped convolutional module PSConv are as follows: The windmill-shaped convolution module PSConv achieves a larger effective receptive field than standard convolution through asymmetric padding and directional convolution, while maintaining high efficiency in terms of parameter count; its calculation process is as follows: Let the input feature map be The convolution stride is The number of output channels is First, the input is asymmetrically padded in four directions, and 1× is applied to each direction. or ×1 convolution: in, , , , Indicates input features The four branches of output feature maps are obtained after performing asymmetric padding and convolution in four directions. This represents the activation function of the Sigmoid Linear Unit. This indicates a batch normalization operation. Indicates input features The feature map is filled with corresponding pixels in the left, right, top, and bottom directions. , , , These represent the kernel parameters for the four directional branches; the first two are... Striped convolution kernels, the latter two are Striped convolution kernel, with the number of output channels being , This indicates the number of output channels for each branch. , This indicates the number of fill pixels in the left, right, top, and bottom directions. This is a convolution operation; the concatenated convolution results are: in, This represents the feature map after concatenating the output features from the four directional branches along the channel dimension. , This represents the height and width of the spliced ​​feature map. This represents the concatenation operation, followed by 2×2 convolution to fuse the results, yielding the final output: in, This represents the final output feature map of the PSConv module. This represents the activation function of the Sigmoid Linear Unit. This indicates a batch normalization operation. Indicates the use of four-branch splicing features for fusion. Convolution kernel parameters, number of output channels , , , These represent the height, width, and number of channels of the final output feature map, respectively.

8. The multi-scale tea quality detection method based on improved DEIM according to claim 7, characterized in that: The method for creating a dataset from the obtained tea images in step 1 through image processing includes: Step 1.1: Remove blurry, overexposed, and underexposed images from the collected tea leaf pictures; Step 1.2: Image data is obtained by performing data augmentation processing on each image, including random cropping, horizontal flipping, vertical flipping, random rotation, color dithering, and adding Gaussian noise. Step 1.3: Use annotation software to annotate the collected images in COCO format to ensure that the bounding box accurately matches the outline of the tea leaves, and annotate the age category and quality grade of the tea leaves. Step 1.4: After labeling, the dataset is divided into training set, validation set and test set according to the proportions.

9. The multi-scale tea quality detection method based on improved DEIM according to claim 8, characterized in that: The dense one-to-one matching training strategy and the geometrically calibrated matching-aware loss function (GCMAL) method include: The Dense O2O matching mechanism is adopted to increase the number of positive samples in a single image through data augmentation. At the same time, the original matching perception loss function is improved to the geometric calibration matching perception loss function GCMAL to enhance the optimization strength of low-quality matching samples and improve the consistency between classification confidence and localization quality. This improves the ability to distinguish between different quality matches during training, speeds up model convergence, and improves the accuracy of tea quality detection.

10. A multi-scale tea quality detection system based on improved DEIM, characterized in that, The method for implementing the multi-scale tea quality detection method based on the improved DEIM framework as described in claim 1 includes an input unit, an image processing unit, a multi-scale tea quality detection network model unit based on the improved DEIM framework, a training unit, a detection unit, and an output unit, wherein: The input unit is used to input images of tea leaves of different ages and quality grades; it is also used to input images of tea leaves to be tested. The image processing unit is used to generate a dataset from the obtained tea images through image processing; The multi-scale tea quality detection network model unit based on the improved DEIM framework is used to construct a multi-scale tea quality detection network model based on the improved DEIM framework. The training unit is used to train a multi-scale tea quality detection network model based on the improved DEIM framework by introducing a dense one-to-one matching training strategy and a geometric calibration matching perceptual loss function GCMAL based on the dataset, so as to obtain a trained multi-scale tea quality detection network model based on the improved DEIM framework. The detection unit is used to perform quality detection and age recognition of tea leaves by using a trained multi-scale tea quality detection network model based on the improved DEIM framework through images of tea leaves to be detected. The output unit is used to output the detected quality and age.