Field tobacco leaf segmentation method, device, equipment and medium

By using an improved UNet network, utilizing the ResNet50 residual structure and DC-ECA attention module, and combining a composite learning rate strategy, the problem of balancing robustness, accuracy, and real-time performance in field tobacco leaf segmentation was solved, achieving high-precision, real-time tobacco leaf segmentation suitable for intelligent tobacco harvesting equipment.

CN122244451APending Publication Date: 2026-06-19SOUTH CHINA AGRICULTURAL UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SOUTH CHINA AGRICULTURAL UNIVERSITY
Filing Date
2026-04-24
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies for tobacco leaf segmentation in the field suffer from insufficient robustness in complex scenarios, limited ability to extract fine-grained features, and difficulty in balancing accuracy and real-time performance, making it difficult to achieve high-precision, real-time tobacco leaf segmentation in complex environments.

Method used

An improved UNet network is adopted, which uses the ResNet50 residual structure to build the encoder and embeds the DC-ECA attention module in the decoder. Combined with a two-stage compound learning rate adjustment strategy, the model training process is optimized and the accuracy of tobacco leaf feature extraction and segmentation is enhanced.

Benefits of technology

It significantly improves the robustness and accuracy of tobacco leaf segmentation, enabling precise pixel-level segmentation of tobacco leaves in various shapes and sizes, such as those with shading, abnormal lighting, and wrinkling, in complex field environments. It is adapted to the actual operational needs of intelligent tobacco harvesting equipment, with an average crossover ratio of 7.25% and an inference speed of 56.53 FPS.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244451A_ABST
    Figure CN122244451A_ABST
Patent Text Reader

Abstract

This application relates to a method, apparatus, device, and medium for segmenting tobacco leaves in a field. The method includes: acquiring an image of a tobacco field environment containing target tobacco leaves; updating the encoder of a first tobacco leaf segmentation model to a ResNet50 residual structure; embedding a DC-ECA module in the decoder of the first tobacco leaf segmentation model before performing upsampling operations in each of the remaining upsampling layers, except for the first upsampling layer, to construct a second tobacco leaf segmentation model. The ResNet50 residual structure includes an Identity Block and a Conv Block, and the DC-ECA module includes a 3×3 convolutional layer, a channel attention mechanism module, and another 3×3 convolutional layer connected in sequence; inputting the tobacco field environment image into the second tobacco leaf segmentation model, which has been trained to convergence, and performing pixel-level segmentation through model inference to output the foreground region of the target tobacco leaf in the tobacco field environment image. This application can significantly improve the accuracy of tobacco leaf edge segmentation and reduce missegmentation and missed segmentation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image processing, and more particularly to a method for segmenting tobacco leaves in a field, a corresponding apparatus, electronic equipment, and a computer-readable storage medium. Background Technology

[0002] As an important economic crop, tobacco leaf harvesting is a crucial step in flue-cured tobacco production, and its quality directly affects the quality of the tobacco leaves and economic benefits. Currently, the harvesting process still heavily relies on manual labor. Affected by the high temperature and humidity of the harvesting season, labor costs continue to rise, becoming a core bottleneck restricting industrial upgrading. Therefore, tobacco harvesting urgently needs industrial transformation. In mechanized and intelligent tobacco harvesting operations, accurate identification and segmentation of fresh tobacco leaves are the core prerequisites for achieving automated harvesting.

[0003] Early tobacco leaf segmentation primarily relied on traditional image processing techniques. These methods depended on low-level visual features such as color, texture, and shape, and used algorithms like thresholding, region growing, and edge detection to extract leaf contours. While these traditional methods could achieve some segmentation results in simple scenarios with uniform backgrounds, they struggled to reliably capture complete leaf features in the complex real-world environment of tobacco fields. Factors such as leaf occlusion, drastic changes in natural light, and cluttered backgrounds made them highly susceptible to problems like missing contours, missegmentation, and missed segmentation. Their environmental adaptability and robustness were severely lacking, failing to meet the demands of actual production.

[0004] Currently, with the development of computer vision technology, end-to-end deep learning methods are being used more and more widely in the field of agricultural image segmentation, including networks such as UNet, PSPNet, and DeepLabv3+ and their variants. However, the following technical shortcomings still exist:

[0005] Firstly, the model lacks robustness in complex scenarios. It is poorly adaptable to scenarios such as leaf occlusion in the field, dynamic lighting changes, low light / exposure, and interference from weeds and soil background, which can easily lead to a significant decrease in segmentation accuracy.

[0006] Secondly, the fine-grained feature extraction capability is limited. General networks struggle to balance shallow detailed features with deep semantic information, and fail to capture key features such as tobacco leaf edges, textures, and contours sufficiently, resulting in blurred boundaries and incomplete segmentation.

[0007] Third, it is difficult to balance accuracy and real-time performance. High-precision models have redundant structures and large computational loads, making it difficult to achieve real-time inference on embedded harvesting equipment; lightweight models, on the other hand, have weak feature representation capabilities and cannot adapt to the segmentation of tobacco leaves with diverse shapes.

[0008] Fourth, existing methods are not optimized for occlusion and complex backgrounds. They are mostly designed for simple scenarios and do not provide specific robustness improvements for typical problems such as overlapping and occlusion of tobacco leaves, leaf damage, and wrinkling. This can easily lead to missegmentation and missed segmentation.

[0009] In summary, existing field tobacco leaf segmentation models suffer from problems such as insufficient robustness in complex scenarios, limited fine-grained feature extraction capabilities, and difficulty in balancing accuracy and real-time performance. The applicant has made corresponding explorations to address these issues. Summary of the Invention

[0010] The purpose of this application is to solve the above-mentioned problems by providing a method for cutting tobacco leaves in the field, a corresponding device, an electronic device, and a computer-readable storage medium.

[0011] To achieve the various objectives of this application, the following technical solution is adopted:

[0012] A field tobacco leaf segmentation method proposed for one of the purposes of this application includes:

[0013] Acquire an image of the tobacco field environment containing the target tobacco leaves;

[0014] The encoder of the first tobacco leaf segmentation model is updated to a ResNet50 residual structure. In the decoder of the first tobacco leaf segmentation model, except for the first upsampling layer, a DC-ECA module is embedded before the upsampling operation is performed in each of the other upsampling layers to construct the second tobacco leaf segmentation model. The ResNet50 residual structure includes an Identity Block and a Conv Block. The DC-ECA module includes a 3×3 convolutional layer, a channel attention mechanism module, and a 3×3 convolutional layer connected in sequence.

[0015] The tobacco field environment image is input into the second tobacco leaf segmentation model that has been trained to convergence. The model performs pixel-level segmentation through inference to output the foreground region of the target tobacco leaf in the tobacco field environment image, thereby completing the segmentation of tobacco leaves in the field.

[0016] Optionally, the Conv Block includes a first 1×1 convolutional layer, a first batch normalization layer, a first activation layer, a first 3×3 convolutional layer, a second batch normalization layer, a second activation layer, a second 1×1 convolutional layer, and a third batch normalization layer, all connected in series. The Conv Block also includes a first skip connection path, which includes a third 1×1 convolutional layer and a fourth batch normalization layer connected in series. The output of the first skip connection path is connected to the output of the third batch normalization layer, and the elements are added together element by element before being output through the third activation layer.

[0017] The Identity Block comprises a fourth 1×1 convolutional layer, a fifth batch normalization layer, a fourth activation layer, a second 3×3 convolutional layer, a sixth batch normalization layer, a fifth activation layer, a fifth 1×1 convolutional layer, and a seventh batch normalization layer, all connected in series. The Identity Block also includes a second skip connection path, the output of which is connected to the output of the seventh batch normalization layer, and the output is obtained by adding elements one by one through the sixth activation layer.

[0018] Optionally, the step of inputting the tobacco field environment image into a second tobacco leaf segmentation model that has been trained to a convergent state, performing pixel-level segmentation through model inference, and outputting the foreground region of the target tobacco leaf in the tobacco field environment image includes:

[0019] The tobacco field environment image is input into the second tobacco leaf segmentation model that has been trained to convergence. When the image features in the tobacco field environment image are processed by the Conv Block, they are sequentially transformed through the first 1×1 convolutional layer, the first batch normalization layer, the first activation layer, the first 3×3 convolutional layer, the second batch normalization layer, the second activation layer, the second 1×1 convolutional layer, and the third batch normalization layer. At the same time, the input features are adjusted in dimension and channel by the third 1×1 convolutional layer and the fourth batch normalization layer in the first skip connection path. The output features of the first skip connection path and the output features of the third batch normalization layer are added element by element, and then the tobacco leaf enhancement features are output through the third activation layer.

[0020] The enhanced tobacco leaf features are input into the Identity Block for processing. The main path features are extracted sequentially through the fourth 1×1 convolutional layer, the fifth batch normalization layer, the fourth activation layer, the second 3×3 convolutional layer, the sixth batch normalization layer, the fifth activation layer, the fifth 1×1 convolutional layer, and the seventh batch normalization layer. At the same time, the input features are directly transmitted through the second skip connection path. The output features of the second skip connection path are added element by element to the output features of the seventh batch normalization layer, and then the deep coding features of the tobacco leaf are output through the sixth activation layer.

[0021] Based on the deep coding features of the tobacco leaves, pixel-level segmentation is performed through subsequent model inference to output the foreground region of the tobacco leaves in the tobacco field environment image.

[0022] Optionally, the step of inputting the tobacco field environment image into a second tobacco leaf segmentation model that has been trained to a convergent state, performing pixel-level segmentation through model inference, and outputting the foreground region of the target tobacco leaf in the tobacco field environment image includes:

[0023] The deep coding features of tobacco leaves output by the encoder of the second tobacco leaf segmentation model are obtained and then entered into the decoder for upsampling decoding processing.

[0024] Before performing upsampling operations in each upsampling layer except the first upsampling layer, the current feature map is processed by the DC-ECA module. First, the local feature receptive field is enhanced by a convolutional layer with a kernel size of 3×3. Then, the key channel features related to the target tobacco leaf are learned and strengthened by the channel attention mechanism module. Finally, the attention-weighted features are refined and fused by a convolutional layer with a kernel size of 3×3 to obtain the optimized tobacco leaf decoding features.

[0025] Based on the optimized tobacco leaf decoding features, upsampling and pixel-level classification inference are performed to output the foreground region of tobacco leaves in the tobacco field environment image.

[0026] Optionally, the steps for training the second tobacco leaf segmentation model include:

[0027] The second tobacco leaf segmentation model is trained using a two-stage compound learning rate adjustment strategy. During the warm-up phase of model training, the learning rate is dynamically increased in a non-linear quadratic function manner.

[0028] During the annealing phase of model training, the cosine annealing algorithm is used to adaptively decay the learning rate, and the training is iteratively performed according to the two-stage learning rate update rule until the second tobacco leaf segmentation model reaches convergence.

[0029] Optionally, the basic network architecture of the first tobacco leaf segmentation model is the original UNet network, and the basic network architecture of the second tobacco leaf segmentation model is an improved UNet network.

[0030] Optionally, the tobacco leaf foreground region refers to an image region that is precisely separated at the pixel level from a complex tobacco field environment image, containing only the target tobacco leaf, and excluding soil, weeds, tobacco stalks, and other field background.

[0031] A field tobacco leaf splitting device provided for another purpose of this application includes:

[0032] The tobacco field image acquisition module is configured to acquire images of the tobacco field environment containing the target tobacco leaves;

[0033] The segmentation model construction module is configured to update the encoder of the first tobacco leaf segmentation model to a ResNet50 residual structure. In the decoder of the first tobacco leaf segmentation model, except for the first upsampling layer, a DC-ECA module is embedded before the upsampling operation is performed in each of the other upsampling layers to construct the second tobacco leaf segmentation model. The ResNet50 residual structure includes an Identity Block and a Conv Block. The DC-ECA module includes a 3×3 convolutional layer, a channel attention mechanism module, and a 3×3 convolutional layer connected in sequence.

[0034] The field tobacco leaf segmentation module is configured to input the tobacco field environment image into a second tobacco leaf segmentation model that has been trained to a convergent state, perform pixel-level segmentation through model inference, and output the foreground region of the target tobacco leaf in the tobacco field environment image to complete the segmentation of the field tobacco leaves.

[0035] An electronic device provided for another purpose of this application includes a central processing unit and a memory, the central processing unit being configured to invoke and run a computer program stored in the memory to perform the steps of the field tobacco leaf splitting method described in this application.

[0036] A computer-readable storage medium is provided for another purpose of this application, which stores, in the form of computer-readable instructions, a computer program implemented according to the field tobacco leaf splitting method, which, when invoked by a computer, executes the steps included in the corresponding method.

[0037] Compared to existing technologies, this application addresses the problems of insufficient robustness in complex scenarios, limited fine-grained feature extraction capabilities, and difficulty in balancing accuracy and real-time performance in existing field tobacco leaf segmentation models. This application offers the following beneficial effects, including but not limited to:

[0038] Firstly, the improved UNet network in this application employs a ResNet50 residual structure to construct the encoder, enabling deep extraction of multi-scale texture and edge detail features of tobacco leaves, thus alleviating the gradient vanishing problem in deep networks. Simultaneously, a DC-ECA attention module is embedded in the decoder to effectively suppress redundant interference from the field background, enhance the expression of tobacco leaf target features, significantly improve the accuracy of tobacco leaf edge segmentation, reduce missegmentation and missed segmentation, and achieve pixel-level high-precision segmentation. The proposed method achieves an average intersection-union ratio (mIoU) of 99.02% for tobacco leaf segmentation in complex tobacco field environments, a 7.25% improvement over the original UNet network. Furthermore, the improved UNet network exhibits strong environmental robustness, significantly outperforming other semantic segmentation models in complex environments such as local occlusion and overexposure, and achieves an inference speed of 56.53 FPS.

[0039] Secondly, the improved UNet network in this application adopts a network structure that combines a ResNet50 residual encoder with a DC-ECA attention decoder, which significantly improves the robustness, fine-grained feature extraction capability and segmentation accuracy of tobacco leaf segmentation in the field. At the same time, it takes into account the real-time performance of inference, and can achieve accurate pixel-level segmentation of tobacco leaves with various shapes such as shading, abnormal lighting, wrinkling and damage in complex field environments. It can efficiently adapt to the actual operation requirements of intelligent tobacco harvesting equipment.

[0040] Thirdly, the improved UNet network in this application adopts a two-stage composite learning rate adjustment strategy. In the warm-up stage, the parameters are smoothly updated through nonlinear quadratic function growth, which suppresses gradient oscillations in the early stage of training. In the annealing stage, the learning rate is adaptively decayed through cosine annealing algorithm, which helps the model escape local optima and converge to the global optimum, balancing training efficiency and model performance, shortening the training cycle and improving the model segmentation accuracy.

[0041] Fourth, the improved UNet network in this application can be lightweightly deployed on the Jetson Orin Nano embedded development board. The accompanying hardware system consists of a power supply, a depth camera, the Jetson Orin Nano embedded development board, and a display screen. This enables real-time acquisition, real-time inference, and visualization of segmentation results of fresh tobacco leaf images in the field. It can be directly adapted to intelligent tobacco harvesters, providing precise support for end-effector positioning and operation planning, and can be quickly transformed into practical operational capabilities. Attached Figure Description

[0042] The above and / or additional aspects and advantages of this application will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein:

[0043] Figure 1 This is an exemplary network architecture used in the field tobacco leaf segmentation method of this application;

[0044] Figure 2 This is an exemplary architecture diagram of the field tobacco leaf segmentation hardware system in the embodiments of this application;

[0045] Figure 3 This is an exemplary network architecture diagram of the improved UNet network in the embodiments of this application;

[0046] Figure 4 This is an exemplary architecture diagram of the Identity Block and Conv Block in the embodiments of this application;

[0047] Figure 5 This is a schematic diagram of the DC-ECA module in an embodiment of this application;

[0048] Figure 6 This is a schematic diagram of the channel attention module in an embodiment of this application;

[0049] Figure 7 This is a comparison diagram of the tobacco leaf segmentation effect under different working conditions in the embodiments of this application;

[0050] Figure 8 This is a schematic diagram of the field tobacco leaf splitting device in the embodiments of this application;

[0051] Figure 9This is a schematic diagram of the structure of the computer device in the embodiments of this application. Detailed Implementation

[0052] The embodiments of this application are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain this application, and should not be construed as limiting this application.

[0053] Those skilled in the art will understand that, unless specifically stated otherwise, the singular forms “a,” “an,” “the,” and “the” used herein may also include the plural forms. It should be further understood that the term “comprising” as used in this application means the presence of the stated features, integers, steps, operations, elements, and / or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and / or groups thereof. It should be understood that when we say an element is “connected” or “coupled” to another element, it can be directly connected or coupled to the other element, or there may be intermediate elements. Furthermore, “connected” or “coupled” as used herein can include wireless connections or wireless coupling. The term “and / or” as used herein includes all or any units and all combinations of one or more associated listed items.

[0054] Those skilled in the art will understand that, unless otherwise defined, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains. It should also be understood that terms such as those defined in general dictionaries should be understood to have the same meaning as in the context of the prior art, and should not be interpreted in an idealized or overly formal sense unless specifically defined as herein.

[0055] Those skilled in the art will understand that the terms "client," "terminal," and "terminal device" as used herein include both devices that receive wireless signals, devices that only possess wireless signal receiver capabilities without transmission capabilities, and devices with receiving and transmitting hardware, devices that have receiving and transmitting hardware capable of bidirectional communication over a bidirectional communication link. Such devices may include: cellular or other communication devices such as personal computers or tablets, having single-line displays, multi-line displays, or cellular or other communication devices without multi-line displays; PCS (Personal Communications Service) that can combine voice, data processing, fax, and / or data communication capabilities; PDAs (Personal Digital Assistants) that may include radio frequency receivers, pagers, internet / intranet access, web browsers, notebooks, calendars, and / or GPS (Global Positioning System) receivers; and conventional laptops and / or handheld computers or other devices that have and / or include radio frequency receivers. As used herein, "client," "terminal," and "terminal device" can be portable, transportable, installed in a means of transportation (air, sea, and / or land), or suitable and / or configured to operate locally and / or in a distributed manner, operating in any other location on Earth and / or in space. "Client," "terminal," and "terminal device" as used herein can also be a communication terminal, an internet access terminal, or a music / video playback terminal, such as a PDA, a MID (Mobile Internet Device), and / or a mobile phone with music / video playback capabilities, or a smart TV, set-top box, etc.

[0056] The hardware referred to by the names "server," "client," and "service node" in this application is essentially an electronic device with the equivalent capabilities of a personal computer. It is a hardware device with the necessary components revealed by the von Neumann architecture, such as a central processing unit (including an arithmetic logic unit and a control unit), memory, input devices, and output devices. The computer program is stored in its memory, and the central processing unit loads the program stored in the secondary storage into the main memory to run it, execute the instructions in the program, and interact with the input and output devices to complete specific functions.

[0057] It should be noted that the concept of "server" used in this application can also be extended to the case of server clusters. Based on the network deployment principles understood by those skilled in the art, the servers should be logically divided. Physically, these servers can be independent of each other but accessible through interfaces, or they can be integrated into a single physical computer or a computer cluster. Those skilled in the art should understand this flexibility and should not use it to constrain the implementation of the network deployment method in this application.

[0058] One or more of the technical features of this application, unless explicitly specified herein, can be deployed on a server and accessed by a client remotely calling the online service interface provided by the server, or can be directly deployed and run on a client for access.

[0059] Unless otherwise specified, the neural network models referenced or potentially referenced in this application may be deployed on a remote server and invoked remotely on the client, or deployed on a client with the capability to invoke directly. In some embodiments, when running on the client, the corresponding intelligence may be acquired through transfer learning in order to reduce the requirements on the client's hardware resources and avoid excessive consumption of the client's hardware resources.

[0060] Unless otherwise specified, all data involved in this application may be stored remotely on a server or on a local terminal device, as long as it is suitable for use by the technical solution of this application.

[0061] Those skilled in the art will understand that although the various methods in this application are described based on the same concept and thus present commonality among them, they can be performed independently unless otherwise specified. Similarly, the various embodiments disclosed in this application are all based on the same inventive concept; therefore, concepts expressed in the same way, as well as concepts that are appropriately changed for convenience but are expressed differently, should be understood equivalently.

[0062] Unless otherwise expressly stated, the various embodiments disclosed in this application can be combined in a cross-cutting manner to flexibly construct new embodiments, as long as such combination does not depart from the inventive spirit of this application and can meet the needs of the prior art or solve a certain deficiency in the prior art. Those skilled in the art should be aware of such modifications.

[0063] Please see Figure 1 In one embodiment of the field tobacco leaf segmentation method of this application, the method includes:

[0064] Step S10: Obtain an image of the tobacco field environment containing the target tobacco leaves;

[0065] Please see Figure 2 The depth camera in the field tobacco leaf segmentation hardware system can acquire images of the tobacco field environment containing the target tobacco leaves. The field tobacco leaf segmentation hardware system mainly consists of a power supply, a depth camera, a Jetson Orin Nano development board, and a display screen. The workflow of the field tobacco leaf segmentation hardware system includes: the depth camera acquires tobacco field environment images in real time → transmits them to the Jetson Orin Nano development board → the Jetson Orin Nano development board runs the second tobacco leaf segmentation model of this application to complete pixel-level inference → removes the field background and extracts the foreground area of ​​the target tobacco leaf → the segmentation results are transmitted to the display screen in real time for output, providing accurate positioning basis for the intelligent harvester.

[0066] In some embodiments, original environmental images of tobacco fields containing target fresh tobacco leaves are collected at real, complex tobacco field operation sites. To ensure the model's adaptability to real-world complex field conditions, the acquisition process actively covers typical challenging disturbance elements in tobacco fields, including:

[0067] (1) The local shading phenomenon of tobacco leaves and branches covering each other and overlapping leaves;

[0068] (2) Dynamic changes in outdoor field lighting, including various lighting scenarios such as uneven lighting, strong light exposure, and low light on cloudy days;

[0069] (3) Irregular leaf morphology such as wrinkling and curling of tobacco leaves caused by pests, diseases and drought;

[0070] (4) Structural damage to tobacco leaves caused by natural field damage, pests and diseases, and mechanical abrasion.

[0071] The collected image samples fully restore the real and complex background of the tobacco planting area, and completely preserve the original field environment features such as soil, weeds, stems, and shadows. It not only realistically replicates the original tobacco field environment, but also improves the segmentation robustness of subsequent models in real field operation scenarios from the source through the sample foundation of high environmental diversity.

[0072] In a further embodiment, the Labelme annotation tool is used to perform pixel-level semantic precision annotation on the target tobacco leaf region in the image, generate a standardized JSON tag file and a tobacco leaf mask segmentation map, and complete the precise pixel-level differentiation between the tobacco leaf foreground and the field background.

[0073] To further enhance the model's generalized segmentation capabilities across different lighting conditions, land parcels, and devices, a multi-dimensional data augmentation strategy is constructed to enrich the data content of the collected samples. This strategy includes various augmentation operations, such as:

[0074] (1) Brightness adjustment: The adjustment coefficient range is β∈[0.7,1.3];

[0075] (2) Contrast adjustment: Gain coefficient γ∈[0.7,1.3];

[0076] (3) Image flipping: Randomly flip the image horizontally or vertically;

[0077] (4) Noise injection: Salt and pepper noise injection (noise density η=0.02); (5) Image rotation: Random rotation at multiple angles of 45°, 90°, 135°, and 180°.

[0078] Finally, while avoiding geometric distortion of the images, all tobacco field environment images were uniformly scaled to a size of 512×512 pixels to complete the model input format alignment, providing high-quality standardized input data for subsequent model training and accurate pixel segmentation.

[0079] Step S20: Update the encoder of the first tobacco leaf segmentation model to a ResNet50 residual structure. In the decoder of the first tobacco leaf segmentation model, except for the first upsampling layer, a DC-ECA module is embedded before the upsampling operation is performed in each of the other upsampling layers to construct the second tobacco leaf segmentation model. The ResNet50 residual structure includes IdentityBlock and Conv Block. The DC-ECA module includes a 3×3 convolutional layer, a channel attention mechanism module, and a 3×3 convolutional layer connected in sequence.

[0080] Please see Figure 3 After acquiring an image of the tobacco field environment containing the target tobacco leaves, the encoder of the first tobacco leaf segmentation model is updated to a ResNet50 residual structure. In the decoder of the first tobacco leaf segmentation model, except for the first upsampling layer, a DC-ECA module is embedded before the upsampling operation is performed in each of the other upsampling layers to construct the second tobacco leaf segmentation model. The ResNet50 residual structure includes an Identity Block and a Conv Block. The DC-ECA module includes a 3×3 convolutional layer, a channel attention mechanism module, and a 3×3 convolutional layer connected in sequence. The basic network architecture of the first tobacco leaf segmentation model is the original UNet network, and the basic network architecture of the second tobacco leaf segmentation model is an improved UNet network.

[0081] Please see Figure 4The second tobacco leaf segmentation model in this application uses a ResNet50 residual structure to construct the encoder, transforming the learning objective into a residual mapping, which significantly reduces the training difficulty of deep networks. The stacking strategy of the residual modules fully exploits the multi-scale texture features of tobacco leaf images through a multi-level feature reuse mechanism, enhancing the model's ability to deeply represent details such as leaf texture and edges, thereby constructing an encoder that meets the requirements of high-precision tobacco leaf segmentation.

[0082] To address the complex textures and fine leaf edges in tobacco field environmental images, this embodiment decomposes the residual structure into two basic core modules for customized construction: Conv Block (CB) and Identity Block (IB). The Conv Block comprises a first 1×1 convolutional layer, a first batch normalization layer, a first activation layer, a first 3×3 convolutional layer, a second batch normalization layer, a second activation layer, a second 1×1 convolutional layer, and a third batch normalization layer, all connected in series. The Conv Block also includes a first skip connection path, which comprises a third 1×1 convolutional layer and a fourth batch normalization layer, connected in series. The output of the first skip connection path is connected to the output of the third batch normalization layer, and the output is obtained by element-wise addition followed by output through the third activation layer.

[0083] Specifically, the Conv Block, as the core unit of feature encoding, has an internal data flow that includes main path feature transformation and dimensionality adaptation of the first skip connection path. The input features are sequentially passed through the first 1×1 convolutional layer for channel dimensionality reduction, and then through the first batch normalization layer and the first activation layer to complete feature normalization and non-linear activation. Next, the local spatial texture features are extracted through the first 3×3 convolutional layer, and then the feature distribution is further optimized through the second batch normalization layer and the second activation layer. Finally, the channel dimension is increased through the second 1×1 convolutional layer, and the deep features of the main path are output through the third batch normalization layer.

[0084] Furthermore, to match the channel dimension changes of the main path, the input features are directly transformed through the third 1×1 convolutional layer, and then the dimension is adapted through the fourth batch normalization layer to obtain the first skip connection features.

[0085] Furthermore, the first skip connection features are added element-wise to the features output by the third batch normalization layer of the main path, and then the nonlinear mapping is completed through the third activation layer, finally outputting the tobacco leaf enhancement features after Conv Block processing.

[0086] In a further embodiment, the Identity Block comprises a fourth 1×1 convolutional layer, a fifth batch normalization layer, a fourth activation layer, a second 3×3 convolutional layer, a sixth batch normalization layer, a fifth activation layer, a fifth 1×1 convolutional layer, and a seventh batch normalization layer, all connected in series. The Identity Block also includes a second skip connection path, the output of which is connected to the output of the seventh batch normalization layer, and the output is obtained by adding elements one by one through the sixth activation layer.

[0087] Specifically, the Identity Block inherits the tobacco leaf enhancement features output by the Conv Block and further deepens the multi-scale feature fusion. Its internal data flow includes: the input tobacco leaf enhancement features are sequentially passed through the fourth 1×1 convolutional layer for channel dimensionality reduction, and then normalized and activated by the fifth batch normalization layer and the fourth activation layer; then, the fine leaf edge features are captured by the second 3×3 convolutional layer, and optimized by the sixth batch normalization layer and the fifth activation layer; finally, the channel dimension is restored by the fifth 1×1 convolutional layer, and the main path fusion features are output by the seventh batch normalization layer.

[0088] Furthermore, to preserve the integrity of the original shallow input features, the input features are directly passed through the second hop connection path without transformation to obtain the second hop connection features.

[0089] Furthermore, the second skip connection feature is added element-wise to the feature output by the main path through the seventh batch normalization layer, and then the deep coding feature of tobacco leaf with multi-scale information is output through the sixth activation layer and sent to the decoder for subsequent processing.

[0090] The aforementioned Conv Block and Identity Block achieve cross-layer fusion of shallow detailed features (such as leaf texture and edges) and deep semantic features (such as the overall morphology of tobacco leaves and diseased areas) through a skip connection mechanism. This skip connection mechanism not only effectively alleviates the gradient vanishing problem in deep network training, but also fully explores the multi-scale features of tobacco leaves in the tobacco field environment through a multi-level feature reuse mechanism, laying a high-quality feature foundation for subsequent pixel-level accurate segmentation.

[0091] In some embodiments, in the decoder of the first tobacco leaf segmentation model, except for the first upsampling layer, a DC-ECA module is embedded before the upsampling operation is performed on each of the other upsampling layers. The DC-ECA module includes a convolutional layer with a kernel size of 3×3, a channel attention mechanism (ECA) module, and a convolutional layer with a kernel size of 3×3 connected in sequence.

[0092] Specifically, please refer to Figure 5 and Figure 6The original structure of the first upsampling layer of the decoder of the first tobacco leaf segmentation model is retained unchanged. For all other upsampling layers except the first upsampling layer, a DC-ECA module is pre-embedded to perform pre-optimization processing on the current features before performing the upsampling operation.

[0093] The DC-ECA module adopts a dedicated cascaded structure of "convolution-attention-convolution". Internally, it consists of a series of 3×3 convolutional layers, a channel attention mechanism (ECA) module, and another 3×3 convolutional layer connected in series. Its data processing flow is as follows:

[0094] Step S201: The decoded features to be processed before upsampling first enter the first 3×3 convolutional layer to complete the local receptive field expansion and enhance the ability to extract local contextual features such as tobacco leaf edges and texture details.

[0095] Step S202: Then, the signal is sent to the intermediate channel attention mechanism module to recalibrate the weights of all feature channels: actively enhance the key channel features that are strongly correlated with the target tobacco leaf foreground, while suppressing redundant interference from channels unrelated to complex field backgrounds such as soil, weeds, and shadows.

[0096] Step S203: The calibrated features are then passed through a second 3×3 convolutional layer to complete the deep fusion and refined nonlinear mapping of the attention-weighted features, resulting in optimized tobacco leaf decoding features.

[0097] Step S204: The features enhanced and optimized by the DC-ECA module are then sent to the corresponding upsampling layer to perform upsampling decoding.

[0098] By embedding DC-ECA modules in the decoder of the first tobacco leaf segmentation model before performing upsampling operations in each of the other upsampling layers (except the first upsampling layer), the defects of premature introduction of attention in the first upsampling layer leading to the loss of global details in the lower layers are avoided. Furthermore, the subsequent multi-layer DC-ECA attention support continuously filters and retains effective tobacco leaf features during the decoding stage, weakening the noise of complex field environments. The synergistic cooperation of two 3×3 convolutions and channel attention significantly improves the model's robustness in segmenting scenes with occlusion, uneven lighting, and damaged tobacco leaves, ultimately ensuring accurate segmentation output of the foreground region of tobacco leaves at the pixel level.

[0099] In some embodiments, the step of training the second tobacco leaf segmentation model includes:

[0100] Step S2001: The second tobacco leaf segmentation model is trained using a two-stage compound learning rate adjustment strategy. During the warm-up phase of model training, the learning rate is dynamically increased in a non-linear quadratic function manner.

[0101] Step S2002: In the annealing stage of model training, the cosine annealing algorithm is used to adaptively decay the learning rate, and the training is iteratively performed according to the two-stage learning rate update rule until the second tobacco leaf segmentation model reaches the convergence state.

[0102] Specifically, the second tobacco leaf segmentation model of this application employs a two-stage composite learning rate adjustment strategy for training. Dynamically adjusting the learning rate ensures that the model optimization process smoothly converges to the global optimum. The two-stage collaborative mechanism optimizes the training process, including:

[0103] The warm-up phase ensures the stability of initial parameter updates and effectively suppresses oscillations caused by gradient mutations. The annealing phase achieves adaptive decay of the learning rate through dynamic learning rate adjustment, which helps the model escape local optima and converge to the global optimum. It also maintains the minimum learning rate at the end of training to help the model converge stably, thereby improving the training efficiency and segmentation accuracy of the tobacco leaf segmentation model.

[0104] During the warm-up phase of model training, the learning rate is dynamically increased using a quadratic function with non-linear growth. The formula for calculating the non-linearly increasing learning rate is as follows:

[0105]

[0106] in, Indicates the preheating stage Batch learning rate; This represents the initial learning rate during the warm-up phase; This indicates the maximum learning rate during the warm-up phase; Indicates the total batch size during the preheating phase; This indicates the batches that have completed the preheating phase. The total number of batches in the preheating phase of this application. It can be 3.

[0107] The learning rate is dynamically increased by using a quadratic function with non-linear growth. This achieves a smooth transition of the learning rate compared to the traditional linear growth method. While ensuring the stability of the gradient magnitude in the early stage of training, it can accelerate the warm-up process in the later stage and effectively alleviate the oscillation phenomenon in the parameter update process.

[0108] After the further warm-up phase, the learning rate is dynamically adjusted using the cosine annealing algorithm. The cosine annealing algorithm adaptively decays the learning rate, and its calculation formula is expressed as:

[0109]

[0110] In the formula, For the annealing stage Batch learning rate; and These represent the maximum and minimum learning rates during the annealing phase; For the total batch during the annealing stage, This refers to the batches that have completed the annealing stage. The total number of batches in the annealing stage of this application. It can be 92.

[0111] The cosine annealing algorithm is used to adaptively decay the learning rate. In the early stage of training, the learning rate decays rapidly to approach the local optimum of the model. Then, by slowing down the decay rate, the parameter space is refined for search. This helps the model get rid of local extremum constraints and converge to the global optimum.

[0112] Once the second tobacco leaf segmentation model has been trained to convergence, it can be put into production use, enabling it to segment the foreground region of the target tobacco leaf from the tobacco field environment image.

[0113] Step S30: Input the tobacco field environment image into the second tobacco leaf segmentation model that has been trained to convergence, perform pixel-level segmentation through model inference, and output the foreground region of the target tobacco leaf in the tobacco field environment image to complete the segmentation of tobacco leaves in the field.

[0114] Please see Figure 7 The encoder of the first tobacco leaf segmentation model is updated to a ResNet50 residual structure. In the decoder of the first tobacco leaf segmentation model, except for the first upsampling layer, a DC-ECA module is embedded before each upsampling operation. After constructing the second tobacco leaf segmentation model, the tobacco field environment image is input into the second tobacco leaf segmentation model, which has been trained to convergence. Pixel-level segmentation is performed through model inference to output the foreground region of the target tobacco leaf in the tobacco field environment image, thus completing the segmentation of the tobacco leaves in the field. The foreground region of the tobacco leaf represents an image region that is precisely separated pixel-level from the complex tobacco field environment image, containing only the target tobacco leaf and excluding soil, weeds, tobacco stalks, and other field background. The target tobacco leaf includes partially occluded tobacco leaves, tobacco leaves exposed to strong light, wrinkled tobacco leaves, and structurally damaged tobacco leaves.

[0115] In some embodiments, the step of inputting the tobacco field environment image into a second tobacco leaf segmentation model that has been trained to a convergent state, performing pixel-level segmentation through model inference, and outputting the foreground region of the target tobacco leaf in the tobacco field environment image includes:

[0116] Step S301: Input the tobacco field environment image into the second tobacco leaf segmentation model that has been trained to convergence. When the image features in the tobacco field environment image are processed by the Conv Block, they are sequentially transformed through the first 1×1 convolutional layer, the first batch normalization layer, the first activation layer, the first 3×3 convolutional layer, the second batch normalization layer, the second activation layer, the second 1×1 convolutional layer, and the third batch normalization layer. At the same time, the input features are adjusted in dimension and channel by the third 1×1 convolutional layer and the fourth batch normalization layer in the first skip connection path. The output features of the first skip connection path and the output features of the third batch normalization layer are added element by element, and then the tobacco leaf enhancement features are output through the third activation layer.

[0117] Step S302: The enhanced tobacco leaf features are input into the Identity Block for processing. The main path features are extracted sequentially through the fourth 1×1 convolutional layer, the fifth batch normalization layer, the fourth activation layer, the second 3×3 convolutional layer, the sixth batch normalization layer, the fifth activation layer, the fifth 1×1 convolutional layer, and the seventh batch normalization layer. At the same time, the input features are directly transmitted through the second skip connection path. The output features of the second skip connection path are added element by element to the output features of the seventh batch normalization layer, and then the deep coding features of the tobacco leaf are output through the sixth activation layer.

[0118] Step S303: Based on the deep coding features of the tobacco leaves, pixel-level segmentation is performed through subsequent model inference to output the foreground region of the tobacco leaves in the tobacco field environment image.

[0119] As can be seen from steps S301 to S303 above, this application extracts and fuses features of tobacco field environment images through the residual structure of Conv Block and Identity Block, which can fully explore the multi-scale detailed features of tobacco leaves, alleviate the gradient vanishing problem of deep networks, and retain leaf edge information through skip connections, thereby outputting tobacco leaf deep coding features with stronger representation capabilities, effectively improving the accuracy, completeness and robustness of pixel-level segmentation of tobacco leaves in complex field environments, and finally accurately outputting the foreground area of ​​tobacco leaves.

[0120] In a further embodiment, the step of inputting the tobacco field environment image into a second tobacco leaf segmentation model that has been trained to a convergent state, performing pixel-level segmentation through model inference, and outputting the foreground region of the target tobacco leaf in the tobacco field environment image includes:

[0121] Step S3001: Obtain the deep coding features of tobacco leaves output by the encoder of the second tobacco leaf segmentation model, and enter the decoder for upsampling decoding processing;

[0122] Step S3002: Before performing upsampling operations on each upsampling layer except the first upsampling layer, the current feature map is processed by the DC-ECA module. First, the local feature receptive field is enhanced by a convolutional layer with a kernel size of 3×3. Then, the key channel features related to the target tobacco leaf are learned and strengthened by the channel attention mechanism module. Subsequently, the attention-weighted features are refined and fused by a convolutional layer with a kernel size of 3×3 to obtain the optimized tobacco leaf decoding features.

[0123] Step S3003: Based on the optimized tobacco leaf decoding features, perform upsampling and pixel-level classification inference to output the foreground region of tobacco leaves in the tobacco field environment image.

[0124] As can be seen from steps S3001 to S3003 above, this application embeds a DC-ECA module before upsampling the decoder to enhance the local receptive field, channel attention weighting, and feature refinement fusion of tobacco leaf features. This can effectively suppress field background interference, enhance tobacco leaf edge and texture details, and improve the quality of decoded features, thereby achieving high-precision and robust pixel-level segmentation of tobacco leaves in complex tobacco field environments.

[0125] As can be seen from the above embodiments, compared with the prior art, this application addresses the problems of insufficient robustness in complex scenarios, limited fine-grained feature extraction capability, and difficulty in balancing accuracy and real-time performance in existing field tobacco leaf segmentation models. This application has, but is not limited to, the following beneficial effects:

[0126] Firstly, the improved UNet network in this application employs a ResNet50 residual structure to construct the encoder, enabling deep extraction of multi-scale texture and edge detail features of tobacco leaves, thus alleviating the gradient vanishing problem in deep networks. Simultaneously, a DC-ECA attention module is embedded in the decoder to effectively suppress redundant interference from the field background, enhance the expression of tobacco leaf target features, significantly improve the accuracy of tobacco leaf edge segmentation, reduce missegmentation and missed segmentation, and achieve pixel-level high-precision segmentation. The proposed method achieves an average intersection-union ratio (mIoU) of 99.02% for tobacco leaf segmentation in complex tobacco field environments, a 7.25% improvement over the original UNet network. Furthermore, the improved UNet network exhibits strong environmental robustness, significantly outperforming other semantic segmentation models in complex environments such as local occlusion and overexposure, and achieves an inference speed of 56.53 FPS.

[0127] Secondly, the improved UNet network in this application adopts a network structure that combines a ResNet50 residual encoder with a DC-ECA attention decoder, which significantly improves the robustness, fine-grained feature extraction capability and segmentation accuracy of tobacco leaf segmentation in the field. At the same time, it takes into account the real-time performance of inference, and can achieve accurate pixel-level segmentation of tobacco leaves with various shapes such as shading, abnormal lighting, wrinkling and damage in complex field environments. It can efficiently adapt to the actual operation requirements of intelligent tobacco harvesting equipment.

[0128] Thirdly, the improved UNet network in this application adopts a two-stage composite learning rate adjustment strategy. In the warm-up stage, the parameters are smoothly updated through nonlinear quadratic function growth, which suppresses gradient oscillations in the early stage of training. In the annealing stage, the learning rate is adaptively decayed through cosine annealing algorithm, which helps the model escape local optima and converge to the global optimum, balancing training efficiency and model performance, shortening the training cycle and improving the model segmentation accuracy.

[0129] Fourth, the improved UNet network in this application can be lightweightly deployed on the Jetson Orin Nano embedded development board. The accompanying hardware system consists of a power supply, a depth camera, the Jetson Orin Nano embedded development board, and a display screen. This enables real-time acquisition, real-time inference, and visualization of segmentation results of fresh tobacco leaf images in the field. It can be directly adapted to intelligent tobacco harvesters, providing precise support for end-effector positioning and operation planning, and can be quickly transformed into practical operational capabilities.

[0130] Please see Figure 8 A field tobacco leaf segmentation device provided for one of the purposes of this application includes a tobacco field image acquisition module 1100, a segmentation model construction module 1200, and a field tobacco leaf segmentation module 1300. The tobacco field image acquisition module 1100 is configured to acquire an image of the tobacco field environment containing the target tobacco leaves; the segmentation model construction module 1200 is configured to update the encoder of the first tobacco leaf segmentation model to a ResNet50 residual structure, and embed a DC-ECA module in the decoder of the first tobacco leaf segmentation model before performing upsampling operations in each of the other upsampling layers, except for the first upsampling layer, to construct a second tobacco leaf segmentation model. The ResNet50 residual structure includes an Identity Block and a ConvBlock, and the DC-ECA module includes a 3×3 convolutional layer, a channel attention mechanism module, and a 3×3 convolutional layer connected in sequence; the field tobacco leaf segmentation module 1300 is configured to input the tobacco field environment image into the second tobacco leaf segmentation model that has been trained to convergence, perform pixel-level segmentation through model inference, and output the foreground region of the target tobacco leaf in the tobacco field environment image to complete the segmentation of the field tobacco leaves.

[0131] Based on any embodiment of this application, please refer to Figure 9 Another embodiment of this application also provides an electronic device, which can be implemented by a computer device, such as... Figure 9 The diagram shows the internal structure of a computer device. The computer device includes a processor, a computer-readable storage medium, a memory, and a network interface connected via a system bus. The computer-readable storage medium stores an operating system, a database, and computer-readable instructions. The database may store a sequence of control information. When the computer-readable instructions are executed by the processor, they enable the processor to implement a method for segmenting tobacco leaves in a field. The processor of the computer device provides computational and control capabilities, supporting the operation of the entire computer device. The memory of the computer device may store computer-readable instructions. When these computer-readable instructions are executed by the processor, they enable the processor to execute the method for segmenting tobacco leaves in a field as described in this application. The network interface of the computer device is used for communication with a terminal. Those skilled in the art will understand that… Figure 9 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0132] In this embodiment, the processor is used to execute... Figure 9 The specific functions of each module are defined within the device, and the memory stores the program code and various data required to execute these modules. A network interface is used for data transmission between the user terminal and the server. In this embodiment, the memory stores the program code and data required to execute all modules in the field tobacco leaf splitting device of this application, and the server can call the server's program code and data to execute the functions of all modules.

[0133] This application also provides a storage medium storing computer-readable instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps of the field tobacco leaf splitting method described in any embodiment of this application.

[0134] This application also provides a computer program product, including a computer program / instructions that, when executed by one or more processors, implement the steps of the field tobacco leaf splitting method described in any embodiment of this application.

[0135] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments of this application can be implemented by a computer program instructing related hardware. This computer program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the methods described above. The aforementioned storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.

[0136] The above description is only a partial embodiment of this application. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of this application, and these improvements and modifications should also be considered within the scope of protection of this application.

Claims

1. A method for dividing tobacco leaves in the field, characterized in that, include: Acquire an image of the tobacco field environment containing the target tobacco leaves; The encoder of the first tobacco leaf segmentation model is updated to a ResNet50 residual structure. In the decoder of the first tobacco leaf segmentation model, except for the first upsampling layer, a DC-ECA module is embedded before the upsampling operation is performed in each of the other upsampling layers to construct the second tobacco leaf segmentation model. The ResNet50 residual structure includes an Identity Block and a Conv Block. The DC-ECA module includes a 3×3 convolutional layer, a channel attention mechanism module, and a 3×3 convolutional layer connected in sequence. The tobacco field environment image is input into the second tobacco leaf segmentation model that has been trained to convergence. The model performs pixel-level segmentation through inference to output the foreground region of the target tobacco leaf in the tobacco field environment image, thereby completing the segmentation of tobacco leaves in the field.

2. The field tobacco leaf splitting method according to claim 1, characterized in that, The Conv Block comprises a first 1×1 convolutional layer, a first batch normalization layer, a first activation layer, a first 3×3 convolutional layer, a second batch normalization layer, a second activation layer, a second 1×1 convolutional layer, and a third batch normalization layer, all connected in series. The Conv Block also includes a first skip connection path, which comprises a third 1×1 convolutional layer and a fourth batch normalization layer, all connected in series. The output of the first skip connection path is connected to the output of the third batch normalization layer, and the elements are added together element-wise before being output through the third activation layer. The Identity Block comprises a fourth 1×1 convolutional layer, a fifth batch normalization layer, a fourth activation layer, a second 3×3 convolutional layer, a sixth batch normalization layer, a fifth activation layer, a fifth 1×1 convolutional layer, and a seventh batch normalization layer, all connected in series. The Identity Block also includes a second skip connection path, the output of which is connected to the output of the seventh batch normalization layer, and the output is obtained by adding elements one by one through the sixth activation layer.

3. The field tobacco leaf splitting method according to claim 1, characterized in that, The steps of inputting the tobacco field environment image into a second tobacco leaf segmentation model that has been trained to convergence, performing pixel-level segmentation through model inference, and outputting the foreground region of the target tobacco leaf in the tobacco field environment image include: The tobacco field environment image is input into the second tobacco leaf segmentation model that has been trained to convergence. When the image features in the tobacco field environment image are processed by the Conv Block, they are sequentially transformed through the first 1×1 convolutional layer, the first batch normalization layer, the first activation layer, the first 3×3 convolutional layer, the second batch normalization layer, the second activation layer, the second 1×1 convolutional layer, and the third batch normalization layer. At the same time, the input features are adjusted in dimension and channel by the third 1×1 convolutional layer and the fourth batch normalization layer in the first skip connection path. The output features of the first skip connection path and the output features of the third batch normalization layer are added element by element, and then the tobacco leaf enhancement features are output through the third activation layer. The enhanced tobacco leaf features are input into the Identity Block for processing. The main path features are extracted sequentially through the fourth 1×1 convolutional layer, the fifth batch normalization layer, the fourth activation layer, the second 3×3 convolutional layer, the sixth batch normalization layer, the fifth activation layer, the fifth 1×1 convolutional layer, and the seventh batch normalization layer. At the same time, the input features are directly transmitted through the second skip connection path. The output features of the second skip connection path are added element by element to the output features of the seventh batch normalization layer, and then the deep coding features of the tobacco leaf are output through the sixth activation layer. Based on the deep coding features of the tobacco leaves, pixel-level segmentation is performed through subsequent model inference to output the foreground region of the tobacco leaves in the tobacco field environment image.

4. The field tobacco leaf splitting method according to claim 1, characterized in that, The steps of inputting the tobacco field environment image into a second tobacco leaf segmentation model that has been trained to convergence, performing pixel-level segmentation through model inference, and outputting the foreground region of the target tobacco leaf in the tobacco field environment image include: The deep coding features of tobacco leaves output by the encoder of the second tobacco leaf segmentation model are obtained and then entered into the decoder for upsampling decoding processing. Before performing upsampling operations in each upsampling layer except the first upsampling layer, the current feature map is processed by the DC-ECA module. First, the local feature receptive field is enhanced by a convolutional layer with a kernel size of 3×3. Then, the key channel features related to the target tobacco leaf are learned and strengthened by the channel attention mechanism module. Finally, the attention-weighted features are refined and fused by a convolutional layer with a kernel size of 3×3 to obtain the optimized tobacco leaf decoding features. Based on the optimized tobacco leaf decoding features, upsampling and pixel-level classification inference are performed to output the foreground region of tobacco leaves in the tobacco field environment image.

5. The field tobacco leaf splitting method according to claim 1, characterized in that, The steps for training the second tobacco leaf segmentation model include: The second tobacco leaf segmentation model is trained using a two-stage compound learning rate adjustment strategy. During the warm-up phase of model training, the learning rate is dynamically increased in a non-linear quadratic function manner. During the annealing phase of model training, the cosine annealing algorithm is used to adaptively decay the learning rate, and the training is iteratively performed according to the two-stage learning rate update rule until the second tobacco leaf segmentation model reaches convergence.

6. The field tobacco leaf splitting method according to any one of claims 1 to 5, characterized in that, The basic network architecture of the first tobacco leaf segmentation model is the original UNet network, while the basic network architecture of the second tobacco leaf segmentation model is the improved UNet network.

7. The field tobacco leaf splitting method according to any one of claims 1 to 5, characterized in that, The foreground region of the tobacco leaf refers to an image region that is precisely separated at the pixel level from a complex tobacco field environment image, containing only the target tobacco leaf and excluding soil, weeds, tobacco stalks, and other field background.

8. A field tobacco leaf splitting device, characterized in that, include: The tobacco field image acquisition module is configured to acquire images of the tobacco field environment containing the target tobacco leaves; The segmentation model construction module is configured to update the encoder of the first tobacco leaf segmentation model to a ResNet50 residual structure. In the decoder of the first tobacco leaf segmentation model, except for the first upsampling layer, a DC-ECA module is embedded before the upsampling operation is performed in each of the other upsampling layers to construct the second tobacco leaf segmentation model. The ResNet50 residual structure includes an Identity Block and a Conv Block. The DC-ECA module includes a 3×3 convolutional layer, a channel attention mechanism module, and a 3×3 convolutional layer connected in sequence. The field tobacco leaf segmentation module is configured to input the tobacco field environment image into a second tobacco leaf segmentation model that has been trained to a convergent state, perform pixel-level segmentation through model inference, and output the foreground region of the target tobacco leaf in the tobacco field environment image to complete the segmentation of the field tobacco leaves.

9. An electronic device comprising a central processing unit and a memory, characterized in that, The central processing unit is used to invoke and run a computer program stored in the memory to perform the steps of the method as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, It stores, in the form of computer-readable instructions, a computer program implemented according to any one of claims 1 to 7, which, when invoked by a computer, executes the steps included in the corresponding method.