A Joint Excitation Upsampling Network Calculation Method and Detection System for Cigarette Packet Ash Crack Segmentation
By employing a joint excitation upsampling network computation method, combined with the JEUNet encoder-decoder structure and joint excitation upsampling module, the problem of low tear segmentation accuracy in traditional detection methods is solved, achieving efficient and accurate tear region segmentation, which is suitable for cigarette detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA NAT TOBACCO QUALITY SUPERVISION & TEST CENT
- Filing Date
- 2024-12-20
- Publication Date
- 2026-06-30
AI Technical Summary
Traditional methods for detecting the quality of cigarette ash packaging are difficult to accurately detect the proportion of cracked areas, and existing image segmentation algorithms struggle to accurately extract cracked regions in complex backgrounds.
A joint excitation upsampling network computation method is adopted, which combines the JEUNet encoder-decoder structure and the joint excitation upsampling module to enhance the segmentation accuracy of the crack region through multi-level feature extraction and upsampling.
It improves the accuracy and efficiency of cigarette pack ash crack segmentation, reduces computational complexity, enhances the robustness and generalization ability of the model, and is suitable for industrial applications.
Smart Images

Figure CN119723293B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of cigarette detection technology, and in particular to a joint excitation upsampling network calculation method and detection system for dividing ash cracks in cigarette packs. Background Technology
[0002] Two important parameters for the quality of cigarette ash are the shrinkage of the ash column and the proportion of ash cracks. The shrinkage of the ash column mainly reflects the tobacco filling rate; cigarettes with a high filling rate have smaller internal pores, resulting in less ash column shrinkage after combustion. The formation of ash cracks is caused by a combination of factors, including the performance of the cigarette paper wrapping, the tobacco formula, the filling performance, and the thermal properties of combustion. The proportion of cracks in the ash column after combustion is positively correlated with the amount of ash flakes that fall during smoking. Scattered ash flakes not only cause environmental pollution but also affect the smoking experience for consumers.
[0003] Traditional methods for detecting the quality of cigarette ash typically rely on manual inspection or simple visual image processing techniques. While detecting the degree of ash column shrinkage is relatively easy with visual image detection methods, accurately detecting the proportion of the tear area remains a challenging problem. This is because, on the one hand, cigarette paper comes in a variety of colors and textures, resulting in a complex surface morphology after combustion; on the other hand, incomplete combustion of cigarettes leads to significant differences in the color of the ash column (varies between yellow, black, gray, and white), and the contrast between the tear and the background varies at different locations, making it difficult for conventional image segmentation algorithms to accurately extract the tear area.
[0004] In view of this, based on years of experience in production and design in this and related fields, the inventor has designed a joint excitation upsampling network calculation method and detection system for cigarette pack ash crack segmentation through repeated experiments, in order to solve the problems existing in the prior art. Summary of the Invention
[0005] The purpose of this invention is to provide a joint excitation upsampling network calculation method and detection system for segmenting ash cracks in cigarette packs, which can effectively improve the segmentation accuracy of ash cracks in cigarette packs.
[0006] To achieve the above-mentioned objectives, this invention proposes a joint excitation upsampling network calculation method and detection system for segmenting ash cracks in cigarette packs. The joint excitation upsampling network calculation method includes:
[0007] Acquire images of cracks during the combustion process of cigarettes and preprocess the crack images;
[0008] An encoder is used to extract features from the preprocessed crack image at multiple levels. Feature maps of different scales are obtained through four stages of convolution and pooling operations. Downsampling is performed between each two stages using 2×2 max pooling.
[0009] A joint excitation upsampling module is introduced into the decoder, which includes a spatial attention mechanism and a joint excitation channel mechanism.
[0010] The joint excitation upsampling module utilizes multi-level feature maps for joint upsampling, upsampling feature maps of different scales to the same size and performing cascade processing to generate feature maps containing multi-scale information.
[0011] This invention also proposes a detection system for ash cracks in cigarette packs, characterized in that the detection system implements the above-mentioned method, including:
[0012] The image acquisition module is used to acquire real-time image data of cracks during the combustion process of cigarettes;
[0013] The data preprocessing module is used to perform noise removal, grayscale calibration, and contrast enhancement on image data;
[0014] The segmentation module includes the JEUNet encoder-decoder structure and the joint excitation upsampling module, which combines upsampling operations on multi-level feature maps to achieve accurate segmentation of the crack region;
[0015] The output module is used to generate visual segmentation results and analysis reports of the cracked area.
[0016] The present invention also proposes a computer-readable storage medium having a computer program stored thereon, characterized in that the computer program, when executed by a processor, implements the steps of the above-described calculation method.
[0017] The present invention also proposes a computer device, including a memory and a processor, wherein the memory stores a computer program that can run on the processor, characterized in that the processor executes the computer program to implement the steps of the above-described calculation method.
[0018] Compared with the prior art, the present invention has the following features and advantages:
[0019] This invention proposes a joint excitation upsampling network computation method and detection system for cigarette pack ash tear segmentation. By designing an Encoder-Decoder symmetric structure and introducing a joint excitation upsampling module in the Decoder part, it achieves joint upsampling of multi-scale feature maps and spatial-channel excitation, thereby improving the segmentation accuracy and efficiency of cigarette tears. The proposed joint excitation upsampling network computation method enhances the segmentation accuracy of tear regions through spatial attention and channel excitation mechanisms, while reducing computational complexity and improving algorithm efficiency. Furthermore, it employs a Binary Cross-Entropy loss function for model optimization and uses a 10-fold Cross Validation method to improve the model's robustness and generalization ability. While ensuring high-precision tear segmentation results, it significantly reduces computational resource requirements and improves the multi-scale perception capability and segmentation accuracy of the segmentation model, providing an efficient and accurate solution for cigarette tear detection with broad industrial application prospects. Attached Figure Description
[0020] The accompanying drawings described herein are for illustrative purposes only and are not intended to limit the scope of the invention in any way. Furthermore, the shapes and proportions of the components in the drawings are merely illustrative to aid in understanding the invention and do not specifically limit the shapes and proportions of the components. Those skilled in the art, guided by the teachings of this invention, can select various possible shapes and proportions to implement the invention according to specific circumstances.
[0021] Figure 1 This is a schematic diagram of the JEUNet network structure of the present invention;
[0022] Figure 2 This is a schematic diagram of the Joint Excitation Upsampling (JEU) module structure of the present invention;
[0023] Figure 3 This diagram illustrates the comparison of the accuracy and computational complexity of JEUNet from this invention with other advanced segmentation algorithms.
[0024] Figure 4 This is a diagram showing the cigarette pack ash crack, ground truth, and a comparison of the segmentation effects of multiple methods according to the present invention.
[0025] Figure 5 This is a schematic diagram illustrating the steps of the joint excitation upsampling network calculation method for cigarette pack ash crack segmentation according to the present invention. Detailed Implementation
[0026] The details of the present invention can be more clearly understood by referring to the accompanying drawings and the description of specific embodiments. However, the specific embodiments of the present invention described herein are for illustrative purposes only and should not be construed as limiting the invention in any way. Under the teachings of this invention, those skilled in the art can conceive of any possible modifications based on the invention, and these should all be considered to fall within the scope of the invention.
[0027] This invention proposes a joint excitation upsampling network calculation method for cigarette pack ash tear segmentation, such as... Figure 5 As shown, the joint excitation upsampling network calculation method includes:
[0028] Acquire images of cracks during the combustion process of cigarettes and preprocess the crack images;
[0029] An encoder is used to extract features from the preprocessed crack image at multiple levels. Feature maps of different scales are obtained through four stages of convolution and pooling operations. Downsampling is performed between each two stages using 2×2 max pooling.
[0030] A joint excitation upsampling module is introduced into the decoder, which includes a spatial attention mechanism and a joint excitation channel mechanism.
[0031] The joint excitation upsampling module utilizes multi-level feature maps for joint upsampling, upsampling feature maps of different scales to the same size and performing cascade processing to generate feature maps containing multi-scale information.
[0032] This invention proposes a joint-excitation upsampling network computation method for cigarette pack ash tear segmentation. First, a preprocessing step ensures image quality, providing high-quality input for feature extraction. Second, the encoder's multi-level feature extraction capability enables the model to capture multi-scale information from shallow to deep layers, which is crucial for accurate tear region identification. The spatial attention mechanism and joint-excitation channel mechanism of the joint-excitation upsampling module further enhance the model's tear region identification ability. By adaptively recalibrating the feature responses of the channel directions and modeling the interdependencies between different channels, a refined response to tear region features is achieved. Finally, the joint upsampling and cascaded processing of multi-level feature maps generates feature maps containing multi-scale information. This not only improves the multi-scale perception capability of the segmentation model but also increases the accuracy and upsampling speed of tear segmentation.
[0033] In an optional embodiment of the present invention, the spatial attention mechanism processes the feature map through max pooling and average pooling operations. The technical effect of the present invention is a significant improvement in the accuracy and efficiency of cigarette pack ash tear segmentation. Through the spatial attention mechanism, the model can more accurately identify tear regions, especially their salience against complex backgrounds. Max pooling and average pooling operations enhance the ability to identify tear boundaries while preserving important spatial information in the feature map, thereby improving segmentation accuracy.
[0034] In an optional embodiment of the present invention, the joint incentive channel mechanism generates channel incentive vectors through global average pooling and adjusts the weights of the feature responses of different channels. By introducing the joint incentive channel mechanism, the computational method of the present invention achieves a technological breakthrough in the cigarette pack ash crack segmentation task. The channel incentive vectors generated by global average pooling enable the model to capture the importance of each channel and strengthen the feature responses through weight adjustment.
[0035] In one optional example of this implementation, multi-level feature map joint upsampling is employed, including: sampling is performed by a 2×2 deconvolution operation to upsample by a factor of 2, and each upsampled feature map is cascaded to generate multi-level feature maps containing different receptive fields, thereby improving the multi-scale perception capability of the segmentation model; after multi-level feature map joint upsampling, the feature map size is adjusted to the original image size through a 1×1 convolution. By adopting the technical solution of multi-level feature map joint upsampling, the segmentation accuracy is improved while maintaining computational efficiency, providing strong technical support for the efficient and accurate detection of ash cracks in cigarette packs.
[0036] Specifically, gray crack detection can be described as a pixel-level anomaly prediction task. The proposed joint-excitation upsampling network JEUNet has a classic encoder-decoder symmetric structure, and its network framework is as follows: Figure 1 As shown. Yellow arrows indicate direct input; black arrows indicate 3×3 convolution operations with a stride of 1, which use a zero-padding strategy to keep feature maps of the same level the same size; gray arrows indicate cut and join operations, cutting the left feature map to the size corresponding to the right before joining; blue arrows indicate 1×1 operations used for final classification, with the last two layers outputting the result and background; red arrows indicate downsampling of the feature map using 2×2 max pooling; green arrows indicate upsampling of the feature map by a factor of 2 using 2×2 deconvolution operations. k represents the number of base channels of the convolutional feature map (in this paper, k = 32).
[0037] In an optional embodiment of the present invention, channel excitation paths are embedded in the multi-level feature maps of stages 2, 3, and 4 of the decoding phase. These channel excitation paths progressively refine the multi-scale crack features and model the dependencies between multiple channel features. By embedding channel excitation paths in the decoding phase, the model's ability to recognize multi-scale crack features is enhanced. Furthermore, by modeling the dependencies between channels, the representation of the feature maps is optimized, resulting in more accurate segmentation. The introduction of channel excitation paths allows the model to adaptively adjust the feature responses of different channels, thereby better capturing the detailed information of the crack region. The progressively refined feature processing and the modeling of dependencies between channels provide strong technical support for the accurate segmentation of cigarette pack ash cracks, improving the robustness and generalization ability of the segmentation model.
[0038] Specifically, the encoder part of JEUNet is consistent with the backbone network of UNet. The original input image is encoded in four stages through consecutive convolutional layers, with 2×max pooling downsampling between each two stages. Therefore, there are a total of five scales, including the original image. The feature maps at each scale contain information with different receptive fields. Shallow feature maps mainly contain detailed texture information of local pixels, while deep feature maps contain local semantic information of the image. In the decoding process of JEUNet, the outputs of stages 2, 3, and 4 are fed into the designed JEU (JointExcitation Up-sampling block) module, which replaces the three-scale convolutional upsampling in the original UNet. The output of the JEU module is upsampled once and concatenated with the output of stage 1 of the encoder. Then, two 3×3 convolution operations are performed in the same layer. Finally, it is upsampled to the original image size and concatenated with the initial convolution result of the original image. After two convolution operations, the prediction result is obtained. In the decoder part, cascading the upsampled feature map with the shallow features in the encoder part can improve the local prediction accuracy of pixels.
[0039] The Joint Activated Upsampling (JEU) module is a computational unit that integrates spatial attention into a joint upsampling network. On one hand, three feature maps with different receptive fields are resized to the same size and concatenated into a joint feature map T. On the other hand, a transformation vector W is constructed to activate each channel in the tensor T. During the upsampling process, the JEU module simultaneously considers the correlation between multi-scale feature information and channel orientation, improving the accuracy of semantic segmentation while reducing the computational complexity of upsampling. Figure 2 The structure diagram of the JEU block is shown. The purple arrow indicates the joint upsampling path, and the red arrow indicates the feature map excitation path.
[0040] This invention, JEUNet, uses joint upsampling to replace the initial two layer-by-layer upsampling steps in the decoder of UNet. Through embedding vector extraction, the multi-level feature maps output from stages 2, 3, and 4 are input into the designed JEU module. The feature maps of stage 4 (w×w) and stage 3 (2w×2w) are upsampled to the same size as stage 2 (4w×4w). A concat operation is performed on the jointly upsampled feature maps to obtain a multi-level feature map T of size 4w×4w×X (X = 4k + 8k + 16k = 28k = 896). After joint upsampling, the three new feature maps are sequentially combined into a single multi-level feature map T of size 4w×4w×X, where X = 4k + 8k + 16k = 28k = 896. The multi-level feature map T contains 4w×4w embedding vectors of size 1×X, each containing semantic information from three different receptive fields, corresponding one-to-one with each position (x, y) in the original image.
[0041] After joint upsampling, the JPU block then uses four dilated convolutions with different scaling rates to extract features and concatenate them. The aim is to capture feature information at different scales in the multi-level feature maps to ensure the accuracy of the semantic segmentation upsampling decoder. However, each scale of the multi-level feature map itself contains semantic information from different receptive fields. Using multiple dilated convolutions may overlap with the multi-level embedding vector itself, resulting in little improvement in semantic segmentation accuracy. Furthermore, the multi-level feature map T contains X channels, each derived from convolution kernels with different parameters. The correlation between these channels is not considered, thus limiting its improvement in the accuracy of semantic segmentation prediction.
[0042] In one alternative embodiment of this implementation, for multi-level feature maps, a global average pooling function is used to compress the feature map of each channel into a single feature vector. By using the global average pooling function to compress each channel of the multi-level feature map into a single feature vector, efficient compression and feature extraction of the feature map are achieved. This not only reduces the dimensionality of the data but also preserves the key information within each channel, allowing the model to process the feature response of each channel more centrally. Simultaneously, it enhances the model's ability to capture global information from the feature map and improves the model's accuracy in recognizing the ash crack features of cigarette packs.
[0043] In one optional example of this implementation, the feature vector is mapped through two fully connected layers to form the final weight vector. This weight vector is then used to assign new response weights to each channel in the jointly upsampled multi-level feature map, ultimately resulting in a spatially channel-excited feature map. By mapping the feature vector through two fully connected layers to form the final weight vector, and using these weights to assign new response weights to each channel in the jointly upsampled multi-level feature map, the model can adaptively adjust the feature response of each channel, enhancing its ability to identify crack regions. Especially in complex backgrounds and diverse crack morphologies, the spatially channel-excited feature map allows the model to more accurately capture the detailed information of the crack.
[0044] Specifically, to adaptively recalibrate the feature responses of the channel directions and model the interdependencies between different channels, the JEU block introduces an SE (Squeeze-and-Excitation) module, integrating spatial attention into the structure of the upsampling network. This is a significant difference between the JEU and JPU modules. As shown in Equation 1, for a 4w×4w×X Tensor T, a global average pooling function is used to compress it into a 1×X feature vector F, where the x-th element F... x The calculation method is as follows:
[0045]
[0046] In the formula, F(x) corresponds to the x-th channel in Tensor T, and I and J are the feature maps of the x-th channel T, respectively. x The length and width of the feature map are (I = J = 4w), and i and j represent the coordinates of each point on the feature map. In order to establish the correlation between features of different channels, F is connected to two fully connected layers to calculate the contribution weights of different channels during the learning process and to activate the corresponding channels in the feature map during prediction, as shown in Equation (2).
[0047]
[0048] In the formula, δ() represents the ReLU activation function, and σ() represents the Sigmoid activation function.
[0049] During the learning process, the feature vector F is mapped through two fully connected layers to form the final weight vector W. The weight Wx of each channel in W predicts the importance of the corresponding channel x in the multi-level feature map T, thereby modeling the correlation between feature channels. In the two fully connected layer operations, the vector F undergoes one scaling (1 / 2) and one restoration operation to reduce the computational load. w1 is the mapped weight vector of the first fully connected layer, and w2 is the mapped weight vector of the second fully connected layer. The JEU block represents the activation feature map obtained by weighted multiplication of the feature map Tx corresponding to the x-th channel and the weight vector Wx. The JEU block assigns new response weights to each channel c in the multi-level feature map T obtained by joint upsampling using the weight vector W, ultimately obtaining the multi-level feature map after spatial channel activation.
[0050] In an optional embodiment of the present invention, the joint excitation upsampling network computation method further includes: constructing a cigarette pack ash tear segmentation model based on feature maps, optimizing the cigarette pack ash tear segmentation model using a Binary Cross-Entropy loss function, and training the model using 10-fold Cross Validation. By employing the Binary Cross-Entropy loss function optimization and 10-fold Cross Validation training method, not only is the accuracy of the model in identifying tear regions improved, but also the generalization ability of the model on different tear feature images is enhanced through cross-validation, ensuring the stability and reliability of the model in practical applications. Furthermore, this optimization and validation strategy helps to identify and solve the problem of model overfitting, further improving the model's performance on unknown data.
[0051] This invention also proposes a detection system for ash cracks in cigarette packs, wherein the detection system implements the above-mentioned method, including:
[0052] The image acquisition module is used to acquire real-time image data of cracks during the combustion process of cigarettes;
[0053] The data preprocessing module is used to perform noise removal, grayscale calibration, and contrast enhancement on image data;
[0054] The segmentation module includes the JEUNet encoder-decoder structure and the joint excitation upsampling module, which combines upsampling operations on multi-level feature maps to achieve accurate segmentation of the crack region;
[0055] The output module is used to generate visual segmentation results and analysis reports of the cracked area.
[0056] The present invention also proposes a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the above-described calculation method.
[0057] The present invention also proposes a computer device, including a memory and a processor, wherein the memory stores a computer program that can run on the processor, wherein the processor executes the computer program to implement the steps of the above-described calculation method.
[0058] The specific implementation process of the present invention will now be described in detail with reference to the embodiments:
[0059] The computational platform setup for model training in this invention is as follows: JEUNet is implemented using the PyTorch library; the image reading and adjustment functions call some functions from OpenCV 5.0; and the CPU is... The system was powered by a Core(R) i7-8700K CPU (3.70GHz x 12, Intel Corporation, USA) and a GeForce RTX 3080 graphics card (Nvidia Corporation, USA). The deep learning environment used Python version 3.6, PyTorch version 1.10.1, OpenCV version 4.6, CUDA version 11.3, and CUDNN version 8.2. On the constructed cigarette pack ash crack dataset, JEUNet randomly selected 90 images with ground truth and used a 10-fold cross-validation to split the training and validation sets for training. The remaining 10 images with ground truth were then used for testing. The input images were uniformly resized to 448 x 448 pixels. The SGD optimizer was used during training, with a momentum of 0.9 and a weight decay of 0.0001. The Binary Cross-Entropy loss function was employed, with a learning rate of 0.001, a batch size of 1, and 100 training epochs.
[0060] This invention further continues the JEUNet optimization experiments. The JEUNet encode part is defined as Stage 1-Stage 4 according to the feature map size. To obtain the optimal algorithm combination, this paper studies the joint upsampling of the JEU block by selecting feature map combinations from different Stages and combining them with the feature map excitation path, resulting in different semantic segmentation models. Each model is trained for 100 epochs on 90 panoramic grayscale images and then tested on 10 panoramic grayscale images. The results of the optimization experiments are shown in Table 1. It can be seen that the optimal upsampling combination is Stage 4 + Stage 3 + Stage 2, with IOU and AUROC reaching 57.89% and 81.30%, respectively. In addition, after combining the excitation path, the IOU and AUROC of JEUNet are improved to 65.73% and 88.07%, respectively. The average improvement of nearly 8% indicates that the channel attention mechanism of the JEU block plays a very important role in the upsampling process.
[0061] Table 1:
[0062]
[0063] like Figure 3 As shown, to evaluate the performance of CSegNet and JEUNet, comparative experiments were conducted on a cigarette pack ash tear image dataset, comparing them with classic models such as Otsu, FCN, FastFCN, U-Net, U-Net++, and DeepLab V3+. Each method was trained on 90 images and tested on 10 images. The Otsu method is an automatic thresholding segmentation technique that selects a threshold by maximizing the variance between the foreground and background. In the case of tear segmentation, the Otsu method can distinguish between the tear (foreground) and non-tear regions (background). FCN is a pioneering work of deep learning in the field of semantic segmentation. It uses convolutional layers instead of the fully connected layers behind traditional CNNs to output heatmaps and uses upsampling to restore image size. FastFCN significantly improves the upsampling performance of FCN by combining it with the classic Joint Pyramid Upsampling Block (JPU). U-Net, U-Net++, and DeepLabV3+ are Encoder-Decoder architectures with symmetric structures.
[0064] For testing on 10 randomly selected images from the constructed dataset evaluated by IoU, Dice, and AUROC metrics, the Ostu method was used directly, while other models were trained on 90 ground-truth images, with each model trained 50–100 times to achieve convergence. Furthermore, a complexity analysis was performed on the deep learning-based methods, including measurements of total test time (in seconds), floating-point operations per second (FLOPs, in gigabytes (G)), and number of parameters (in millions (M)). For all previous deep learning methods and the proposed method, the tensor input size was measured to be 10 × 448 × 448. The comprehensive evaluation results of different methods are shown in Table 2 and... Figure 4 As shown, the following conclusions can be drawn:
[0065] The proposed CSegNet outperforms the comparison methods in terms of gap segmentation performance. CSegNet achieves the best pixel-level prediction accuracy (IOU and AUROC reach 66.01% and 89.16%, respectively), followed by JEUNet (IOU and AUROC reach 65.73% and 88.07%, respectively).
[0066] Compared to FastFCN, JEUNet improved IoU and AUROC by nearly 30% and 15% respectively, confirming the proposed view that "modeling the interdependencies between different channels can further improve the performance of semantic segmentation networks."
[0067] Compared with the advanced DeepLabV3+ network, CSegNet and JEUNet outperform the dataset constructed in this paper, with IoU and AUROC both increasing by about 1% to 2%.
[0068] Although CSegNet achieved the best detection accuracy, it consumed the most computation, with times, FLOPS, and Params all exceeding those of other methods, making it not cost-effective.
[0069] While achieving detection results similar to CSegNet, JEUNet has a computational parameter count of 5.93, which is 1 / 6 of DeepLabV3+ and 1 / 7 of CSegNet. It has the shortest training and testing time and the lowest computational complexity and time cost among all algorithms. From a cost-performance perspective, it is the most suitable method for detecting gray cracks.
[0070] Table 2:
[0071]
[0072]
[0073] like Figure 4 As shown, the segmentation results of different methods are challenging compared to manual annotation. It can be observed that CSegNet and JEUNet extract the best continuity of the thin burning lines, and their segmentation results of the cracks are also closest to the ground truth.
[0074] The detailed explanations of the above embodiments are intended only to explain the present invention so as to facilitate a better understanding of the present invention. However, these descriptions should not be construed as limiting the present invention for any reason. In particular, the various features described in different embodiments can be arbitrarily combined with each other to form other embodiments. Unless there is an explicit description to the contrary, these features should be understood to be applicable to any embodiment, and not limited to the described embodiments.
Claims
1. A joint excitation upsampling network calculation method for cigarette pack ash crack segmentation, characterized in that, The joint excitation upsampling network calculation method includes: Acquire images of cracks during the combustion process of cigarettes, and preprocess the crack images; A JEUNet encoder with an encoder-decoder symmetric structure is used to perform multi-level feature extraction on the preprocessed crack image. Feature maps of different scales are obtained through four stages of convolution and pooling operations, and downsampling is performed between each two stages using 2×2 max pooling. A joint excitation upsampling module is introduced into the decoder, which includes a spatial attention mechanism and a joint excitation channel mechanism. The joint activation upsampling module utilizes multi-level feature maps for joint upsampling, upsampling feature maps of different scales to the same size and performing cascade processing to generate feature maps containing multi-scale information. Multi-level feature map joint upsampling is employed, including: sampling is performed by 2×2 deconvolution operation to perform 2x upsampling, and each upsampled feature map is cascaded to generate multi-level feature maps containing different receptive fields, so as to improve the multi-scale perception capability of the segmentation model; after the multi-level feature maps are jointly upsampled, the feature map size is adjusted to the original image size by 1×1 convolution. Channel excitation paths are embedded in the multi-level feature maps of stages 2, 3 and 4 of the decoding stage. The multi-scale crack features are refined step by step through the channel excitation paths, and the dependencies between multiple channel features are modeled. For multi-level feature maps, a global average pooling function is used to compress the feature map of each channel into a corresponding feature vector.
2. The joint excitation upsampling network calculation method for cigarette pack ash crack segmentation as described in claim 1, characterized in that, The spatial attention mechanism processes the feature map through max pooling and average pooling operations.
3. The joint excitation upsampling network calculation method for cigarette pack ash crack segmentation as described in claim 1, characterized in that, The joint incentive channel mechanism generates channel incentive vectors through global average pooling and adjusts the weights of the feature responses of different channels.
4. The joint excitation upsampling network calculation method for cigarette pack ash crack segmentation as described in claim 1, characterized in that, Channel excitation paths are embedded in the multi-level feature maps of stages 2, 3, and 4 of the decoding phase. The multi-scale crack features are refined step by step through the channel excitation paths, and the dependencies between multiple channel features are modeled.
5. The joint excitation upsampling network calculation method for cigarette pack ash crack segmentation as described in claim 1, characterized in that, The feature vector is mapped through two fully connected layers to form the final weight vector. The weight vector is then used to assign new response weights to each channel in the multi-level feature map obtained by joint upsampling, ultimately resulting in the feature map after spatial channel excitation.
6. The joint excitation upsampling network calculation method for cigarette pack ash crack segmentation as described in claim 1, characterized in that, The joint excitation upsampling network calculation method further includes: A cigarette pack ash tear segmentation model was constructed based on feature maps. The Binary Cross-Entropy loss function was used to optimize the cigarette pack ash tear segmentation model, and 10-fold Cross Validation was used for model training.
7. A detection system for ash cracks in cigarette packs, characterized in that, The detection system implements the method according to any one of claims 1-6, including: The image acquisition module is used to acquire real-time image data of cracks during the combustion process of cigarettes; The data preprocessing module is used to perform noise removal, grayscale calibration, and contrast enhancement on image data; The segmentation module includes the JEUNet encoder-decoder structure and the joint excitation upsampling module, which combines upsampling operations on multi-level feature maps to achieve accurate segmentation of the crack region; The output module is used to generate visual segmentation results and analysis reports of the cracked area.
8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the calculation method according to any one of claims 1 to 6.
9. A computer device comprising a memory and a processor, wherein a computer program capable of running on the processor is stored in the memory, characterized in that, When the processor executes the computer program, it implements the steps of the computation method according to any one of claims 1 to 6.