A method for detecting open fire and smoke based on improved YOLOv8
By improving the YOLOv8 network and combining C2f-KAN, FFM, and SimAM modules, and using detail-enhanced convolution with shared parameters and group normalization techniques, the problems of small target flames and background occlusion are solved, thus improving the accuracy and robustness of open flame and smoke detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QINGDAO TUDA INTERNET INFORMATION TECH CO LTD
- Filing Date
- 2025-06-25
- Publication Date
- 2026-06-19
AI Technical Summary
Existing open flame and smoke detection technologies are prone to false alarms or missed detections in complex environments, especially the problems of detecting small target flames and background obscuration.
An improved YOLOv8 network is adopted, which improves detection accuracy by adding a C2f-KAN module to the backbone network, introducing a feature fusion module FFM and an attention module SimAM to the neck network, and using detail enhancement convolution with shared parameters and group normalization techniques in the detection head.
It improves the detection accuracy of open flames and smoke in complex backgrounds, especially the detection capability of small target flames, and enhances robustness against background occlusion.
Smart Images

Figure CN122244778A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of target detection technology in computer vision, and more specifically, to a method for detecting open flames and smoke based on an improved YOLOv8. Background Technology
[0002] With the acceleration of industrialization and urbanization, fire has become a major threat to human life and property. Especially in the early stages of a fire, rapid detection of open flames and smoke is crucial. In the early stages of a fire, open flames and smoke are often the most obvious signals; therefore, developing efficient and accurate open flame and smoke detection technologies has significant application value, enabling relevant departments to take timely countermeasures and reduce disaster losses.
[0003] Traditional open flame and smoke detection technologies mainly rely on physical sensors and rule-based algorithms. For example, smoke detectors detect smoke concentration using photoelectric or ionization principles, but they are prone to false alarms or missed alarms in complex environments. Infrared thermal imaging technology can capture temperature changes of the fire source, but it is often affected by background temperature interference. Smoke optical imaging technology captures the scattered light from smoke particles using optical sensors, but its effectiveness is limited by lighting conditions and the diversity of smoke particles, and its performance is unstable in low light or complex scenes.
[0004] With the development of computer vision and deep learning, image processing-based open flame and smoke detection has gradually become mainstream. Deep learning models (such as convolutional neural networks and Transformers) acquire image data in real time through video surveillance systems, automatically extract features, and identify fire occurrences. Compared with traditional methods, deep learning-based detection technology is more robust, capable of efficiently and accurately detecting open flames and smoke in complex backgrounds, adapting to different environments, and widely used in fire protection and industrial safety fields.
[0005] The inventors discovered that although existing target detection algorithms have achieved high accuracy in detecting open flames and smoke, they still face several challenges. First, there's the detection of flames from small targets. Detailed features of small flames (such as color gradations and edge shapes) are easily lost during image sampling or compression, making them difficult for models to extract. Second, there's the challenge of detection due to background occlusion. In practical applications, flames often appear against complex backgrounds, such as buildings, vegetation, or vehicles, which may obscure some or all of the flame's features. In some cases, reflections, shadows, or moving objects (such as pedestrians or vehicles) in the background may resemble the appearance of a flame, leading to false positives or false negatives. Summary of the Invention
[0006] The technical problem this invention aims to solve is to provide a flame and smoke detection method based on an improved YOLOv8. For the problem of detecting small target flames, this invention designs a feature fusion and allocation structure, utilizing two feature fusion modules (FFM) to perform deep fusion of multi-scale features. To address the false alarm problem caused by background occlusion, a feature extraction module (C2f-KAN) is designed using KAGN convolution and DBM. Furthermore, the detection head is improved by utilizing shared detail enhancement convolution and group normalization, further enhancing detection accuracy.
[0007] The present invention achieves its objective by employing the following technical solution: A method for detecting open flames and smoke based on an improved YOLOv8, characterized by the following steps: S1: Process and label the collected surveillance video data of open flames and smoke to create a dataset; S2: Data augmentation methods are used to expand the dataset, including mosaic enhancement, random left and right flipping, random scaling, and random changes in image brightness. The dataset is divided into training and test sets in a 4:1 ratio. S3: Improvements based on the YOLOv8 network: Add a C2f-KAN module to the YOLOv8 backbone network; add a feature fusion module to the neck network of YOLOv8 and design a feature fusion allocation structure; introduce detail enhancement convolution in the detection head of YOLOv8 and design a detail enhancement detection head using shared convolutional layers and group normalization. S4: Feed the training set data into the improved YOLOv8 network for training to obtain the model; S5: The model saves the weights from the last training session and the weight file that yielded the best results; S6: Use the test set for performance evaluation of the model.
[0008] As a further limitation of this technical solution, in S1: The data processing section uses a frame-sampling method with an interval of 10 frames to split the monitoring video data of open flame and smoke frame by frame and generate a static image sequence. The target annotation part combines the advantages of manual and automatic annotation. First, some data is manually annotated, and then the YOLOv8 network is used to train the labeled data. The weight file obtained from the training is combined with the automatic annotation function of X-AnyLabeling-GPU to build an intelligent annotation process.
[0009] As a further limitation of this technical solution, the specific process of S1 is as follows: First, model inference is used to achieve batch automatic annotation, and then manual review is used to correct false detections / missed detections. This semi-automated approach improves annotation efficiency.
[0010] As a further limitation of this technical solution, the specific process of S3 is as follows: Improvements to the S31 backbone network section; Improvements to the S32 neck network section; Improvements to the S33 detection head section.
[0011] As a further limitation of this technical solution, the specific process of S31 is as follows: Replace the feature extraction module C2f in the YOLOv8 backbone network with the C2f-KAN module; The C2f-KAN module replaces the bottleneck layer module in the C2f module with a bottleneck layer-Kolmogorov-Arnold network module. Within the bottleneck layer-Kolmogorov-Arnold network module, detail enhancement modules and Kolmogorov-Arnold Gram multinomial network-convolution are designed. The detail enhancement module consists of depthwise separable convolution, detail enhancement convolution, and convolutional attention fusion module. After the input features are extracted by depthwise separable convolution to extract channel-independent spatial features, they are normalized by batch normalization, and then enhanced by detail enhancement convolution to strengthen the details of open flame and smoke. Finally, the self-attention mechanism of the convolutional attention fusion module is used to model the relationship between any two positions in the features to capture global semantic features. Kolmogorov-Arnoldgram Multinomial Network-Convolution is an attention mechanism that introduces convolutional block attention modules on the basis of Kolmogorov-Arnoldgram Multinomial Network-Convolution. In the Kolmogorov-Arnoldgram multinomial network-convolution, there are two parallel branches: a basic feature extraction branch and a multi-order feature extraction branch, whose outputs are summed at the end. After extracting local semantic features through basic convolution, a convolutional block attention module is introduced to further enhance the representation of key channels and spatial locations; The resulting C2f-KAN module significantly improves the model's ability to distinguish between background and open flame / smoke targets through detailed feature enhancement and global and local collaborative mechanisms.
[0012] As a further limitation of this technical solution, the Kolmogorov-Arnold Gram polynomial network-convolution is generated by combining the Kolmogorov-Arnold network convolution with Gram polynomials. Gram polynomials are discrete Chebyshev polynomials, which are orthogonal polynomials with good approximation. Its definition satisfies: (1) in: This refers to the Gram polynomial. x Represents the feature values of the input data. N Denotes the order of a polynomial. n and m Indicates an integer index; It is a discrete weighting function, usually using the Dirac weighting function. δ The function is represented as: (2) in: r It is an integer index representing the location of a discrete point; The Kolmogorov-Arnold Gram multinomial network-convolutional layer is a Kolmogorov-Arnold network convolutional layer based on Gram multinomials. By replacing the traditional spline function in the Kolmogorov-Arnold network convolution with Gram multinomials, it achieves a combination of the Kolmogorov-Arnold network and Gram multinomials. The expression for the Kolmogorov-Arnold network convolution after combining Gram multinomials is: (3) in: It is an activation function; (4) in: It is an activation function in a neural network; These are the trainable weights of the residual activation terms; (5) in: Indicates the first i Gram polynomial of order n; These are trainable weights with polynomial coefficients.
[0013] As a further limitation of this technical solution, the specific process of S32 is as follows: In the future, the feature maps of the three levels of the autonomous backbone network will first undergo deep feature fusion through an FFM (Feature Fusion Method). Then a series of upsampling and downsampling operations are performed; The C2f module then generates feature maps at three scales, which are then fed into the second feature fusion module for further fusion of multi-scale features. Before the features are fed into the detection head, they pass through a simple attention module to further enhance the model's ability to focus on key information.
[0014] As a further limitation of this technical solution, the specific process of S33 is as follows: Replace the 3×3 convolutions in the YOLOv8 detector head with shared-parameter augmented detail convolutions, and change the normalization method in the detector head from batch normalization to group normalization.
[0015] 1. The open flame and smoke detection method based on improved YOLOv8 according to claim 8, characterized in that: to solve the cross-scale consistency problem, a learnable scaling operation is introduced to dynamically adjust the bounding box regression components, and the function of the scaling operation is represented as: (6) in: X Represents input features; This represents the learnable scaling factor.
[0016] Compared with the prior art, the advantages and positive effects of the present invention are: This invention provides a flame and smoke detection method based on an improved YOLOv8 architecture. A novel feature extraction module, C2f-KAN, is designed by fusing KAGN convolutions into the YOLOv8 backbone network, effectively enhancing the network's ability to distinguish between background and flame / smoke targets. A feature fusion and allocation structure is designed, integrating FFM and SimAM. FFM effectively aggregates multi-scale features, thereby improving the network's ability to detect small flame targets, while SimAM enhances the accuracy of flame and smoke detection without introducing any parameters. Through the application of parameter-sharing enhanced detail convolutions and group normalization techniques, the number of parameters in the detection head is reduced while detection accuracy is still improved. Attached Figure Description
[0017] Figure 1 This is a schematic flowchart of the method of the present invention.
[0018] Figure 2 This is a diagram of the overall network structure of the present invention.
[0019] Figure 3 This is a schematic diagram of the backbone network portion of the present invention.
[0020] Figure 4 This is a schematic diagram of the neck network portion of the present invention.
[0021] Figure 5 This is a structural diagram of the enhanced convolution of the present invention.
[0022] Figure 6 This is a structural diagram of the enhanced detection head of the present invention.
[0023] Figure 7 This is a schematic diagram of the detection results of the present invention. Figure 1 .
[0024] Figure 8 This is a schematic diagram of the detection results of the present invention. Figure 2 . Detailed Implementation
[0025] The following detailed description of a specific embodiment of the present invention is provided in conjunction with the accompanying drawings. However, it should be understood that the scope of protection of the present invention is not limited to the specific embodiment.
[0026] This invention includes the following steps: S1: Process and label the collected surveillance video data of open flames and smoke to create a dataset.
[0027] In S1: The data processing section uses a frame-sampling method with an interval of 10 frames to split the monitoring video data of open flame and smoke frame by frame and generate a static image sequence. The target annotation part combines the advantages of manual and automatic annotation. First, some data is manually annotated, and then the YOLOv8 (You Only Look Once version 8) network is used to train the annotated data. The weight file obtained from the training is combined with the automatic annotation function of X-AnyLabeling-GPU to build an intelligent annotation process.
[0028] The specific process of S1 is as follows: First, model inference is used to achieve batch automatic annotation, and then manual review is used to correct false detections / missed detections. This semi-automated approach improves annotation efficiency.
[0029] S2: The dataset is augmented using data augmentation methods, including mosaic augmentation (4 images stitched together for 100% probability), random left and right flipping (50% of the images will be flipped horizontally), random scaling (image scaling range (±50%)), and random changes in image brightness (variation range [0, 0.4]). The dataset is divided into training and test sets in a 4:1 ratio.
[0030] S3: Improvements based on the YOLOv8 network: A C2f-KAN (Faster Implementation of CSP Bottleneck with 2 convolutions-Kolmogorov–Arnold Networks) module is added to the YOLOv8 backbone network; a Feature Fusion Module (FFM) is added to the neck network of YOLOv8, and a feature fusion allocation structure is designed; Detail-Enhanced Convolution (DEConv) is introduced into the detection head of YOLOv8, and a detail-enhanced detection head is designed using shared convolutional layers and group normalization. The specific process of S3 is as follows: Improvements to the S31 backbone network: The feature extraction module C2f-KAN is designed to replace C2f. Bottleneck-KAN is constructed using KAGN-Conv and DBM to replace Bottleneck in C2f. By introducing KAGN-Conv, C2f-KAN can map the input data to a higher-dimensional space, thereby expanding the perceptual range of input features, capturing more complex patterns and relationships, enhancing the network's ability to extract semantic features, and distinguishing between background, open flame, and smoke. The specific process of S31 is as follows: Replace the feature extraction module C2f (Faster Implementation of CSPBottleneck with 2 convolutions) in the YOLOv8 backbone network with the C2f-KAN module; The C2f-KAN module replaces the Bottleneck module in the C2f module with the Bottleneck-Kolmogorov–Arnold Networks (Bottleneck-KAN) module. Within the Bottleneck-Kolmogorov–Arnold Networks module, a DetailBoost Module (DBM) and a Kolmogorov–Arnold Gram polynomials Networks-Conv (KAGN-Conv) are designed. The DetailBoost Module consists of depthwise separable convolutions, detail-boosting convolutions, and a convolutional attention fusion module. After the input features are extracted by depthwise separable convolution to extract channel-independent spatial features, they are normalized by batch normalization, and then enhanced by detail enhancement convolution to strengthen the details of open flame and smoke features. Finally, the self-attention mechanism of the Convolution and Attention Fusion Module (CAFM) is used to model the relationship between any two positions in the features to capture global semantic features. Kolmogorov–Arnold Gram polynomials Networks (KAGN) introduces an attention mechanism called Convolution Block Attention Module (CBAM) on top of Kolmogorov–Arnold Gram polynomials Networks (KAGN). In the Kolmogorov-Arnoldgram multinomial network-convolution, there are two parallel branches: a basic feature extraction branch and a multi-order feature extraction branch, whose outputs are summed at the end. After extracting local semantic features through basic convolution, a convolutional block attention module is introduced to further enhance the representation of key channels and spatial locations; The resulting C2f-KAN module significantly improves the model's ability to distinguish between background and open flame / smoke targets through detailed feature enhancement and global and local collaborative mechanisms.
[0031] The Kolmogorov-Arnold Gram polynomial network convolution is generated by combining Kolmogorov-Arnold Network (KAN) convolution with Gram polynomials (Gram polynomials). Gram polynomials are discrete Chebyshev polynomials, which are orthogonal polynomials with good approximation. Its definition satisfies: (1) in: It refers to Gram polynomials. x Represents the feature values of the input data. N Denotes the order of a polynomial. n and m Indicates an integer index; It is a discrete weighting function, usually using the Dirac weighting function. δ The function is represented as: (2) in: rIt is an integer index representing the location of a discrete point; The Kolmogorov-Arnold Gram multinomial network-convolutional layer is a Kolmogorov-Arnold network convolutional layer based on Gram multinomials. By replacing the traditional spline function in the Kolmogorov-Arnold network convolution with Gram multinomials, it achieves a combination of the Kolmogorov-Arnold network and Gram multinomials. The expression for the Kolmogorov-Arnold network convolution after combining Gram multinomials is: (3) in: It is an activation function; (4) in: It is an activation function in neural networks; its full name is Sigmoid Gated Linear Unit. Its image is... x When the value is large, it grows approximately linearly, approaching a certain value. x ;exist x When the value is small, it approaches 0, suppressing small negative values; x The output at =0 is 0, the derivative is 0.25, and the curve is smooth. The advantages of the SiLU function are that it alleviates the gradient vanishing problem, especially in deep neural networks where it can better maintain the gradient magnitude, which is beneficial for information propagation; it has strong nonlinear expressive power and can flexibly fit complex functional relationships; at the same time, the curve is smooth, which is beneficial for optimization.
[0032] These are the trainable weights of the residual activation terms; (5) in: Indicates the first i Gram polynomials of order 1 have orthogonality and can effectively capture higher-order features of input data; These are trainable weights with polynomial coefficients.
[0033] This new form allows KAN convolutions to leverage the orthogonality of Gram polynomials to enhance feature representations.
[0034] Improvements to the S32 neck network section: The neck network introduces a Feature Fusion Module (FFM) and an Attention Module (SimAM). FFM combines deep convolution and a parallel Identity module to fuse multi-scale features, ensuring dimensionality matching and flexibility, and integrating DropPath regularization to improve training efficiency. In the neck network, two FFMs are used to fuse multi-scale features twice. This fusion method enhances the detailed features of small targets within the context of global information, improving the accuracy of small target detection. The SimAM attention mechanism generates attention weights by calculating the similarity between each pixel in the feature map and its neighboring pixels, improving the quality of the feature map representation.
[0035] The specific process of S32 is as follows: In the future, the feature maps of the three layers of the autonomous backplane network (P3, P4, and P5 represent feature maps after downsampling the input image by 8, 16, and 32 times, respectively) will first undergo deep feature fusion through an FFM. Then a series of upsampling and downsampling operations are performed; The C2f module then generates feature maps at three scales, which are then fed into the second feature fusion module for further fusion of multi-scale features. The feature fusion module introduces Adown downsampling convolution, which combines average pooling and max pooling, and utilizes small convolutional kernels and branching structures to effectively preserve important features and reduce computation during downsampling. Before the features are fed into the detection head, they pass through a SimpleAttention Module (SimAM) to further enhance the model's ability to focus on key information.
[0036] Improvements to the S33 detection head: In the DEDH head of the detail enhancement detection, each of the three detection layers in the neck network output is first convolved with a 1×1 convolution, and then information is aggregated through two detail enhancement convolutional layers with shared parameters to reduce redundant information and enhance the learning ability of adjacent features, thereby capturing target details more accurately. Using group normalization instead of batch normalization can not only ensure stability in small batch scenarios, but also maintain consistent model accuracy across different batch sizes.
[0037] The specific process of S33 is as follows: Replace the 3×3 convolutions in the YOLOv8 detector head with shared-parameter augmented detail convolutions, and change the normalization method in the detector head from batch normalization to group normalization.
[0038] To address the cross-scale consistency problem, a learnable scaling operation is introduced to dynamically adjust the bounding box regression components. The scaling operation is represented as follows: (6) in: X Represents input features; This represents the learnable scaling factor, initially set to 1.0, and automatically optimized during training through backpropagation.
[0039] In this model, the Scale module is used as an auxiliary adjustment in the bounding box regression channel to refine the output of the detection head and improve the target localization accuracy.
[0040] S4: Feed the training set data into the improved YOLOv8 network for training to obtain the model.
[0041] S5: The model saves the weights from the last training iteration and the weight file with the best performance (these weights are derived by comparing the evaluation results of different training rounds).
[0042] S6: Use the test set for performance evaluation of the model.
[0043] Example: S1: The acquired surveillance video data is split into static image sequences frame by frame. Image data is collected using the interval frame extraction method. Frames are extracted with a sampling step size of 10 frames. The X-AnyLabeling-GPU annotation tool is used to annotate open flame and smoke targets and export them in YOLO format.
[0044] S2: The dataset is augmented using data augmentation methods such as color enhancement and geometric enhancement. The data augmentation methods used include: mosaic enhancement (4 images stitched together for enhancement, with a probability of 100%), random left and right flipping (50% of the images will be flipped horizontally), random scaling (image scaling ratio range (±50%)), and random changes in image brightness (variation range of [0,0.4]). The dataset is randomly divided into training and test sets in a 4:1 ratio.
[0045] S3: Build an improved YOLOv8 model, such as Figure 2As shown, the model comprises a backbone network, a neck network, and a detection head. The backbone network is responsible for feature extraction, with the C2f-KAN module serving as the basic building block. Residual connections and bottleneck structures are used to reduce the network size and improve performance. Within the C2f-KAN module, the introduction of KAGN-Conv and DBM (Detail Boost Module) enhances the network's ability to distinguish between background, open flame, and smoke. The neck network incorporates FFM (Feature Fusion Module) and SimAM modules for deep fusion of multi-scale information. The Detail Enhancement Detection Head (DEDH) improves the model's detection accuracy while reducing the number of parameters by introducing two shared detail enhancement convolutional layers and group normalization techniques.
[0046] This invention designs DBM and KAGN-Conv modules, such as Figure 3 As shown. DBM consists of detail-enhanced convolutions (structure see...). Figure 5 The module consists of a depthwise separable convolution (DBM) and a convolutional attention fusion module. The input DBM features are first processed by a depthwise separable convolution to extract local spatial information. This convolution operates independently on each input channel without mixing channel information. Batch Normalization is then applied for regularization, followed by a detail enhancement convolution (DEConv) to extract detailed features of open flame and smoke targets. Finally, the self-attention method in CAFM models the relationships between features, capturing long-range dependencies and global semantics. KAGN-Conv introduces a CBAM attention mechanism on top of the KAGN convolution. It contains two parallel branches: a basic feature extraction branch and a multi-order feature extraction branch, with the outputs summed at the end. Input features retain the original data through depthwise duplication and apply the SiLU activation function to enhance nonlinear representation. Basic convolution extracts local features. CBAM is introduced to enhance the representation of key channels and spatial locations, and stability is enhanced through tanh activation, normalization, and Dropout. The normalized features are input to a Gram multinomial generator, recursively generating multi-order features, which are then processed by high-order convolutions to extract global semantic information. Finally, higher-order features are combined with basic features, and enhanced feature representations are obtained through layer normalization and SiLU activation fusion. DBM and KAGN-Conv are combined to form Bottleneck-KAN, replacing the Bottleneck part in the C2f module, thus forming C2f-KAN. This module design enables the model to synthesize information at multiple scales and levels, improving the model's ability to capture comprehensive information from the gradient flow and enhancing its ability to distinguish between background and open flames and smoke. KAGN convolution is generated by combining KAN convolution with Gram polynomials. Gram polynomials are discrete Chebyshev polynomials, which are orthogonal polynomials with good approximation.
[0047] The KAGN convolutional layer is a KAN convolutional layer based on Gram polynomials. It combines KAN and Gram polynomials by replacing the traditional spline function in KAN convolution with Gram polynomials.
[0048] FFM structure as Figure 4 As shown, FFM combines deep convolutions and a parallel Identity module to ensure dimensionality matching and flexibility, and improves training efficiency through Drop Path regularization. Its core idea is to capture local information using small kernel convolutions while extracting features at multiple scales through parallel deep convolutions. Specifically, feature map P3 (8x downsampling) is downsampled using the Adown module, with part downsampled through 3×3 convolutions and the other part through max pooling followed by convolution, outputting feature K3. The Adown module, by combining average pooling and max pooling, and utilizing small convolutional kernels and branching structures, effectively preserves important features during downsampling. Feature P4 (16x downsampling) is directly used as K4, and feature P5 (32x downsampling) is upsampled and its channel count adjusted using 1×1 convolutions to generate K5. Finally, K3, K4, and K5 are concatenated along the channel dimension, processed through four deep convolutional layers, and superimposed with K to generate a fused feature. The fused feature is further processed through 1×1 convolutions and added to K through residual connections to generate the final output. Each feature map passes through the SimAM attention module before being fed into the detection head, where attention weights are calculated to optimize the feature representation.
[0049] The structure of DEDH is shown below. Figure 6 DEDH combines shared detail augmentation convolutions for feature extraction and information aggregation to achieve a balance between lightweight design and detection accuracy. The three detection layers in the detection head are respectively processed by 1×1 convolutions and two shared-parameter detail augmentation convolution layers for information aggregation, reducing redundant information and enhancing the learning of adjacent features, thereby capturing target details more accurately. To improve stability, the Batch Normalization (BN) layer is replaced by the Group Normalization (GN) layer, eliminating dependence on batch size and ensuring stability and consistent accuracy in small-batch scenarios. Detection layers at different scales are sensitive to target size differently; low-resolution layers are more suitable for large targets, while high-resolution layers are more suitable for small targets. To address cross-scale consistency issues, a learnable scaling operation (Scale) is introduced to dynamically adjust the bounding box regression components.
[0050] S4: Adjust the input image size to 640*640, set the model training epochs to 300, set the batch size to 8, and use the SGD optimizer for training.
[0051] S5: Weight update, the model saves the weights from the last training and the weight that performed best.
[0052] S6: Use the test set for performance evaluation of the model.
[0053] The above-disclosed embodiments are merely specific examples of the present invention. However, the present invention is not limited thereto, and any variations that can be conceived by those skilled in the art should fall within the protection scope of the present invention.
Claims
1. A method for detecting open flames and smoke based on an improved YOLOv8, characterized in that, Includes the following steps: S1: Process and label the collected surveillance video data of open flames and smoke to create a dataset; S2: Data augmentation methods are used to expand the dataset, including mosaic enhancement, random left and right flipping, random scaling, and random changes in image brightness. The dataset is divided into training and test sets in a 4:1 ratio. S3: Improvements are made based on the YOLOv8 network. A C2f-KAN module is added to the backbone network of YOLOv8; a feature fusion module is added to the neck network of YOLOv8, and a feature fusion allocation structure is designed; a detail enhancement convolution is introduced into the detection head of YOLOv8, and a detail enhancement detection head is designed by using shared convolutional layers and group normalization. S4: Feed the training set data into the improved YOLOv8 network for training to obtain the model; S5: The model saves the weights from the last training session and the weight file that yielded the best results; S6: Use the test set for performance evaluation of the model.
2. The method for detecting open flames and smoke based on improved YOLOv8 according to claim 1, characterized in that: In S1: The data processing section uses a frame-sampling method with an interval of 10 frames to split the monitoring video data of open flame and smoke frame by frame and generate a static image sequence. The target annotation part combines the advantages of manual and automatic annotation. First, some data is manually annotated, and then the YOLOv8 network is used to train the labeled data. The weight file obtained from the training is combined with the automatic annotation function of X-AnyLabeling-GPU to build an intelligent annotation process.
3. The method for detecting open flames and smoke based on improved YOLOv8 according to claim 1, characterized in that: The specific process of S1 is as follows: First, model inference is used to achieve batch automatic annotation, and then manual review is used to correct false detections / missed detections. This semi-automated approach improves annotation efficiency.
4. The method for detecting open flames and smoke based on improved YOLOv8 according to claim 1, characterized in that: The specific process of S3 is as follows: Improvements to the S31 backbone network section; Improvements to the S32 neck network section; Improvements to the S33 detection head section.
5. The method for detecting open flames and smoke based on improved YOLOv8 according to claim 4, characterized in that: The specific process of S31 is as follows: Replace the feature extraction module C2f in the YOLOv8 backbone network with the C2f-KAN module; The C2f-KAN module replaces the bottleneck layer module in the C2f module with a bottleneck layer-Kolmogorov-Arnold network module. Within the bottleneck layer-Kolmogorov-Arnold network module, detail enhancement modules and Kolmogorov-Arnold Gram multinomial network-convolution are designed. The detail enhancement module consists of depthwise separable convolution, detail enhancement convolution, and convolutional attention fusion module. After the input features are extracted by depthwise separable convolution to extract channel-independent spatial features, they are normalized by batch normalization, and then enhanced by detail enhancement convolution to strengthen the details of open flame and smoke. Finally, the self-attention mechanism of the convolutional attention fusion module is used to model the relationship between any two positions in the features to capture global semantic features. Kolmogorov-Arnoldgram Multinomial Network-Convolution is an attention mechanism that introduces convolutional block attention modules on the basis of Kolmogorov-Arnoldgram Multinomial Network-Convolution. In the Kolmogorov-Arnoldgram multinomial network-convolution, there are two parallel branches: a basic feature extraction branch and a multi-order feature extraction branch, whose outputs are summed at the end. After extracting local semantic features through basic convolution, a convolutional block attention module is introduced to further enhance the representation of key channels and spatial locations; The resulting C2f-KAN module significantly improves the model's ability to distinguish between background and open flame / smoke targets through detailed feature enhancement and global and local collaborative mechanisms.
6. The method for detecting open flames and smoke based on improved YOLOv8 according to claim 5, characterized in that: The Kolmogorov-Arnold Gram polynomial network convolution is generated by combining Kolmogorov-Arnold network convolution with Gram polynomials. Gram polynomials are discrete Chebyshev polynomials, which are orthogonal polynomials with good approximation. Its definition satisfies: (1); in: This refers to the Gram polynomial. x Represents the feature values of the input data. N Denotes the order of a polynomial. n and m Indicates an integer index; It is a discrete weighting function, usually using the Dirac weighting function. The function is represented as: (2); in: r It is an integer index representing the location of a discrete point; The Kolmogorov-Arnold Gram multinomial network-convolutional layer is a Kolmogorov-Arnold network convolutional layer based on Gram multinomials. By replacing the traditional spline function in the Kolmogorov-Arnold network convolution with Gram multinomials, it achieves a combination of the Kolmogorov-Arnold network and Gram multinomials. The expression for the Kolmogorov-Arnold network convolution after combining Gram multinomials is: (3); in: It is an activation function; (4); in: It is an activation function in a neural network; These are the trainable weights of the residual activation terms; (5); in: Indicates the first Gram polynomial of order n; These are trainable weights with polynomial coefficients.
7. The method for detecting open flames and smoke based on improved YOLOv8 according to claim 5, characterized in that: The specific process of S32 is as follows: In the future, the feature maps of the three levels of the autonomous backbone network will first undergo deep feature fusion through an FFM (Feature Fusion Method). Then a series of upsampling and downsampling operations are performed; The C2f module then generates feature maps at three scales, which are then fed into the second feature fusion module for further fusion of multi-scale features. Before the features are fed into the detection head, they pass through a simple attention module to further enhance the model's ability to focus on key information.
8. The method for detecting open flames and smoke based on improved YOLOv8 according to claim 7, characterized in that: The specific process of S33 is as follows: Replace the 3×3 convolutions in the YOLOv8 detector head with shared-parameter augmented detail convolutions, and change the normalization method in the detector head from batch normalization to group normalization.
9. The method for detecting open flames and smoke based on improved YOLOv8 according to claim 8, characterized in that: To address the cross-scale consistency problem, a learnable scaling operation is introduced to dynamically adjust the bounding box regression components. The scaling operation is functionally represented as follows: (6); in: X Represents input features; This represents the learnable scaling factor.