A progressive multi-scale attention perception field model system for infrared dangerous and hazardous gas leakage visualization intelligent detection

By improving the YOLOv8 model and combining multi-scale sensing field convolution and attention mechanisms, a GAS-YOLO network was constructed, which solved the problems of accuracy and efficiency in gas detection in complex industrial environments and achieved efficient and accurate infrared gas leak detection.

CN118470634BActive Publication Date: 2026-06-19HEFEI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HEFEI UNIV
Filing Date
2024-05-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing gas detection methods have unstable detection accuracy in complex industrial environments, making it difficult to achieve rapid response and large-scale detection. They also rely on manual detection, resulting in low efficiency. Dynamic intelligent vision detection is easily affected by moving objects, and traditional feature extraction methods are difficult to capture the intrinsic characteristics of gases.

Method used

A progressive multi-scale attention-based perceptual field model system is adopted, combined with an improved YOLOv8 model. It uses a multi-scale perceptual field convolution module, separable convolutional attention, and a progressive feature pyramid network to enhance feature representation and information interaction, and constructs a GAS-YOLO network to adapt to gas leak detection in infrared images.

🎯Benefits of technology

It significantly improves the efficiency and accuracy of gas leak detection, reduces reliance on manual operation, adapts to complex industrial environments, and enhances safety performance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118470634B_ABST
    Figure CN118470634B_ABST
Patent Text Reader

Abstract

This invention proposes a progressive multi-scale attention-based perception field model system for intelligent visualization detection of infrared hazardous gas leaks. Image data of gas leaks in real chemical environments is captured using an infrared thermal imager and preprocessed. MRFGConv is introduced into the YOLOv8 model to preserve detailed information and address the spatial attention feature sharing problem; SCA attention is designed and integrated into the C2f module to effectively couple multi-scale features, balancing performance and efficiency; the original YOLOv8 Neck structure is reconstructed using an AFPN structure to promote information interaction between non-adjacent levels, achieving high-level feature fusion. The images are input into the improved model for training and validation to obtain the optimal model, which is then integrated into the intelligent detection system. The system takes an infrared gas leak image as input and outputs the result through forward computation. This invention ensures accurate, real-time, and safe location and identification of gas leaks in complex environments.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of hazardous chemical gas leak detection technology, and in particular relates to a progressive multi-scale attention perception field model system for infrared hazardous chemical gas leak visualization and intelligent detection. Background Technology

[0002] Rapid global urbanization and industrialization have brought unprecedented opportunities to the petrochemical, coal-fired power, and steel industries. However, these industries have also generated environmental and safety problems during economic development, primarily due to the large-scale use of flammable, explosive, colorless, and toxic gases. These gaseous substances (such as volatile organic compounds) not only pose direct harm to human health and the environment but can also lead to serious safety accidents. Since most chemical gases originate from the evaporation and dispersion of raw materials, semi-finished products, and finished products, timely and accurate detection of gas leaks is particularly urgent. Therefore, taking effective measures to detect gas leaks in a timely manner is crucial. This not only complies with environmental regulations and safety standards but is also an important means of significantly reducing the emission of harmful chemicals. Especially in the areas of early warning and prevention of major safety production accidents, identification of hazardous sources in complex industrial production scenarios, and emergency response, it has profound and significant implications for safeguarding the lives and property of the people and for sustainable social development.

[0003] Current gas detection methods can be divided into two categories: contact and non-contact. Traditional contact gas detection methods based on the physicochemical properties of gases are affected by environmental factors such as temperature, humidity, and wind speed, resulting in large fluctuations in detection accuracy, limited detection range, and difficulty in achieving rapid response to large-scale dynamic changes. With the rapid development of computer vision technology, video surveillance equipment has gradually become an essential tool in the field of non-contact gas leak detection. However, current visual methods that mainly rely on manual inspection are inefficient. Methods relying solely on dynamic intelligent vision are affected by moving objects, leading to a high false detection rate. To address these issues, researchers often use feature extraction methods to capture representative static features, such as the spectrum, morphology, and texture of the leaking gas. However, these traditional methods have limitations in terms of effectiveness and time, because these feature operators depend on specific scenarios and empirical design, making it difficult to capture the intrinsic characteristics of the gas and achieve generalization.

[0004] In recent years, deep learning-based artificial intelligence methods have been widely applied across various industries, achieving significant progress in gas leak detection. Deep learning methods effectively bypass the complex manual design process of traditional feature extraction, enabling the learning of representative features of the target object. In the petrochemical, coal-fired power, and steel industries, the presence of large quantities of colorless and odorless gases makes it difficult for traditional visible light-based video surveillance equipment to produce effective images, hindering further identification and detection. Building on this, the introduction of infrared imaging, by capturing the infrared radiation of the target object, overcomes the problems of visible light images, offering advantages such as high contrast and minimal sensitivity to light sources. Therefore, infrared imaging technology has become a popular research direction in the field of gas leak detection. Infrared thermal imaging-based detection methods utilize the temperature difference between the gas and the background to successfully monitor harmful gases effectively. Compared with traditional non-imaging gas leak detection technologies such as catalytic oxidation, photoionization, and flame ionization, infrared thermal imaging-based gas leak detection has significant advantages such as fast response speed, long range, wide coverage, and dynamic visualization. It can quickly detect and locate leaks that may be difficult to detect using traditional methods. Therefore, the integration of infrared thermal imaging visual monitoring systems with deep learning technology has become an important direction for exploration in the petrochemical industry for large-scale, visual, and intelligent monitoring of flammable, explosive, colorless, and toxic gases.

[0005] Therefore, there is an urgent need to implement a gas leak detection method that is efficient, real-time, safe, and has a high detection rate in actual production environments. Summary of the Invention

[0006] The purpose of this invention is to overcome the shortcomings of the existing technology by proposing a progressive multi-scale attention perception field model system for infrared hazardous gas leakage visualization and intelligent detection, which can adapt to complex and ever-changing actual industrial environments.

[0007] The technical solutions commonly used in this invention are as follows:

[0008] A progressive multi-scale attention-based sensing field model system for infrared-based intelligent visualization detection of hazardous gas leaks includes: a model selection module, a detection parameter adjustment module, an operation module, and a detection result output module; wherein:

[0009] Model selection module: It is used to select different detection models;

[0010] Detection parameter adjustment module: It is used to adjust the sensitivity of gas leak detection for different scenarios;

[0011] Operation module: It is used to select the file type to upload, which can be divided into uploading pictures, uploading videos, and transmitting videos captured by real-time online video on mobile devices;

[0012] The detection result output module is used to input the images, videos, and real-time captured videos from the uploaded system into the final detection model and output the detection results.

[0013] A method for visualizing and intelligently detecting hazardous gas leaks based on the aforementioned progressive multi-scale attention-based field model system comprises the following steps:

[0014] Step S1: Obtain the hazardous chemical gas leak dataset and divide it into training set, validation set and test set;

[0015] Step S2: Construct a hazardous chemical gas leak detection model based on the improved YOLOv8;

[0016] Step S3: Train the hazardous chemical gas leak detection model using the training set. The model maps the input data to the output space, generates prediction results, and obtains the final intelligent detection system for hazardous chemical gas leaks.

[0017] Step S4: Input the test set, local video, and images acquired in real time from the mobile device into the intelligent detection system for hazardous chemical gas leaks, and output the detection results.

[0018] As a preferred embodiment of the present invention, step S1 of the hazardous gas leakage visualization intelligent detection method specifically includes the following steps:

[0019] Step S11: Capture video of a gas leak in a real chemical scene using an infrared thermal imager;

[0020] Step S12: Manually annotate the target gas frame by frame in the acquired video using VOTT software;

[0021] Step S13: Divide the dataset into 101 different scenarios, and randomly divide them into training set, validation set and test set according to scenario and proportion.

[0022] As a preferred embodiment of the present invention, step S2 of the hazardous gas leakage visualization intelligent detection method specifically includes the following steps:

[0023] Step S21: Replace the CBS module in the YOLOv8 model with a multi-scale receptive field convolution (MRFGConv) module. This expands the receptive field during downsampling, enhances feature representation learning to some extent, extracts more detailed information, and solves the feature sharing problem. The formula for MRFGConv is as follows:

[0024] F = ReLU(Norm(c 3×3 (r(Ar×Fr))));

[0025]

[0026] Fr=ReLU(Norm(g 3×3 (X)));

[0027] In the formula, X represents the input feature map, and c k×k G represents a regular convolution with a kernel size of k×k. k×k Tr represents a grouped convolution with a kernel size of k×k, Norm is the normalization operation, Tr represents the receptive field feature map after transformation using 3×3 grouped convolution, Ar is the sum of multi-scale spatial features extracted by grouped convolution with different kernel sizes, r is the reshaping operation, and F is the final output feature map.

[0028] Step S22: Separable Convolutional Attention (SCA) is coupled into the C2f module to form a new C2f-SCA component. Through the unique design of multi-scale convolution and a global aggregation method based on new dimensions, this module effectively integrates multi-scale features, further enhancing feature representation capabilities and achieving a balance between performance improvement and computational efficiency. The formula for SCA is as follows:

[0029] Atten=SiLU(Norm(c 1×1 (Am+X * )));

[0030]

[0031] X * =spilt(X);

[0032] In the formula, X * This means dividing the input feature map X into two parts, c k×k denoted as a regular convolution with a kernel size of k×k, Norm represents the normalization operation, Am represents the multi-scale attention extracted by convolution kernels of different sizes, r is the reshaping operation, and F is the final output feature map;

[0033] Step S23: Reconstruct the standard neck of YOLOv8 using a feature fusion network based on the Asymptotic Feature Pyramid Network (AFPN) structure. This helps to effectively fusion semantic information and reduce significant differences between non-adjacent levels, thus reducing the semantic gap. The formula for the adaptive spatial fusion operation (three layers) in AFPN is as follows:

[0034]

[0035] In the formula, the feature vector at position (i,j) from layer n to layer l is represented as: The final feature representation through multi-layer adaptive spatial fusion operation is as follows in The feature space weights of the three layers at level l are given by the following constraints.

[0036] As a preferred embodiment of the present invention, step S3 in the visual intelligent detection method for hazardous gas leaks specifically includes the following steps:

[0037] Step S31: Perform forward propagation using the training set, input the image into the model, and obtain the prediction result;

[0038] Step S32: Calculate the loss between the model prediction and the true label using cross-entropy loss;

[0039] Step S33: Calculate the gradient of the loss function with respect to the model parameters using the backpropagation algorithm;

[0040] Step S34: Update the model's weights and biases using the gradient descent optimization algorithm;

[0041] Step S35: Repeat steps S31-S34, iterating multiple times until the model converges or reaches the predetermined number of training rounds.

[0042] Compared with the prior art, the beneficial effects of the present invention are as follows:

[0043] 1. This invention proposes the GAS-YOLO network based on YOLOv8. First, in the Backbone and Neck sections, most of the CBS modules are replaced with MRFGConv modules to preserve fine-grained information, prevent detail loss, and solve the spatial attention feature sharing problem. Second, the Neck structure is reconstructed, replacing the original PAN-FPN structure with an AFPN structure. This generates feature pyramids of different resolutions through adaptive spatial fusion operations, promoting information interaction between non-adjacent layers and preventing the loss and degradation of feature information during transmission and interaction, thus successfully narrowing the semantic gap. Finally, the SCA attention mechanism is integrated into some C2f modules to construct a new C2f-SCA component. This not only fills the gap in YOLOv8's lack of attention mechanisms and enhances the model's focus on the target, but also effectively couples multi-scale features, balancing performance and efficiency.

[0044] 2. The progressive multi-scale attention perception field model system for infrared hazardous gas leak visualization and intelligent detection proposed in this invention can significantly improve the efficiency and accuracy of hazardous gas leak detection, while reducing reliance on manual operation, further improving safety performance, and adapting to complex and ever-changing actual industrial environments. Attached Figure Description

[0045] Figure 1 This is a flowchart of the intelligent visualization detection model algorithm for infrared hazardous gas leaks proposed in this invention.

[0046] Figure 2 This is a schematic diagram of the MRFGConv module proposed in this invention.

[0047] Figure 3 This is a schematic diagram of the SCA attention mechanism module proposed in this invention.

[0048] Figure 4 This is a schematic diagram of the AFPN structure proposed in this invention and its fusion with other mainstream features.

[0049] Figure 5 This is a schematic diagram of the structure of the GAS-YOLO model proposed in this invention, as well as a schematic diagram of the structure containing other small modules.

[0050] Figure 6 This is a heatmap of interest for the GAS-YOLO model proposed in this invention on an infrared gas leak image.

[0051] Figure 7 This is a curve comparing the training accuracy of the GAS-YOLO model proposed in this invention with that of YOLOv8n.

[0052] Figure 8 This is a module design diagram of the progressive multi-scale attention perception field model system for infrared hazardous gas leak visualization and intelligent detection designed in this invention. Detailed Implementation

[0053] The present invention will be further described in detail below with reference to the embodiments and accompanying drawings.

[0054] This invention proposes a progressive multi-scale attention-based sensing field model system for the intelligent visualization detection of infrared hazardous gas leaks, combining... Figures 1 to 8 Detailed explanation is as follows:

[0055] like Figure 1The flowchart shown is a presentation of the proposed intelligent detection model algorithm for infrared hazardous gas leak visualization in this invention. First, image data of gas leaks in real chemical scenarios is acquired using an infrared thermal imager, and the data is preprocessed. Second, the YOLOv8 model is improved to adapt to the task of detecting gas leaks in infrared images under complex scenarios. The improvements include partially replacing the CBS module with a multi-scale perceptual field convolution (MRFGConv) module to retain fine-grained information, prevent detail loss, and solve the problem of spatial attention feature sharing; a separable convolutional attention (SCA) module is designed and integrated into the C2f module, effectively coupling multi-scale features and balancing performance and efficiency; and an asymptotic feature pyramid network (AFPN) is used to reconstruct the original YOLOv8 Neck structure, promoting information interaction between non-adjacent levels, thereby achieving high-level feature fusion. Then, the image data is input into the improved YOLOv8 model at a fixed size of 640×640 for training and validation, and is tested using a test set to obtain the final detection model weight file. Finally, the final defect detection model is integrated into the intelligent detection system for hazardous chemical gas leaks. The infrared gas leak image data to be detected is input, and the results are output through forward calculation.

[0056] Specifically, the steps of the intelligent visual detection method for hazardous gas leaks based on the progressive multi-scale attention perception field model system proposed in this invention are as follows:

[0057] Step S1: Obtain the hazardous chemical gas leak dataset and divide it into training set, validation set and test set;

[0058] Step S2: Construct a hazardous chemical gas leak detection model based on the improved YOLOv8;

[0059] Step S3: Train the hazardous chemical gas leak detection model using the training set. The model maps the input data to the output space, generates prediction results, and obtains the final intelligent detection system for hazardous chemical gas leaks.

[0060] Step S4: Input the test set, local video, and images acquired in real time from the mobile device into the intelligent detection system for hazardous chemical gas leaks, and output the detection results.

[0061] Step S1 above specifically includes the following steps:

[0062] Step S11: Capture video of a gas leak in a real chemical scene using an infrared thermal imager;

[0063] Step S12: Manually annotate the target gas frame by frame in the acquired video using VOTT software;

[0064] Step S13: Divide the dataset into 101 different scenarios, and randomly divide them into training set, validation set and test set according to scenario and proportion.

[0065] Step S2 above specifically includes the following steps:

[0066] Step S21: Replace the CBS module in the YOLOv8 model with a multi-scale receptive field convolution (MRFGConv) module. This expands the receptive field during downsampling, enhances feature representation learning to some extent, extracts more detailed information, and solves the feature sharing problem. The formula for MRFGConv is as follows:

[0067] F = ReLU(Norm(c 3×3 (r(Ar×Fr))));

[0068]

[0069] Fr=ReLU(Norm(g 3×3 (X)));

[0070] In the formula, X represents the input feature map, and c k×k G represents a regular convolution with a kernel size of k×k. k×k represents a grouped convolution with a kernel size of k×k, Norm is the normalization operation, Fr represents the receptive field feature map after transformation using 3×3 grouped convolution, Ar is the sum of multi-scale spatial features extracted by grouped convolution with different kernel sizes, r is the reshaping operation, and F is the final output feature map.

[0071] like Figure 2 As shown, the MRFGConv module aims to obtain field-space attention weights for multi-scale receptive feature maps using three different kernel sizes: 3×3, 5×5, and 7×7. It employs a grouped convolution method with a 3×3 kernel size to extract receptive field spatial features, enhancing feature representation learning capabilities, obtaining more detailed information, and improving operational efficiency. Finally, when adjusting the feature map shape, a reshaping method is used to transform the feature map size from 9C×H×W to C×3H×3W. This operation ensures that the obtained receptive field spatial features are not duplicated. Therefore, the learned attention map aggregates the feature information of each receptive field slider, solving the problem of partial parameter sharing in convolution operations with large kernel sliding.

[0072] Step S22: Separable Convolutional Attention (SCA) is coupled into the C2f module to form a new C2f-SCA component. Through the unique design of multi-scale convolution and a global aggregation method based on new dimensions, this module effectively integrates multi-scale features, further enhancing feature representation capabilities and achieving a balance between performance improvement and computational efficiency. The formula for SCA is as follows:

[0073] Atten=SiLU(Norm(c 1×1 (Am+X * )));

[0074]

[0075] X * =spilt(X);

[0076] In the formula, X * This means dividing the input feature map X into two parts, c k×k denoted as a regular convolution with a kernel size of k×k, Norm represents the normalization operation, Am represents the multi-scale attention extracted by convolution kernels of different sizes, r is the reshaping operation, and F is the final output feature map;

[0077] like Figure 3 As shown. First, this invention divides the number of channels in the feature map into two parts. This division allows for parallel processing of the two parts at different stages of the network, improving computational efficiency and accelerating feature extraction. Second, utilizing a reshaping operation, this invention extracts a new dimension from a fragment of the feature map. Within this dimension, this invention uses two different sized convolutional kernels (3×3 and 5×5) for feature extraction, and then merges the results to generate a multi-scale feature space attention map. Next, this invention uses a reshaping method to restore the original shape and connects it to other parts that have not yet undergone convolutional operations. This step involves interaction with channel and spatial information, i.e., the aggregation of global information.

[0078] Step S23: Reconstruct the standard neck of YOLOv8 using a feature fusion network based on the Asymptotic Feature Pyramid Network (AFPN) structure. This helps to effectively fusion semantic information and reduce significant differences between non-adjacent levels, thus reducing the semantic gap. The formula for the adaptive spatial fusion operation (three layers) in AFPN is as follows:

[0079]

[0080] In the formula, the feature vector at position (i,j) from layer n to layer l is represented as: The final feature representation through multi-layer adaptive spatial fusion operation is as follows in The feature space weights of the three layers at level l are given by the following constraints.

[0081] Figure 4This diagram illustrates the AFPN structure used in this invention and other mainstream feature fusion network methods. By generating feature pyramids of different resolutions through adaptive spatial fusion operations, information interaction between non-adjacent layers is promoted, thereby preventing the loss and degradation of feature information during transmission and interaction, successfully narrowing the semantic gap. Due to the trend of gradually increasing target size during gas leaks, the features of leaks in infrared images are usually represented as small target features. However, the YOLOv8 model may lose information related to small targets during deep feature extraction, leading to the inability to detect gas leaks in a timely manner. To solve this problem, this invention optimizes the PAN-FPN structure of YOLOv8. By comprehensively considering and repeatedly utilizing multi-scale features, a more advanced feature fusion process is successfully implemented, thereby enhancing the feature representation capability of small-scale gas targets in infrared images.

[0082] Therefore, this invention proposes a GAS-YOLO network based on YOLOv8. For example... Figure 5 The diagram shows the structural schematic of the proposed GAS-YOLO model and its other sub-modules. First, in the Backbone and Neck sections, most of the CBS modules are replaced with MRFGConv modules to preserve fine-grained information, prevent detail loss, and address the spatial attention feature sharing problem. Second, the Neck structure is reconstructed, replacing the original PAN-FPN structure with an AFPN structure. This generates feature pyramids of different resolutions through adaptive spatial fusion operations, promoting information interaction between non-adjacent layers and preventing feature loss and degradation during transmission and interaction, successfully narrowing the semantic gap. Finally, the SCA attention mechanism is integrated into some C2f modules, creating a new C2f-SCA component. This not only fills the gap in YOLOv8's lack of attention mechanisms, enhancing the model's focus on the target, but also effectively couples multi-scale features, balancing performance and efficiency.

[0083] Step S3 above specifically includes the following steps:

[0084] Step S31: Perform forward propagation using the training set, input the image into the model, and obtain the prediction result;

[0085] Step S32: Calculate the loss between the model prediction and the true label using cross-entropy loss;

[0086] Step S33: Calculate the gradient of the loss function with respect to the model parameters using the backpropagation algorithm;

[0087] Step S34: Update the model's weights and biases using the gradient descent optimization algorithm;

[0088] Step S35: Repeat steps S31-S34, iterating multiple times until the model converges or reaches the predetermined number of training rounds.

[0089] To analyze the performance improvement of each strategy, ablation experiments were conducted to evaluate their effectiveness. For fair comparison, the same dataset and experimental environment were used in all experiments. YOLOv8-n was used as the baseline network, and the MRFGConv module, SCA attention mechanism, and AFPN structure were sequentially introduced to optimize and improve YOLOv8-n. The ablation experiment results are shown in Table 1. The mAP50 achieved by each improvement strategy reached 0.451, 0.459, and 0.472, respectively, all exceeding YOLOv8-n. Finally, the proposed GAS-YOLO model combining all improvement strategies achieved a precision of 0.671, a recall of 0.402, an mAP50 of 0.472, and an mAP50-95 of 0.227, while maintaining the low cost of YOLOv8-n, representing improvements of 3.3%, 2.7%, 5.8%, and 2.7% over YOLOv8-n, respectively.

[0090] Table 1: Ablation Experiment

[0091]

[0092] To verify the superiority of this invention over other advanced object detection algorithms, comparative experiments were added to study the performance of current mainstream object detection algorithms. The comparison results are shown in Table 2. The results show that the GAS-YOLO model of this invention has the highest mAP50-95, reaching 0.227. Overall, this model exhibits the best detection performance. It achieves a trade-off between detection accuracy and speed, and its model size is suitable for deployment on hardware modules, making it practically applicable.

[0093] Table 2: Performance Comparison of Each Model

[0094]

[0095] Figure 6 This is a heatmap of the GAS-YOLO model's attention to gas leakage in infrared images, as proposed in this invention. Brighter areas represent regions of greater interest to the network. Observation of the test results shows that, compared to YOLOv8-n, GAS-YOLO more accurately covers the area of ​​the gas target, and the coverage is brighter and more concentrated. Therefore, this further validates the MRFGConv module and SCA attention mechanism proposed in this invention, as well as the introduction of the recently proposed AFPN fusion concept, enabling the model to focus on the target more accurately, demonstrating the model's superiority in both efficiency and accuracy.

[0096] Figure 7This is a comparison curve of mAP50-95 and P during training between the GAS-YOLO model proposed in this invention and YOLOv8-n. As can be seen from the graph, the model proposed in this invention outperforms YOLOv8 in detection metrics after approximately 150 epochs of training, and it begins to stabilize after approximately 200 epochs. Compared to YOLOv8-n, the method of this invention has faster training speed and better detection performance.

[0097] Figure 8 This is a module design diagram of a progressive multi-scale attention-based perception field model system for infrared-based intelligent visualization detection of hazardous chemical gas leaks, as designed in this invention. The system utilizes the GAS-YOLO model based on deep learning and infrared images for hazardous chemical gas leak detection. In the detection parameter settings module on the page, the confidence threshold and intersection-over-union (IoU) threshold of the model can be set. For daily use, a confidence threshold of 0.25 and an IoU threshold of 0.7 are sufficient. For tasks where defects are difficult to detect smoothly, the exponents can be appropriately increased to make the model more sensitive. The detection result module can detect the number of targets in the image, the time taken, and their location in the image in real time, and directly display the gas leak detection effect on the left side of the interface. The operation module allows users to select a detection mode and upload local images and videos to achieve image and video detection. Selecting online video real-time detection requires the PC and mobile device to be on the same local area network. Then, the RTSP video stream network interface of the mobile device's camera is input to the PC to achieve real-time online detection, and the detection results are displayed on the page. The mobile device, such as a mobile phone, laptop, or tablet, executes the above method.

[0098] The above description is merely an example and illustration of the concept of the present invention. Those skilled in the art can make various modifications or additions to the specific embodiments described or use similar methods to replace them, as long as they do not deviate from the concept of the invention or exceed the scope defined in the claims, they should all fall within the protection scope of the present invention.

Claims

1. A progressive multi-scale attention perception field model system for infrared hazardous gas leakage visualization intelligent detection, characterized in that, include: The module includes a model selection module, a detection parameter adjustment module, an operation module, and a detection result output module; among which: Model selection module: It is used to select different detection models; Detection parameter adjustment module: It is used to adjust the sensitivity of gas leak detection for different scenarios; Operation module: It is used to select the file type to upload, which can be divided into uploading pictures, uploading videos, and transmitting videos captured by real-time online video on mobile devices; Detection result output module: It is used to input the images, videos and real-time captured videos uploaded to the system into the final detection model and output the detection results; The final detection model is a hazardous chemical gas leak detection model built based on an improved YOLOv8, specifically as follows: The CBS module in the YOLOv8 model is replaced with a multi-scale perceptual field convolutional module (MRFGConv). The formula for MRFGConv is as follows: ; In the formula, Represents the input feature map, Indicates the kernel size as Ordinary convolution, The representative convolution kernel is Grouped convolutions, where Norm is the normalization operation. Representative use Receptive field feature map after grouped convolution transformation It is the sum of multi-scale spatial features extracted through grouped convolutions with different kernel sizes. It is a reshaping operation. This is the final output feature map; Separable convolutional attention (SCA) is coupled to a portion of the C2f module to form a new C2f-SCA component. The formula for SCA is as follows: ; In the formula, This indicates that the input feature map will be used. Divide into two Indicates the kernel size as Ordinary convolution, This indicates a normalization operation. This represents multi-scale attention extracted from convolutional kernels of different sizes. It's a reshaping operation. This is the final output feature map; The standard neck of YOLOv8 is reconstructed using a feature fusion network based on the asymptotic feature pyramid network (AFPN) structure.

2. A method for visualizing and intelligently detecting hazardous gas leaks based on the progressive multi-scale attention-based field model system described in claim 1, characterized in that, The steps are as follows: Step S1: Obtain the hazardous chemical gas leak dataset and divide it into training set, validation set and test set; Step S2: Construct a hazardous chemical gas leak detection model based on the improved YOLOv8; Step S3: Train the hazardous chemical gas leak detection model using the training set. The model maps the input data to the output space, generates prediction results, and obtains the final intelligent detection system for hazardous chemical gas leaks. Step S4: Input the test set, local video, and images acquired in real time from the mobile device into the intelligent detection system for hazardous chemical gas leaks, and output the detection results. 3.The method of claim 2, wherein, Step S1 specifically includes the following steps: Step S11: Capture video of a gas leak in a real chemical scene using an infrared thermal imager; Step S12: Manually annotate the target gas frame by frame in the acquired video using VOTT software; Step S13: Divide the dataset into multiple different scenarios, and randomly divide them into training set, validation set and test set according to scenario and proportion. 4.The method of claim 3, wherein the method further comprises: The formula for the adaptive spatial fusion operation in the Asymptotic Feature Pyramid Network (AFPN) is as follows: In the formula, from the first layer to the first Layer position The eigenvector at that location is represented as The final feature representation through multi-layer adaptive spatial fusion operation is as follows: ,in , , express The feature space weights of the three layers of the level are subject to the following constraints: .

5. The intelligent visual detection method for hazardous gas leaks as described in claim 4, characterized in that, Step S3 specifically includes the following steps: Step S31: Perform forward propagation using the training set, input the image into the model, and obtain the prediction result; Step S32: Calculate the loss between the model prediction and the true label using cross-entropy loss; Step S33: Calculate the gradient of the loss function with respect to the model parameters using the backpropagation algorithm; Step S34: Update the model's weights and biases using the gradient descent optimization algorithm; Step S35: Repeat steps S31-S34, iterating multiple times until the model converges or reaches the predetermined number of training rounds.