Campus fire detection method and system

By employing a neural network model with switchable void ratio dynamic convolutional fusion and channel attention weighting in campus fire detection, the problem of detecting small flames and complex backgrounds was solved, achieving higher detection accuracy and lower false negative rate.

CN122244766APending Publication Date: 2026-06-19ANHUI POLYTECHNIC UNIV MECHANICAL & ELECTRICAL COLLEGE

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ANHUI POLYTECHNIC UNIV MECHANICAL & ELECTRICAL COLLEGE
Filing Date
2026-03-31
Publication Date
2026-06-19

Smart Images

  • Figure CN122244766A_ABST
    Figure CN122244766A_ABST
Patent Text Reader

Abstract

This invention discloses a campus fire detection method and system, belonging to the field of computer vision technology. The method includes: acquiring surveillance video images; inputting the images into an improved YOLOv5s network model for inference, wherein the backbone network employs a residual module based on switchable dilated convolution, dynamically fusing convolution outputs with different dilation rates through a switching function to adaptively adjust the receptive field; integrating an efficient channel attention module after the spatial pyramid pooling module, achieving local channel interaction through one-dimensional convolution to recalibrate feature responses; extending the P2 feature layer and adding an independent convolutional detector in the path aggregation network, fusing shallow high-resolution features with deep features for small target flame detection. This invention also provides a corresponding detection system. Through the synergistic effect of dynamic receptive field adaptation, lightweight channel recalibration, and shallow feature enhancement, this invention improves the detection capability of small target flames and is suitable for real-time fire monitoring on embedded platforms.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the technical field, and in particular relates to a method and system for detecting campus fires. Background Technology

[0002] Campus environments are characterized by dense buildings, concentrated populations, open spaces, and complex structures, resulting in high fire risks and potentially severe consequences. Traditional fire detection methods primarily rely on heat sensors or ionization smoke detectors, which collect physical signals in the environment (such as temperature and smoke concentration) and compare them with preset thresholds to determine if a fire has occurred. However, these sensors are easily affected by building obstructions, changes in lighting, airflow interference, and environmental clutter in open campus settings, leading to delayed detection responses and high false negative rates, making them unsuitable for campus security needs.

[0003] In recent years, deep learning-based visual target detection technology has made significant progress and has been gradually applied to the field of fire detection. For example, Shen et al. used YOLOv5s for fire detection; Saponara et al. designed an embedded fire monitoring system based on YOLOv2; Ding Jie et al. improved YOLOv3 to enhance its ability to extract dynamic flame patterns; and Wang Haiqun et al. proposed a lightweight flame detection algorithm based on YOLOv4. These methods have improved detection performance to some extent, but still have the following limitations: First, small flame targets have a high false negative rate because shallow feature maps are rich in detail but have a low semantic level, while deep feature maps have a high semantic level but low spatial resolution, and multi-scale fusion is not sufficient. Secondly, in complex contexts, flame features are easily confused with distracting objects of similar color, and existing models lack effective mechanisms to enhance the dynamic morphology and channel features of flames.

[0004] Therefore, this invention designs a campus fire detection method and system. Summary of the Invention

[0005] The purpose of this invention is to solve the problems in the prior art, and to propose a campus fire detection method and system.

[0006] This invention first discloses a method for detecting fires on campus, comprising the following steps: Acquire the surveillance video image to be detected; The image is input into a neural network model for inference, and the inference process includes: Multi-scale features are extracted through the backbone network, wherein at least one dynamic convolutional fusion operation based on switchable dilation rate is performed on the extracted features. For high-level features after pyramid pooling, perform channel attention weighting. Perform multi-scale feature fusion and detection inference, which includes a parallel inference branch specifically for small-sized flame detection. The generation process of this branch is as follows: obtain the first feature map from the shallowest layer of the backbone network, which has the highest spatial resolution among the multi-scale features. The first feature map is upsampled to make it have the same spatial size as the second feature map from a deeper layer of the network. The upsampled first feature map and the second feature map are added element by element and fused to generate a small target enhanced feature map; The enhanced feature map of the small target is input into an independent convolutional detector, which outputs the detection result for the small flame target; based on the inference result, it is determined whether there is a flame in the image and a comprehensive detection result is output.

[0007] In the aforementioned campus fire detection method, the dynamic convolutional fusion operation based on switchable porosity specifically refers to: For the input feature map, convolution with a first dilation rate and a second dilation rate greater than the first dilation rate are used to process it, respectively, to obtain the corresponding first branch feature map and second branch feature map; A first dynamic weight is assigned to the first branch feature map and a second dynamic weight is assigned to the second branch feature map through an input-related switching function, wherein the second dynamic weight is complementary to the first dynamic weight. The product of the first branch feature map and the first dynamic weight is added to the product of the second branch feature map and the second dynamic weight to obtain the fused output feature map.

[0008] In the aforementioned campus detection method, the calculation process of the channel weights in the channel attention weighting operation includes: Global average pooling is performed on the input features to obtain a one-dimensional channel description vector; The description vector is convolved using a one-dimensional convolution kernel, wherein the size of the one-dimensional convolution kernel is adaptively determined based on the total number of channels of the input features, and its value is an odd number that is closest to the sum of the logarithm of the total number of channels to base 2 plus a constant 1 divided by 2.

[0009] In the above campus detection method, the spatial resolution of the first feature map is twice that of the spatial resolution of the second feature map.

[0010] In the above campus detection method, the independent convolutional detector consists of at least one convolutional layer, and its parameters are different from those of the detector used to detect regular-sized targets in the neural network model.

[0011] In a second aspect, the present invention discloses an electronic device, comprising: Memory, used to store computer programs; A processor is used to execute the computer program to implement the campus fire detection method described above.

[0012] Thirdly, this invention discloses a campus fire detection system, comprising: The image acquisition module is used to acquire video streams from the monitored area. A processing module, connected to the image acquisition module, is used to receive and process the video stream, and the processing module includes the aforementioned electronic device; An alarm module, connected to the processing module, is used to trigger an alarm based on the fire detection results output by the processing module.

[0013] In the aforementioned campus fire detection system, the image acquisition module is a CSI interface camera, and the electronic device in the processing module is a Jetson TX2 embedded platform.

[0014] Fourthly, the present invention discloses a computer-readable storage medium having a computer program stored thereon, characterized in that the program, when executed by a processor, implements the above-mentioned campus fire detection method.

[0015] The beneficial effects of this invention are as follows: This application replaces the fixed receptive field convolution of the Bottleneck residual unit in the standard C3 module with a switchable dilated convolution unit, and dynamically weights and fuses the outputs of the two branches with dilation rates of 1 and 3 through a switching function. This design enables the network to adaptively select or fuse feature responses from different receptive fields based on the local content of the input feature map during inference. For non-rigid objects like flames with dynamically changing shapes, this mechanism allows the network to select a large receptive field in the flame edge region to capture contextual information and a small receptive field in the flame core region to preserve detailed texture, thereby improving the model's adaptability to changes in flame shape and enhancing the targeting of feature extraction.

[0016] This application integrates the ECA module after the SPP module to recalibrate the channels of high-level semantic features after multi-scale pooling. The ECA module uses one-dimensional convolutions instead of fully connected layers to achieve local channel interactions, avoiding the information loss caused by dimensionality reduction operations in SENet, while having extremely low parameter count. This design enables the network to enhance the feature channel responses related to flames and suppress the influence of complex backgrounds (such as similarly colored interfering objects) on the detection results without significantly increasing the computational burden, thereby improving the model's feature discrimination ability in complex scenes.

[0017] This application extends the P2 feature layer in PANet, introducing high-resolution feature maps from the shallowest layer of the backbone network into the detection process. These maps are then upsampled and fused with deeper feature maps before being input into a newly added independent convolutional detector. This design preserves and specifically utilizes small flame details (such as color, edges, and texture) that might otherwise be lost in deeper features. The new detector works in parallel with the original detector, forming an independent detection pathway specifically for small flame targets. This compensates for the original network's insufficient ability to capture small target features, thereby improving the model's ability to detect small flames in the early stages of a fire. Attached Figure Description

[0018] Figure 1 This is a flowchart of a campus fire detection method disclosed in this invention.

[0019] Figure 2 This is a schematic diagram of a campus fire detection system disclosed in this invention.

[0020] Figure 3 This is a schematic diagram of the structure of SAC_C3 in a campus fire detection system disclosed in this invention.

[0021] Figure 4 This is a schematic diagram of the ECA module.

[0022] Figure 5 The structure diagram of the improved flame detection algorithm.

[0023] Figure 6 This is a schematic diagram of the experimental site and equipment operation.

[0024] Figure 7 A comparison chart of mAP curve results to improve the model.

[0025] Figure 8 Comparison chart of Loss curve results for the improved model.

[0026] Figure 9 The results show the comparative test effects of detection in four campus scenarios.

[0027] Figure 10 The results show the comparison of multi-point flame detection in a campus setting.

[0028] Figure 11 Test results of simulated surveillance layout in a campus setting Detailed Implementation To facilitate understanding of this application and to make the aforementioned objectives, features, and advantages of this application more apparent, a detailed description of specific embodiments of this application is provided below in conjunction with the accompanying drawings. Numerous specific details are set forth in the following description to provide a thorough understanding of this application, and preferred embodiments are shown in the accompanying drawings. However, this application can be implemented in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of the disclosure of this application. This application can be implemented in many other ways different from those described herein, and those skilled in the art can make similar modifications without departing from the spirit of this application; therefore, this application is not limited to the specific embodiments disclosed below. Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this application, "a plurality of" means at least two, such as two, three, etc., unless otherwise explicitly specified. In the description of this application, "several" means at least one, such as one, two, etc., unless otherwise explicitly specified. It should be noted that when an element is referred to as being "fixed to" another element, it can be directly attached to the other element or there may be an intervening element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or there may be an intervening element. The terms "vertical," "horizontal," "left," "right," and similar expressions used herein are for illustrative purposes only and do not represent the only possible implementations. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is only for describing particular implementations and is not intended to limit the scope of this application. The term "and / or" as used herein includes any and all combinations of one or more of the associated listed items.

[0029] Reference Figure 1-11 A campus fire detection method includes the following steps: Acquire the surveillance video image to be detected; The image is input into a neural network model for inference, and the inference process includes: Multi-scale features are extracted through the backbone network, wherein at least one dynamic convolutional fusion operation based on switchable dilation rate is performed on the extracted features. For high-level features after pyramid pooling, perform channel attention weighting. Perform multi-scale feature fusion and detection inference, which includes a parallel inference branch specifically for small-sized flame detection. The generation process of this branch is as follows: obtain the first feature map from the shallowest layer of the backbone network, which has the highest spatial resolution among the multi-scale features. The first feature map is upsampled to make it have the same spatial size as the second feature map from a deeper layer of the network. The upsampled first feature map and the second feature map are added element by element and fused to generate a small target enhanced feature map; The enhanced feature map of the small target is input into an independent convolutional detector, which outputs the detection result for the small flame target; based on the inference result, it is determined whether there is a flame in the image and a comprehensive detection result is output.

[0030] The neural network model can adopt the YOLOv5 network model structure, which is mainly composed of three parts: the backbone network, the PANet neck network, and the head detection network. The part used for feature extraction is called the backbone network, which is the core component. Its main function is to extract key features from the input image for use by the subsequent head detection part. The backbone network consists of multiple modules, primarily including the Focus module, Conv convolutional layers, the C3 cross-stage local network, and the SPP spatial pyramid pooling module. In the YOLOv5 framework, the C3 structure, also known as the Bottleneck CSP structure, is the core component of the model, its role being to promote the improvement of information and gradient flow while rationally reusing features. In the YOLOv5 network model architecture, the neck network is responsible for processing the feature maps extracted from the backbone network. The extracted feature information is fused and processed to provide a more comprehensive and robust feature description to the prediction layer, aiming to enhance the accuracy of target detection.

[0031] The prediction network head is the core part of the Yolov5 network model, responsible for generating the final prediction results. Its main function is to process the feature maps, predicting the location, class, and confidence score of each bounding box.

[0032] The dynamic convolutional fusion operation based on switchable dilation rate is specifically as follows: For the input feature map, convolution with a first dilation rate and a second dilation rate greater than the first dilation rate are used to process it, respectively, to obtain the corresponding first branch feature map and second branch feature map; A first dynamic weight is assigned to the first branch feature map and a second dynamic weight is assigned to the second branch feature map through an input-related switching function, wherein the second dynamic weight is complementary to the first dynamic weight. The product of the first branch feature map and the first dynamic weight is added to the product of the second branch feature map and the second dynamic weight to obtain the fused output feature map.

[0033] In practical implementation, existing technologies typically improve model recognition accuracy by increasing the number of layers in the detection model. However, excessively increasing the number of layers may lead to performance degradation. To address this issue, this solution introduces a residual structure into the network model. This structure avoids the dilemma of excessive computational resource consumption and stagnation in accuracy caused by excessively deep model layers, achieving a good balance between accuracy and processing speed. To enhance the feature extraction performance of the model's backbone, this paper improves and studies the residual module based on Switchable Dilated Convolution (SAC). The mathematical expression of the SAC module is shown below:

[0034] In this expression, S(x) is a dynamic switching function. and These are the convolution outputs with dilatations of 1 and 3, respectively. For example... Figure 2 The SAC architecture used in this chapter is shown. SAC employs two convolutional kernels with different dilation rates, including 3×3 dilated convolutions with dilation rates of 1 and 3, to improve the perceptual range of features. This architecture mainly consists of global context and global context modules on both sides, and the central SAC structural module. The input flame features are processed by the global context module and an average pooling layer convolution to calculate the switching function S(x). Through convolution processing with different dilation rates, corresponding features D1 and D2 are generated. The switching functions are multiplied to obtain T1 and T2, which are then fused. After global context processing, the output is improved, contributing to the stable transformation of the switching function. The SAC convolutional unit optimizes the C3 module, forming the improved structure SAC_C3. This modification process includes integrating switchable dilated convolutions into the C3 module, thereby allowing the SAC unit to replace the convolutional part of the Bottleneck residual module in C3, thus developing a new SAC_Bottleneck unit, as detailed below. Figure 3 As shown.

[0035] Furthermore, in the channel attention weighting operation, the calculation process of the channel weights includes: Global average pooling is performed on the input features to obtain a one-dimensional channel description vector; The description vector is convolved using a one-dimensional convolution kernel, wherein the size of the one-dimensional convolution kernel is adaptively determined based on the total number of channels of the input features, and its value is an odd number that is closest to the sum of the logarithm of the total number of channels to base 2 plus a constant 1 divided by 2.

[0036] In this scheme, the above steps are achieved using the efficient channel attention mechanism ECA. The core of ECA is an improved strategy based on SENet, employing a non-dimensionality reduction approach for rapid interaction of local information between channels. It also improves performance by adaptively selecting the size of the one-dimensional convolutional kernel, enabling the network to focus on more valuable feature information over a wider area. The basic structure of the ECA module is as follows: Figure 4 As shown. Its main process starts from the input image. First, one-dimensional feature information is obtained through global average pooling (GAP) processing. Then, feature information interaction is performed without reducing dimensionality. Subsequently, a value of k is set to determine the cross-channel local information interaction between a channel and its k neighboring channels. Finally, the corresponding output feature map is generated by multiplying the weights with the input features. .

[0037] The ECA module efficiently captures feature information from the attention channels while reducing model complexity. By employing pooling techniques to process convolutional layers, it simplifies and optimizes the network architecture and improves learning efficiency without reducing the network channel size. This is achieved by promoting interactions between local channels and increasing parameters that share network information. The formula for calculating channel weights in the ECA module is shown below:

[0038] In this formula, Represents the k-th domain channel ,in, This represents the channel feature information after pooling. For activation functions; This represents the channel's weight value.

[0039] In the above formula, to prevent duplicate cross-checks, the selection of the k value was optimized. By adjusting the value of k, the range between channels can be determined. The effective range of this parameter increases proportionally with the total dimension C of the channels. The relationship between C and the k value is shown in the following formula:

[0040] The value of k is determined by an adaptive function based on the total number of channels. k refers to the number of neighboring channels, and C is the total number of channels. The specific formula for calculating the value of k is shown below:

[0041] in This refers to the odd number closest to x. Experimental verification of the above formula shows that the experimental effect is... =2, The ideal condition is when =1, and the above formula can be transformed into the following form:

[0042] After a series of experimental comparisons, this paper concludes that integrating the attention mechanism into the SPP module backend achieves the best performance. A schematic diagram of the Yolov5s network model improved by integrating the SAC-integrated Bottleneck residual module and the ECA attention mechanism is shown below. Figure 5 As shown.

[0043] Furthermore, the independent convolutional detector consists of at least one convolutional layer, whose parameters differ from those of the detector used in the neural network model for detecting targets of normal size.

[0044] This is because in the early stages of a fire, the flames are small and spread slowly, making this a critical period for early warning and firefighting. Therefore, this solution focuses on smaller flame targets in a campus setting, designing a small-target flame detection module that leverages the numerous detailed features of the flames in lower convolutional layers to improve the model's ability to detect detailed features of smaller flame targets, such as... Figure 6 As shown in the red box, this paper presents a schematic diagram of the small target flame detection module. Compared to existing technologies, the upsampling and downsampling steps in the PANet neck network introduce a feature enhancement path to the P2 layer. In the detection results, a novel detector, Conv2d, is also designed to detect smaller flame targets, thereby optimizing the model's overall ability to identify and extract lower-level flame features.

[0045] Secondly, this solution provides a campus fire detection system, including: The image acquisition module is used to acquire video streams from the monitored area. A processing module, connected to the image acquisition module, is used to receive and process the video stream. The processing module includes the electronic device described above. An alarm module, connected to the processing module, is used to trigger an alarm based on the fire detection results output by the processing module.

[0046] In one feasible embodiment, the system uses a CSI camera as the image acquisition module, connects to a detection display screen via HDMI to display real-time detection values, and uses a Jetson TX2 as the processing module.

[0047] In summary, this system includes: an improved Bottleneck residual module based on SAC fusion, an improved backbone network with ECA attention mechanism, and a designed small target detection layer. Data comparison results with other existing neural networks are shown in Table 1 and... Figures 7 to 8 As shown: Table 1: Data Comparison Between the Improved Model and the Existing Technical Model

[0048] Comparing the original algorithm with the ablation experiments, it can be seen that the improved algorithm model shows improvements in both the mean accuracy (mAP) and precision curves compared to the experimental algorithm. After training up to 50 epochs, the loss value typically converges rapidly to a relatively small value, then decreases relatively slowly until it begins to converge gradually around 250 epochs. Compared to the original Yolov5s and two other ablation experiments, the improved algorithm initially converges faster, and the improved convolutional neural network also has a lower loss convergence value. These data demonstrate that the improved model is trained more effectively. Furthermore, compared to the original algorithm and the ablation experiment algorithm, the improved algorithm shows an approximately 3.5% improvement in mean detection accuracy (mAP), indicating that the improved detection model achieves a significant improvement in accuracy compared to the original algorithm.

[0049] In the simulation experiment, the on-site test simulated the detection task in a real-world scenario. An open grassy area and a complex electric vehicle shed background were selected on campus for on-site testing to verify the simulation's realism. The aforementioned campus flame detection platform was deployed in the campus setting for experiments. Under the premise of ensuring a safe environment, ignition experiments were conducted manually. The experimental site and equipment operation are as follows. Figure 8 As shown, after the experimental setup was deployed, a field camera was connected to perform real-time detection of the video stream address. The detection results from the connected camera are as follows. Figure 9 As shown, there are four scenarios, including two scenarios using the conventional Yolov5s solution and two scenarios using the improved Yolov5s solution. The data is as follows: Table 2: Comparison of model detection results in different scenarios

[0050] The experimental results show that, in the selected scenarios of grassland and electric vehicle shed on campus, the improved algorithm proposed in this paper demonstrates a certain improvement in detection accuracy compared to the original algorithm. Figure 9 In the table, (a) and (c) are the detection results of the original algorithm, and (b) and (d) are the detection results of the improved flame detection algorithm. Compared with the original detection algorithm, the improved campus flame detection algorithm proposed in this paper improves the detection accuracy by an average of about 3%.

[0051] Table 3: Comparison data of multi-point flame detection in campus scenes

[0052] Figure 10 (e)~(h) show the comparison of multi-point flame detection in the experimental grass and electric vehicle shed scenarios. (e)(g) shows the detection results of the original algorithm, and (g)(h) shows the detection results of the improved flame detection algorithm. The flame detection results show that in multi-point flame detection tasks in different scenarios, the original algorithm has low accuracy in detecting small target multi-point flames and misses some detections. The analysis hypothesis is that this is because the original Yolov5s model has a low ability to extract features of small target flames in its feature extraction network, resulting in the loss of continuous small target feature information and poor performance in real-world scenarios. The improved campus flame detection model, by designing a small target detection layer, increases the network's ability to extract features of small target flames, significantly improving the detection capability of small target flames in the scene. Multiple sets of experiments comparing with the above experiments verify that the improved flame detection algorithm in this paper has higher accuracy in detecting small target flames.

[0053] As above Figure 11 As shown in (i) and (j), these are the test results of the campus flame detection model designed in this paper simulating real-time flame detection in a campus security monitoring scene. Based on the test results, it can be concluded that when deployed in monitoring scenarios prone to fire, such as near carports or grassy areas, the model can effectively detect fires in real time. This proves that the model designed in this paper can meet the needs of flame detection in actual campus monitoring applications. Furthermore, to verify the flame detection performance of the model after deployment on an embedded platform, the deployment results of the proposed campus flame detection model and the original algorithm model are shown in Table 3 below: Table 3: Comparison of detection performance of different models on the Jetson TX2 embedded platform

[0054] As is known from common technical knowledge, this invention can be implemented through other embodiments that do not depart from its spirit or essential characteristics. Therefore, the disclosed embodiments described above are merely illustrative and not exhaustive. All modifications within the scope of this invention or its equivalents are included in this invention.

Claims

1. A method for detecting fires on campus, characterized in that, Includes the following steps: Acquire the surveillance video image to be detected; The image is input into a neural network model for inference, and the inference process includes: Multi-scale features are extracted through the backbone network, wherein at least one dynamic convolutional fusion operation based on switchable dilation rate is performed on the extracted features. For high-level features after pyramid pooling, perform channel attention weighting. Perform multi-scale feature fusion and detection inference, which includes a parallel inference branch specifically for small-sized flame detection. The generation process of this branch is as follows: obtain the first feature map from the shallowest layer of the backbone network, which has the highest spatial resolution among the multi-scale features. The first feature map is upsampled to make it have the same spatial size as the second feature map from a deeper layer of the network. The upsampled first feature map and the second feature map are added element by element and fused to generate a small target enhanced feature map; The enhanced feature map of the small target is input into an independent convolutional detector, which outputs the detection result for the small flame target; based on the inference result, it is determined whether there is a flame in the image and a comprehensive detection result is output.

2. The campus fire detection method according to claim 1, characterized in that, The dynamic convolutional fusion operation based on switchable dilation rate is specifically as follows: For the input feature map, convolution with a first dilation rate and a second dilation rate greater than the first dilation rate are used to process it, respectively, to obtain the corresponding first branch feature map and second branch feature map; A first dynamic weight is assigned to the first branch feature map and a second dynamic weight is assigned to the second branch feature map through an input-related switching function, wherein the second dynamic weight is complementary to the first dynamic weight; The product of the first branch feature map and the first dynamic weight is added to the product of the second branch feature map and the second dynamic weight to obtain the fused output feature map.

3. The campus fire detection method according to claim 1, characterized in that, In the channel attention weighting operation, the calculation process of the channel weights includes: Global average pooling is performed on the input features to obtain a one-dimensional channel description vector; The description vector is convolved using a one-dimensional convolution kernel, wherein the size of the one-dimensional convolution kernel is adaptively determined based on the total number of channels of the input features, and its value is an odd number that is closest to the sum of the logarithm of the total number of channels to base 2 plus a constant 1 divided by 2.

4. The campus fire detection method according to claim 1, characterized in that, The spatial resolution of the first feature map is twice that of the spatial resolution of the second feature map.

5. The campus fire detection method according to claim 1, characterized in that, The independent convolutional detector consists of at least one convolutional layer, and its parameters are different from those of the detector used in the neural network model for detecting targets of normal size.

6. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor for executing the computer program to implement the campus fire detection method as described in any one of claims 1 to 5.

7. A campus fire detection system, characterized in that, include: The image acquisition module is used to acquire video streams from the monitored area. A processing module, connected to the image acquisition module, is used to receive and process the video stream, and the processing module includes the electronic device as described in claim 6; An alarm module, connected to the processing module, is used to trigger an alarm based on the fire detection results output by the processing module.

8. The campus fire detection system according to claim 7, characterized in that, The image acquisition module is a CSI interface camera, and the electronic device in the processing module is a Jetson TX2 embedded platform.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the campus fire detection method as described in any one of claims 1 to 5.