Fire event detection method based on brain-like visual deconvolution coding and decoding model

By processing fire video images using a brain-like visual deconvolution encoding and decoding model, the real-time and accuracy issues of fire detection in complex scenarios are solved, enabling efficient detection of small target fires and obstructed fires, and meeting the requirements for rapid response in fire early warning.

CN121305447BActive Publication Date: 2026-06-19TIANJIN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TIANJIN UNIV
Filing Date
2025-11-21
Publication Date
2026-06-19

Smart Images

  • Figure CN121305447B_ABST
    Figure CN121305447B_ABST
Patent Text Reader

Abstract

This invention relates to a fire event detection method based on a neuromorphic visual deconvolutional encoding and decoding model. The method includes the following steps: performing neuromorphic visual preprocessing on each frame of acquired fire video images; constructing a neuromorphic visual deconvolutional encoding and decoding model, which includes a neuromorphic visual hierarchical convolutional sparse encoder and a multi-scale deconvolutional feature decoder; the neuromorphic visual hierarchical convolutional sparse encoder includes four convolutional coding layers simulating the human brain's visual cortex V1-V4, progressively extracting features from low-level to high-level features from the input single-frame image; the multi-scale deconvolutional feature decoder upsamples and reconstructs the encoded features; and end-to-end training using a composite loss function to obtain a candidate fire event probability feature map, thereby determining the final fire event. This method ensures real-time detection and meets the response requirements of fire early warning systems.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of fire detection and computer vision technology, specifically to a fire event detection method and system based on a neuromorphic visual deconvolution encoding and decoding model. Background Technology

[0002] Fire is one of the major disasters threatening public safety. Timely and accurate fire detection is crucial for reducing casualties and property losses. Therefore, fire detection technology has always been a research hotspot in the field of public safety and security. Currently, fire detection technology mainly includes two aspects: sensor-based detection methods and deep learning computer vision-based detection methods. Traditional sensor-based detection methods rely on physical devices such as temperature sensors, smoke sensors, and recorders to collect environmental signals at the fire scene. Their drawbacks include limited spatial coverage, susceptibility to interference from environmental factors such as dust and humidity, and the inability to directly obtain the visual characteristics of the fire (such as flame shape and smoke diffusion process), making it difficult to meet the real-time and accurate detection needs in complex scenarios. Deep learning computer vision-based detection methods use general object detection models such as convolutional neural networks (CNN) and YOLO to identify the visual features of flames and smoke in fire images or videos. However, these methods lack brain-like hierarchical perception capabilities, cannot simulate the adaptive processing of human vision for dynamic targets, and are insufficient in extracting features from small-scale fires or fires obscured by smoke. Furthermore, in complex fire scenarios, traditional computer vision methods do not pay enough attention to key features and are easily affected by background interference, leading to false fire detections. Specifically, if there are many "fire-like" distractors in the scene, whose color and texture are highly similar to flames, the computer vision model is prone to misidentifying these distractions as fires. In addition, traditional model training often uses a single loss function, which struggles to account for the dynamic and textural features of fires, resulting in insufficient detection accuracy and robustness. Therefore, to improve detection accuracy, traditional computer vision models need to stack a large number of convolutional layers to achieve accurate extraction of fire features, which increases the number of model parameters, computational complexity, and inference speed. In scenarios requiring real-time response, a model becomes useless if inference latency is too high; conversely, simplifying the model to pursue speed reduces accuracy, creating a dilemma between accuracy and speed.

[0003] The bottleneck of traditional computer vision methods lies in the insufficient synergy between encoding and decoding; that is, encoding lacks adaptive perception capabilities, and decoding lacks precise reconstruction and feature selection capabilities. In recent years, neuromorphic computing models have rapidly developed in many fields. By simulating the pulse transmission mechanism of neurons in the human brain, they have demonstrated unique advantages in low power consumption, temporal information processing, and hierarchical perception, thus compensating for the perception shortcomings in the encoding stage. Deconvolution technology, through targeted upsampling reconstruction, can solve the problem of insufficient feature restoration in the traditional decoding stage. The combination of these two technologies provides a new path for breakthroughs in fire detection technology. Currently, fire detection technology based on neuromorphic computing is still in its early stages. How to deeply integrate neuromorphic perception with deconvolutional decoding, achieving a balance between lightweight design and high accuracy, remains an urgent problem to be solved. Summary of the Invention

[0004] To address the shortcomings of existing technologies, this invention aims to provide a fire event detection method based on a neuromorphic visual deconvolutional encoding / decoding model. The method involves performing neuromorphic visual preprocessing on each frame of fire video images captured by surveillance cameras, constructing a neuromorphic visual hierarchical convolutional sparse encoder that simulates the functions of the human brain's visual cortex (V1-V4), and progressively extracting low-level to high-level features from the input single-frame image. The encoded features are then upsampled and reconstructed using a multi-scale deconvolutional feature decoder. The encoding / decoding model is trained end-to-end using a composite loss function. The candidate fire event probability feature map output from the decoder is verified based on probability thresholds and time series continuity to determine the fire event. This invention effectively enhances fire features while suppressing noise, improving feature recognition in complex fire environments. It also solves the problem of existing methods struggling to detect small-target fires and occluded fires. Furthermore, the lightweight model design ensures real-time detection, meeting the response requirements of fire early warning systems.

[0005] The technical solution adopted by the present invention to solve the aforementioned technical problem is as follows:

[0006] In a first aspect, the present invention provides a fire event detection method based on a brain-like visual deconvolutional encoding and decoding model, the detection method comprising the following steps:

[0007] Brain-like visual preprocessing was performed frame by frame on the acquired fire video images;

[0008] A brain-like visual deconvolutional encoding and decoding model is constructed, which includes a brain-like visual hierarchical convolutional sparse encoder and a multi-scale deconvolutional feature decoder.

[0009] The brain-like visual hierarchical convolutional sparse encoder includes four convolutional coding layers that simulate the V1-V4 layers of the human brain's visual cortex, enabling the stepwise extraction of low-level features from high-level features from the input single-frame image.

[0010] The multi-scale deconvolutional feature decoder includes a deconvolutional decoding network with a spatial-channel dual-gated mechanism to upsample and reconstruct the encoded features;

[0011] The brain-like visual deconvolutional encoding and decoding model is trained end-to-end using a composite loss function to obtain the probability feature map of candidate fire events. Then, the location that has been verified by time series continuity is selected as the final fire event based on the probability feature map of candidate fire events using a probability threshold.

[0012] The composite loss function Includes semantic loss Dynamic loss and texture loss The expression is:

[0013] (13)

[0014] in, , , All three are weighting coefficients and satisfy the following conditions: This is used to balance the contribution of the three types of core losses; For sparse regularization coefficients, For the first The first convolutional coding layer The sparse codes corresponding to each convolutional kernel.

[0015] Secondly, the present invention provides a fire event detection system based on a neuromorphic visual deconvolutional encoding and decoding model, the system comprising:

[0016] The video frame acquisition module is used to acquire video image data of the fire scene;

[0017] The dynamic response adjustment module is used to adjust the pixel response of the image obtained by the video image acquisition module through a nonlinear function, thereby enhancing the identification of fire features under low light conditions.

[0018] The side-suppression noise reduction module is used to calculate the grayscale difference of the pixel response after the dynamic response adjustment module. It removes environmental noise by suppressing redundant information and preserves the edge details of flame and smoke phenomena.

[0019] The neuromorphic visual hierarchical convolutional sparse encoder includes four convolutional coding layers corresponding to the V1-V4 layers of the human visual cortex. The image frames output by the lateral suppression and noise reduction module are used as the input of the V1 convolutional coding layer. In each convolutional coding layer, deformable convolution is used to assign an independent two-dimensional offset to each sampling point of the convolutional kernel to correct the sampling position to accurately cover the fire target area and obtain the output features of each convolutional kernel. Then, the sparse code corresponding to all convolutional kernels of the convolutional coding layer is obtained by combining the baseline. The sum of the sparse codes corresponding to all convolutional kernels of the convolutional coding layer is used as the output of the next convolutional coding layer.

[0020] The multi-scale deconvolutional feature decoder includes a channel gating module and a spatial gating module. The output of the last convolutional coding layer of the neuromorphic visual hierarchical convolutional sparse encoder is used as the input of the multi-scale deconvolutional feature decoder. The channel gating module and the spatial gating module are applied to calculate the channel gating weight and spatial gating weight of the current layer, respectively. Then, the input of the corresponding layer of the multi-scale deconvolutional feature decoder is weighted with the channel gating weight and spatial gating weight. Finally, through the deconvolution operation, an input of the next layer of the decoder is obtained, and the output of the last layer of the decoder is used as the candidate fire event probability feature map.

[0021] The fire event determination and early warning module is used to determine candidate fire pixels in the candidate fire event probability feature map according to a set probability threshold, then determine candidate fire event regions based on connected components, and calculate the area of ​​the candidate fire event regions. If the area of ​​the candidate fire event regions in consecutive frames is not less than the area ratio threshold of the input feature map of the corresponding frame based on the neuromorphic visual deconvolutional encoding and decoding model, and the area growth rate is within a certain range, the determination and early warning module is used to determine the fire event candidate region. If the fire is detected within the designated area, a fire incident will be confirmed and an alert will be triggered.

[0022] Furthermore, the calculation formula for the dynamic response adjustment module is:

[0023] (1)

[0024] in, Coordinates The original brightness value of the pixel at that location; The baseline brightness for the scene is the average brightness of all pixels in the entire frame. This is the response sensitivity coefficient; The pixel response is adjusted for dynamic response.

[0025] Furthermore, the calculation formula for the side-suppression noise reduction module is:

[0026] (2)

[0027] in, To suppress intensity, For pixels The set of 8 neighboring pixels, This is the original brightness value of the pixel at the neighboring pixel (a,b).

[0028] Furthermore, the convolution kernel size of the neuromorphic visual hierarchical convolutional sparse encoder is 3x3, and the number of convolution kernels in the convolutional coding layer is 3; the deconvolution kernel size of the first three decoding layers of the multi-scale deconvolutional feature decoder is 2x2, and the deconvolution kernel size of the last decoding layer is 1x1.

[0029] Image frames processed by the side suppression noise reduction module Tagging is performed to obtain accurate fire images. ;

[0030] The area ratio threshold is 0.35, and consecutive frames refer to ten consecutive frames.

[0031] Furthermore, the process of calculating the offset using deformable convolution is as follows:

[0032]

[0033] in, Input feature map to convolutional coding layer Local feature maps determined by the kernel size; Learn weights for the offset; This is a bias term.

[0034] Furthermore, the channel gating module is used to calculate the mutual information of each feature channel in the feature map, filter feature channels that are strongly correlated with fire, and generate channel gating weights. The specific calculation process is as follows:

[0035] (8)

[0036] in, For the multi-scale deconvolution feature decoder The input feature map of the layer; This is a global average pooling operation. and All are multi-scale deconvolutional feature decoders. Layer channel gating parameters, It is the sigmoid function;

[0037] The spatial gating module is used to calculate the spatial information entropy of the feature map, locate fire candidate regions, and generate spatial gating weights. The specific process is as follows:

[0038] (9)

[0039] in, and All are multi-scale deconvolutional feature decoders. The spatial gating parameters of the layer, Based on The calculated spatial information entropy;

[0040] For the multi-scale deconvolution feature decoder Input feature map of the layer Weighted channel gating and spatial gating yield a weighted feature map:

[0041] (10)

[0042] Weighted feature map Using the deconvolution operation, we obtain the... Output feature map of the layer :

[0043] (11)

[0044] in, For the multi-scale deconvolution feature decoder The deconvolution kernel of the layer, This is a deconvolution operation.

[0045] Furthermore, the process of determining candidate regions for fire events based on connected components involves setting a probability threshold. If the probability of a certain point in the candidate fire event probability feature map output by decoding... Then the point is determined to be a candidate pixel for fire, if it is in a connected region of the fire event probability feature map. Number of candidate pixels for internal fire Then this connected component Defined as a candidate area for fire events.

[0046] Compared with the prior art, the beneficial effects of the present invention are:

[0047] (1) Compared with the commonly used data preprocessing methods, the brain-like visual preprocessing module (including the dynamic response adjustment module and the lateral inhibition noise reduction module) effectively enhances the identification of fire features, suppresses environmental noise, and adapts to complex fire scene conditions through dynamic response adjustment and lateral inhibition mechanisms.

[0048] (2) The YOLO series models most commonly used in the field of computer vision use a fixed receptive field, which generally has a low accuracy in detecting small target fires of ≤50x50 pixels and is difficult to handle smoke-covered scenes. This invention combines deformable convolution with adaptive matching of fire scale in a brain-like hierarchical sparse coding network to fully learn fire features and achieve adaptive extraction of fire features at different scales, providing a new approach to the problem of detecting small target fires and occluded fires.

[0049] (3) Models such as YOLO series and Faster R-CNN lack targeted attention mechanisms and are easily affected by background interference such as red objects and lights. This invention strengthens the key features of fire by using a space-channel dual gating mechanism (channel gating module and space gating module) and combines a multi-feature fusion loss function to take into account semantic, dynamic and texture features, effectively reducing the false alarm rate.

[0050] (4) The “black box” models such as YOLO series and Faster R-CNN cannot explain the source of the detection results. This invention constructs data preprocessing, hierarchical encoding and deconvolution decoding based on the brain-like vision principle. The learned convolution kernel can directly correspond to the physical feature pattern of fire (such as the high-frequency texture kernel of flame and the dynamic kernel of smoke diffusion). Moreover, the sparse code can clearly indicate the occurrence time and intensity of the fire event, providing a traceable decision basis for the detection results, and achieving both interpretability and performance.

[0051] (5) The present invention adopts a brain-like hierarchical simplified structure through a brain-like visual hierarchical convolutional sparse encoder. Under the same hardware conditions, the average detection latency is lower than that of the YOLO series models, while maintaining high detection accuracy. It perfectly balances "fast" and "accurate" and meets the rapid response requirements of fire early warning. Attached Figure Description

[0052] Figure 1 The flowchart illustrates a fire event detection method based on a neuromorphic visual deconvolutional encoding and decoding model provided by this invention.

[0053] Figure 2 The diagram shows the brain-like visual hierarchical convolutional coding network structure in the fire event detection method based on a brain-like visual deconvolutional coding and decoding model provided by this invention.

[0054] Figure 3 The diagram shows the multi-scale deconvolution decoding network structure in a fire event detection method based on a brain-like visual deconvolution encoding and decoding model provided by this invention. Detailed Implementation

[0055] The present invention will be further explained below with reference to the embodiments and accompanying drawings, but this is not intended to limit the scope of protection of this application.

[0056] This invention provides a fire event detection method based on a neuromorphic visual deconvolutional encoding and decoding model, the method comprising the following steps:

[0057] Video frame acquisition and brain-like visual preprocessing: Continuous video frames of the target scene are acquired through a surveillance camera. Brain-like visual processing is performed on each frame. Image pixel characteristics are modeled based on the natural exponential family distribution. The preprocessing unit consists of a dynamic response adjustment module and a lateral suppression noise reduction module. These two modules are used to preprocess the acquired fire video images through dynamic response adjustment and lateral suppression noise reduction mechanisms.

[0058] Dynamic response adjustment simulates the adaptive gain encoding mechanism of retinal ganglion cells, using a nonlinear function to adjust the pixel response of the image, enhancing the recognizability of fire features under low light conditions. The pixel response after dynamic response adjustment is shown in the figure. It can be represented as:

[0059] (1)

[0060] in, Coordinates The original brightness value of the pixel at that location. The baseline brightness for the scene (the average brightness of all pixels in the entire frame). This is the response sensitivity coefficient.

[0061] Lateral suppression denoising is based on the principle of lateral suppression in human vision, calculating the grayscale difference between each pixel and its neighboring pixels. By suppressing redundant information and removing environmental noise, edge details of phenomena such as flames and smoke are preserved. The specific calculation formula is as follows:

[0062] (2)

[0063] in, To suppress intensity, For pixels The set of 8 neighboring pixels, The original brightness value of the pixel at the neighboring pixel (a,b);

[0064] The preprocessed image is denoted as ( (for frame index), for preprocessed image frames Tagging is performed to obtain accurate fire images. ;

[0065] A brain-like visual hierarchical convolutional sparse encoder is constructed, which simulates the V1-V4 layers of the human brain's visual cortex. The encoding network consists of four layers, corresponding to the functions of the V1-V4 visual cortex. The specific network parameters are as follows:

[0066]

[0067] Preprocessed image frames After tensor transformation, it is converted into a feature map that can be computed by a hierarchical convolutional coding network. It is used as the input to the V1 convolutional coding layer.

[0068] Each convolutional coding layer embeds a dynamic convolution adjustment unit, which assigns an independent parameter to each sampling point of the convolutional kernel through deformable convolution. Two-dimensional offset is used to correct the sampling position to accurately cover the target area of ​​the fire.

[0069] The specific processing procedure of the dynamic convolution adjustment unit is as follows: Let the convolution kernel size be... Then each convolution kernel corresponds to There are 3 sampling points, assuming the original relative coordinates of each sampling point are 1. The offset is obtained through adaptive learning of the convolutional kernel, with each sampling point corresponding to a two-dimensional offset. ,in for Axis offset, for Axis offset. The offset is determined by deformable convolution and satisfies... ,

[0070] in Learn weights for offsets. Input feature map for convolutional coding layer The local feature map is determined by the kernel size (e.g., the kernel size can be 3x3). The size is also 3x3). This is a bias term.

[0071] For the current position on the input feature map of the coding layer The original sampling location is After deformable convolution offset adjustment, the final sampling position is: The adjusted convolution region can cover the fire area as fully as possible.

[0072] After processing with deformable convolution, the first... The output feature values ​​of each convolutional kernel are:

[0073] (4)

[0074] in, Indicates the V1 layer The weight matrix of each convolution kernel is in The weight at the point, Representation of feature map exist The eigenvalue at that location.

[0075] The output feature values ​​of the convolution kernel Overlay baseline After the activity, nonlinear activation is used to obtain the first... The sparse code corresponding to each convolutional kernel The sum of the sparse codes of all convolutional kernels in the V1 convolutional coding layer is the output feature map of the V1 convolutional coding layer, which is also the input feature map of the V2 convolutional coding layer. , means as follows:

[0076] (6)

[0077] (7)

[0078] in, The function is sigmoid. In this embodiment, the number of convolution kernels in the V1 convolutional coding layer is 3.

[0079] The output feature map of the V1 convolutional coding layer is used as the input feature map of the V2 convolutional coding layer.

[0080] In hierarchical convolutional coding networks, the first... The output feature maps of each convolutional coding layer are used as the input feature maps of k+1 coding layers. It satisfies the following form:

[0081] (8)

[0082] in, For the first The first layer The output feature values ​​of each convolutional kernel For the hierarchical convolutional coding network Baseline activity of the layer; M is the number of convolutional kernels.

[0083] Multi-scale deconvolutional feature decoding is a deconvolutional decoding network that includes spatial-channel dual-gating modules. These modules, namely channel gating and spatial gating, upsample and reconstruct the feature maps obtained from the hierarchical convolutional coding network. The channel gating module calculates the mutual information of each feature channel in the feature map, filters feature channels strongly correlated with fire, and generates channel gating weights. :

[0084] (9)

[0085] in, For decoding network number Input feature map of layer ( ), This is a global average pooling operation. and All are decoding networks. Layer channel gating parameters, This is the sigmoid function.

[0086] The spatial gating module calculates the spatial information entropy of the feature map, locates fire candidate regions, and generates spatial gating weights. :

[0087] (10)

[0088] and All are decoding networks. The spatial gating parameters of the layer, Based on The calculated spatial information entropy. For the decoding network... Input feature map of the layer Weighted channel gating and spatial gating yield a weighted feature map:

[0089] (11)

[0090] Weighted feature map Using the deconvolution operation, we obtain the... Output feature map of the layer:

[0091] (12)

[0092] in, For decoding network number The deconvolution kernel of the layer, This is a deconvolution operation. The decoding network's... The output feature map of layer 1 is used as the first layer 2. The input feature map of the layer continues to be computed, and features are recovered by progressive upsampling using deconvolution operations. The decoding network parameters are designed as follows:

[0093]

[0094] A composite loss function with multi-feature fusion is constructed to perform end-to-end optimization training on a brain-like visual deconvolutional encoding / decoding model. Group sparsity regularization is combined to improve the model's interpretability and generalization ability. Includes semantic loss Dynamic loss and texture loss The expression is as follows:

[0095] (13)

[0096] in, , , All three are weighting coefficients and satisfy the following conditions: This is used to balance the contribution of the three types of core losses. For sparse regularization coefficients, For the first The first convolutional coding layer The sparse codes corresponding to each convolutional kernel.

[0097] The semantic loss is calculated using the cross-entropy loss method, specifically:

[0098] (14)

[0099] in, and These are the height and width of the image, respectively. For the first s Frame image pixels The true labels (fire-related labels are recorded as 1, and background-related labels are recorded as 0). This is a probability feature map of candidate fire events output by the fourth layer of the decoding network.

[0100] The dynamic loss is calculated using the optical flow gradient loss method, and the expression is as follows:

[0101] (15)

[0102] in, The total number of frames in the training video; The input is the size of the continuous frame window; this is a set value. Operator for calculating optical flow gradient, For the first s A real fire scene image. For the first s+ A real fire scene image.

[0103] Texture loss is calculated using the structural similarity loss method, as shown in the following formula:

[0104] (16)

[0105] in, The structural similarity index is calculated as follows:

[0106] (17)

[0107] , These are real fire diagrams. and predicted feature map The mean, , For variance, For covariance, , .

[0108] The model optimization uses the Adam optimizer, and the learning rate is set accordingly. Batch size Number of iterations It can be set according to specific circumstances.

[0109] Outputting candidate fire event probability feature maps based on a brain-like visual deconvolutional encoding and decoding model .

[0110] Fire incident identification and early warning, setting probability thresholds The probability feature map of candidate fire events in the decoded output The probability of a certain point in the candidate fire event probability feature map. If the point is within a connected region of the fire event probability feature map, then it is determined to be a candidate pixel for fire detection. Number of candidate pixels for internal fire Then this connected component Defined as a candidate region for fire events, calculation area The location that has passed time series continuity verification is the final confirmed fire event. The specific time series continuity verification process is as follows:

[0111] Take continuous The frames are examined, and the area of ​​the fire candidate region in each frame is calculated in the entire input feature map of the neuromorphic visual deconvolutional encoder-decoder model. The proportion of the s-th frame is calculated as follows:

[0112] (18)

[0113] in, Input feature map The area;

[0114] If continuous Frame percentage and area growth rate If so, a fire incident is confirmed and an early warning is triggered;

[0115] Let be the area of ​​the fire event candidate region in the s-th frame; The area of ​​the fire event candidate region in frame s-1.

[0116] Any aspects not covered in this invention are applicable to existing technologies.

Claims

1. A fire event detection method based on a brain-like visual deconvolution coding model, characterized in that, The detection method includes the following steps: Brain-like visual preprocessing was performed frame by frame on the acquired fire video images; A brain-like visual deconvolutional encoding and decoding model is constructed, which includes a brain-like visual hierarchical convolutional sparse encoder and a multi-scale deconvolutional feature decoder. The neuromorphic visual hierarchical convolutional sparse encoder includes four convolutional coding layers that simulate the V1-V4 layers of the human brain's visual cortex. It performs stepwise extraction of low-level features to high-level features from the input single-frame image. In each convolutional coding layer, deformable convolution is used to assign an independent two-dimensional offset to each sampling point of the convolutional kernel, correcting the sampling position to accurately cover the fire target area and obtaining the output features of each convolutional kernel. Then, combined with the baseline, the sparse code corresponding to all convolutional kernels of the convolutional coding layer is obtained. The sum of the sparse codes corresponding to all convolutional kernels of the convolutional coding layer is used as the output of the next convolutional coding layer. The multi-scale deconvolutional feature decoder includes a deconvolutional decoding network with a spatial-channel dual-gated mechanism to upsample and reconstruct the encoded features; The brain-like visual deconvolutional encoding and decoding model is trained end-to-end using a composite loss function to obtain the probability feature map of candidate fire events. Then, the location that has been verified by time series continuity is selected as the final fire event based on the probability feature map of candidate fire events using a probability threshold. The composite loss function comprising a semantic loss , a dynamic loss and a texture loss , expressed as: (13) in, , , All three are weighting coefficients and satisfy the following conditions: This is used to balance the contribution of the three types of core losses; For sparse regularization coefficients, For the first The first convolutional coding layer The sparse code corresponding to each convolutional kernel; The semantic loss is calculated using cross-entropy loss, the dynamic loss is calculated using optical flow gradient loss, and the texture loss is calculated using structural similarity loss.

2. A fire event detection system based on a brain-inspired visual deconvolutional coding model, characterized in that, The system includes: The video frame acquisition module is used to acquire video image data of the fire scene; The dynamic response adjustment module is used to adjust the pixel response of the image obtained by the video image acquisition module through a nonlinear function, thereby enhancing the identification of fire features under low light conditions. The side-suppression noise reduction module is used to calculate the grayscale difference of the pixel response after the dynamic response adjustment module. It removes environmental noise by suppressing redundant information and preserves the edge details of flame and smoke phenomena. The neuromorphic visual hierarchical convolutional sparse encoder includes four convolutional coding layers corresponding to the V1-V4 layers of the human visual cortex. The image frames output by the lateral suppression and noise reduction module are used as the input of the V1 convolutional coding layer. In each convolutional coding layer, deformable convolution is used to assign an independent two-dimensional offset to each sampling point of the convolutional kernel to correct the sampling position to accurately cover the fire target area and obtain the output features of each convolutional kernel. Then, the sparse code corresponding to all convolutional kernels of the convolutional coding layer is obtained by combining the baseline. The sum of the sparse codes corresponding to all convolutional kernels of the convolutional coding layer is used as the output of the next convolutional coding layer. The multi-scale deconvolutional feature decoder includes a channel gating module and a spatial gating module. The output of the last convolutional coding layer of the neuromorphic visual hierarchical convolutional sparse encoder is used as the input of the multi-scale deconvolutional feature decoder. The channel gating module and the spatial gating module are applied to calculate the channel gating weight and spatial gating weight of the current layer, respectively. Then, the input of the corresponding layer of the multi-scale deconvolutional feature decoder is weighted with the channel gating weight and spatial gating weight. Finally, through the deconvolution operation, an input of the next layer of the decoder is obtained, and the output of the last layer of the decoder is used as the candidate fire event probability feature map. The fire event determination and early warning module is used to determine candidate fire pixels in the candidate fire event probability feature map according to a set probability threshold, then determine candidate fire event regions based on connected components, and calculate the area of ​​the candidate fire event regions. If the area of ​​the candidate fire event regions in consecutive frames is not less than the area ratio threshold of the input feature map of the corresponding frame based on the neuromorphic visual deconvolutional encoding and decoding model, and the area growth rate is within a certain range, the determination and early warning module is used to determine the fire event candidate region. If the fire is detected within the designated area, a fire incident will be confirmed and an early warning will be triggered.

3. The system of claim 2, wherein, The calculation formula for the dynamic response adjustment module is: (1) in, Coordinates The original brightness value of the pixel at that location; The baseline brightness for the scene is the average brightness of all pixels in the entire frame. This is the response sensitivity coefficient; The pixel response is adjusted for dynamic response.

4. The system of claim 2, wherein, The calculation formula for the side-suppression noise reduction module is: (2) wherein, to suppress intensity, is a pixel point of the 8-neighborhood pixel set of the pixel point is the original luminance value of the pixel point at the neighborhood pixel (a, b).

5. The system of claim 2, wherein, The convolution kernel size of the neuromorphic visual hierarchical convolutional sparse encoder is 3x3, and the number of convolution kernels in the convolutional coding layer is 3; the deconvolution kernel size of the first three decoding layers of the multi-scale deconvolutional feature decoder is 2x2, and the deconvolution kernel size of the last decoding layer is 1x1. Image frame processed by side inhibition noise reduction module Tagging processing is performed to obtain a real fire image ; The area ratio threshold is 0.35, and consecutive frames refer to ten consecutive frames.

6. The system of claim 2, wherein, The process of calculating the offset using deformable convolution is as follows: , wherein, input feature maps for convolutional encoding layer local feature maps determined according to the size of the convolution kernel; learning weights for the offset; bias term.

7. The system according to claim 2, characterized in that, The channel gating module is used to calculate the mutual information of each feature channel in the feature map, filter feature channels that are strongly correlated with fire, and generate channel gating weights. The specific calculation process is as follows: (8) in, For the multi-scale deconvolution feature decoder The input feature map of the layer; This is a global average pooling operation. and All are multi-scale deconvolutional feature decoders. Layer channel gating parameters, It is the sigmoid function; The spatial gating module is used to calculate the spatial information entropy of the feature map, locate fire candidate regions, and generate spatial gating weights. The specific process is as follows: (9) in, and All are multi-scale deconvolutional feature decoders. The spatial gating parameters of the layer, For based on The calculated spatial information entropy; For the multi-scale deconvolution feature decoder Input feature map of the layer Weighted channel gating and spatial gating yield a weighted feature map: (10) Weighted feature map Using the deconvolution operation, we obtain the... Output feature map of the layer : (11) in, For the multi-scale deconvolution feature decoder The deconvolution kernel of the layer, This is a deconvolution operation.

8. The system according to claim 2, characterized in that, The process of determining candidate regions for fire events based on connected components is as follows: setting a probability threshold. If the probability of a certain point in the candidate fire event probability feature map output by decoding... Then the point is determined to be a candidate pixel for fire, if it is in a connected region of the fire event probability feature map. Number of candidate pixels for internal fire Then this connected component Defined as a candidate area for fire events.