A method, equipment, and storage medium for detecting the risk of exterior wall facade detachment.
By combining the multi-stream 3D-ResNet spatiotemporal feature extraction module, feature enhancement and multi-head spatiotemporal attention module and BiLSTM temporal risk modeling module, the problem of high false negative rate in the detection of hazard falling off the exterior facade is solved, and high-precision risk detection and early warning are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CCCC SHEC DONGMENG ENG CO LTD
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies for detecting the risk of exterior wall detachment have a high rate of missed detections, especially since equipment detection relies on single data or shallow multi-source data splicing, resulting in incomplete information and an inability to effectively coordinate and utilize multi-source data.
A multi-stream 3D-ResNet spatiotemporal feature extraction module is used to extract independent spatiotemporal features from multimodal input data. Combined with feature enhancement, multi-head spatiotemporal attention module and BiLSTM temporal risk modeling module, deep fusion and risk detection of different modal data are achieved.
By deeply integrating multi-source data and using time-series modeling, the rate of missed detection of potential hazards has been reduced, the detection accuracy and the comprehensiveness of risk warnings have been improved, and minor potential hazards can be identified and warnings can be issued.
Smart Images

Figure CN122309971A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of building structure safety testing technology, and in particular to a method, equipment and storage medium for detecting the risk of exterior wall facade detachment. Background Technology
[0002] For the risk detection of exterior wall detachment, common methods include manual inspection and equipment inspection. Manual inspection is the most widely used traditional technique. Inspectors approach the exterior wall using suspended platforms, scaffolding, or other high-altitude equipment. They first visually inspect the surface, identifying obvious cracks, peeling, and hollow areas. For suspected areas, they verify the issue by tapping with a small hammer, measuring with a tape measure, and touching. Finally, they record the location, type, and level of the hazard in paper or spreadsheet form, generating an inspection report. This method is mainly used for spot checks of small buildings or partial exterior walls, and is common in minor repairs in older residential areas and partial building renovations. Equipment inspection, on the other hand, uses specific equipment to capture certain features of the exterior wall. However, equipment inspection relies on single data sources (such as only RGB images or only radar data) or only superficially stitches together multi-source data (such as simply overlaying infrared and visual data channels). This approach results in incomplete data information, fails to release the collaborative value of multi-source data, and has a high rate of missed hazard detection. Summary of the Invention
[0003] This application aims to at least partially address one of the aforementioned technical problems in the prior art. To this end, embodiments of this application provide a method, device, and storage medium for detecting the risk of exterior wall facade detachment, which can effectively reduce the false negative rate of potential exterior wall facade detachment hazards.
[0004] In a first aspect, this application provides a method for detecting the risk of exterior wall facade detachment, characterized by comprising the following steps: S1: Collect multimodal input data of the target building, the multimodal input data including dynamic time series data and static attribute data; S2: The multi-stream 3D-ResNet spatiotemporal feature extraction module is used to extract the spatiotemporal features of different modal dynamic time series data in the multimodal input data independently to obtain the dynamic spatiotemporal features of each modality; S3: The extracted dynamic spatiotemporal features of each modality are fused to obtain preliminary fused features; S4: Input the preliminary fused features into the feature enhancement and multi-head spatiotemporal attention module, and dynamically weight the spatiotemporal dimensions of the fused features through the multi-head attention mechanism to enhance risk-related features and suppress interference information, thereby obtaining attention-optimized features; S5: Input the attention optimization features into the BiLSTM time-series risk modeling module to capture the evolution of hidden dangers over time, and combine them with the static attribute data to generate the final risk detection result.
[0005] Based on the embodiments of the first aspect of this application, the dynamic time-series data are visual data, structural data, and environmental data; the static attribute data are building attribute data.
[0006] Based on the embodiments of the first aspect of this application, the multi-stream 3D-ResNet spatiotemporal feature extraction module is a network based on 3D convolution and residual connections; In step S2, spatiotemporal features are extracted for the dynamic temporal data of each modality through an independent 3D-ResNet sub-network. The 3D-ResNet sub-network includes a Conv3D block and an Identity block connected in sequence, and there is a residual connection between the Conv3D block and the Identity block.
[0007] Based on the embodiments of the first aspect of this application, a feature enhancement step is further included before step S4: The preliminary fused features are subjected to dimensionality reduction and global pooling operations to generate feature weight vectors; The feature weight vector is weighted and fused with the dimensionality-reduced features to obtain the enhanced features; In step S4, the enhanced features are input into the feature enhancement and multi-head spatiotemporal attention module for processing.
[0008] Based on an embodiment of the first aspect of this application, the feature enhancement and multi-head spatiotemporal attention module performs the following operations: The enhanced features are projected into query vectors, key vectors, and value vectors, respectively. The attention weights of the query vector and the key vector in the spatiotemporal dimension are calculated using multiple parallel attention heads. The value vector is weighted and summed based on the attention weights to obtain the output of each attention head; The outputs of all attention heads are concatenated and linearly transformed, and then added to the enhanced features through residual connections to output the attention-optimized features.
[0009] Based on the embodiments of the first aspect of this application, step S5 includes: The attention optimization features are input into the BiLSTM temporal risk modeling module by time step, and the forward and backward temporal dependency features are extracted by forward LSTM and backward LSTM respectively. By concatenating the final hidden states of the forward LSTM and the backward LSTM outputs, bidirectional temporal features are obtained.
[0010] Based on the embodiments of the first aspect of this application, step S5 further includes: The bidirectional temporal features are concatenated with the static features extracted from the static attribute data to obtain the final fused features; Based on the final fusion features, the risk detection result is generated through the output layer.
[0011] In optional or preferred embodiments, the risk detection result includes at least one of the following: The risk level of the target building or exterior wall area; And, the pixel-level risk mask for the risk areas on the exterior wall facade.
[0012] Secondly, this application provides an external wall facade detachment risk detection device, which includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the computer program is configured to implement the steps of any of the above-described external wall facade detachment risk detection methods.
[0013] Thirdly, this application provides a storage medium, which is a computer-readable storage medium, and stores a computer program thereon. When the computer program is executed by a processor, it implements the steps of the external wall facade detachment risk detection method described in any of the above claims.
[0014] Based on the above technical solutions, the embodiments of this application have at least the following beneficial effects: This application designs independent feature extraction sub-networks for visual data, structural data, and environmental data respectively, to achieve deep extraction of features of different modal data. The BiLSTM time series risk modeling module avoids risk misjudgment caused by relying solely on single-dimensional features, solves the information limitation of a single data source, and reduces the rate of missed detection of hidden dangers. Attached Figure Description
[0015] The present application will be further described below with reference to the accompanying drawings and embodiments; Figure 1 This is a flowchart of this application; Figure 2 This is a flowchart of the multi-stream 3D-ResNet spatiotemporal feature extraction module in an embodiment of this application; Figure 3 This is a flowchart of the feature enhancement and multi-head spatiotemporal attention module in this application. Detailed Implementation
[0016] It should be understood that the specific embodiments described herein are merely illustrative of the technical solutions of this application and are not intended to limit this application.
[0017] It should be noted that the executing entity in this embodiment can be a computing service device with data processing, network communication, and program execution functions, such as a tablet computer, personal computer, or mobile phone, or an electronic device or wall recognition device capable of performing the above functions. The following description uses a wall recognition device as an example to illustrate this embodiment and the subsequent embodiments.
[0018] For the risk detection of exterior wall detachment, common methods include manual inspection and equipment inspection. Manual inspection is the most widely used traditional technique. Inspectors approach the exterior wall using suspended platforms, scaffolding, or other high-altitude equipment. They first visually inspect the surface, identifying obvious cracks, peeling, and hollow areas. For suspected areas, they verify the issue by tapping with a small hammer, measuring with a tape measure, and touching. Finally, they record the location, type, and level of the hazard in paper or spreadsheet form, generating an inspection report. This method is mainly used for spot checks of small buildings or partial exterior walls, and is common in minor repairs in older residential areas and partial building renovations. Equipment inspection, on the other hand, uses specific equipment to capture certain features of the exterior wall. However, equipment inspection relies on single data sources (such as only RGB images or only radar data) or only superficially stitches together multi-source data (such as simply overlaying infrared and visual data channels). This approach results in incomplete data information, fails to release the collaborative value of multi-source data, and has a high rate of missed hazard detection.
[0019] Reference Figures 1 to 3 This application provides a method for detecting the risk of exterior wall facade detachment, including the following steps: S1: Collect multimodal input data of the target building, the multimodal input data including dynamic time series data and static attribute data; S2: The multi-stream 3D-ResNet spatiotemporal feature extraction module is used to extract the spatiotemporal features of different modal dynamic time series data in the multimodal input data independently to obtain the dynamic spatiotemporal features of each modality; S3: The extracted dynamic spatiotemporal features of each modality are fused to obtain preliminary fused features; S4: Input the preliminary fused features into the feature enhancement and multi-head spatiotemporal attention module, and dynamically weight the spatiotemporal dimensions of the fused features through the multi-head attention mechanism to enhance risk-related features and suppress interference information, thereby obtaining attention-optimized features; S5: Input the attention optimization features into the BiLSTM time-series risk modeling module to capture the evolution of hidden dangers over time, and combine them with the static attribute data to generate the final risk detection result.
[0020] The dynamic time-series data includes visual data, structural data, and environmental data; the static attribute data is building attribute data.
[0021] The method for detecting the risk of exterior wall facade detachment in this application is implemented through an end-to-end deep learning model, which includes: A multi-stream 3D-ResNet spatiotemporal feature extraction module is used to perform steps S2 and S3; Feature enhancement and multi-head spatiotemporal attention module, used to perform step S4; The BiLSTM time series risk modeling module is used to perform step S5.
[0022] The detailed flowchart is as follows: Figure 1 As shown, the top layer contains three dynamic temporal data input boxes (visual data, structural data, and environmental data) and one static attribute data input box (building attributes). The middle layer consists of the "Multi-stream 3D-ResNet Spatiotemporal Feature Extraction Module", the "Feature Enhancement and Multi-head Spatiotemporal Attention Module", and the "BiLSTM Temporal Risk Modeling Module", with arrows indicating the data flow direction. Below, there are two output boxes (risk level and pixel-level risk mask).
[0023] In some embodiments, the multi-stream 3D-ResNet spatiotemporal feature extraction module is a network based on 3D convolution and residual connections.
[0024] In step S2, spatiotemporal features are extracted for the dynamic temporal data of each modality using an independent 3D-ResNet sub-network. Specifically, to fully extract features from visual data, structural data, environmental data, and static building attribute data, and to avoid interference between different modalities, the multi-stream 3D-ResNet spatiotemporal feature extraction module adopts a multi-stream independent sub-network architecture. 3D-ResNet sub-networks are designed separately for the three types of dynamic data: visual, structural, and environmental, while a separate 3D convolution branch is designed for static building attribute data. Each 3D-ResNet sub-network achieves joint extraction of spatiotemporal features through 3D convolution operations, capturing both spatial features such as the location of wall cracks and the shape of hollow areas, and temporal features such as the expansion of cracks from small to large and the expansion of hollow areas over time, overcoming the limitation of traditional techniques that can only extract single-dimensional features.
[0025] The 3D-ResNet subnetwork includes Conv3D blocks and Identity blocks connected in sequence, and there is a residual connection between the Conv3D blocks and the Identity blocks.
[0026] By adding residual connections between the Conv3D block and the Identity block in the sub-network, the gradient vanishing problem during deep network training can be avoided, ensuring that long-term features in time series data do not degrade and improving the ability to capture early minor problems.
[0027] The 3D convolutional feature extraction process of the multi-stream 3D-ResNet spatiotemporal feature extraction module transforms the spatiotemporal dimensions of the input data through 3D convolution: (1) In the formula, X represents a certain type of input data tensor (such as visual data). K is the size of the 3D convolution kernel, W and b are the weights and biases of the 3D convolution kernel, respectively, and the BatchNorm operation is used to maintain the stability of the data distribution to avoid gradient explosion. The ReLU activation function is used to introduce nonlinear features. This formula can transform the temporal-spatial correlation information of the input data into a high-dimensional feature vector, thus preserving the spatiotemporal patterns of potential risks.
[0028] The residual connections between the Conv3D blocks and the Identity blocks of the subnetwork are as follows: (2) In the formula, The output features of 3D convolutional blocks, Output the shortcut path. This represents the final output feature of the residual block.
[0029] Gradient propagation is supplemented by equation (2) to support network depth stacking and enhance the ability to extract complex spatiotemporal features. Finally, multi-source data features are integrated through a multi-stream feature fusion formula:
[0030] In the formula, These are the output features of the visual, structural, and environmental subnetworks, respectively. This is a tensor splicing operation. This is the result of dynamic feature fusion. This is static building attribute data. This is the result of static feature processing.
[0031] In some embodiments, a feature enhancement step is included before step S4: The preliminary fused features are subjected to dimensionality reduction and global pooling operations to generate feature weight vectors; The feature weight vector is weighted and fused with the dimensionality-reduced features to obtain the enhanced features; In step S4, the enhanced features are input into the multi-head spatiotemporal attention module for processing.
[0032] Furthermore, the multi-head spatiotemporal attention module performs the following operations: The enhanced features are projected into query vectors, key vectors, and value vectors, respectively. The attention weights of the query vector and the key vector in the spatiotemporal dimension are calculated using multiple parallel attention heads. The value vector is weighted and summed based on the attention weights to obtain the output of each attention head; The outputs of all attention heads are concatenated and linearly transformed, and then added to the enhanced features through residual connections to output the attention-optimized features.
[0033] To address the issues of redundant information such as wall background texture, susceptibility of key risk features to interference, and degradation of deep features in the output features of multi-stream 3D-ResNet modules, feature quality is optimized through a combination of feature enhancement, multi-head spatiotemporal attention, and residual connections.
[0034] The feature enhancement stage first reduces computational complexity through 2D convolutional dimensionality reduction, then extracts global features through global pooling to reduce the impact of local interference, and finally generates feature weights through a fully connected layer and fuses them with the dimensionality-reduced features to enhance the expression of key risk features such as crack edges and thermal anomalies in hollow areas. The multi-head spatiotemporal attention stage projects the enhanced features into queries (Q), keys (K), and values (V), and calculates the feature weights of the spatiotemporal dimensions through multiple independent attention heads to dynamically increase the risk feature weights and weaken interference information, thereby focusing on key risks. The residual connection establishes a path between the attention output and the enhanced features to avoid feature degradation caused by the attention module and ensure that deep risk features are not lost.
[0035] (1) Feature enhancement 1) 2D convolutional dimensionality reduction Reduced by 1×1 2D convolution To reduce computational cost while preserving core features, the spatial dimension is calculated using the following formula: (5) 2) Global average pooling To compress spatial dimensions and extract global features to avoid local interference, the formula is as follows: (6) In the formula, The spatial dimension after dimensionality reduction. For the dimensionality reduction features in pixels The value at point GAP is the 1-dimensional feature vector after pooling.
[0036] 3) Feature weight fusion The feature weight vector is generated and fused with the dimensionality-reduced features to strengthen key risk features and weaken redundant features, as shown in the following formula:
[0037] In the formula, A fully connected layer activated by LeakyReLU. Constraining the weights to between 0 and 1, D x For the feature weight vector, To enhance the post-feature tensor.
[0038] (2) Multi-headed spatiotemporal attention 1) Feature projection The enhanced features are transformed into a format adapted for attention calculation, laying the foundation for weight calculation. The formula is as follows: (9) In the formula, Let be a learnable projection weight matrix, and let (query), (key), and (value) be the projected feature tensors.
[0039] 2) Single-head attention calculation Each attention head focuses on a class of spatiotemporal features, and the weight of that class of features is calculated using the following formula: (10) In the formula, Let d be the projected feature of the i-th attention head. k Let M be the attention head dimension, and M be the mask matrix. Normalize the weights.
[0040] 3) Multi-head fusion and residual connection By integrating multiple spatiotemporal risk features and ensuring feature non-degradation through residual connections, the accuracy of risk identification in complex scenarios is improved. The formula is as follows:
[0041] In the formula, h represents the number of attention heads. To fuse the weight matrix, It features multi-head fusion. This is the final output feature of the MHSA module.
[0042] In some embodiments, step S5 includes: The attention optimization features are input into the BiLSTM temporal risk modeling module by time step, and the forward and backward temporal dependency features are extracted by forward LSTM and backward LSTM respectively. By concatenating the final hidden states of the forward LSTM and the backward LSTM outputs, bidirectional temporal features are obtained.
[0043] In some embodiments, step S5 further includes: The bidirectional temporal features are concatenated with the static features extracted from the static attribute data to obtain the final fused features; Based on the final fusion features, the risk detection result is generated through the output layer.
[0044] In some embodiments, the risk detection result includes at least one of the following: The risk level of the target building or exterior wall area; And, the pixel-level risk mask for the risk areas on the exterior wall facade.
[0045] To address the problem that traditional technologies cannot model the long-term correlation between environmental factors, hazard evolution, and risk escalation, leading to delayed risk warnings, this module adopts a bidirectional LSTM architecture. It processes time-series data separately using forward LSTM and backward LSTM, simultaneously capturing both "forward time-series dependencies" (such as early precipitation leading to mid-term cracks and later risk escalation) and "backward time-series dependencies" (such as tracing later risk escalation back to early environmental impacts), thus fully reconstructing the temporal evolution pattern of hazards. Furthermore, the time-series features output by the bidirectional LSTM are concatenated with static building attribute features, combining "dynamic environmental impacts" and "static material properties" to avoid misjudgments caused by relying solely on single-dimensional features, thereby improving the comprehensiveness of risk assessment.
[0046] (1) Bidirectional LSTM hidden state calculation Taking forward LSTM as an example (reverse LSTM only has the opposite timing processing direction, but the formula structure is the same), under the timing step size, effective timing information is filtered and random noise is removed through gate structures, as shown in the following formula:
[0047] In the formula, X t Input features for period t (MHSA module output) (the slice of period t). (Input Gate) (Forgotten Gate) (Output gates) respectively control feature input, invalid information discarding, and valid feature output; For cell states (stores long-term temporal information). The hidden state (outputs the current time series features); W is the weight matrix, and b is the bias term. For element-wise multiplication, The Sigmoid activation function is used. It is the hyperbolic tangent activation function.
[0048] (2) Feature fusion By integrating positive and negative time-series features, a feature vector containing the complete time-series evolution pattern of potential risks is formed, as shown in the following formula: (18) In the formula, This is the last hidden state of the positive LSTM. This is the hidden state in the last phase of the inverse LSTM. This is the result of bidirectional temporal feature fusion.
[0049] (3) Fusion of static and dynamic features Combining the "temporal evolution of hidden dangers under the influence of dynamic environment" and the "differences in static building attributes," comprehensive feature support is provided for the output layer, as shown in the following formula: (19) In the formula, The static building attribute features output by the multi-stream 3D-ResNet module. To ultimately integrate the feature tensors, ensuring that the model can both predict the development trend of potential hazards and achieve accurate risk assessment by combining building characteristics.
[0050] This application designs independent feature extraction sub-networks for visual data, structural data, and environmental data respectively. By using 3D convolutional layers and normalization strategies adapted to the characteristics of each data source, it achieves deep extraction of features from different modalities. Then, by using tensor splicing operations, it integrates multi-stream features to construct a unified multi-source feature space. This breaks the information limitations of a single data source and the feature conflicts of shallow fusion, fully explores the synergistic value of multi-source data in detachment risk detection, and reduces the rate of missed detection of hidden dangers.
[0051] In the spatial dimension, small-sized 3D convolutional kernels combined with dilated convolutional structures are used to enhance the ability to capture microscopic features of walls, enabling accurate identification of millimeter-level minute hazards. In the temporal dimension, a BiLSTM temporal risk modeling module is integrated to perform bidirectional feature learning on multi-period temporal data, construct a temporal correlation model, and quantify the evolution of hazards over time. Through the synergistic effect of the multi-stream 3D-ResNet spatiotemporal feature extraction module and the BiLSTM temporal risk modeling module, an upgrade from static instantaneous detection to spatiotemporal linkage prediction is achieved, solving the problem of identifying minute hazards and enabling early warning of risks.
[0052] By dynamically weighting spatiotemporal features through a multi-head attention mechanism, the weights of key risk features such as cracks and hollow areas are automatically increased, while the impact of interference information such as stains is weakened. A residual connection structure is introduced to maintain gradient continuity during feature extraction, avoiding feature degradation in deep networks. At the same time, the initially extracted features are optimized and enhanced by combining global averaging-max pooling with fully connected layers, strengthening the expressive power of key risk features, ultimately reducing the risk of misjudgment caused by interference information and improving detection accuracy in complex scenarios.
[0053] This application also provides an external wall facade detachment risk detection device, which includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the computer program is configured to implement the steps of any of the above-described external wall facade detachment risk detection methods.
[0054] The facade detachment risk detection equipment may include a processing unit (such as a central processing unit, graphics processing unit, etc.) that can perform various appropriate actions and processes according to a program stored in read-only memory (ROM) or a program loaded from a storage device into random access memory (RAM). The RAM also stores various programs and data required for the operation of the facade detachment risk detection equipment. The processing unit, ROM, and RAM are interconnected via a bus. Input / output (I / O) interfaces are also connected to the bus. Typically, the following systems can be connected to the I / O interface: input devices including, for example, touchscreens, touchpads, keyboards, mice, image sensors, microphones, accelerometers, gyroscopes, etc.; output devices including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices including, for example, magnetic tape, hard disks, etc.; and communication devices. The communication device allows the facade recognition device to communicate wirelessly or wiredly with other devices to exchange data.
[0055] This application also provides a storage medium, which is a computer-readable storage medium, and stores a computer program thereon. When the computer program is executed by a processor, it implements the steps of the external wall facade detachment risk detection method described in any of the above claims.
[0056] The computer program can be downloaded and installed from a network via a communication device, or installed from a storage device, or installed from a read-only memory. When the computer program is executed by a processing device, it performs the functions defined in the methods of the embodiments disclosed in this application.
[0057] It should be understood that the various parts disclosed in this application can be implemented using hardware, software, firmware, or a combination thereof. In the description of the above embodiments, specific features, structures, materials, or characteristics can be combined in any suitable manner in one or more embodiments or examples.
[0058] The embodiments of this application have been described in detail above with reference to the accompanying drawings. However, this application is not limited to the above embodiments. Within the scope of knowledge possessed by those skilled in the art, various changes can be made without departing from the spirit of this application.
Claims
1. A method for detecting a risk of facade disbonding, characterized in that Includes the following steps: S1: Collect multimodal input data of the target building, the multimodal input data including dynamic time series data and static attribute data; S2: The multi-stream 3D-ResNet spatiotemporal feature extraction module is used to extract the spatiotemporal features of different modal dynamic time series data in the multimodal input data independently to obtain the dynamic spatiotemporal features of each modality; S3: The extracted dynamic spatiotemporal features of each modality are fused to obtain preliminary fused features; S4: Input the preliminary fused features into the feature enhancement and multi-head spatiotemporal attention module, and dynamically weight the spatiotemporal dimensions of the fused features through the multi-head attention mechanism to enhance risk-related features and suppress interference information, thereby obtaining attention-optimized features; S5: Input the attention optimization features into the BiLSTM time-series risk modeling module to capture the evolution of hidden dangers over time, and combine them with the static attribute data to generate the final risk detection result.
2. The method for detecting the risk of exterior wall facade detachment according to claim 1, characterized in that: The dynamic time-series data includes visual data, structural data, and environmental data; the static attribute data includes building attribute data.
3. The method for detecting the risk of exterior wall facade detachment according to claim 1 or 2, characterized in that: The multi-stream 3D-ResNet spatiotemporal feature extraction module is a network based on 3D convolution and residual connections; In step S2, spatiotemporal features are extracted for the dynamic temporal data of each modality through an independent 3D-ResNet sub-network. The 3D-ResNet sub-network includes a Conv3D block and an Identity block connected in sequence, and there is a residual connection between the Conv3D block and the Identity block.
4. The method for detecting the risk of exterior wall facade detachment according to claim 1, characterized in that: Prior to step S4, a feature enhancement step is also included: The preliminary fused features are subjected to dimensionality reduction and global pooling operations to generate feature weight vectors; The feature weight vector is weighted and fused with the dimensionality-reduced features to obtain the enhanced features; In step S4, the enhanced features are input into the feature enhancement and multi-head spatiotemporal attention module for processing.
5. The method for detecting the risk of exterior wall facade detachment according to claim 4, characterized in that: The feature enhancement and multi-head spatiotemporal attention module performs the following operations: The enhanced features are projected into query vectors, key vectors, and value vectors, respectively. The attention weights of the query vector and the key vector in the spatiotemporal dimension are calculated using multiple parallel attention heads. The value vector is weighted and summed based on the attention weights to obtain the output of each attention head; The outputs of all attention heads are concatenated and linearly transformed, and then added to the enhanced features through residual connections to output the attention-optimized features.
6. The method for detecting the risk of exterior wall facade detachment according to claim 1, characterized in that: Step S5 includes: The attention optimization features are input into the BiLSTM temporal risk modeling module by time step, and the forward and backward temporal dependency features are extracted by forward LSTM and backward LSTM respectively. By concatenating the final hidden states of the forward LSTM and the backward LSTM outputs, bidirectional temporal features are obtained.
7. The method for detecting the risk of exterior wall facade detachment according to claim 6, characterized in that: Step S5 also includes: The bidirectional temporal features are concatenated with the static features extracted from the static attribute data to obtain the final fused features; Based on the final fusion features, the risk detection result is generated through the output layer.
8. The method for detecting the risk of exterior wall facade detachment according to claim 1, characterized in that: The risk detection results include at least one of the following: The risk level of the target building or exterior wall area; And, the pixel-level risk mask for the risk areas on the exterior wall facade.
9. A device for detecting the risk of exterior wall facade detachment, characterized in that, The exterior wall facade detachment risk detection device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the exterior wall facade detachment risk detection method as described in any one of claims 1 to 8.
10. A storage medium, characterized in that, The storage medium is a computer-readable storage medium, and a computer program is stored on the storage medium. When the computer program is executed by a processor, it implements the steps of the method for detecting the risk of exterior wall facade detachment as described in any one of claims 1 to 8.