A remote sensing change detection method for disaster building assessment
By using the frequency domain heterogeneous modulation fusion network FHFM-Net, which decouples frequency information using Fourier transform and performs feature modulation, the problems of increased false changes and incomplete extraction of real change areas in remote sensing change detection are solved, and high-precision identification of building disaster changes is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA THREE GORGES UNIV
- Filing Date
- 2026-04-30
- Publication Date
- 2026-06-19
AI Technical Summary
Existing remote sensing change detection methods suffer from problems such as increased false changes, incomplete extraction of real change areas, blurred edges of changed targets, and omission of local areas in building disaster assessment, making it difficult to achieve high-precision identification in complex scenarios.
The frequency domain heterogeneous modulation fusion network FHFM-Net is adopted. Frequency information is decoupled through Fourier transform. The high-frequency components are extracted by differential operation to extract change-sensitive signals, and the low-frequency components are characterized by negative cosine similarity to characterize the semantic consistency between features. In the spatial domain, cosine similarity is used to modulate the feature difference and feature concatenation to achieve complementary information fusion.
The remote sensing change detection model has improved its detection performance and generalization ability in the assessment of disaster-stricken buildings. It can accurately identify the damage outline and local damage details of buildings in complex scenarios, thereby improving the completeness and accuracy of the detection results.
Smart Images

Figure CN122244684A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of remote sensing change detection technology, and more particularly to a technology that applies remote sensing change detection technology and artificial intelligence technology to the detection and assessment of disaster-stricken buildings, especially a remote sensing change detection method for assessing disaster-stricken buildings. Background Technology
[0002] Remote sensing change detection (RSCD) aims to identify areas where surface features have changed by comparing and analyzing multi-temporal images of the same geographic area acquired at different times. As an important topic in the field of remote sensing, RSCD is of great significance in practical tasks such as urban planning, deforestation monitoring, disaster assessment, and environmental monitoring, and has therefore attracted widespread attention.
[0003] In recent years, with the development of deep learning, frequency domain modeling methods have been widely used in the field of remote sensing and have achieved remarkable results. The paper "High-Resolution Remote Sensing Image Change Detection Based on Fourier Feature Interaction and Multiscale Perception" introduces frequency domain feature processing into remote sensing change detection tasks, which improves the generalization ability of change detection models to a certain extent. However, directly applying this type of method to the assessment of disaster-stricken buildings still has two problems: 1) Disaster-stricken building scenes are complex in structure and diverse in disaster types, including significant changes such as building collapse, damage, and partial occlusion, and are often accompanied by smoke and dust coverage, shadow shift, and background disturbance. In such scenarios, existing methods simplify frequency domain processing into a uniform global filtering or style alignment process, failing to fully explore the differences and complementarities carried by different frequency components, thus making it difficult to effectively decouple content information and style information, and making it difficult to accurately highlight the real disaster-affected areas. 2) Existing methods typically rely on feature splicing or element-wise subtraction during the dual-temporal feature fusion process. The former mainly preserves the joint semantic context and fails to explicitly emphasize the difference information that is sensitive to changes. Although the latter can capture local changes, it is also easy to amplify the appearance differences of unchanged areas, thereby introducing false alarms and limiting the model's ability to perceive the fine details of the edges of disaster-stricken buildings and local damage areas.
[0004] The aforementioned problems generally manifest in scenarios with significant stylistic differences in multi-temporal remote sensing images and complex structural changes in affected areas. In disaster-affected building assessments, these problems often include an increase in false changes, incomplete extraction of true change areas, blurred edges of changed targets, and omissions of local areas. Furthermore, the lack of effective differentiation and collaborative modeling of different frequency components easily leads to confusion between content and style information, making it difficult for the model to accurately focus on the actual disaster-affected areas of buildings, thus reducing the completeness and accuracy of the detection results. Simultaneously, simply using stitching or difference strategies for dual-temporal feature fusion can easily result in insufficient response in disaster-affected areas or false detections in non-disaster-affected areas, making it difficult to achieve high-precision identification of building damage outlines, local damage details, and true change information in complex backgrounds. These phenomena further illustrate that existing frequency-domain-based remote sensing change detection methods still have shortcomings in decoupling content and style representation, complementary utilization of different frequency information, and generalized detection in complex disaster scenarios. Therefore, how to fully exploit multi-component information in the frequency domain while suppressing style migration interference and achieving effective complementary fusion of dual-temporal features remains a key problem that urgently needs to be solved in remote sensing change detection tasks for disaster-affected building assessments.
[0005] To address the aforementioned problems, this invention proposes a remote sensing change detection method for urban disaster assessment. The method first decouples frequency information using Fourier transform. For high-frequency components, the invention employs differential operations to extract change-sensitive signals, and then introduces modulation and filtering bottlenecks to enhance the bi-temporal response of change regions and suppress noise. For low-frequency components, the invention uses negative cosine similarity to characterize the semantic consistency between feature vectors, thereby reducing sensitivity to style changes. Subsequently, the frequency domain features are mapped back to the spatial domain, where the invention further utilizes cosine similarity to modulate feature difference and feature concatenation, enabling the complementary fusion of multiple strategies. Summary of the Invention
[0006] The purpose of this invention is to address the following technical problems of existing remote sensing change detection methods used in building disaster assessment tasks, as mentioned in the background art: Due to significant differences in lighting, shadows, smoke and dust obstruction, imaging conditions, and background texture between pre-disaster and post-disaster remote sensing images, a large amount of false change information is easily introduced. Furthermore, existing methods are insufficient in expressing fine-grained disaster features such as building edge damage, structural fractures, and local texture damage, resulting in low accuracy in building disaster change detection, insufficient robustness in complex scenarios, and poor efficiency in identifying disaster-affected areas. Therefore, this invention proposes a remote sensing change detection method for disaster-affected building assessment.
[0007] To solve the above-mentioned technical problems, the technical solution adopted by the present invention is as follows: A remote sensing change detection method for assessing disaster-affected buildings includes the following steps: Step 1: Collect and organize remote sensing image samples containing information on changes in urban buildings due to disasters; Step 2: Preprocess and enhance the image samples in the dataset by rotating, cropping, flipping, and adjusting brightness to expand the sample distribution and improve data diversity; Step 3: Use the CVAT annotation tool to annotate the target information of the enhanced images, and divide the dataset into training set, validation set and test set according to the preset ratio to support the training and performance evaluation of the model at different stages. Step 4: Construct a frequency domain heterogeneous modulation fusion network FHFM-Net for building disaster assessment; Step 5: Input the preprocessed and labeled building disaster change dataset into the FHFM-Net model for training. Iteratively optimize the network parameters through forward and backward propagation so that the model can fully learn the spatial and frequency domain features of building disaster changes, thereby improving the accuracy and generalization ability of change detection.
[0008] In step 1, the specific method is as follows: the image data includes the following two parts: one is the building disaster change detection sample in the existing public database, and the other is the image sample containing building disaster change information obtained through on-site collection.
[0009] The image samples collected in the field can cover scenarios with a lot of background interference, obvious target occlusion, and complex disaster types, thereby effectively expanding the sample types and improving the diversity of the dataset.
[0010] In step 2, the specific approach is as follows: preprocessing operations are performed on the building disaster change detection dataset to correct and enhance spatial offset, grayscale unevenness, and imaging differences, so as to reduce the distortion caused by changes in imaging conditions, thereby improving the efficiency of subsequent models in recognizing building disaster features and enhancing their robustness and environmental adaptability in complex backgrounds, occlusion interference, and multi-hazard factor coupling scenarios.
[0011] In step 3, the specific steps are as follows: when annotating the building disaster change dataset, the CVAT tool is used to accurately annotate the building disaster change areas, and the dataset is divided into training set, validation set and test set according to a preset ratio to ensure that the model can achieve effective training and objective evaluation under different data distribution conditions; the final constructed dataset covers disaster change target samples under different background complexities, providing a foundation for the training and evaluation of the subsequent building disaster assessment remote sensing change detection model.
[0012] In step 4, the obtained frequency domain heterogeneous modulation fusion network FHFM-Net includes a first encoder-decoder branch, a second encoder-decoder branch, multiple frequency domain heterogeneous modulation fusion modules (FHFM), and a change decoder. Its specific structure is as follows: The first temporal image T1 is input to the input of the first encoder, which extracts multi-scale features. Its deepest output is connected to the input of the first-level decoder of the first encoder-decoder branch. The output of the first-level decoder of the first encoding-decoding branch is connected to the first input of the first-level feature fusion module of the first branch and the first input of the first-level frequency domain heterogeneous modulation fusion module FHFM, respectively. The output of the first-level feature fusion module of the first branch is connected to the input of the second-level decoder of the first encoder-decoder branch; the output of the second-level decoder of the first encoder-decoder branch is connected to the first input of the second-level feature fusion module of the first branch and the first input of the second-level FHFM, respectively; and so on, the first-level feature fusion module of the first encoder-decoder branch... i The outputs of the first-level decoder are respectively connected to the first branch of the second-level decoder. i The first input terminal of the level feature fusion module, and the second input terminal of the level feature fusion module i Connect the first input terminal of the FHFM stage; Among them, the second input end of the feature fusion module at each level of the first branch is connected to the jump connection output end of the first encoder at the corresponding scale, so as to receive shallow spatial features. The second temporal image T2 is input to the input of the second encoder. The second encoder extracts post-disaster multi-scale features, and its deepest output is connected to the input of the first-level decoder of the second encoder-decoder branch. The output of the first-level decoder of the second encoding-decoding branch is connected to the first input of the first-level feature fusion module of the second branch and the second input of the first-level frequency domain heterogeneous modulation fusion module FHFM, respectively. The output of the first-level feature fusion module of the second branch is connected to the input of the second-level decoder of the second encoding-decoding branch; the output of the second-level decoder of the second encoding-decoding branch is connected to the first input of the second-level feature fusion module of the second branch and the second input of the second-level FHFM, respectively; and so on, the second-level feature fusion module of the second encoding-decoding branch... i The output of the first-level decoder is respectively connected to the second branch of the second branch. i The first input terminal of the level feature fusion module, and the second input terminal of the level feature fusion module i Connect the second input terminal of the FHFM stage; In this section, the second input end of each level feature fusion module of the second branch is connected to the output end of the second encoder at the corresponding scale to form a skip connection and receive the shallow spatial features of the post-disaster image. Through the above-mentioned dual-branch multi-level decoding and feature fusion serial structure, deep semantic extraction and pixel-level spatial restoration of pre-disaster and post-disaster remote sensing images can be performed at different spatial scales. Thus, in complex disaster scene, neither small house damage is missed, nor large areas of building collapse are completely covered. In addition, the variation decoder branch includes multiple multi-scale feature blocks and variation fusion modules connected in series. The output of the first-level FHFM is connected to the input of the first multi-scale feature block of the change decoder; The output of the first multi-scale feature block of the change decoder is connected to the first input of the first change fusion module; the output of the second-stage FHFM is connected to the second input of the first change fusion module. The output of the first change fusion module is connected to the input of the second multi-scale feature block of the change decoder; The output of the second multi-scale feature block of the change decoder is connected to the first input of the second change fusion module; the output of the third-level FHFM is connected to the second input of the second change fusion module. Similarly, the output of the shallowest multi-scale feature block of the change decoder outputs the final change detection result map.
[0013] Through the above-mentioned "dual-branch independent decoding - cascaded lateral injection - single-branch change decoding" topology, it is possible to ensure that pre-disaster and post-disaster features at different resolution scales are accurately aligned and fused at the corresponding spatial levels. This enables the network to integrate global large-area collapse semantics and local small building crack features from bottom to top, greatly improving the scale adaptability of disaster building assessment.
[0014] Each level of the frequency domain heterogeneous modulation fusion module (FHFM) has the same internal structure, as detailed below: The temporal decoding features of the first branch and the second branch input to the FHFM are denoted as D1 and D2, respectively. Feature D1 is input to the input terminal of the first two-dimensional Fourier transform unit; feature D2 is input to the input terminal of the second two-dimensional Fourier transform unit. The output terminal (output frequency domain representation F1) of the first two-dimensional Fourier transform unit is connected to the first input terminal of the heterogeneous dual-branch modulation mechanism (HDMM); the output terminal (output frequency domain representation F2) of the second two-dimensional Fourier transform unit is connected to the second input terminal of the HDMM. The first output terminal of the HDMM is connected to the input terminal of the first two-dimensional inverse Fourier transform unit; the second output terminal of the HDMM is connected to the input terminal of the second two-dimensional inverse Fourier transform unit. The output terminals of the first two-dimensional inverse Fourier transform unit and the second two-dimensional inverse Fourier transform unit are connected together to the input terminal of the cosine-gated fusion module CGFM. The output of the aforementioned CGFM serves as the output of the corresponding FHFM and is connected to the subsequent change decoder module.
[0015] This closed-loop frequency domain conversion topology forces the network to undergo frequency domain filtering and reconstruction before spatial domain fusion in terms of physical connection, thereby completely isolating interference caused by low-frequency environmental changes such as light intensity and seasons before and after disasters, and ensuring the purity of the signal input to the fusion module.
[0016] The aforementioned cosine-gated fusion module CGFM is specifically as follows: Spatial features D1' and D2' are simultaneously input in parallel to the input terminals of the first channel splicing unit Concat, the cosine difference calculation unit, and the absolute difference unit Sub; The output of the first channel splicing unit (Concat) is connected to the input of the channel attention unit (CA); the output of the channel attention unit (CA) is connected to the first input of the first multiplier and the first input of the second multiplier, respectively. The output of the absolute difference unit (Sub) is connected to the input of the convolutional thinning unit (Conv 3x3); the output of the convolutional thinning unit is connected to the input of the second channel stitching unit (Concat). The output of the cosine difference calculation unit (output cosine difference diagram) is connected to the second input (control terminal) of the first multiplier. The output of the first multiplier is also connected to the input of the second channel splicing unit Concat. The output of the second channel splicing unit Concat is connected to the input of the CBAM module. The output of the CBAM module is connected to the second input of the second multiplier. The output of the second multiplier is used to connect to the change decoder. This structure employs a cosine difference multiplier with cross-wiring to perform hard-polarity gating on semantic features and difference features, followed by double verification of empty / channel data using CBAM. This rigorous topology design forces the network to adaptively focus on and refine real building collapse pixels in a chaotic post-disaster ruin background, achieving highly noise-resistant feature fusion.
[0017] The specific implementation process of the heterogeneous dual-branch modulation mechanism (HDMM) includes: In heterogeneous dual-branch modulation mechanism (HDMM), the frequency domain representation is first... and Construct two fixed complementary masks, one for low frequency and the other for high frequency. and high frequency mask The low-frequency mask The high-frequency mask is defined with the origin of the spectrum as the center and a preset radius threshold as the basis for determining the low-frequency region. Used to characterize the high-frequency region outside the low-frequency region; utilizing the low-frequency mask and high frequency mask Dual-phase frequency domain representation and Decomposition was performed to obtain the dual-temporal low-frequency components. , and dual-phase high-frequency components , The high-frequency components obtained from the decomposition are input to the high-frequency modulation branch to highlight edge, texture and local structural change information, while the low-frequency components obtained from the decomposition are input to the low-frequency modulation branch to reduce the pseudo-change response caused by differences in lighting, season and imaging style, thereby achieving decoupled modeling of content information and style information.
[0018] The specific implementation process of the high-frequency modulation branch includes: In the high-frequency modulation branch, the dual-phase high-frequency components are first analyzed. and Element-by-element subtraction and taking the absolute value yields the difference features characterizing the local differences between the two time phases. Then, the differential features Enter in sequence from Enhanced differential features are obtained from the feature enhancement unit consisting of convolutional layers, batch normalization layers, and PReLU activation functions. Furthermore, the enhanced differential features... Perform channel-dimensional max pooling and average pooling respectively, and input the pooling results into... Frequency domain attention maps are generated in convolutional layers and sigmoid activation functions. Finally, the frequency domain attention map is used. Weighted modulation of the dual-phase high-frequency components yields the modulated high-frequency characteristics. and This enhances the high-frequency response relevant to real-world changes and suppresses noise textures and background interference. The specific implementation process of the low-frequency modulation branch includes: In the low-frequency modulation branch, the dual-time low-frequency components are first analyzed. and Along the channel dimension Normalization yields normalized low-frequency features. and Subsequently, the cosine similarity between the two temporal low-frequency components is calculated based on the normalized low-frequency features, and then further converted into normalized difference weights. Finally, the normalized difference weights are used. Weighted modulation of the dual-phase low-frequency components yields the modulated low-frequency characteristics. and By employing the above methods, the network focuses more on semantic structure changes in low-frequency components rather than style changes such as brightness, contrast, and seasonal differences, thereby reducing pseudo-change responses; modulated high-frequency features , With modulated low-frequency characteristics , The components are combined separately to form the reconstructed dual-temporal frequency domain representation. and .
[0019] The specific implementation process of the cosine-gated fusion module CGFM includes: The biphase spatial characteristics after inverse Fourier transform and Concatenate the data along the channel dimension to obtain joint semantic features. ; in the joint semantic features Adaptive max pooling and adaptive average pooling are performed on the first two layers, and channel attention weights are generated using the Sigmoid activation function. and Then, the joint semantic features are relabeled using the aforementioned channel attention weights to obtain enhanced features. Simultaneously, regarding the dual-temporal spatial characteristics and Element-wise absolute differences are used to obtain local differential features, which are then further refined through convolutional layers to obtain the differential features. Furthermore, the two-temporal spatial characteristics are calculated. and The cosine similarity between them is calculated and converted into a cosine difference map. Then, the cosine difference map is used to enhance the features. Gated modulation is performed to obtain differentially perceived semantic features. Finally, the difference-aware semantic features are... The difference features The data is concatenated along the channel dimension and fed into the convolutional block attention module (CBAM) and convolutional layers for joint modeling, outputting the final fused representation. It is used to characterize the change information between two temporal images and serves as the input feature of the change detection decoder.
[0020] Compared with the prior art, the present invention has the following technical effects: 1. This invention first decouples frequency information using Fourier transform. For high-frequency components, it employs differential operations to extract signals sensitive to changes, and then introduces modulation and filtering bottlenecks to enhance the biphase response in changing regions and suppress noise. For low-frequency components, it uses negative cosine similarity to characterize the semantic consistency between feature vectors, thereby reducing sensitivity to style changes. Subsequently, the frequency domain features are mapped back to the spatial domain, where cosine similarity is further used to modulate feature difference and feature concatenation, enabling the complementary fusion of multiple strategies. 2. This paper addresses two problems existing in the application of frequency domain-based remote sensing change detection methods in disaster-affected building assessment tasks: 1) Existing methods typically simplify frequency domain processing into a uniform global filtering or style alignment process, resulting in insufficient mining of the differential and complementary information carried by different frequency components. Consequently, content information and style information are difficult to decouple effectively, affecting the accurate extraction of real disaster-affected change areas; 2) Existing methods generally rely on feature splicing or element-wise subtraction in the process of dual-temporal feature fusion. The former is difficult to explicitly highlight change-sensitive information, while the latter easily amplifies the appearance differences of unchanged areas, thus limiting the model's ability to finely perceive the edges of disaster-affected buildings and local damage areas. This invention proposes a frequency domain heterogeneous modulation fusion network, FHFM-Net, for disaster-affected building assessment. This network maps dual-temporal features to the frequency domain and performs decoupling modeling of high-frequency and low-frequency components. It can more effectively focus on real disaster-affected change areas while suppressing style migration interference, thereby significantly improving the detection performance and generalization ability of remote sensing change detection models in disaster-affected building assessment tasks. 3. In scenarios with significant stylistic differences and complex structural changes in affected areas, the assessment results for disaster-affected buildings are prone to problems such as increased false changes, incomplete extraction of true change areas, blurred building edges, and omissions of local areas. Furthermore, the confusion between content and style information further weakens the model's ability to focus on the actual disaster-affected areas, reducing the completeness and accuracy of the detection results. However, existing frequency-domain-based remote sensing change detection methods still have shortcomings in decoupling content and style representations, complementary utilization of different frequency information, and generalized detection in complex disaster scenarios, making it difficult to achieve high-precision identification of change targets in various types of disaster-affected buildings. To address the aforementioned issues, this invention proposes a frequency-domain heterogeneous modulation strategy. This strategy decouples the dual-temporal features using Fourier transform and distributes high-frequency and low-frequency information to different branches. The high-frequency branch extracts change-sensitive signals through differential operations and enhances the response in changing regions and suppresses background noise by addressing modulation and filtering bottlenecks. The low-frequency branch, on the other hand, uses negative cosine similarity to characterize the semantic consistency between features, thereby reducing the model's sensitivity to changes in lighting, shadows, smoke occlusion, and imaging style. This enhances the model's ability to perceive local and overall structural changes in complex disaster scenarios. 4. In remote sensing disaster building assessment tasks, simply using feature stitching or difference strategies for bi-temporal feature fusion can easily lead to insufficient response in real change areas or false detections in non-disaster areas, resulting in unclear building damage outlines, discontinuous local damage areas, blurred change boundaries, and omission of some disaster-affected targets. Although some existing methods have improved change detection performance to some extent by introducing feature enhancement mechanisms, relationship modeling modules, or cross-layer interaction structures, current frequency domain processing-based methods are still insufficient in the design of fusion mechanisms after frequency domain features are mapped back to the spatial domain, and cannot fully take into account the complementary relationship between global semantic information and local difference information. To this end, this invention further designs a cosine-gated fusion strategy. After mapping frequency domain features back to the spatial domain, cosine similarity is used to jointly modulate the feature difference and feature stitching results, enabling the two types of complementary information to achieve synergistic fusion. This method can both preserve the joint semantic context in bi-temporal features and enhance the expression of local differences that are sensitive to change, thereby improving the model's ability to detect the edges, outline damage, and local damage details of disaster-affected buildings, and improving the accuracy, completeness, and robustness of disaster area identification. Attached Figure Description
[0021] The present invention will be further described below with reference to the accompanying drawings and embodiments: Figure 1 This is a flowchart of the method of the present invention; Figure 2 This is a schematic diagram of the overall FHFM-Net model structure in step 4 of the present invention; Figure 3 This is a schematic diagram of the specific structure of HDMM in step 4 of the present invention. Detailed Implementation
[0022] like Figure 1 As shown, a remote sensing change detection method for assessing disaster-affected buildings includes the following steps: Step 1: Collect and organize dual-temporal remote sensing image samples containing information on changes in building damage caused by disasters; Step 2: Perform preprocessing and enhancement operations on the image samples in the dataset. Expand the sample distribution by means of rotation, cropping, flipping, brightness adjustment and normalization to improve data diversity; Step 3: Use the CVAT annotation tool to annotate the target information of the enhanced images, and divide the dataset into training set, validation set and test set according to the preset ratio to support the training and performance evaluation of the model at different stages. Step 4: Construct a frequency domain heterogeneous modulation fusion network FHFM-Net for building disaster assessment; Step 5: Input the preprocessed and labeled building disaster change dataset into the FHFM-Net model for training. Iteratively optimize the network parameters through forward and backward propagation so that the model can fully learn the spatial and frequency domain features of building disaster changes, thereby improving the change detection accuracy and generalization ability.
[0023] In step 1, the specific approach is as follows: the image data includes two parts: firstly, building disaster change detection samples from existing publicly available databases; and secondly, dual-temporal image samples containing building disaster change information acquired through on-site collection, drone aerial photography, or satellite remote sensing. The on-site image samples can cover scenarios with dense buildings, complex backgrounds, smoke and dust obstruction, shadow shifts, and complex disaster types, thereby effectively expanding the sample types and improving the diversity of the dataset.
[0024] In step 2, the specific approach is as follows: preprocessing operations are performed on the building disaster change detection dataset, including geometric registration, size unification, grayscale normalization, noise suppression, and data augmentation of the dual-temporal images, in order to reduce the distortion caused by changes in imaging conditions, thereby improving the efficiency of subsequent models in recognizing building disaster features and enhancing their robustness and environmental adaptability in complex backgrounds, occlusion interference, and multi-hazard factor coupling scenarios.
[0025] In step 3, the specific steps are as follows: when annotating the building disaster change dataset, the CVAT tool is used to accurately annotate the building disaster change areas, and the dataset is divided into training set, validation set and test set according to a preset ratio to ensure that the model can achieve effective training and objective evaluation under different data distribution conditions; the final constructed dataset covers disaster change target samples under different background complexities, providing a foundation for the training and evaluation of the subsequent building disaster assessment remote sensing change detection model.
[0026] In step 4, the specific implementation of the frequency domain heterogeneous modulation fusion network FHFM-Net for building disaster assessment is as follows: like Figure 2 As shown, the FHFM-Net is an end-to-end dual-temporal remote sensing change detection network, comprising two symmetrically arranged encoder-decoder branches, multiple frequency-domain heterogeneous modulation fusion modules (FHFM), and a change detection decoder. The input dual-temporal remote sensing images are denoted as pre-disaster images. and post-disaster images The pre-disaster images Images of the aftermath of the disaster The data are input into two encoder-decoder branches with identical structures and independent parameters to extract multi-scale feature information of the building before and after the disaster.
[0027] Among them, the two encoders respectively processed the pre-disaster images. and post-disaster images Hierarchical convolutional feature extraction is performed to obtain semantic features at different scales. Correspondingly, two decoders progressively recover and reconstruct the encoded high-level semantic features, outputting bi-temporal decoded feature pairs at each decoding layer. For any decoding layer at any scale, the decoded features output by the pre-disaster branch are denoted as... The decoding features of the post-disaster branch output are denoted as The dual-temporal decoding feature pair The data are input together into the frequency domain heterogeneous modulation fusion module (FHFM) to achieve joint modeling of structural, texture, and style differences between pre- and post-disaster building areas.
[0028] The frequency domain heterogeneous modulation fusion module (FHFM) includes a heterogeneous dual-branch modulation mechanism (HDMM) and a cosine-gated fusion module (CGFM). The HDMM first maps the dual-phase decoded features to the frequency domain, then explicitly decomposes and heterogeneously modulates the high-frequency and low-frequency components. The CGFM maps the frequency-domain modulated features back to the spatial domain and performs difference-aware fusion to obtain the fused representation at the current scale. Fusion representation at multiple scales The data is further input into the change detection decoder, and after step-by-step aggregation, upsampling, and convolutional recovery, the final building disaster change detection result map is output.
[0029] The input dual-temporal remote sensing images are first processed by corresponding encoder-decoder branches to extract multi-scale decoding features. Then, the pre-disaster and post-disaster decoding features at each scale are fed into the FHFM module to extract change-sensitive high-frequency information and style-related low-frequency information in the frequency domain. Heterogeneous modulation is used to enhance the response to real disaster-affected changes and suppress pseudo-change responses caused by non-disaster factors. The processed features are then mapped back to the spatial domain, and the spliced information and differential information are jointly modeled by the cosine-gated fusion module to obtain fused features that have both global semantic consistency and local change sensitivity. Finally, the change detection decoder uses the fused features at multiple scales to detect and locate the disaster-affected areas of buildings.
[0030] Preferably, the specific implementation process of the heterogeneous dual-branch modulation mechanism (HDMM) includes: First, the input dual-temporal decoding features and By mapping the two-dimensional Fourier transforms to the frequency domain, we obtain the dual-temporal frequency domain representation. and ,Right now: ; in, This represents a two-dimensional Fourier transform.
[0031] To achieve frequency component decoupling, a low-frequency mask is constructed in the frequency domain. and high frequency mask Let the coordinates of the spectrum center be... The preset radius threshold is Then the low-frequency mask It can be represented as: ; High frequency mask for: ; In the low frequency mask and high frequency mask Under the influence of dual-time-phase frequency domain representation and It is decomposed into low-frequency components and high-frequency components, specifically: ; in, This indicates element-wise multiplication. After decomposition, the low-frequency components are mainly used to characterize low-frequency information such as brightness, contrast, shadow shift, smoke and dust occlusion, and differences in imaging style; the high-frequency components are mainly used to characterize high-frequency detail information such as building edges, contour breaks, local texture damage, and structural destruction.
[0032] Preferably, the specific implementation process of high-frequency branches includes: In the high-frequency branch, the dual-phase high-frequency components are first analyzed. and By performing element-wise differencing and taking the absolute value, we obtain the change-sensitive differential characteristics. : ; in, This represents the absolute value operation. The difference feature... It can roughly characterize the degree of change in the edges, textures and local structures of buildings before and after a disaster.
[0033] To reduce background noise and irrelevant high-frequency interference, the difference features are... Input to Enhanced high-frequency differential features are obtained from the feature enhancement unit composed of convolutional layers, batch normalization layers, and PReLU activation functions. : ; Subsequently, the enhanced high-frequency difference features were analyzed. Perform global average pooling and global max pooling respectively, and then concatenate the pooling results along the channel dimension before inputting them into the database. In convolutional layers and the sigmoid activation function, a frequency domain attention map is generated. : ; in, Indicates global average pooling. Indicates global max pooling. This indicates a splicing operation. This represents the Sigmoid activation function.
[0034] Finally, the frequency domain attention map is used. Weighted modulation of the dual-phase high-frequency components yields enhanced high-frequency characteristics. and : ; Through the above processing, the high-frequency branch can focus on enhancing the real changes in building contour damage, edge breakage and local texture mutation caused by disasters, and suppress false responses caused by background texture, noise and non-disaster factors.
[0035] Preferably, the specific implementation process of the low-frequency branch includes: In the low-frequency branch, considering that low-frequency components reflect imaging style differences rather than true semantic changes, the low-frequency components in the dual-temporal branch are first analyzed. and Perform separately Normalization yields normalized low-frequency features. and : ; in, Represents along the channel dimension Norm, A smoothing constant is introduced to prevent the denominator from being zero.
[0036] Then, the cosine similarity between the low-frequency features of the two time phases is calculated, and further converted into negative cosine difference weights. : ; The difference weight This is used to reflect the degree of directional difference between low-frequency components in two time phases. When the pre-disaster and post-disaster components maintain a high degree of consistency in the low-frequency space, it indicates that the changes are more likely to be caused by style disturbances; when the directional differences are significant, it is more likely to correspond to actual semantic changes.
[0037] Furthermore, utilizing the aforementioned difference weights Weighted modulation of the dual-phase low-frequency components yields enhanced low-frequency characteristics. and : ; Through the above processing, the low-frequency branch can effectively reduce the style disturbance caused by changes in lighting, shadow shifts, smoke and dust obstruction, and sensor differences while preserving the overall semantic structure of the building, thereby reducing pseudo-change responses.
[0038] Preferably, the specific implementation process of frequency domain feature reconstruction includes: After high-frequency and low-frequency branching, the enhanced high-frequency and low-frequency features are reconstructed to obtain a dual-temporal reconstructed frequency domain representation. and : ; Then, the reconstructed biphase frequency domain representation is remapped back to the spatial domain using a two-dimensional inverse Fourier transform to obtain the modulated biphase spatial features. and : ; in, This represents the two-dimensional inverse Fourier transform. Therefore, the information separated, modeled, and enhanced in the frequency domain can be returned to the spatial domain, providing a more discriminative bi-temporal representation for subsequent spatial domain fusion.
[0039] Preferably, the specific implementation process of the cosine-gated fusion module CGFM includes: In the spatial domain, to simultaneously consider the joint semantic context and local differential responses, the modulated bi-temporal spatial features are first... and Concatenate the data along the channel dimension to obtain joint semantic features. : ; in, This indicates a channel splicing operation.
[0040] Subsequently, the joint semantic features Adaptive max pooling and adaptive average pooling are performed separately, and channel attention weights are obtained through the Sigmoid activation function. and : ; The joint semantic features are then recalibrated based on the channel attention weights to obtain channel-enhanced features. : ; in, This indicates element-wise multiplication.
[0041] At the same time, the dual-temporal spatial characteristics and Perform an absolute difference operation to extract local variation information, and further refine it through a convolutional layer to obtain local difference features. : ; Furthermore, the cosine similarity between the two temporal spatial features is calculated: ; And convert it into a difference-gated graph. : ; Then, using the difference-gated graph Channel enhancement features Modulation is performed to obtain differentially perceived semantic features. : ; Finally, the difference-aware semantic features are... Local differences The concatenation is performed along the channel dimension and then sequentially fed into the convolutional block attention module (CBAM) and convolutional layers for joint modeling, outputting the fused features at the current scale. : ; The fusion feature It retains the joint semantic context between the two temporal features and explicitly emphasizes the local difference information that is sensitive to changes in building damage, which can improve the model's ability to identify details of building edge, contour damage and local damage.
[0042] Preferably, the specific implementation process of the change detection decoder includes: The change detection decoder receives fused features from multiple scales. The system then fuses multi-scale information step by step through upsampling, skip connections, and convolutional recovery to form the final building disaster change detection result map. ,Right now: ; in, The output map of building damage changes is used to characterize the changes in location, range, and severity of damage to the building area before and after the disaster.
[0043] This embodiment details the implementation process and data flow of the frequency domain heterogeneous modulation fusion network FHFM-Net. The network input consists of dual-temporal remote sensing images from a publicly available dataset, denoted as pre-disaster images. Images of the aftermath of the disaster Data dimensions are ,in The number of feature channels, The corresponding grid size for the study area. The main network architecture consists of four parts: a dual-branch encoder, a dual-branch decoder, a heterogeneous dual-branch modulation and fusion module, and a change detection path. The processing procedure for each stage is as follows: The input flow for the dual-branch encoder is: dual-temporal remote sensing images. and The two input features, respectively, have dimensions of 3×256×256. First, pre-disaster and post-disaster images are input into the first-layer encoding module. Initial shallow features are extracted through convolution, normalization, and non-linear activation operations, increasing the number of output channels from 3 to 32, resulting in an output feature dimension of 32×256×256. Subsequently, the two features continue to be processed through multiple levels of encoding blocks to extract higher-level semantic information, with the number of channels expanding progressively, forming highly semantically representative features in the deep encoding stage. Simultaneously, channel swapping operations are introduced during the mid-to-high-level feature extraction process to interactively enhance the dual-temporal features, thereby improving the network's ability to express the differences between the two temporal phases.
[0044] The input flow of the dual-branch decoder is as follows: the dual-temporal features output from the deepest layer of the encoder serve as the input features for the two decoding branches. The decoder fuses the corresponding layer features from the encoding stage through progressive upsampling and skip connections, gradually restoring and reconstructing deep semantic information and shallow detail information. Taking this embodiment as an example, the decoding stage forms four levels of dual-temporal decoding features, with the deepest decoding feature having 512 channels, subsequently restored to 256, 128, and 64 channels respectively. For each decoding layer, the features output by the pre-disaster branch are denoted as... The characteristics of the post-disaster branch output are denoted as The dual-temporal decoding features serve as input to the subsequent heterogeneous dual-branch modulation fusion module, used to further model the disaster-affected changes in building information.
[0045] The input process of the frequency domain heterogeneous modulation fusion module (FHFM) is as follows: the dual-temporal decoding features output by the dual-branch encoder-decoder at the corresponding scale are processed. and Simultaneously, the input is fed into the FHFM module. The FHFM module consists of two parts: a heterogeneous dual-branch modulation mechanism (HDMM) and a cosine-gated fusion module (CGFM). Its overall function is to decouple the dual-temporal characteristics in the frequency domain, perform heterogeneous modulation, and fuse them in the spatial domain, thereby enhancing the response of the real disaster-affected area and suppressing pseudo-change interference.
[0046] The processing flow of Heterogeneous Dual-Branch Modulation Mechanism (HDMM) is as follows: First, the input dual-temporal decoded features are mapped from the spatial domain to the frequency domain to obtain the corresponding dual-temporal frequency domain features. Then, the frequency domain features are divided into high-frequency components and low-frequency components using a preset frequency domain mask. Among them, the high-frequency components mainly represent information such as building edges, contour breaks, local texture damage, and detail changes, while the low-frequency components mainly represent information such as brightness, contrast, shadow shift, smoke and dust occlusion, and imaging style differences.
[0047] The data flow of the high-frequency branch is as follows: the dual-temporal high-frequency components are first subjected to element-wise differencing to extract high-frequency difference signals that are sensitive to changes; then, the difference features are input into an enhancement unit consisting of convolution, normalization, and activation operations to perform preliminary denoising and enhancement of the high-frequency difference information; based on this, the enhanced difference features are further subjected to pooling and convolution mapping to generate a frequency domain attention map; finally, the attention map is used to perform weighted modulation on the dual-temporal high-frequency components to output the enhanced high-frequency features. Through this process, the network can better highlight the real changes in response to building damage caused by disasters, such as edge damage, contour breaks, and local texture anomalies, while suppressing high-frequency interference caused by background textures and irrelevant noise.
[0048] The data flow of the low-frequency branch is as follows: the dual-temporal low-frequency components are first normalized to reduce the impact of amplitude differences; then, the semantic similarity between the dual-temporal low-frequency features is calculated and converted into low-frequency difference weights; finally, these weights are used to perform weighted modulation on the dual-temporal low-frequency components to obtain enhanced low-frequency features. Through this process, the network can effectively reduce pseudo-change responses caused by illumination variations, shadow shifts, smoke and dust coverage, and differences in imaging style while preserving the overall semantic structure of the building.
[0049] The data flow for frequency domain feature reconstruction is as follows: the enhanced high-frequency features output from the high-frequency branch and the enhanced low-frequency features output from the low-frequency branch are reconstructed separately to form a frequency domain representation after dual-temporal reconstruction; then, the reconstructed dual-temporal frequency domain representation is mapped back to the spatial domain to obtain the modulated dual-temporal spatial features. After this step, the features that have completed differential modeling in the frequency domain return to the spatial domain, providing more discriminative input features for subsequent differential perception fusion in the spatial domain.
[0050] The processing flow of the cosine-gated fusion module CGFM is as follows: First, the modulated bi-temporal spatial features are concatenated along the channel dimension to obtain joint semantic features. Then, channel attention enhancement is applied to the joint semantic features to improve the discriminative response related to changes in building disaster damage. Simultaneously, absolute difference operations are performed on the bi-temporal spatial features, followed by further refinement through convolutional layers to extract local difference features. Subsequently, the cosine similarity relationship between the bi-temporal spatial features is calculated and transformed into a difference-gated map, which is used to modulate the enhanced joint semantic features to obtain difference-aware semantic features. Finally, the difference-aware semantic features and local difference features are concatenated along the channel dimension and processed sequentially through a convolutional block attention module and convolutional layers to output the fused features at the current scale. The fusion features preserve the joint semantic context in the two-phase images and enhance the expression of local differences that are sensitive to changes in building damage.
[0051] The input flow of the change detection decoder is: fused features output from FHFM modules at various scales. These features serve as input to the change detection decoder. The decoder aggregates, upsamples, and performs convolutional reconstruction on the fused features from different scales, comprehensively utilizing deep semantic information and shallow edge detail information to ultimately output a building disaster change detection result map. This result map can characterize the location, extent, and severity of changes in building areas before and after a disaster, providing a basis for subsequent building damage identification, disaster assessment, and post-disaster emergency decision-making.
[0052] Example: This embodiment provides a detailed description of the specific implementation and experimental results of the FHFM network: 1. Experimental setup and parameter configuration: Both training and inference of the model were performed on an NVIDIA RTX 3090 GPU equipped with 24GB of VRAM, fully utilizing its high-performance computing capabilities to meet the training and testing requirements of remote sensing change detection tasks. The proposed FHFM-Net was implemented using the PyTorch framework, and all experiments were conducted in a single-GPU environment. The total number of training epochs was set to 200, and the batch size was set to 8. The optimizer used the AdamW algorithm for parameter updates, with a random seed of 42, and initial learning rate and weight decay coefficients were set. Through the above parameter configuration, the model's convergence efficiency can be improved while ensuring training stability, thus providing reliable support for change detection performance in disaster-affected building assessment tasks.
[0053] 2. Dataset Introduction: The invention evaluates the proposed model on two challenging benchmark public datasets, LEVIR-CD and WHU-CD.
[0054] The LEVIR-CD dataset: The LEVIR-CD dataset is used for building change detection, providing 445, 64, and 128 image pairs for training, validation, and testing, respectively. Each image pair consists of two biphase images of size 1024 × 1024. To control GPU memory usage while fairly comparing with existing methods, each large image is divided into 16 non-overlapping 256 × 256 sub-blocks. Ultimately, this invention yielded a total of 7,120, 1,024, and 2,048 sub-block images for the training, validation, and testing sets, respectively.
[0055] The WHU-CD dataset consists of a pair of large, bitemporal aerial images covering Christchurch, New Zealand, with a spatial resolution of approximately 0.2 meters. The original images cover earthquake-affected areas and subsequent reconstructions, primarily capturing structural changes such as building construction and demolition. This invention crops the large images into 256 × 256 pixel tiles, resulting in 7,634 pairs of bitemporal image pairs with binarized building change annotations. Pairs 6,096, 762, and 762 are assigned sequentially as training, validation, and test sets, respectively.
[0056] During the training of FHFM-Net, this invention uses the training sets of the unlabeled LEVIR-CD and WHU-CD datasets to evaluate the effectiveness of the FHFM-Net framework during the training phase.
[0057] 3. Evaluation indicators: To quantitatively evaluate each pre-training method, this invention selects Precision, Recall, mIoU (mean Intersection over Union), and F1 score (F1) as evaluation metrics, where TP, FP, and FN represent the percentages of positive samples with positive predictions, positive samples with negative predictions, and negative samples with positive predictions, respectively.
[0058] ; ; ; ; The Intersection over Union (IoU) ratio represents the ratio of the intersection and concatenation of the predicted map and the ground truth map. The F1 score is the harmonic mean of precision and recall. High precision means the algorithm can accurately identify changed pixels. Conversely, high recall indicates that the algorithm can detect a higher proportion of changed pixels from the real data.
[0059] 4. Analysis of experimental results: This invention compares the performance of several mainstream remote sensing change detection models on the LEVIR-CD and WHU-CD datasets.
[0060] The comparative experimental results on the LEVIR-CD dataset are shown in Table I. FHFM-Net achieved the best performance among all the compared methods, with F1, OA, Precision, Recall, and IoU reaching 92.45%, 99.24%, 93.31%, 91.60%, and 85.96%, respectively. Overall, it outperformed methods such as BIT, ChangeFormer, AMTNet, SFRINet, SEFINet, ELGCNet, AEGL-Net, CASP, and Change3D. Among them, Change3D and AEGL-Net performed better than FHFM-Net, but FHFM-Net still improved F1 and IoU by 0.72 and 1.24 percentage points respectively compared to Change3D, and by 1.10 and 1.89 percentage points respectively compared to AEGL-Net, indicating that the proposed method can more effectively improve the overall performance of change detection. Furthermore, FHFM-Net achieved the highest values in both Precision and Recall, indicating that this method can not only more accurately identify real-world change areas and reduce false positives, but also extract change targets more completely and reduce false negatives. In contrast, while some methods are close to or even better than others in individual metrics, they often struggle to balance Precision and Recall simultaneously. For example, SEFINet and SFRINet have high Precision but low Recall, resulting in limited F1 and IoU scores. CASP and Change3D perform well in Recall, but their Precision is still lower than FHFM-Net, thus their overall performance still lags behind. In summary, FHFM-Net demonstrates better balance and stronger comprehensive detection capabilities across all metrics, enabling more accurate and complete identification of real-world change areas in dual-temporal remote sensing images. This verifies the effectiveness of this method in disaster-affected building change detection tasks and its robustness in complex scenarios.
[0061] TABLE I: Accuracy comparison of different models on the LEVIR-CD dataset; all scores are expressed as percentages (%).
[0062] The comparative experimental results on the WHU-CD dataset are shown in Table II. FHFM-Net achieved the best performance among all the comparative methods, with F1, OA, Precision, Recall and IoU reaching 94.72%, 99.59%, 96.33%, 93.17% and 89.96% respectively. Overall, it outperformed methods such as BIT, ChangeFormer, AMTNet, SFBINet, SEFINet, ELGCNet, AEGL-Net, CASP and Change3D. Among the proposed methods, Change3D and AEGL-Net outperformed FHFM-Net, but FHFM-Net still improved F1, OA, Precision, Recall, and IoU by 0.30, 0.03, 0.36, 0.24, and 0.53 percentage points respectively compared to Change3D, and by 0.91, 0.07, 0.08, 1.67, and 1.61 percentage points respectively compared to AEGL-Net. This indicates that the proposed method can further improve the overall performance of change detection. Furthermore, FHFM-Net achieved the highest values in both Precision and Recall, demonstrating that the method can not only more accurately identify real change regions and reduce false detections, but also extract changed targets more completely and reduce false negatives. In contrast, while some methods perform well on individual metrics, they struggle to balance precision and recall. For example, SFBINet achieves a precision of 94.65%, but its recall is only 88.60%. CASP and Change3D, while performing better in recall (90.96% and 92.93% respectively), still have lower precision than FHFM-Net, resulting in an overall performance gap. Overall, FHFM-Net demonstrates better balance and stronger comprehensive detection capabilities across all metrics, enabling more accurate and complete identification of real-world change areas in dual-temporal remote sensing images. This further validates the effectiveness of this method in detecting changes in disaster-affected buildings and its robustness in complex scenarios.
[0063] TABLE II: Accuracy comparison of different models on the WHU-CD dataset; all scores are expressed as percentages (%).
[0064] 5. Ablation experiment analysis: Table III specifically demonstrates the impact of introducing the FHFM module on the performance of the baseline model on the two datasets. To verify the effectiveness of the proposed frequency-domain heterogeneous modulation fusion strategy, ablation experiments were conducted on the LEVIR-CD and WHU-CD datasets, comparing the performance changes between using only the baseline model and introducing the FHFM module into the baseline.
[0065] Experimental results show that the addition of the FHFM module significantly improves the detection performance of the model on both datasets. On the LEVIR-CD dataset, the baseline model's F1 score and IoU were 91.27% and 83.94%, respectively, which improved to 92.45% and 85.96% after introducing the FHFM module, representing increases of 1.18 and 2.02 percentage points, respectively. On the WHU-CD dataset, the baseline model's F1 score and IoU were 93.71% and 88.16%, respectively, which further improved to 94.72% and 89.97% after adding the FHFM module, representing increases of 1.01 and 1.81 percentage points, respectively. These results indicate that the FHFM module can effectively enhance the ability to represent changes in dual-temporal features. On the one hand, by modeling the differences between high-frequency and low-frequency information in the frequency domain, it better highlights the real change regions and suppresses spurious change responses; on the other hand, by improving the model's ability to perceive edge details and local structural changes through the subsequent fusion mechanism. Therefore, when the baseline model is combined with the FHFM module, it can achieve higher F1 and IoU scores on different datasets, verifying the effectiveness and versatility of the proposed module in remote sensing change detection tasks.
[0066] TABLE III: Ablation study results of FHFM-Net on two datasets. All metrics are expressed as percentages (%).
[0067] To further examine the effectiveness of FHFM from a structural perspective, this invention conducted an ablation study on its core component CGFM, and the results are shown in Table VI.
[0068] TABLE IV: Accuracy analysis of CGFM on the LEVIR-CD dataset; best results are indicated in bold; all metrics are reported as percentages (%).
[0069] Specifically, "NO CA" indicates the removal of the channel attention mechanism, "NO Concat+CA" indicates the removal of the concatenation branch for enhanced attention, "NO CD" indicates the removal of cosine difference-based modulation, and "NO Sub" indicates the removal of the difference branch. Experimental results show that the complete CGFM achieves the best overall performance, with an F1 score of 92.45% and an IoU of 85.96%, while all simplified variants exhibit performance degradation to varying degrees. Specifically, the channel attention mechanism and the concatenation branch for enhanced attention help maintain and strengthen the joint semantic context of bi-temporal features, thereby improving the robustness of the fused representation; cosine difference-based modeling introduces additional difference cues, effectively supplementing direct difference information; and explicit difference branches play an important role in the representation of locally changing regions. Overall, these results fully validate the effectiveness of each component of CGFM in improving feature fusion quality and change detection performance.
[0070] In addition, this invention systematically compares several dual-temporal feature fusion strategies, and the results are summarized in Table V; TABLEV: Ablation experiments on feature fusion methods on LEVIR-CD, with best results highlighted in bold; all metrics are reported as percentages (%).
[0071] Specifically, "Add" indicates element-wise addition of dual-phase features, "Concat" indicates concatenation by channel, and "A" indicates concatenation by channel. B” and “|A” B|" corresponds to the direct feature difference and its absolute difference, respectively. Meanwhile, this invention further introduces three representative classic fusion modules, namely TFAM, TFIM, and DFC, to achieve a more comprehensive evaluation of fusion effectiveness. As shown in Table V, traditional fusion strategies, such as addition, splicing, and difference-based fusion (A... B and |A B|), the overall performance is quite similar, but only |A is used B| leads to a significant decrease in F1 score and IoU, indicating that a single difference representation is insufficient to support accurate and fine-grained change recognition. In contrast, the introduced classic fusion modules, including TFAM, TFIM, and DFC, improve performance to some extent, validating the effectiveness of more complex feature interaction mechanisms. Nevertheless, the proposed FHFM still achieves the best overall performance among all comparative methods, with an F1 score of 92.45% and an IoU of 85.96%. These results demonstrate that the organic integration of heterogeneous modulation and multi-strategy fusion can effectively overcome the limitations of single-operation fusion strategies, more fully exploit the complementarity of semantic information in dual-temporal images, and exhibit stronger pseudo-change suppression capabilities, thereby significantly improving the stability and accuracy of change detection.
Claims
1. A remote sensing change detection method for assessing disaster-affected buildings, characterized in that, Includes the following steps: Step 1: Collect and organize remote sensing image samples containing information on changes in urban buildings due to disasters; Step 2: Perform preprocessing and enhancement operations on the image samples in the dataset to expand the sample distribution; Step 3: Use the CVAT annotation tool to annotate the target information of the enhanced images, and divide the dataset into training set, validation set and test set according to the preset ratio to support the training and performance evaluation of the model at different stages. Step 4: Construct a frequency domain heterogeneous modulation fusion network FHFM-Net for building disaster assessment; Step 5: Input the preprocessed and labeled dataset of building disaster changes into the FHFM-Net model for training. Iteratively optimize the network parameters through forward and backward propagation so that the model can fully learn the spatial and frequency domain features of building disaster changes, thereby improving the accuracy and generalization ability of change detection.
2. The method according to claim 1, characterized in that, In step 1, the specific method is as follows: the image data includes the following two parts: one is the building disaster change detection sample in the existing public database, and the other is the image sample containing building disaster change information obtained through on-site collection.
3. The method according to claim 2, characterized in that, The image samples collected in the field can cover scenes with a lot of background interference, obvious target occlusion, and complex disaster types, thereby effectively expanding the sample types and improving the diversity of the dataset.
4. The method according to claim 1, characterized in that, In step 2, the specific approach is as follows: preprocessing operations are performed on the building disaster change detection dataset to correct and enhance spatial offset, grayscale unevenness, and imaging differences, so as to reduce the distortion caused by changes in imaging conditions, thereby improving the efficiency of subsequent models in recognizing building disaster features and enhancing their robustness and environmental adaptability in complex backgrounds, occlusion interference, and multi-hazard factor coupling scenarios.
5. The method according to claim 1, characterized in that, In step 3, the specific steps are as follows: when annotating the building disaster change dataset, the CVAT tool is used to accurately annotate the building disaster change areas, and the dataset is divided into training set, validation set and test set according to a preset ratio to ensure that the model can achieve effective training and objective evaluation under different data distribution conditions; the final constructed dataset covers disaster change target samples under different background complexities, providing a foundation for the training and evaluation of the subsequent building disaster assessment remote sensing change detection model.
6. The method according to claim 1, characterized in that, In step 4, the obtained frequency domain heterogeneous modulation fusion network FHFM-Net includes a first encoder-decoder branch, a second encoder-decoder branch, multiple frequency domain heterogeneous modulation fusion modules (FHFM), and a change decoder. Its specific structure is as follows: The first temporal image T1 is input to the input of the first encoder, which extracts multi-scale features. Its deepest output is connected to the input of the first-level decoder of the first encoder-decoder branch. The output of the first-level decoder of the first encoding-decoding branch is connected to the first input of the first-level feature fusion module of the first branch and the first input of the first-level frequency domain heterogeneous modulation fusion module FHFM, respectively. The output of the first-level feature fusion module of the first branch is connected to the input of the second-level decoder of the first encoder-decoder branch; the output of the second-level decoder of the first encoder-decoder branch is connected to the first input of the second-level feature fusion module of the first branch and the first input of the second-level FHFM, respectively; and so on, the first-level feature fusion module of the first encoder-decoder branch... i The outputs of the first-level decoder are respectively connected to the first branch of the second-level decoder. i The first input terminal of the level feature fusion module, and the second input terminal of the level feature fusion module i Connect the first input terminal of the FHFM stage; Among them, the second input end of the feature fusion module at each level of the first branch is connected to the jump connection output end of the first encoder at the corresponding scale, so as to receive shallow spatial features. The second temporal image T2 is input to the input of the second encoder. The second encoder extracts post-disaster multi-scale features, and its deepest output is connected to the input of the first-level decoder of the second encoder-decoder branch. The output of the first-level decoder of the second encoding-decoding branch is connected to the first input of the first-level feature fusion module of the second branch and the second input of the first-level frequency domain heterogeneous modulation fusion module FHFM, respectively. The output of the first-level feature fusion module of the second branch is connected to the input of the second-level decoder of the second encoding-decoding branch; the output of the second-level decoder of the second encoding-decoding branch is connected to the first input of the second-level feature fusion module of the second branch and the second input of the second-level FHFM, respectively; and so on, the second-level feature fusion module of the second encoding-decoding branch... i The output of the first-level decoder is respectively connected to the second branch of the second branch. i The first input terminal of the level feature fusion module, and the second input terminal of the level feature fusion module i Connect the second input terminal of the FHFM stage; In this section, the second input end of each level feature fusion module of the second branch is connected to the output end of the second encoder at the corresponding scale to form a skip connection and receive the shallow spatial features of the post-disaster image. In addition, the variation decoder branch includes multiple multi-scale feature blocks and variation fusion modules connected in series. The output of the first-level FHFM is connected to the input of the first multi-scale feature block of the change decoder; The output of the first multi-scale feature block of the change decoder is connected to the first input of the first change fusion module; the output of the second-stage FHFM is connected to the second input of the first change fusion module. The output of the first change fusion module is connected to the input of the second multi-scale feature block of the change decoder; The output of the second multi-scale feature block of the change decoder is connected to the first input of the second change fusion module; the output of the third-level FHFM is connected to the second input of the second change fusion module. Similarly, the output of the shallowest multi-scale feature block of the change decoder outputs the final change detection result map.
7. The method according to claim 6, characterized in that, Each level of the frequency domain heterogeneous modulation fusion module (FHFM) has the same internal structure, as detailed below: The temporal decoding features of the first branch and the second branch input to the FHFM are denoted as D1 and D2, respectively. Feature D1 is input to the input terminal of the first two-dimensional Fourier transform unit; feature D2 is input to the input terminal of the second two-dimensional Fourier transform unit. The output of the first two-dimensional Fourier transform unit is connected to the first input of the heterogeneous dual-branch modulation mechanism (HDMM); the output of the second two-dimensional Fourier transform unit is connected to the second input of the HDMM. The first output terminal of the HDMM is connected to the input terminal of the first two-dimensional inverse Fourier transform unit; the second output terminal of the HDMM is connected to the input terminal of the second two-dimensional inverse Fourier transform unit. The output terminals of the first two-dimensional inverse Fourier transform unit and the second two-dimensional inverse Fourier transform unit are connected to the input terminal of the cosine-gated fusion module CGFM. The output of the CGFM serves as the output of the FHFM at that stage and is connected to the subsequent corresponding change decoder module.
8. The method according to claim 7, characterized in that, The cosine-gated fusion module CGFM is specifically: Spatial features D1' and D2' are simultaneously input in parallel to the input terminals of the first channel splicing unit Concat, the cosine difference calculation unit, and the absolute difference unit Sub; The output of the first channel splicing unit Concat is connected to the input of the channel attention unit CA; the output of the channel attention unit CA is connected to the first input of the first multiplier and the first input of the second multiplier, respectively. The output of the absolute difference unit Sub is connected to the input of the convolution thinning unit Conv; the output of the convolution thinning unit is connected to the input of the second channel stitching unit Concat. The output of the cosine difference calculation unit is connected to the second input of the first multiplier. The output of the first multiplier is also connected to the input of the second channel splicing unit Concat. The output of the second channel splicing unit Concat is connected to the input of the CBAM module. The output of the CBAM module is connected to the second input of the second multiplier. The output of the second multiplier is used to connect to the change decoder.
9. The method according to claim 7, characterized in that, The specific implementation process of the heterogeneous dual-branch modulation mechanism (HDMM) includes: In heterogeneous dual-branch modulation mechanism (HDMM), the frequency domain representation is first... and Construct two fixed complementary masks, one for low frequency and the other for high frequency. and high frequency mask The low-frequency mask The high-frequency mask is defined with the origin of the spectrum as the center and a preset radius threshold as the basis for determining the low-frequency region. Used to characterize the high-frequency region outside the low-frequency region; utilizing the low-frequency mask and high frequency mask Dual-phase frequency domain representation and Decomposition was performed to obtain the dual-temporal low-frequency components. , and dual-phase high-frequency components , The high-frequency components obtained from the decomposition are input to the high-frequency modulation branch to highlight edge, texture and local structural change information, while the low-frequency components obtained from the decomposition are input to the low-frequency modulation branch to reduce the pseudo-change response caused by differences in lighting, season and imaging style, thereby achieving decoupled modeling of content information and style information.
10. The method according to claim 9, characterized in that, The specific implementation process of the high-frequency modulation branch includes: In the high-frequency modulation branch, the dual-phase high-frequency components are first analyzed. and Element-by-element subtraction and taking the absolute value yields the difference features characterizing the local differences between the two time phases. Then, the differential features Enter in sequence from Enhanced differential features are obtained from the feature enhancement unit consisting of convolutional layers, batch normalization layers, and PReLU activation functions. Furthermore, the enhanced differential features... Perform channel-dimensional max pooling and average pooling respectively, and input the pooling results into... Frequency domain attention maps are generated in convolutional layers and sigmoid activation functions. Finally, the frequency domain attention map is used. Weighted modulation of the dual-phase high-frequency components yields the modulated high-frequency characteristics. and This enhances the high-frequency response relevant to real-world changes and suppresses noise textures and background interference. The specific implementation process of the low-frequency modulation branch includes: In the low-frequency modulation branch, the dual-time low-frequency components are first analyzed. and Along the channel dimension Normalization yields normalized low-frequency features. and Subsequently, the cosine similarity between the two temporal low-frequency components is calculated based on the normalized low-frequency features, and then further converted into normalized difference weights. Finally, the normalized difference weights are used. Weighted modulation of the dual-phase low-frequency components yields the modulated low-frequency characteristics. and By employing the above methods, the network focuses more on semantic structure changes in low-frequency components rather than style changes related to brightness, contrast, and seasonal differences, thereby reducing pseudo-change responses; modulated high-frequency features , With modulated low-frequency characteristics , The components are combined separately to form the reconstructed dual-temporal frequency domain representation. and .