Method for rapid screening of geological disaster change area based on multi-temporal unmanned aerial vehicle image

By constructing a lightweight change detection network and combining UAV POS data and digital elevation models for coarse alignment, and explicitly modeling spectral difference characteristics, the real-time and efficiency issues of screening geological disaster change areas at the edge of UAVs were solved, enabling rapid screening and efficient inspection.

CN122244738APending Publication Date: 2026-06-19GUIZHOU COAL MINE DESIGN & RES INST +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUIZHOU COAL MINE DESIGN & RES INST
Filing Date
2026-05-22
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to quickly and in real-time screen areas of geological hazard changes at the edge of drones. The high computational complexity and sensitivity to registration errors result in low efficiency in geological hazard inspections, making it difficult to meet the timeliness requirements of emergency monitoring.

Method used

A lightweight change detection network is constructed, which combines UAV POS data and digital elevation model for coarse alignment. By explicitly modeling spectral difference features through grouped convolution, the network can quickly eliminate areas without change, reducing the amount of data required for subsequent fine detection.

Benefits of technology

It enables real-time and rapid screening of areas with geological disaster changes at the edge of the drone, reduces the requirements for accurate registration, improves inspection efficiency, reduces the amount of data in areas without changes, and is suitable for lightweight deployment.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244738A_ABST
    Figure CN122244738A_ABST
Patent Text Reader

Abstract

This invention discloses a rapid screening method for geological hazard change areas based on multi-temporal UAV imagery, belonging to the fields of remote sensing image processing and deep learning technology. The method first uses UAV POS data and DEM to coarsely align the current and reference images, allowing feature matching registration to be used when DEM accuracy is insufficient. Then, a sliding window extracts image patch pairs, which are input into a lightweight change detection network. The network sequentially includes shallow convolutions, grouped convolutions (used to explicitly model the band difference / ratio spectral features between the current and reference images), asymmetric depthwise separable convolutional combinations, and a global pooling layer, outputting change confidence scores. Finally, suspected change patches are retained based on thresholding, and the change area bounding boxes are output after merging based on the Euclidean distance between the patch centers. This invention can quickly eliminate a large number of unchanged image patches with low computational cost, reducing the amount of data required for subsequent fine-grained detection, and is suitable for real-time UAV inspection of geological hazards such as landslides and cracks.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of remote sensing image processing and deep learning technology, specifically to a method for rapidly screening geological hazard change areas based on multi-temporal UAV imagery, applicable to routine inspections and emergency monitoring of geological hazards such as landslides, collapses, and fissures. Background Technology

[0002] Routine inspections and emergency monitoring of geological hazards (such as landslides, collapses, and fissures) are crucial for safeguarding people's lives and property. Drones, with their advantages of maneuverability, low cost, and ability to acquire high-resolution imagery, have become an important tool for geological hazard monitoring. In actual inspection operations, drones typically take regular photos of geologically hazardous areas along pre-set routes, generating hundreds to thousands of high-resolution images per inspection. However, the vast majority of these images show unchanged backgrounds (such as stable mountains, vegetation, roads, and buildings), with only a very small number showing surface changes caused by geological hazards (such as fissure expansion, landslide boundary creep, and collapse accumulation). Manually screening all images is not only inefficient and subjective, but also fails to meet the timeliness requirements of emergency monitoring.

[0003] Existing deep learning-based change detection methods, such as fully convolutional networks (FCN), U-Net, and Siamese networks, have achieved good accuracy, but their computational complexity is high, typically requiring high-performance GPU servers and making them difficult to deploy on UAV-borne edge computing units (such as the NVIDIA Jetson series). Furthermore, accurate registration between multi-temporal images is a prerequisite for change detection. Traditional methods rely on high-precision digital elevation models (DEMs) or a large number of ground control points, but areas prone to geological disasters often have complex terrain and difficulty in obtaining DEM data, leading to large registration errors and further affecting detection accuracy.

[0004] Therefore, there is an urgent need for a lightweight variable area screening method that can operate in real time at the edge of the drone, has a certain tolerance for registration errors, and can quickly eliminate a large number of unchanged areas, so as to reduce the amount of data for subsequent fine detection and improve the overall efficiency of geological disaster inspection. Summary of the Invention

[0005] To address the shortcomings of existing technologies, this invention provides a rapid screening method for geological disaster change areas based on multi-temporal UAV imagery. By constructing a lightweight change detection network, coarse alignment is achieved by combining historical reference imagery with UAV POS data and digital elevation models (DEMs). Furthermore, spectral difference features are explicitly modeled through grouped convolution, thereby quickly eliminating areas without change and reducing the amount of data required for subsequent fine-grained detection.

[0006] To achieve the above objectives, the following technical solution is adopted:

[0007] This invention provides a method for rapid screening of geological hazard change areas based on multi-temporal UAV imagery, comprising the following steps:

[0008] S1: Acquire the current inspection image and historical reference image, and perform coarse alignment using UAV POS data and digital elevation model to obtain coarsely aligned image pairs;

[0009] S2: On the coarsely aligned image pairs, extract fixed-size image block pairs in a sliding window manner, and stitch the current block and the reference block in each image block pair together in the channel dimension to form an input tensor;

[0010] S3: Input the input tensor into a pre-trained lightweight change detection network and output the change confidence; wherein, the lightweight change detection network includes a shallow feature extraction module, a grouped convolutional layer for spectral difference modeling, a change feature extraction module, and a confidence prediction module;

[0011] S4: On the validation set, determine the decision threshold by maximizing the F1-score. Image blocks with a change confidence level lower than the decision threshold are judged as unchanged blocks and discarded. Image blocks with a change confidence level not lower than the decision threshold are judged as suspected changed blocks, and their block center positions are recorded.

[0012] S5: Based on the Euclidean distance between the block centers, merge suspected change blocks whose distance is less than half the side length of the image block into a connected region, and output the bounding box of the connected region.

[0013] Furthermore, in step S1, the historical reference image is an orthophoto generated from the previous inspection of the same area, which has been geometrically corrected and stored as tiles.

[0014] Furthermore, in step S1, the coarse alignment uses collinearity equations to project the current inspection image onto the coordinate system of the historical reference image; for areas with severe terrain undulations or insufficient accuracy of the digital elevation model, fine registration is further performed after coarse alignment using feature point matching methods based on SIFT or SuperPoint.

[0015] Furthermore, in step S2, the step size of the sliding window is 32 pixels, and the size of the image block is 32 pixels × 32 pixels or 64 pixels × 64 pixels; for geological hazard targets of different scales, two independent lightweight change detection networks are trained to process small-sized image blocks and large-sized image blocks respectively.

[0016] Furthermore, in step S3, the shallow feature extraction module is a 3×3 standard convolutional layer used to map the number of channels of the input tensor to M intermediate channels; the variation feature extraction module adopts a depth-separable convolutional combination layer, including a 1×3 depth convolution, a 3×1 depth convolution and a 1×1 point convolution connected in sequence, used to extract local texture differences between image block pairs, and outputs M channels, where M is a positive integer.

[0017] Furthermore, the grouped convolutional layer for spectral difference modeling is a grouped 1×1 convolutional layer, the number of which is equal to the number of bands in the image. It is used to calculate the difference or ratio features between the corresponding bands of the current block and the reference block, and outputs a band difference feature map.

[0018] The band difference feature map is concatenated with the feature map output by the shallow feature extraction module in the channel dimension, or the band difference feature map is concatenated with the input tensor in the channel dimension, and the concatenated tensor is sent to the change feature extraction module.

[0019] Furthermore, the confidence prediction module includes a global average pooling layer and a fully connected layer. The global average pooling layer is used to pool the feature map output by the change feature extraction module into a feature vector, and the fully connected layer is used to map the feature vector into a single value and output the change confidence through the Sigmoid function.

[0020] Furthermore, in step S4, the threshold interval [0,1] is traversed on the validation set with a step size of 0.01, the F1-score under each threshold is calculated, and the threshold that maximizes the F1-score is selected as the judgment threshold.

[0021] Further, step S5 includes: calculating the Euclidean distance between the center of each pair of suspected change blocks; if the Euclidean distance is less than half the side length of the image block, then the corresponding two suspected change blocks are grouped into the same connected region; the disjoint-set data structure algorithm or a clustering algorithm based on Euclidean distance is used to complete the grouping, and the minimum bounding rectangle of each group is output as the bounding rectangle bounding box.

[0022] Furthermore, the bounding box of the outer rectangle output in step S5 is used to feed into the subsequent fine detection module for the identification of geological disaster targets and the segmentation of patches; wherein, the fine detection module is a lightweight semantic segmentation network, which uses MobileNetV3 as the encoder and U-Net as the decoder.

[0023] Compared with the prior art, the present invention achieves the following beneficial effects:

[0024] (1) Lightweight design: It adopts depthwise separable convolution and grouped convolution, and the number of parameters is lower than that of a conventional 3×3 convolutional network with 16 input and output channels (comparison benchmark: a standard 3×3 convolution with 16 input and output channels has a number of parameters of approximately 3×(3×3×16×16)=6912; the number of parameters in the feature extraction part of this network is approximately 1×3×16+3×1×16+16×16=352, the number of parameters in shallow convolution is 3×3×6×16=864, the total number of parameters in spectral difference grouped convolution and subsequent transformation convolution is approximately 150, and the total number of parameters is approximately 1366, which is a theoretical reduction of approximately ), suitable for edge deployment.

[0025] (2) Tolerance for registration error: By adding random translation (±15 pixels) during training, the network can tolerate misalignment of several pixels to more than ten pixels, reducing the requirement for accurate registration.

[0026] (3) Explicit modeling of spectral differences: through grouping Convolution calculations of band difference / ratio features, combined with spatial features, enhance the sensitivity to changes in land cover such as vegetation cover and water body changes.

[0027] (4) Screening efficiency: The change confidence is output in units of image blocks. After threshold judgment, most of the blocks without change can be removed (the actual removal rate depends on the scene and threshold setting, and can reach 60%-80% in typical cases), thereby reducing the amount of data for subsequent fine detection.

[0028] It should be understood that the description in the Summary of the Invention is not intended to limit the key or essential features of the embodiments of the present invention, nor is it intended to restrict the scope of the invention. Other features of the invention will become readily apparent from the following description. Attached Figure Description

[0029] The above and other features, advantages, and aspects of the various embodiments of the present invention will become more apparent from the accompanying drawings and the following detailed description. The drawings are provided for a better understanding of the invention and are not intended to limit the scope of the invention. In the drawings, the same or similar reference numerals denote the same or similar elements, wherein:

[0030] Figure 1 This is a flowchart illustrating the rapid screening method for geological disaster change areas based on multi-temporal UAV imagery according to an embodiment of the present invention.

[0031] Figure 2 This is a schematic diagram illustrating the principle of spatial coarse alignment of multi-temporal images in an embodiment of the present invention;

[0032] Figure 3 This is the overall structure of the lightweight change detection network in this embodiment of the invention;

[0033] Figure 4This is a schematic diagram illustrating the principle of suspected change block merging and bounding box output in an embodiment of the present invention. Detailed Implementation

[0034] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0035] Furthermore, the term "and / or" in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, the character " / " in this article generally indicates that the preceding and following related objects have an "or" relationship.

[0036] Figure 1 This is a flowchart illustrating a method for rapid screening of geological hazard change areas based on multi-temporal UAV imagery, according to an embodiment of the present invention. Specifically, see... Figure 1 The rapid screening method for geological hazard change areas based on multi-temporal UAV imagery includes the following steps:

[0037] S1: Acquire the current inspection image and historical reference image, and perform coarse alignment using UAV POS data and digital elevation model to obtain coarsely aligned image pairs;

[0038] Step S1 is used to achieve image acquisition and coarse spatial alignment.

[0039] In one specific embodiment of the present invention, the hardware configuration is as follows: the drone is equipped with an RGB camera (3 channels), and the onboard edge computing unit is an NVIDIA Jetson Xavier NX (8GB memory). The flight altitude is 100m, and the ground resolution of the image is approximately 5cm / pixel. The historical reference image is an orthophoto generated from the previous inspection of the same area, pre-cropped into 1024×1024 tiles and stored. The area in this embodiment is a hilly region with relatively small terrain undulations. After coarse alignment using an SRTM 30m DEM, the measured offset is within the range of 5~12 pixels.

[0040] Acquire the current inspection image and its corresponding historical reference image. The historical reference image is an orthophoto generated from the previous inspection of the same area. This orthophoto has been geometrically corrected and stored as tiles in advance.

[0041] Utilize UAV POS data (latitude, longitude, altitude, yaw angle, pitch angle, roll angle) and available DEM data (such as a 1m resolution DEM generated by airborne LiDAR, or a publicly available SRTM 30m resolution DEM). Figure 2 The diagram shown is a schematic diagram of the coarse alignment principle of multi-temporal images in an embodiment of the present invention. The collinearity equation is used to project the current inspection image onto the coordinate system of the historical reference image to obtain the coarsely aligned image pair. For areas with severe terrain undulations or insufficient accuracy of the digital elevation model, a feature point matching method based on SIFT or SuperPoint is further used for fine registration after coarse alignment.

[0042] This method is suitable for areas with relatively flat terrain (such as plains, hills, and plateaus) or scenes where a high-precision DEM (resolution better than 1m) has been obtained. For landslide areas with drastic terrain undulations, if the DEM accuracy is insufficient and the projection error is too large (more than tens of pixels), a fine registration method based on feature point matching (such as SIFT, SuperPoint) can be used first, and then subsequent steps can be performed; or this coarse alignment step can be replaced with a feature matching method, which is an equivalent substitution of this invention.

[0043] Due to errors in POS and DEM, the local offset after coarse alignment is typically within the range of 5-15 pixels. This offset is simulated during subsequent network training through data augmentation (random translation ±15 pixels), allowing the network to learn to tolerate a certain range of misalignment.

[0044] In one specific embodiment of the present invention, the POS data and SRTM DEM of the current image are acquired. The current image is projected onto the UTM coordinate system of the reference image using the collinearity equation, the expression of which is:

[0045]

[0046] in, For image point coordinates, For camera focal length, The coordinates of the camera projection center in the ground coordinate system (provided by POS data). These are the coordinates of the ground point (obtained by interpolation from the DEM). These are the elements of a rotation matrix composed of the UAV's attitude angles (yaw, pitch, roll). The nine elements of the rotation matrix describe the rotation relationship from a ground coordinate system (such as UTM or geodetic coordinate system) to the camera's image space coordinate system. Specifically, in the collinearity equation, these matrix elements are used to represent the coordinates of ground points... Transform to the camera coordinate system with the camera projection center as the origin, thereby realizing the image point coordinates. The calculation. , , The three components of the ground X-axis unit vector in the camera coordinate system; , , : The components of the ground Y-axis unit vector in the camera coordinate system; , , : The components of the ground Z-axis (elevation direction) unit vector in the camera coordinate system.

[0047] The rotation matrix is ​​composed of the UAV's three attitude angles (yaw angle ψ, pitch angle θ, roll angle ψ, pitch angle θ, roll angle ψ, pitch angle θ, roll angle ψ, pitch angle θ, roll angle ψ, pitch angle ψ, roll ... ) is calculated and is usually defined in the following form:

[0048]

[0049] The specific relationship between elements and attitude angles (common in aerial photogrammetry, using the ZXY rotation system) is as follows: ; ; ; ; ; ; ; ; .

[0050] Since the matrix is ​​an orthogonal matrix, it satisfies... = Furthermore, the determinant is +1, ensuring that the coordinate transformation is conformal and without scaling.

[0051] By solving pixel-by-pixel, the current image is resampled to the reference image coordinate system to obtain a coarsely aligned image pair. Since the terrain is relatively flat and the DEM error is within an acceptable range, no offset exceeding 15 pixels occurs after coarse alignment. In this specific embodiment, feature matching for fine registration is not used.

[0052] S2: On the coarsely aligned image pairs, extract fixed-size image block pairs in a sliding window manner, and stitch the current block and the reference block in each image block pair together in the channel dimension to form an input tensor;

[0053] Step S2 is used to extract image blocks.

[0054] On the coarsely aligned current image and reference image, slide a window with a fixed step size (e.g., 32 pixels) to extract an image of size [size missing]. Image block pairs ( =32 or 64), meaning the image block size is 32 pixels × 32 pixels or 64 pixels × 64 pixels. The current block and the reference block are concatenated along the channel dimension to form the input tensor. Where C is the number of image bands (e.g., 3 for RGB, 4-8 for multispectral). For geological hazard targets of different scales (e.g., cracks and landslides), two independent lightweight change detection networks are trained respectively, with the weights of the two networks not shared, to handle small image patches (e.g., 32×32) and large image patches (e.g., 64×64) respectively.

[0055] In one specific embodiment of the present invention, a 64×64 image block pair is extracted from the coarsely aligned image pair using a sliding window with a stride of 32 pixels. The current block and the reference block are concatenated along the channel dimension to obtain a 64×64×6 tensor. For crack detection (small targets), a separate 32×32 network is trained with a stride of 16, and the two networks operate independently.

[0056] S3: Input the input tensor into a pre-trained lightweight change detection network and output the change confidence; wherein, the lightweight change detection network includes a shallow feature extraction module, a grouped convolutional layer for spectral difference modeling, a change feature extraction module, and a confidence prediction module;

[0057] Step S3 is used to construct a lightweight change detection network.

[0058] like Figure 3 The diagram shows the overall structure of the lightweight change detection network in this embodiment of the invention. The network consists of four sequentially connected components: a shallow feature extraction module, a spectral difference modeling module, a change feature extraction module, and a confidence prediction module.

[0059] 3.1 Shallow Feature Extraction Module: A 3×3 standard convolution (stride 1, padding 1) is used to map the number of channels 2C of the input tensor to M intermediate channels (M=16 in this example). This layer is used to initially fuse low-level features from the current and reference images.

[0060] 3.2 Grouped Convolutional Layer for Spectral Difference Modeling: A grouped 1×1 convolutional layer is used. Specifically, after the shallow feature extraction module, a grouped 1×1 convolutional layer is inserted, with the number of groups G=C (i.e., grouped by band), to calculate the difference or ratio features between the corresponding bands of the current block and the reference block. The number of output channels for this grouped convolution is C (1 channel per group), resulting in a band difference feature map. Then, this band difference feature map is concatenated with the output (or the original input tensor) extracted by the shallow feature extraction module in the channel dimension. For example, if the original input is 64×64×6, the grouped convolution output is 64×64×3, and the concatenation results in 64×64×9. The concatenated tensor is then fed into the subsequent change feature extraction module. This design explicitly models spectral changes (such as the near-infrared / red band ratio change corresponding to vegetation cover changes), complementing spatial feature extraction.

[0061] 3.3 Variation Feature Extraction Module: Employs an asymmetric combination of depthwise separable convolutions. Specifically, it uses a combination of depthwise separable convolutional layers, including sequentially connected 1×3 depthwise convolutions, 3×1 depthwise convolutions, and 1×1 pointwise convolutions. More specifically, 1×3 and 3×1 depthwise convolutions (with a stride of 1 and appropriate padding) are used sequentially, followed by channel fusion via a 1×1 pointwise convolution, outputting M channels. This structure achieves an approximate 3×3 receptive field with a small number of parameters, used to extract local texture differences between image patch pairs.

[0062] 3.4 Confidence Prediction Module: This module includes a global average pooling layer and a fully connected layer. The global average pooling layer pools the feature map output by the change feature extraction module into a feature vector. The fully connected layer maps the feature vector into a single numerical value and outputs the change confidence score via a sigmoid function. Specifically, the feature map output by the change feature extraction module is subjected to global average pooling to obtain a vector of length M. This vector is then passed through a fully connected layer (M→1) and a sigmoid function to output the change confidence score. This confidence level represents the probability that the current image patch has changed significantly relative to the reference image.

[0063] In a specific embodiment of the present invention, the specific parameters of the network structure are as follows:

[0064] (1) Shallow feature extraction module: 3×3 Conv, input channel 6, output channel 16, step size 1, padding 1.

[0065] (2) Grouped convolutional layer for spectral difference modeling: Grouped 1×1 convolution, number of groups G=3 (grouped by RGB bands), input channels 6, output channels 3 (1 channel output per group), to obtain the band difference feature map. The difference feature map is concatenated with the original input tensor (64×64×6) in the channel dimension to obtain 64×64×9. To control the computational load, a 1×1 convolution can be used to transform the 9 channels into 16 channels before feeding them into the subsequent change feature extraction module.

[0066] (3) Change feature extraction module: Depthwise Conv 1×3 (16 channels, stride 1, padding (0,1)) + Depthwise Conv 3×1 (16 channels, stride 1, padding (1,0)), then through 1×1 Pointwise Conv (input 16, output 16).

[0067] (4) Confidence prediction module: global average pooling followed by a fully connected layer (16→1) and Sigmoid.

[0068] S4: On the validation set, determine the decision threshold by maximizing the F1-score. Image blocks with a change confidence level lower than the decision threshold are judged as unchanged blocks and discarded. Image blocks with a change confidence level not lower than the decision threshold are judged as suspected changed blocks, and their block center positions are recorded.

[0069] Step S4 is used to determine the threshold and image block selection. In step S4, the threshold interval [0,1] is traversed on the validation set with a step size of 0.01, the F1-score for each threshold is calculated, and the threshold that maximizes the F1-score is selected as the decision threshold. Specifically:

[0070] On the validation set, the decision threshold is determined by maximizing the F1-score. The specific approach is to iterate through the validation set with a certain step size (e.g., 0.01). Calculate each decision threshold The F1 score below selects the value corresponding to the maximum value. .like If the image block is judged as "no significant change" and discarded directly; If it is judged as a "suspected change", its block center coordinates are recorded.

[0071] In one specific embodiment of the present invention, F1-scores at different thresholds are calculated on the validation set, and the threshold corresponding to the maximum F1 score is selected. This embodiment measures the results on a self-built validation set. =0.45 (This value varies depending on the dataset and needs to be recalculated by the implementer).

[0072] S5: Based on the Euclidean distance between the block centers, merge suspected change blocks whose distance is less than half the side length of the image block into a connected region, and output the bounding box of the connected region.

[0073] Step S5 is used to merge and output suspected blocks. Step S5 includes: calculating the Euclidean distance between the centers of every two suspected changed blocks; if the Euclidean distance is less than half the side length of the image block, the corresponding two suspected changed blocks are grouped into the same connected region; grouping is completed using a disjoint-set data structure algorithm or a clustering algorithm based on Euclidean distance, and the minimum bounding rectangle of each group is output as the bounding box. Specifically:

[0074] like Figure 4The diagram illustrates the principle of merging suspected change blocks and outputting bounding boxes in an embodiment of the present invention. Adjacent or overlapping suspected change image blocks are merged according to their spatial location. Specifically, the Euclidean distance between the centers of every two suspected blocks is calculated. If the distance is less than half the side length of the image block (e.g., for a 64×64 block, the distance is < 32 pixels), they are grouped into the same connected region. A disjoint-set data structure or clustering algorithm is used to complete the grouping, and the minimum bounding rectangle of each group is output as the bounding box. The merged regions are then fed into a subsequent fine-grained detection module (such as a semantic segmentation network) for geological hazard target identification and patch segmentation.

[0075] In a specific embodiment of the present invention, regarding confidence level For image blocks with a center coordinate (x, y) of ≥0.45, record the center coordinates. Using a disjoint-set data structure algorithm, blocks whose center Euclidean distance is less than 32 pixels are grouped together, and the bounding rectangle of each group is output.

[0076] Furthermore, the bounding box of the merged output in step S5 is used to feed into the subsequent fine detection module for the identification of geological disaster targets and patch segmentation; wherein, the fine detection module is a lightweight semantic segmentation network, preferably, the lightweight semantic segmentation network uses MobileNetV3 as the encoder and U-Net as the decoder.

[0077] Network training and parameter settings:

[0078] Dataset: A self-built UAV inspection image change detection dataset was used. Approximately 500 orthophotos (1024×1024 pixels each) of two consecutive inspections of the same area were collected. Geological hazard experts labeled the changed areas (landslides, cracks, collapses, etc.) pixel-by-pixel, generating pixel-level change labels. 64×64 image patch pairs were extracted using a 32-step sliding window, totaling approximately [number missing]. ≈480,500 block pairs (in actual use, some blocks can be randomly sampled to balance positive and negative samples). The block pairs are divided into training and validation sets in an 8:2 ratio.

[0079] Loss function: Binary cross-entropy. If the positive and negative samples are imbalanced, weighted cross-entropy or Focal Loss can be used. Specifically, with a positive to negative sample ratio of approximately 1:3, weighted cross-entropy is used (positive sample weight = 3, negative sample weight = 1).

[0080] Optimizer: Adam, initial learning rate Every 20 rounds, the value decreases to 0.5 times its original value.

[0081] Batch size: 32.

[0082] Training rounds: 30 rounds, with an early stop method: if the loss does not decrease for several consecutive rounds, training is stopped early.

[0083] Data augmentation: Enhancement methods that do not disrupt the spatial correspondence between the current image and the reference image are employed: horizontal flipping, vertical flipping, brightness and contrast adjustment, and random translation of ±15 pixels to cover coarse alignment errors. Random rotation or scaling is not used.

[0084] On the validation set (approximately 96,000 blocks), the change detection F1-score of this method varies with the threshold, with typical optimal values ​​between 0.6 and 0.7 (depending on the scenario complexity). It should be noted that the above performance metrics are affected by factors such as dataset, flight conditions, and disaster type; in practical applications, retraining and evaluation based on specific data are necessary.

[0085] Multispectral image adaptation instructions: If using a 4-band multispectral camera (e.g., R, G, B, NIR), then the input channels 2C=8. The shallow convolution input channels are changed to 8, the number of groups for spectral difference modeling G=4, and the output is 4-channel difference features, which are then stitched together to form 12 channels. Other structural adjustments are made accordingly.

[0086] The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

[0087] It should also be noted that, in the embodiments of this application, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0088] The above description of the disclosed embodiments enables those skilled in the art to make or use this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined in the embodiments of this application may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown in this application, but is to be accorded the widest scope consistent with the principles and novel features disclosed in the embodiments of this application.

Claims

1.A method for rapid screening of geological disaster change area based on multi-temporal UAV images, characterized in that, Includes the following steps: S1: Acquire the current inspection image and historical reference image, and perform coarse alignment using UAV POS data and digital elevation model to obtain coarsely aligned image pairs; S2: On the coarsely aligned image pairs, extract fixed-size image block pairs in a sliding window manner, and stitch the current block and the reference block in each image block pair together in the channel dimension to form an input tensor; S3: Input the input tensor into a pre-trained lightweight change detection network and output the change confidence; wherein, the lightweight change detection network includes a shallow feature extraction module, a grouped convolutional layer for spectral difference modeling, a change feature extraction module, and a confidence prediction module; S4: On the validation set, determine the decision threshold by maximizing the F1-score. Image blocks with a change confidence level lower than the decision threshold are judged as unchanged blocks and discarded. Image blocks with a change confidence level not lower than the decision threshold are judged as suspected changed blocks, and their block center positions are recorded. S5: Based on the Euclidean distance between the block centers, merge suspected change blocks whose distance is less than half the side length of the image block into a connected region, and output the bounding box of the connected region. 2.The method of claim 1, wherein, In step S1, the historical reference image is an orthophoto generated from the previous inspection of the same area. This orthophoto has been geometrically corrected and stored as tiles. 3.The method of claim 1 or 2, wherein, In step S1, the coarse alignment uses collinearity equations to project the current inspection image onto the coordinate system of the historical reference image; for areas with severe terrain undulations or insufficient accuracy of the digital elevation model, fine registration is further performed after coarse alignment using feature point matching methods based on SIFT or SuperPoint. 4.The method of claim 1, wherein, In step S2, the step size of the sliding window is 32 pixels, and the size of the image block is 32 pixels × 32 pixels or 64 pixels × 64 pixels. For geological hazard targets of different scales, two independent lightweight change detection networks are trained to process small-sized image blocks and large-sized image blocks respectively. 5.The method of claim 1, wherein, In step S3, the shallow feature extraction module is a 3×3 standard convolutional layer used to map the number of channels of the input tensor to M intermediate channels; the variation feature extraction module adopts a depth-separable convolutional combination layer, including a 1×3 depth convolution, a 3×1 depth convolution and a 1×1 point convolution connected in sequence, used to extract local texture differences between image block pairs, and outputs M channels, where M is a positive integer. 6.The method of claim 1 or 5, wherein, The grouped convolutional layer for spectral difference modeling is a grouped 1×1 convolutional layer, with the number of groups equal to the number of bands in the image. It is used to calculate the difference or ratio features between the corresponding bands of the current block and the reference block, and outputs a band difference feature map. The band difference feature map is concatenated with the feature map output by the shallow feature extraction module in the channel dimension, or the band difference feature map is concatenated with the input tensor in the channel dimension, and the concatenated tensor is sent to the change feature extraction module. 7.The method of claim 6, wherein, The confidence prediction module includes a global average pooling layer and a fully connected layer. The global average pooling layer is used to pool the feature map output by the change feature extraction module into a feature vector. The fully connected layer is used to map the feature vector into a single value and output the change confidence through the Sigmoid function. 8.The method of claim 1, wherein, In step S4, the threshold interval [0,1] is traversed on the validation set with a step size of 0.01, the F1-score under each threshold is calculated, and the threshold that maximizes the F1-score is selected as the judgment threshold. 9.The method of claim 1, wherein, Step S5 includes: calculating the Euclidean distance between the center of each pair of suspected change blocks; if the Euclidean distance is less than half the side length of the image block, then the corresponding two suspected change blocks are grouped into the same connected region; grouping is completed using a disjoint-set data structure algorithm or a clustering algorithm based on Euclidean distance, and the minimum bounding rectangle of each group is output as the bounding rectangle bounding box. 10.The method of claim 9, wherein, The bounding box of the outer rectangle output in step S5 is used to feed into the subsequent fine detection module for the identification of geological disaster targets and the segmentation of patches; wherein, the fine detection module is a lightweight semantic segmentation network, which uses MobileNetV3 as the encoder and U-Net as the decoder.