A real-time flood area estimation method and device based on large model edge computing

By deploying a lightweight deep learning network at edge nodes for image processing, the real-time problem caused by communication link failures in drone-based flood emergency monitoring was solved, achieving high-precision, real-time flood area estimation and meeting the timeliness requirements of emergency rescue.

CN122244683APending Publication Date: 2026-06-19INST OF GEOGRAPHICAL SCI & NATURAL RESOURCE RES CAS

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
INST OF GEOGRAPHICAL SCI & NATURAL RESOURCE RES CAS
Filing Date
2026-04-30
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing drone-based flood emergency monitoring solutions rely on high-bandwidth communication links, making it difficult to meet the timeliness requirements for real-time flood area extraction after communication infrastructure is damaged.

Method used

A lightweight flood area estimation method based on large model edge computing is adopted. Image processing is performed through a lightweight deep learning network deployed on edge nodes, including multi-layer convolutional layers, feature alignment modules and fully convolutional integral class heads, to achieve local offline inference and real-time flood area estimation.

Benefits of technology

High-precision, real-time flood area estimation was achieved in low-latency and weak network environments, meeting the timeliness requirements of emergency rescue and improving the accuracy and robustness of flood area estimation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure FT_1
    Figure FT_1
  • Figure FT_2
    Figure FT_2
  • Figure FT_3
    Figure FT_3
Patent Text Reader

Abstract

This invention relates to the field of computer technology, and in particular to a method and apparatus for real-time flood area estimation based on large-model edge computing. The method includes: acquiring image data to be measured; inputting the image data to be measured into a trained flood area estimation model to obtain sub-image patches; determining the flood area estimation result based on multiple sub-image patches; wherein the flood area estimation model is a lightweight model deployed on edge nodes; the flood area estimation model is trained on a preset deep learning network; the deep learning network includes a first shallow convolutional layer, a second shallow convolutional layer, a first deep convolutional layer, a second deep convolutional layer, a third deep convolutional layer, a feature alignment module, a feature fusion module, and a fully convolutional integral classifier head connected sequentially; thus, the accuracy of real-time flood area estimation can be improved while meeting the timeliness requirements of emergency rescue.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and in particular to a method and apparatus for real-time flood area estimation based on large-scale edge computing. Background Technology

[0002] Floods are among the most destructive natural disasters globally, causing numerous casualties and infrastructure damage each year, and seriously threatening human life, property, and socio-economic stability. In the response to sudden floods, the timely extraction and dynamic monitoring of the spatiotemporal distribution information of affected areas during the "golden rescue period" directly determines the efficiency and scientific nature of emergency response. Therefore, real-time and accurate identification of flood-affected areas is the core task of flood emergency monitoring.

[0003] Currently, traditional techniques for extracting data from flood-inundated areas mainly rely on satellite remote sensing data. However, satellite remote sensing is limited by inherent factors such as long revisit cycles and susceptibility to cloud cover, making it difficult to meet the stringent real-time requirements of flood emergency monitoring and failing to provide timely and effective data support for emergency rescue. In contrast, drones, with their outstanding advantages of maneuverability, cloud-based operation, and rapid response, can quickly acquire centimeter-resolution images of disaster areas during floods, effectively compensating for the shortcomings of satellite remote sensing and becoming an important data acquisition tool in the field of flood emergency monitoring.

[0004] Current drone-based flood emergency monitoring solutions generally adopt the traditional "end-side acquisition—link transmission—cloud processing" model. This model heavily relies on high-bandwidth communication links, requiring the real-time transmission of massive amounts of high-resolution imagery collected by drones to cloud servers for analysis and processing, thereby enabling the extraction of flood-inundated areas. However, floods often severely damage regional communication infrastructure, leading to communication link paralysis. Even if some communication capabilities are restored through satellite communication, temporary base stations, etc., the high deployment costs, long processing times, and limited bandwidth make it impossible to support the real-time transmission of massive amounts of high-resolution drone imagery. Consequently, the extraction of flooded areas cannot be completed in a timely manner, failing to meet the timeliness requirements of emergency rescue.

[0005] Based on this, the present invention proposes a real-time flood area estimation method and device based on large model edge computing to solve the above-mentioned technical problems. Summary of the Invention

[0006] This invention describes a method and apparatus for estimating real-time flood area based on large-model edge computing, which can improve the accuracy of real-time flood area estimation while meeting the timeliness requirements of emergency rescue.

[0007] According to a first aspect, the present invention provides a real-time flood area estimation method based on large model edge computing, comprising: Acquire the image data to be tested; The image data to be tested is input into the trained flood area estimation model to obtain sub-image blocks; Based on multiple sub-image patches, the flood area estimation result is determined; The flood area estimation model is a lightweight model deployed on edge nodes. It is trained using a pre-defined deep learning network. The deep learning network comprises a first shallow convolutional layer, a second shallow convolutional layer, a first deep convolutional layer, a second deep convolutional layer, a third deep convolutional layer, a feature alignment module, a feature fusion module, and a fully convolutional integral classifier, all connected sequentially. The first shallow convolutional layer performs initial downsampling on the input image to extract first shallow features of the image's bottom-layer edges and textures. The second shallow convolutional layer further downsamples these first shallow features, refining the shallow detail features to obtain second shallow features. The first deep convolutional layer extracts the mid-level global contextual semantic features from the second shallow features, capturing the mid-scale structure of flood patches. The system constructs information and extracts the first deep feature. The second deep convolutional layer is used to expand the feature channel dimension, strengthen the deep semantic feature modeling of large-scale flood areas, expand the effective receptive field of the network, and obtain the second deep feature. The third deep convolutional layer is used to complete the deepest level feature extraction of the network, construct a global flood scene semantic representation, and output the third deep feature. The feature alignment module is used to normalize the size of the first deep feature, the second deep feature, and the third deep feature and perform cross-layer element-wise alignment and fusion to obtain temporal features. The feature fusion module is used to fuse the temporal features and the second shallow feature to fill in the missing feature information and obtain fused features. The full convolutional integral class head is used to upsample the fused features step by step to restore them to the original image resolution and output the sub-image patch.

[0008] According to a second aspect, the present invention provides a real-time flood area estimation device based on large model edge computing, comprising: The acquisition unit is configured to acquire the image data to be tested. The first data processing unit is configured to input the image data to be tested into a trained flood area estimation model to obtain sub-image blocks; The second data processing unit is configured to determine the flood area estimation result based on multiple sub-image blocks; The flood area estimation model is a lightweight model deployed on edge nodes. It is trained using a pre-defined deep learning network. The deep learning network comprises a first shallow convolutional layer, a second shallow convolutional layer, a first deep convolutional layer, a second deep convolutional layer, a third deep convolutional layer, a feature alignment module, a feature fusion module, and a fully convolutional integral classifier, all connected sequentially. The first shallow convolutional layer performs initial downsampling on the input image to extract first shallow features of the image's bottom-layer edges and textures. The second shallow convolutional layer further downsamples these first shallow features, refining the shallow detail features to obtain second shallow features. The first deep convolutional layer extracts the mid-level global contextual semantic features from the second shallow features, capturing the mid-scale structure of flood patches. The system constructs information and extracts the first deep feature. The second deep convolutional layer is used to expand the feature channel dimension, strengthen the deep semantic feature modeling of large-scale flood areas, expand the effective receptive field of the network, and obtain the second deep feature. The third deep convolutional layer is used to complete the deepest level feature extraction of the network, construct a global flood scene semantic representation, and output the third deep feature. The feature alignment module is used to normalize the size of the first deep feature, the second deep feature, and the third deep feature and perform cross-layer element-wise alignment and fusion to obtain temporal features. The feature fusion module is used to fuse the temporal features and the second shallow feature to fill in the missing feature information and obtain fused features. The full convolutional integral class head is used to upsample the fused features step by step to restore them to the original image resolution and output the sub-image patch.

[0009] Thirdly, embodiments of this specification also provide an electronic device, including a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, it implements the method described in any embodiment of this specification.

[0010] Fourthly, embodiments of this specification also provide a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the methods described in any embodiment of this specification.

[0011] The real-time flood area estimation method and apparatus based on large-model edge computing provided by this invention acquires image data to be tested; inputs the image data to be tested into a trained flood area estimation model to obtain sub-image patches; and determines the flood area estimation result based on multiple sub-image patches. The flood area estimation model is a lightweight network model, deployed entirely on edge computing nodes, enabling local offline inference without cloud computing power dependence. It possesses advantages such as low latency, low computing power consumption, and adaptability to weak network environments in emergency scenarios, fully meeting the timeliness requirements of flood disaster emergency rescue. This flood area estimation model is obtained by multiple rounds of iterative supervised training of a pre-set deep learning network, with the following layers connected sequentially: a first shallow convolutional layer, a second shallow convolutional layer, a first deep convolutional layer, a second deep convolutional layer, a third deep convolutional layer, a feature alignment module, a feature fusion module, and a fully convolutional integral class header. The first shallow convolutional layer performs initial spatial downsampling on the input raw image, compressing the initial resolution and simultaneously extracting basic visual information such as the bottom edge contours and surface texture details, outputting the network's first shallow feature layer. The second shallow convolutional layer further downsamples and compresses the extracted first shallow feature layer, refining the shallow spatial detail features and improving the representation of ground feature edges, outputting the second shallow feature layer. The first deep convolutional layer, based on an embedded lightweight convolutional module, mines mid-level global contextual semantic information from the second shallow feature layer, accurately capturing the mid-scale structural morphological features of flooded patches, outputting the first deep feature layer. The second deep convolutional layer expands the feature channel dimension, enhances the deep semantic modeling capability of large-scale contiguous flood areas, broadens the network's effective receptive field, and improves the representation of large-scale scene features, outputting the second deep feature layer. The third deep convolutional layer completes the extraction of the network's deepest global features, constructs a complete semantic representation of the entire flood scene, mines global correlation information between ground features, and outputs high-dimensional abstract third deep feature layer. The feature alignment module performs spatial size normalization on the first, second, and third deep features, completing cross-layer element-wise alignment and fusion of multi-level, multi-receptive-field deep semantic features, and aggregating global context information to obtain unified temporal fusion features. The feature fusion module combines upsampled spatial restoration information to perform cross-scale complementary fusion of deep temporal fusion features and second shallow detail features, supplementing the edge detail information lost in deep semantics, solving the imbalance between semantic information and spatial details, and obtaining a complete global fusion feature. The full-convolution integral class head is used to upsample and restore the complete fusion features to the original resolution of the image under test step by step, complete the flood area category discrimination pixel by pixel, and output disaster segmentation sub-image blocks. This network achieves high-speed inference under the limited computing power of edge nodes through deep and shallow feature complementary fusion and lightweight module architecture design, taking into account the real-time requirements of emergency rescue and the accuracy of flood boundary recognition, effectively improving the accuracy and robustness of real-time flood area estimation in field scenarios. Attached Figure Description

[0012] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0013] Figure 1 A flowchart illustrating a real-time flood area estimation method based on large model edge computing according to one embodiment is shown. Figure 2 A schematic block diagram of a real-time flood area estimation device based on large model edge computing according to one embodiment is shown; Figure 3 A schematic diagram of the structure of a flood area estimation model according to one embodiment is shown; Figure 4 A comparison chart showing the segmentation results of various models according to one embodiment is presented. Detailed Implementation

[0014] The solution provided by the present invention will now be described with reference to the accompanying drawings.

[0015] Figure 1 This diagram illustrates a flowchart of a real-time flood area estimation method based on large-model edge computing according to one embodiment. It is understood that this method can be executed by any device, equipment, platform, or cluster of devices with computing and processing capabilities. Figure 1 As shown, the method includes: Step 100: Obtain the image data to be tested; Step 102: Input the image data to be tested into the trained flood area estimation model to obtain sub-image patches; Step 104: Determine the flood area estimation result based on multiple sub-image patches; The flood area estimation model is a lightweight model deployed on edge nodes. This model is trained using a pre-defined deep learning network. The deep learning network consists of a first shallow convolutional layer, a second shallow convolutional layer, a first deep convolutional layer, a second deep convolutional layer, a third deep convolutional layer, a feature alignment module, a feature fusion module, and a fully convolutional integral classifier head, all connected sequentially. The first shallow convolutional layer performs initial downsampling on the input image to extract first shallow features of the image's bottom-layer edges and textures. The second shallow convolutional layer further downsamples these first shallow features, refining the shallow detail features to obtain second shallow features. The first deep convolutional layer extracts global contextual semantic features from the second shallow features to capture flood patches. The network extracts first-level deep features from mesoscale structural information. A second-level deep convolutional layer expands the feature channel dimension, enhancing deep semantic feature modeling of large-scale flood areas and increasing the network's effective receptive field. A third-level deep convolutional layer extracts the deepest-level features, constructing a global semantic representation of the flood scene and outputting the third-level deep features. A feature alignment module normalizes the size of the first, second, and third-level deep features and performs cross-layer element-wise alignment and fusion to obtain temporal features. A feature fusion module fuses temporal features and second-level shallow features, filling in missing feature information to obtain fused features. A fully convolutional integral classifier upsamples the fused features step-by-step to restore them to the original image resolution, outputting sub-image patches.

[0016] In this embodiment, image data to be tested is acquired; the image data is input into a trained flood area estimation model to obtain sub-image patches; based on multiple sub-image patches, the flood area estimation result is determined; wherein, the flood area estimation model is a lightweight network model, which is deployed entirely on edge computing nodes, and can achieve local offline inference without relying on cloud computing power. In field emergency scenarios, it has the advantages of low latency, low computing power consumption, and adaptability to weak network environments, and can fully meet the timeliness requirements of flood disaster emergency rescue. The flood area estimation model is obtained by a preset deep learning network through multiple rounds of iterative supervised training. The depth is sequentially connected to a first shallow convolutional layer, a second shallow convolutional layer, a first deep convolutional layer, a second deep convolutional layer, a third deep convolutional layer, a feature alignment module, a feature fusion module, and a fully convolutional integral class head. The first shallow convolutional layer performs initial spatial downsampling on the input raw image, compressing the initial resolution and simultaneously extracting basic visual information such as the bottom edge contours and surface texture details, outputting the network's first shallow feature layer. The second shallow convolutional layer further downsamples and compresses the extracted first shallow feature layer, refining the shallow spatial detail features and improving the representation of ground feature edges, outputting the second shallow feature layer. The first deep convolutional layer, based on an embedded lightweight convolutional module, mines mid-level global contextual semantic information from the second shallow feature layer, accurately capturing the mid-scale structural morphological features of flooded patches, outputting the first deep feature layer. The second deep convolutional layer expands the feature channel dimension, enhances the deep semantic modeling capability of large-scale contiguous flood areas, broadens the network's effective receptive field, and improves the representation of large-scale scene features, outputting the second deep feature layer. The third deep convolutional layer completes the extraction of the network's deepest global features, constructs a complete semantic representation of the entire flood scene, mines global correlation information between ground features, and outputs high-dimensional abstract third deep feature layer. The feature alignment module performs spatial size normalization on the first, second, and third deep features, completing cross-layer element-wise alignment and fusion of multi-level, multi-receptive-field deep semantic features, and aggregating global context information to obtain unified temporal fusion features. The feature fusion module combines upsampled spatial restoration information to perform cross-scale complementary fusion of deep temporal fusion features and second shallow detail features, supplementing the edge detail information lost in deep semantics, solving the imbalance between semantic information and spatial details, and obtaining a complete global fusion feature. The full-convolution integral class head is used to upsample and restore the complete fusion features to the original resolution of the image under test step by step, complete the flood area category discrimination pixel by pixel, and output disaster segmentation sub-image blocks. This network achieves high-speed inference under the limited computing power of edge nodes through deep and shallow feature complementary fusion and lightweight module architecture design, taking into account the real-time requirements of emergency rescue and the accuracy of flood boundary recognition, effectively improving the accuracy and robustness of real-time flood area estimation in field scenarios.

[0017] In one embodiment of the present invention, the first deep convolutional layer, the second deep convolutional layer, and the third deep convolutional layer each include a receptive field harmonication module and a peak perception attention module; The receptive field harmonization module is used to enhance the model's global perception of connected water bodies and alleviate the structural fractures and missed detections of flooded water bodies caused by insufficient local receptive fields. The peak-aware attention module is used to suppress attention shifts that occur in noisy background scenes.

[0018] In this embodiment, the first, second, and third deep convolutional layers each include a receptive field harmonicization module and a peak perception attention module. The receptive field harmonicization module, based on a lightweight convolutional structure with a large receptive field at a mesoscale, expands the network's effective spatial receptive field, enhancing the model's global contextual awareness of large-scale connected flood bodies. It fully exploits spatial correlation information between water accumulation patches, effectively mitigating problems caused by the limited local receptive field of traditional small-kernel convolutions, such as structural fragmentation of flood bodies, missed detection of small water accumulation areas, and incomplete segmentation of contiguous water bodies. The peak perception attention module optimizes the attention weight distribution based on a pixel peak normalization mechanism. In remote sensing scenarios with complex surface background noise interference and multiple interfering features, it constrains the attention weight shift problem, accurately locks the target area of ​​the flood body, suppresses invalid feature interference from non-target background features, and ensures the stability of deep semantic feature extraction and the purity of target representation.

[0019] In one embodiment of the present invention, the calculation logic of the receptive field harmonic module is determined by the following formula:

[0020]

[0021]

[0022] In the formula, For 5×5 depth separable convolution operations, This is the intermediate feature map after spatial feature extraction. As input features, For output features, These are the projected features after high-dimensional mapping and feature selection. For global response normalization operation, For 1×1 convolution, For layer normalization operation, This is the activation function.

[0023] In this embodiment, the receptive field harmonics module innovatively introduces a 5×5 depthwise separable convolution operation. Using a 5×5 convolution kernel, the spatial receptive field area it covers is approximately 2.7 times that of a standard 3×3 convolution, enabling more complete coverage of flood-affected area features. Furthermore, combined with depthwise separable technology, the computational complexity of this module is significantly lower than that of traditional modules using stacked large kernel convolutions. From a remote sensing physics perspective, under a ground sampling distance (GSD) of 3.38 cm / pixel, the ground coverage area corresponding to the 5×5 convolution kernel is approximately 16.9 cm × 16.9 cm, effectively capturing the local geometric features of waterlogged patches in urban flooding. The equivalent ground coverage area corresponding to the 5×5 convolution kernel is further expanded to approximately 5.4 m × 5.4 m, sufficient to cover the entire area of ​​small waterlogged patches. Simultaneously, with the 5×5 depthwise separable convolution operation, the computational complexity of this module is significantly lower than that of traditional modules using stacked large kernel convolutions. In addition, to enhance the nonlinear representation capability, the module adopts an inverted bottleneck structure, which uses 1×1 convolution to expand the feature channels by 4 times for high-dimensional mapping, and introduces layer normalization and global response normalization to enhance the competition mechanism between channels. GRN promotes different channels to learn complementary feature representations by calculating the L2 norm of the feature map of each channel and performing channel normalization.

[0024] In this embodiment, to find the optimal balance between receptive field coverage and edge inference efficiency, this invention compares and analyzes three kernel sizes: 3×3, 5×5, and 7×7. From a remote sensing physics perspective, under the condition of a GSD of 3.38 cm / pixel, the single ground coverage ranges corresponding to the 3×3, 5×5, and 7×7 convolution kernels are 10.1 cm × 10.1 cm, 16.9 cm × 16.9 cm, and 23.7 cm × 23.7 cm, respectively. The typical scale of urban flooding patches is usually between 20 cm and several meters. The coverage range of the 5×5 kernel matches the scale of the local geometric features of the edge of the flooding patch and can capture complete patch boundary information in a single convolution. However, the coverage range of the 3×3 kernel is too small, requiring multiple layers to establish cross-patch context associations, which easily leads to boundary breaks. Although the 7×7 kernel has a larger coverage range, in the TensorRT edge inference engine, the operator optimization support for 7×7 depthwise separable convolution is limited, and the actual deployment efficiency is often lower than theoretical expectations. Therefore, the 5×5 kernel represents the optimal balance between receptive field coverage, parameter overhead, and edge operator optimization maturity.

[0025] In one embodiment of the present invention, the computational logic of the peak perception attention module is constructed using the following formula:

[0026]

[0027]

[0028] In the formula, Here, represents the energy value of each neuron, and t represents the feature value of the current target neuron in the input feature map. The channel mean. For channel variance, To achieve a balance between suppressing background noise and preserving water details, As the peak factor, The pixel feature value at the current spatial location. It is the largest eigenvalue in the channel space. The very small positive numbers introduced to prevent overflow during division by zero. For activation function, This is the final output feature map after modulation and enhancement by the peak perception attention module.

[0029] In this embodiment, The value of is determined on the validation set through a grid search, and experiments show that... The time model achieves the best balance between suppressing background noise and preserving water body details. Too small a λ will result in weak suppression and residual background noise; too large a λ will result in over-suppression and weaken the response at the water body edge. The smaller the value, the greater the difference between the neuron and the surrounding background, and the higher its significance. However, the energy function alone is insufficient to distinguish interference objects with high reflectivity. For example, white car roofs or light-colored buildings may also exhibit high response values ​​in the feature space. Therefore, the peak enhancement branch introduces a peak factor P to address the high response characteristics of water bodies. Its value is based on the spectral physical properties of water bodies in remote sensing images: calculated by comparing the current pixel with the maximum value X in the channel space. max The ratio explicitly increases the weight of high-confidence water body areas.

[0030] In this embodiment, the smallest positive constant value introduced to prevent division by zero overflow is 1×10. -5 The physical significance of this value lies in the fact that it is much smaller than the typical range of the feature map (0~100), and its influence on the peak factor is negligible (relative error <0.001%), serving only to ensure the stability of numerical calculations. The physical significance of this branch lies in the fact that in the visible light band, the reflectivity of clean water is usually lower than that of vegetation and soil. However, in turbid, flooded water areas, the enhanced spectral reflectance caused by suspended sediment will form local peaks in the feature space. By calculating the ratio of the current feature value to the maximum channel value, the model can focus on the core area of ​​the water body with high confidence. This design based on physical priors complements the general attention mechanism that relies solely on statistical differences, enhancing the model's robustness against interference in complex contexts.

[0031] In this embodiment, the energy value of each neuron is taken as its reciprocal. The aim is to achieve a mapping from energy to saliency: neurons with lower energy values ​​(i.e., greater differences from the background) receive higher response weights after taking the reciprocal; P is the peak factor; The inner activation function first normalizes the inverse of the energy to the (0,1) interval and then multiplies it element-wise with the peak factor P to achieve joint calibration of the energy function and the peak factor; the outer activation function performs a second normalization on the joint result to generate the final attention mask interval. The final output feature map is modulated and enhanced by the peak-aware attention module. This module, without increasing the number of parameters, combines an energy function based on spatial suppression theory and a peak factor based on the spectral physical properties of water bodies to effectively suppress non-water background and accurately locate the core region of high-response flooded water bodies amidst complex background noise. The range of the reciprocal of the energy function is... The peak factor P has a range of (0,1]. Multiplication enables bidirectional modulation of the feature response—when the reciprocal of the energy is large (significant difference between the neuron and the background) and the peak factor is close to 1 (the current pixel is the maximum value of the channel), the joint factor approaches 1, and the feature is preserved; when the reciprocal of the energy is small (insignificant difference between the neuron and the background) or the peak factor approaches 0 (the current pixel is not a high-response region), the joint factor approaches 0, and the feature is suppressed. This mechanism is similar to gain control in biological vision systems: the energy suppression branch is equivalent to "background suppression gain," and the peak enhancement branch is equivalent to "target enhancement gain." The two are combined through multiplication to achieve nonlinear interaction, enabling the model to accurately lock the core region of the high-response water body in complex backgrounds. In contrast, additive combination can only achieve linear superposition and cannot form this "AND gate" type selective enhancement effect, which is prone to attention shift in scenes with strong background noise.

[0032] In one embodiment of the present invention, the receptive field harmonic module includes a first input layer, a 5×5 mesoscale depth separable convolutional layer, a normalization layer, a first 1×1 convolutional layer, a global response normalization layer, a second 1×1 convolutional layer and a first output layer connected in sequence. The peak-aware attention module includes a second input layer, a first pooling branch, a sigmoid activation function, a second output layer, and a second pooling branch connected to the second input layer and the sigmoid activation function in sequence. The first pooling branch includes a global average pooling layer, a global channel statistics layer, and a global energy calculation layer connected in sequence. The second pooling branch includes a global max pooling layer, a spatial global maximum extraction layer, and a peak factor layer connected in sequence.

[0033] In this embodiment, the receptive field harmonic module has a first input layer, a 5×5 mesoscale depth-separable convolutional layer, a normalization layer, a first 1×1 convolutional layer, a global response normalization layer, a second 1×1 convolutional layer, and a first output layer, all cascaded together. The peak perception attention module has a first pooling branch and a second pooling branch arranged in parallel, and is cascaded together with a second input layer, a dual-branch fusion structure, a sigmoid activation function, and a second output layer. The first pooling branch is composed of a global average pooling layer, a global channel statistics layer, and a global energy calculation layer connected in sequence, and is used to complete the extraction of global semantic energy features. The second pooling branch is composed of a global max pooling layer, a spatial global maximum value extraction layer, and a peak factor layer connected in sequence, and is used to achieve target pixel peak normalization and edge feature enhancement.

[0034] In one embodiment of the present invention, the flood area estimation result is determined based on multiple sub-image patches, including: The coordinates of the sub-image blocks are corrected to obtain the error-corrected pixel coordinates; Multiple error-corrected pixel coordinates are mapped to the geographic coordinate system to generate corresponding spatial vector polygons; All polygons are subjected to union fusion and boundary smoothing, and noisy regions with areas smaller than a preset threshold are removed to obtain the flood area estimation result.

[0035] In this embodiment, firstly, to address the pixel offset errors generated during segmentation inference and image cropping of each sub-image patch, a preset coordinate correction algorithm is used to accurately correct the pixel coordinates of the sub-image patches, eliminating deviations caused by cropping offsets and inference distortions, and obtaining accurate pixel coordinates after error correction. Subsequently, all error-corrected pixel coordinates are mapped one by one to the UTM geographic coordinate system, converting the pixel-level segmentation results into real ground spatial coordinates, generating spatial vector polygons that accurately correspond to the flooded area. Finally, all generated spatial vector polygons are subjected to union fusion processing to eliminate overlapping areas caused by sub-image patch stitching. Simultaneously, the boundaries of the fused polygons are smoothed and optimized, removing small noise areas (such as isolated artifacts and mis-segmented small patches) with areas smaller than a preset threshold. Finally, accurate and reliable flooded area estimation results are obtained through vector polygon area calculation.

[0036] In one embodiment of the present invention, the error-corrected pixel coordinates are determined by the following formula:

[0037]

[0038]

[0039]

[0040] In the formula, To restore the x-coordinate of the global pixel coordinates in the full-size image coordinate system, The x-coordinate of the sub-image patch. To restore the ordinate of the global pixel coordinates in the full-size image coordinate system, For column offset, The ordinate of the sub-image patch. This is the row offset. The projection band with the smallest projection error after conversion. The longitude of the center point of the original image. This represents the physical offset of a pixel relative to the image center on the ground in the east-facing direction. The width of the original image. The pixel ground resolution of the UAV imagery. This represents the physical offset of a pixel relative to the image center in the northward direction. The height of the original image. The x-coordinate of the pixel coordinates after error correction. The ordinate of the pixel coordinates after error correction. Offset vector in the local coordinate system of the image Rotate counterclockwise around the origin by an angle. Let x be the x-coordinate of the image center point in the universal transverse Mercator projection coordinate system. The ordinate of the image center point in the universal transverse Mercator projection coordinate system.

[0041] In this embodiment, since each sub-plot is cropped from a different spatial location in the full-size image, the origin of its internal pixel coordinates is located at the upper left corner of the sub-plot, not the upper left corner of the full-size image. Therefore, based on the cropping start position (row offset r0 and column offset c0) of each sub-plot in the full-size image, the local pixel coordinates within the sub-plot need to be restored to the global pixel coordinates (u, v) in the full-size image coordinate system.

[0042] The center point coordinates in the raw XMP metadata of UAV imagery are WGS-84 coordinates expressed in latitude and longitude, which cannot be directly used for accurate calculation of planar area. Therefore, it is necessary to convert the coordinate system to the Universal Transverse Mercator projection coordinate system, which uses meters as the unit. The UTM projection divides the globe into 60 projection zones, each 6° wide, based on longitude. Each zone uses its central meridian as the projection reference to eliminate errors caused by curvature changes during projection. The projection zone with the smallest projection error after conversion can be calculated using the following formula:

[0043] In the formula The longitude of the center point of the original image is given. The UTM coordinates of the center point of the original image after projection transformation are (Yc, Xc), with the coordinate unit being meters.

[0044] After obtaining the original image center UTM coordinates (Yc, Xc), it is necessary to simultaneously map the coordinates of each sub-tile to the same coordinate system. A local coordinate system is established with the image center as the origin, and pixel offsets are converted into ground physical offsets through GSD scaling. GSD is the pixel ground resolution of the UAV image (unit: meters / pixel), which physically represents the actual ground distance corresponding to a single pixel in the image plane. According to the principle of photogrammetric collinearity equations, ignoring elevation errors caused by terrain undulations, there is a linear proportional relationship between image point coordinates and ground point coordinates. Therefore, the geographic offset of a pixel relative to the image center can be expressed as:

[0045] In the formula, These represent the physical offsets of the pixel relative to the image center in the east and north directions, respectively, where (u,v) are the pixel coordinates in the full-size image coordinate system. This transformation is based on the principle of similar triangles under the pinhole imaging model, mapping discrete pixel coordinates to continuous geographic space, thus establishing a geometrically consistent physical benchmark for subsequent area measurement.

[0046] However, the above mapping only considers translation and scaling transformations, neglecting the influence of the UAV's yaw angle on the coordinate mapping direction. In actual flight, its heading angle (yaw angle) is the angle between the aircraft's longitudinal axis and geographic north, with clockwise as positive (0°~360°). When the UAV flies in a direction other than true north, there is a rotational deviation between the image coordinate system and the geographic coordinate system. If no correction is made, the same ground feature will have a systematic orientation misalignment in images with different headings, leading to spatial topology errors during subsequent polygon fusion. According to the principle of rigid body transformation in two-dimensional space, converting the local offset of a pixel in the image plane into its absolute coordinates in the geographic coordinate system requires sequentially performing rotation and translation transformations. Let the yaw angle ψ (with true north as 0° and clockwise as positive), then the geometric meaning of the rotation transformation can be described as: rotating the offset vector in the local image coordinate system counterclockwise around the origin by an angle to align it with the geographic coordinate system. The derivation of the rotation matrix is ​​based on the basis transformation of two-dimensional rotation: the projections of the basis vectors in the geographic coordinate system onto the local image coordinate system are [cos ,-sin and [sin cos Therefore, a rotation matrix from local image coordinates to geographic coordinates can be constructed. Its mathematical form is as follows:

[0047] The rotation matrix Satisfying orthogonality (R -1 = The properties of X and the determinant det(R) = 1 ensure the shape preservation of the transformed distance and angle, i.e., no geometric distortion occurs. In the formula, X... world Y world The values ​​represent the pixel coordinates after considering yaw angle error correction, and (Xc, Yc) are the coordinates (in meters) of the image center point in the Universal Transverse Mercator (UTM) coordinate system, obtained by projection transformation of the WGS84 latitude and longitude coordinates extracted from the image XMP metadata. This correction method is based on the exterior orientation element theory in photogrammetry, ensuring the consistency of spatial references among multiple images and providing an accurate geometric registration basis for area fusion.

[0048] like Figure 3 As shown in this embodiment, the two blocks 1 in the figure are, from left to right, a shallow convolutional layer and a second shallow convolutional layer, respectively. The three blocks 2 in the figure are, from left to right, a first deep convolutional layer, a second deep convolutional layer, and a third deep convolutional layer, respectively. add is the feature alignment module, FMM is the feature fusion module, and FCN is the full convolutional integral class header. The input image is processed sequentially through Stage 1 to Stage 5. There are no parallel branches or multi-path replications. Therefore, only one memory access overhead of feature map is generated, which greatly reduces the high memory bandwidth overhead caused by dual-path parallelism on edge devices in the multi-branch paradigm.

[0049] like Figure 4As shown in the figure, the segmentation results of each model are qualitatively presented in this embodiment. From left to right, they are the test image, label, BiSeNetV1, BiSeNetV2, DDRNet, PIDNet, GCNet, STDC, and LFE-Net (flood area estimation model) proposed in this invention. Scenario 1 (narrow bifurcation flood body): As shown in the first row, due to the extremely thin target, although PIDNet-S, GCNet-S, and STDC detected part of the water body, the edges appeared severely bloated, blurry, and discontinuous. In contrast, the extraction result of LFE-Net is closest to the real label, restoring the continuous topological structure of the bifurcation while accurately focusing on the flood body area. This proves that the differentiated channel pruning and feature fusion mechanism designed in this paper can effectively preserve shallow spatial details while reducing the amount of computation, ensuring the model's high-resolution geometric representation capability when processing small-area flood bodies. Scenario 2 (Large-area connected water body): As shown in row 2, when the water body extends downwards to the edge of the image, BiSeNetV2 exhibits severe missed detections and truncation, failing to completely segment the bottom of the water body; DDRNet-23-S, PIDNet-S, and GCNet also produce some irregular noise and holes at this boundary. LFE-Net, on the other hand, accurately and smoothly extracts the complete bottom boundary, verifying the effectiveness of the large receptive field design in the receptive field harmonicization module—expanding the receptive field effectively avoids the problem of large-scale target fragmentation caused by insufficient local receptive field. Scenarios 3-5 (Complex background interference): At the junction of building shadows and flooded water bodies (rows 3 and 4) and in areas obscured by dense vegetation (row 5), DDRNet-23-S, PIDNet-S, and GCNet-S are prone to misjudgment in these areas of strong interference, with the water body edges exhibiting jagged erosion and even free-floating false detection patches. LFE-Net consistently maintains smooth edges and closely adheres to the real boundaries at the aforementioned complex intersections, verifying that the peak-sensing attention module can effectively focus on the high-response areas of flooded water bodies under complex background interference, thereby suppressing the erosion of segmentation results by interfering signals such as shadows and vegetation. However, in the third row, the extremely shallow water area covered by building shadows failed to be effectively extracted by LFE-Net and all comparison models. This may be because UWD only contains visible light information, resulting in a perception blind spot in scenes with highly confused spectral features. The spectral reflectance characteristics of extremely shallow water and wet road surfaces are extremely similar, making it difficult to distinguish them using a single optical modality.

[0050] The foregoing has described specific embodiments of the invention. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired results. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0051] According to another embodiment, the present invention provides a real-time flood area estimation device based on large model edge computing. Figure 2 A schematic block diagram of a real-time flood area estimation device based on large model edge computing according to one embodiment is shown. It will be understood that this device can be implemented by any device, equipment, platform, or cluster of devices with computing and processing capabilities. Figure 2 As shown, the device includes: an acquisition unit 200, a first data processing unit 202, and a second data processing unit 204. The main functions of each component are as follows: Acquisition unit 200 is configured to acquire image data to be tested; The first data processing unit 202 is configured to input the image data to be tested into a trained flood area estimation model to obtain sub-image blocks; The second data processing unit 204 is configured to determine the flood area estimation result based on multiple sub-image blocks; The flood area estimation model is a lightweight model deployed on edge nodes. It is trained using a pre-defined deep learning network. The deep learning network comprises a first shallow convolutional layer, a second shallow convolutional layer, a first deep convolutional layer, a second deep convolutional layer, a third deep convolutional layer, a feature alignment module, a feature fusion module, and a fully convolutional integral classifier, all connected sequentially. The first shallow convolutional layer performs initial downsampling on the input image to extract first shallow features of the image's bottom-layer edges and textures. The second shallow convolutional layer further downsamples these first shallow features, refining the shallow detail features to obtain second shallow features. The first deep convolutional layer extracts the mid-level global contextual semantic features from the second shallow features, capturing the mid-scale structure of flood patches. The system constructs information and extracts the first deep feature. The second deep convolutional layer is used to expand the feature channel dimension, strengthen the deep semantic feature modeling of large-scale flood areas, expand the effective receptive field of the network, and obtain the second deep feature. The third deep convolutional layer is used to complete the deepest level feature extraction of the network, construct a global flood scene semantic representation, and output the third deep feature. The feature alignment module is used to normalize the size of the first deep feature, the second deep feature, and the third deep feature and perform cross-layer element-wise alignment and fusion to obtain temporal features. The feature fusion module is used to fuse the temporal features and the second shallow feature to fill in the missing feature information and obtain fused features. The full convolutional integral class head is used to upsample the fused features step by step to restore them to the original image resolution and output the sub-image patch.

[0052] In one embodiment of the present invention, the first deep convolutional layer, the second deep convolutional layer, and the third deep convolutional layer each include a receptive field harmonization module and a peak perception attention module; The receptive field harmonization module is used to enhance the model's global perception of connected water bodies and alleviate the structural fractures and missed detections of flooded water bodies caused by insufficient local receptive fields. The peak perception attention module is used to suppress attention shifts that occur in noisy background scenes.

[0053] In one embodiment of the present invention, the calculation logic of the receptive field harmonic module is constructed using the following formula:

[0054]

[0055]

[0056] In the formula, For 5×5 depth separable convolution operations, This is the intermediate feature map after spatial feature extraction. As input features, For output features, These are the projected features after high-dimensional mapping and feature selection. For global response normalization operation, For 1×1 convolution, For layer normalization operation, This is the activation function.

[0057] In one embodiment of the present invention, the calculation logic of the peak perception attention module is constructed by the following formula:

[0058]

[0059]

[0060] In the formula, Here, represents the energy value of each neuron, and t represents the feature value of the current target neuron in the input feature map. The channel mean. For channel variance, To achieve a balance between suppressing background noise and preserving water details, As the peak factor, The pixel feature value at the current spatial location. It is the largest eigenvalue in the channel space. The very small positive numbers introduced to prevent overflow during division by zero. For activation function, This is the final output feature map after modulation and enhancement by the peak perception attention module.

[0061] In one embodiment of the present invention, the receptive field harmonic module includes a first input layer, a 5×5 mesoscale depth separable convolutional layer, a normalization layer, a first 1×1 convolutional layer, a global response normalization layer, a second 1×1 convolutional layer, and a first output layer connected in sequence. The peak perception attention module includes a second input layer, a first pooling branch, a sigmoid activation function, a second output layer, and a second pooling branch connected to the second input layer and the sigmoid activation function in sequence. The first pooling branch includes a global average pooling layer, a global channel statistics layer, and a global energy calculation layer connected in sequence. The second pooling branch includes a global max pooling layer, a spatial global maximum extraction layer, and a peak factor layer connected in sequence.

[0062] In one embodiment of the present invention, the second data processing unit 204 is configured to perform the following operation: correct the coordinates of the sub-image block to obtain the error-corrected pixel coordinates; Multiple error-corrected pixel coordinates are mapped to the geographic coordinate system to generate corresponding spatial vector polygons; All polygons are subjected to union fusion and boundary smoothing processing to remove noise regions with an area smaller than a preset threshold, thus obtaining the flood area estimation result.

[0063] In one embodiment of the present invention, the error-corrected pixel coordinates are determined by the following formula:

[0064]

[0065]

[0066]

[0067] In the formula, To restore the x-coordinate of the global pixel coordinates in the full-size image coordinate system, The x-coordinate of the sub-image patch. To restore the ordinate of the global pixel coordinates in the full-size image coordinate system, For column offset, The ordinate of the sub-image patch. This is the row offset. The projection band with the smallest projection error after conversion. The longitude of the center point of the original image. This represents the physical offset of a pixel relative to the image center on the ground in the east-facing direction. The width of the original image. The pixel ground resolution of the UAV imagery. This represents the physical offset of a pixel relative to the image center in the northward direction. The height of the original image. The x-coordinate of the pixel coordinates after error correction. Let be the ordinate of the pixel coordinates after error correction. Offset vector in the local coordinate system of the image Rotate counterclockwise around the origin by an angle. Let x be the x-coordinate of the image center point in the universal transverse Mercator projection coordinate system. The ordinate of the image center point in the universal transverse Mercator projection coordinate system.

[0068] According to another embodiment, a computer-readable storage medium is also provided, on which a computer program is stored, which, when executed in a computer, causes the computer to perform a combination Figure 1 The method described.

[0069] According to another embodiment, an electronic device is also provided, including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, it implements a combination... Figure 1 The method described.

[0070] The various embodiments in this invention are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the apparatus embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions of the method embodiments.

[0071] Those skilled in the art will recognize that, in one or more of the examples above, the functions described in this invention can be implemented using hardware, software, firmware, or any combination thereof. When implemented in software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or code on a computer-readable medium.

[0072] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of the present invention. It should be understood that the above description is only a specific embodiment of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made on the basis of the technical solution of the present invention should be included within the scope of protection of the present invention.

Claims

1. A real-time flood area estimation method based on large-model edge computing, characterized in that, include: Acquire the image data to be tested; The image data to be tested is input into the trained flood area estimation model to obtain sub-image blocks; Based on multiple sub-image patches, the flood area estimation result is determined; The flood area estimation model is a lightweight model deployed on edge nodes. It is trained using a pre-defined deep learning network. The deep learning network comprises a first shallow convolutional layer, a second shallow convolutional layer, a first deep convolutional layer, a second deep convolutional layer, a third deep convolutional layer, a feature alignment module, a feature fusion module, and a fully convolutional integral classifier, all connected sequentially. The first shallow convolutional layer performs initial downsampling on the input image to extract first shallow features of the image's bottom-layer edges and textures. The second shallow convolutional layer further downsamples these features, refining the shallow details to obtain second shallow features. The first deep convolutional layer further refines the second shallow features. The network employs a first deep feature extraction module to extract global contextual semantic features, captures mid-scale structural information of flood patches, and extracts the first deep feature. A second deep convolutional layer expands the feature channel dimension, strengthens deep semantic feature modeling of large-scale flood areas, and expands the network's effective receptive field. A third deep convolutional layer completes the deepest-level feature extraction, constructs a global flood scene semantic representation, and outputs the third deep feature. A feature alignment module performs size normalization and cross-layer element-wise alignment fusion on the first, second, and third deep features to obtain temporal features. A feature fusion module fuses the temporal features and the second shallow feature, filling in missing feature information to obtain fused features. The full-convolution integral class header is used to upsample the fused features step by step to restore them to the original image resolution and output the sub-image blocks.

2. The method according to claim 1, characterized in that, The first deep convolutional layer, the second deep convolutional layer, and the third deep convolutional layer each include a receptive field harmonication module and a peak perception attention module; The receptive field harmonization module is used to enhance the model's global perception of connected water bodies and alleviate the structural fractures and missed detections of flooded water bodies caused by insufficient local receptive fields. The peak perception attention module is used to suppress attention shifts that occur in noisy background scenes.

3. The method according to claim 2, characterized in that, The computational logic of the receptive field harmonic module is constructed using the following formula: In the formula, For 5×5 depth separable convolution operations, This is the intermediate feature map after spatial feature extraction. For input features, For output features, These are the projected features after high-dimensional mapping and feature selection. For global response normalization operation, For 1×1 convolution, For layer normalization operation, This is the activation function.

4. The method according to claim 2, characterized in that, The computational logic of the peak perception attention module is constructed using the following formula: In the formula, Here, represents the energy value of each neuron, and t represents the feature value of the current target neuron in the input feature map. The channel mean. For channel variance, To achieve a balance between suppressing background noise and preserving water details, As the peak factor, The pixel feature value at the current spatial location. It is the largest eigenvalue in the channel space. The very small positive numbers introduced to prevent overflow during division by zero. For activation function, This is the final output feature map after modulation and enhancement by the peak perception attention module.

5. The method according to claim 2, characterized in that, The receptive field harmonicization module includes a first input layer, a 5×5 mesoscale depth-separable convolutional layer, a normalization layer, a first 1×1 convolutional layer, a global response normalization layer, a second 1×1 convolutional layer, and a first output layer connected in sequence. The peak perception attention module includes a second input layer, a first pooling branch, a sigmoid activation function, a second output layer, and a second pooling branch connected to the second input layer and the sigmoid activation function in sequence. The first pooling branch includes a global average pooling layer, a global channel statistics layer, and a global energy calculation layer connected in sequence. The second pooling branch includes a global max pooling layer, a spatial global maximum extraction layer, and a peak factor layer connected in sequence.

6. The method according to claim 1, characterized in that, The determination of the flood area estimation result based on multiple sub-image patches includes: The coordinates of the sub-image block are corrected to obtain the error-corrected pixel coordinates; Multiple error-corrected pixel coordinates are mapped to the geographic coordinate system to generate corresponding spatial vector polygons; All polygons are subjected to union fusion and boundary smoothing processing to remove noise regions with an area smaller than a preset threshold, thus obtaining the flood area estimation result.

7. The method according to claim 6, characterized in that, The error-corrected pixel coordinates are determined by the following formula: In the formula, To restore the x-coordinate of the global pixel coordinates in the full-size image coordinate system, The x-coordinate of the sub-image patch. To restore the ordinate of the global pixel coordinates in the full-size image coordinate system, For column offset, The ordinate of the sub-image patch. This is the row offset. The projection band with the smallest projection error after conversion. The longitude of the center point of the original image. This represents the physical offset of a pixel relative to the image center on the ground in the east-facing direction. The width of the original image. The pixel ground resolution of the UAV imagery. This represents the physical offset of a pixel relative to the image center in the northward direction. The height of the original image. The x-coordinate of the pixel coordinates after error correction. Let be the ordinate of the pixel coordinates after error correction. Offset vector in the local coordinate system of the image Rotate counterclockwise around the origin by an angle. Let x be the x-coordinate of the image center point in the universal transverse Mercator projection coordinate system. The ordinate of the image center point in the universal transverse Mercator projection coordinate system.

8. A real-time flood area estimation device based on large-model edge computing, characterized in that, include: The acquisition unit is configured to acquire the image data to be tested. The first data processing unit is configured to input the image data to be tested into a trained flood area estimation model to obtain sub-image blocks; The second data processing unit is configured to determine the flood area estimation result based on multiple sub-image blocks; The flood area estimation model is a lightweight model deployed on edge nodes. It is trained using a pre-defined deep learning network. The deep learning network comprises a first shallow convolutional layer, a second shallow convolutional layer, a first deep convolutional layer, a second deep convolutional layer, a third deep convolutional layer, a feature alignment module, a feature fusion module, and a fully convolutional integral classifier, all connected sequentially. The first shallow convolutional layer performs initial downsampling on the input image to extract first shallow features of the image's bottom-layer edges and textures. The second shallow convolutional layer further downsamples these features, refining the shallow details to obtain second shallow features. The first deep convolutional layer further refines the second shallow features. The network employs a first deep feature extraction module to extract global contextual semantic features, captures mid-scale structural information of flood patches, and extracts the first deep feature. A second deep convolutional layer expands the feature channel dimension, strengthens deep semantic feature modeling of large-scale flood areas, and expands the network's effective receptive field. A third deep convolutional layer completes the deepest-level feature extraction, constructs a global flood scene semantic representation, and outputs the third deep feature. A feature alignment module performs size normalization and cross-layer element-wise alignment fusion on the first, second, and third deep features to obtain temporal features. A feature fusion module fuses the temporal features and the second shallow feature, filling in missing feature information to obtain fused features. The full-convolution integral class header is used to upsample the fused features step by step to restore them to the original image resolution and output the sub-image blocks.

9. An electronic device, characterized in that, It includes a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the method as described in any one of claims 1-7.

10. A computer-readable storage medium, characterized in that, It stores a computer program that, when executed in a computer, causes the computer to perform the method described in any one of claims 1-7.