A large model-based damage assessment method
By using a large model-based damage assessment method, image segmentation and feature extraction techniques, combined with a Transformer encoder-decoder structure, an accurate damage assessment report is generated. This solves the problems of low assessment efficiency and insufficient accuracy in existing technologies, and achieves stronger generalization ability and speed.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HANGZHOU EBOYLAMP ELECTRONICS CO LTD
- Filing Date
- 2024-03-05
- Publication Date
- 2026-06-26
Smart Images

Figure CN118196425B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of damage assessment, specifically relating to a damage assessment method based on a large model. Background Technology
[0002] Damage assessment plays a crucial role in developing appropriate strategies and improving mission efficiency. Currently, the main methods for damage assessment involve obtaining real-time photographs of the target location and its surroundings using satellite remote sensing and aerial reconnaissance, comparing and analyzing the scenes before and after the attack, and obtaining damage assessment conclusions through expert evaluation. However, this method requires strong expert knowledge and is inefficient.
[0003] Currently, some research has been conducted on automatic damage assessment methods, proposing approaches such as the analytic hierarchy process (AHP), fuzzy comprehensive judgment method, Bayesian networks, and neural network nonlinear mapping. These methods primarily rely on mathematical modeling of damage assessment for specific scenarios. However, their applicability is limited, and they suffer from limitations in handling complex damage assessment scenarios, resulting in inaccurate assessment results. Summary of the Invention
[0004] The purpose of this invention is to address the problems raised in the background art by proposing a damage assessment method based on a large model.
[0005] To achieve the above objectives, the technical solution adopted by the present invention is as follows:
[0006] This invention proposes a damage assessment method based on a large model, comprising:
[0007] Images before and after the damage were acquired, and then scaled to obtain the first image and the second image.
[0008] Both the first and second images are input into a trained damage assessment network model to obtain a damage assessment report. The damage assessment network model includes an image segmentation model, a feature extraction structure, and a damage assessment generation network, wherein:
[0009] Both the first image and the second image are input into the image segmentation model to obtain a first segmentation binary image and a second segmentation binary image, respectively. Then, the first segmentation binary image and the first image are fused together to obtain a first input image. The second segmentation binary image and the second image are fused together to obtain a second input image. The first input image and the second input image are both processed by the feature extraction structure and then fused and stitched together to obtain input features. Finally, the input features are used as input to the damage assessment generation network to obtain a damage assessment report.
[0010] Preferably, the step of scaling the images before and after the damage to obtain the first image and the second image respectively includes:
[0011] The images before and after the damage are scaled to a preset resolution to obtain the first image and the second image.
[0012] Preferably, the step of fusing and stitching together the first input image and the second input image after both have undergone feature extraction by the feature extraction structure to obtain the input features includes:
[0013] The first input image and the second input image are divided into grid regions of a preset size, and then the feature vectors of the first image and the second image are obtained from each grid through the feature extraction structure.
[0014] After the first image feature vector and the first position encoding vector are fused in the third step, they are then fused in the fourth step with the second position encoding vector to obtain the first result. The first position encoding vector is the image feature vector of one of the networks of the first input image after passing through the feature extraction structure, and the second position encoding vector is the position of the image feature vector in the first position encoding vector in the first input image.
[0015] After the second image feature vector and the third position encoding vector are fused in the fifth step, they are then fused in the sixth step with the fourth position encoding vector to obtain the second result. The third position encoding vector is the image feature vector of one of the networks of the second input image after passing through the feature extraction structure, and the fourth position encoding vector is the position of the image feature vector in the third position encoding vector in the second input image.
[0016] Then, the first result and the second result are concatenated to obtain the input features.
[0017] Preferably, the damage assessment generation network includes M Transformer encoder and decoder structures, and the input features pass through the M encoder structures and then through the M decoder structures. The decoder structure includes a self-attention structure, a cross self-attention structure, a normalization layer, and a feedforward neural network connected sequentially from input to output. A dimensionality reduction connection network structure is added to each decoder structure. The output of the self-attention structure is used as the input of the dimensionality reduction connection network structure. The output of the dimensionality reduction connection network structure and the output of the feedforward neural network are fused together to obtain the damage assessment report.
[0018] Preferably, both the first and second fusion methods are weighted summation methods, with the specific formulas as follows:
[0019] I = w1 * I1 + w2 * I2
[0020] Where I represents the first input image or the second input image. When I represents the first input image, I1 represents the first image, I2 represents the first segmentation binary image, w1 represents the weight of the first image, and w2 represents the weight of the first segmentation binary image. When I represents the second input image, I1 represents the second image, I2 represents the second segmentation binary image, w1 represents the weight of the second image, and w2 represents the weight of the second segmentation binary image, and w1+w2=1.0, 0≤w1≤1.0, 0≤w2≤1.0.
[0021] Preferably, the methods for the third, fourth, fifth, and sixth fusions are vector summation methods.
[0022] Preferably, the calculation formula for the seventh fusion is as follows:
[0023] F = F1 + w * F2
[0024] Where represents the feature vector after the seventh fusion, F1 represents the output vector of the feedforward neural network, F2 represents the output vector of the dimension reduction connection network structure, w represents the weight of the output vector of the dimension reduction connection network structure, and 0≤w≤1.0.
[0025] Preferably, the dimensionality reduction network structure includes a first fully connected layer and a second fully connected layer connected sequentially from input to output.
[0026] Preferably, the feature extraction structure is a fully connected neural network.
[0027] Preferably, the training dataset for the damage assessment network model training process is established as follows:
[0028] Collect image pairs of the target area before and after damage, conduct manual damage assessment based on the collected image pairs, and write a first damage assessment report that meets the preset standards;
[0029] The collected images before and after the damage were used to train a generative adversarial network (GAN) model, and then the trained GAN model was used to augment the data:
[0030] The collected pre-damage image is used as input to the trained generative adversarial network model to generate the corresponding post-damage image;
[0031] The acquired damaged image is used as input to the trained generative adversarial network model to generate the corresponding undamaged image.
[0032] The images before and after damage generated by the generative adversarial network model are used as new image pairs for manual damage assessment. A second damage assessment report that meets preset standards is then written, with the data before and after augmentation used together as the training dataset.
[0033] Compared with the prior art, the beneficial effects of the present invention are as follows:
[0034] This large-model-based damage assessment method improves upon existing large-model approaches by adding a reduced-rank network connection structure to the decoder, resulting in a damage assessment network model. The images before and after damage are then input into this trained network model to obtain an accurate damage assessment report. This method leverages existing large-models to effectively utilize their feature extraction capabilities from massive datasets while maintaining their feature perception capabilities specific to the damage assessment domain, resulting in stronger generalization ability. Furthermore, the image segmentation model employs existing techniques, thereby improving training speed and reducing parameter updates. Attached Figure Description
[0035] Figure 1 This is a schematic diagram of the damage assessment network model in the damage assessment method based on a large model of the present invention;
[0036] Figure 2 This is a schematic diagram of a decoder structure in the prior art;
[0037] Figure 3 This is a schematic diagram of the improved decoder structure in this invention. Detailed Implementation
[0038] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0039] It should be noted that when a component is referred to as being "connected" to another component, it can be directly connected to the other component or there may be an intervening component. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the specification of this application is for the purpose of describing particular embodiments only and is not intended to limit the application.
[0040] In one embodiment, such as Figure 1-3 As shown, a damage assessment method based on a large model includes:
[0041] Step 1: Obtain images before and after the damage, then scale the images to obtain the first and second images, as follows:
[0042] The images before and after the damage are scaled to a preset resolution to obtain a first image and a second image (the first image is obtained by scaling the image before the damage, and the second image is obtained by scaling the image after the damage). There is no limit to the preset resolution, such as a preset resolution of 224x224 pixels.
[0043] Step 2: Input both the first and second images into the trained damage assessment network model to obtain a damage assessment report. The damage assessment network model includes an image segmentation model, a feature extraction structure, and a damage assessment generation network, where, for example... Figure 1 As shown:
[0044] Step 2.1: Input both the first image and the second image into the image segmentation model to obtain the first segmentation binary image and the second segmentation binary image respectively (the first image is processed by the image segmentation model to obtain the first segmentation binary image, and the second image is processed by the image segmentation model to obtain the second segmentation binary image); wherein the image segmentation model in this embodiment is the SAM image segmentation model based on ViT-G / 14, but other image segmentation models can also be used.
[0045] Step 2.2: Then, perform a first fusion between the first segmented binary image and the first image to obtain the first input image, and perform a second fusion between the second segmented binary image and the second image to obtain the second input image; wherein, both the first fusion and the second fusion are weighted summation methods, and the specific formulas are as follows:
[0046] I = w1 * I1 + w2 * I2
[0047] Wherein, I represents either the first input image or the second input image. When I represents the first input image, I1 represents the first image, I2 represents the first segmentation binary image, w1 represents the weight of the first image, and w2 represents the weight of the first segmentation binary image. When I represents the second input image, I1 represents the second image, I2 represents the second segmentation binary image, w1 represents the weight of the second image, and w2 represents the weight of the second segmentation binary image, and w1 + w2 = 1.0, 0 ≤ w1 ≤ 1.0, and 0 ≤ w2 ≤ 1.0. In this embodiment, w1 = 0.2 is taken to highlight the main target object and suppress irrelevant targets.
[0048] Step 2.3: After extracting structural features from both the first and second input images, fuse and stitch them together to obtain the input features, as detailed below:
[0049] The first and second input images are divided into grid regions of a preset size (e.g., 16×16 grid regions, i.e., 16 rows and 16 columns, totaling 256 grid regions). Then, each grid of the first and second input images is processed through a feature extraction structure to obtain a first image feature vector and a second image feature vector, respectively (the first input image is processed through the feature extraction structure to obtain the first image feature vector, and the second input image is processed through the feature extraction structure to obtain the second image feature vector). The feature extraction structure is a fully connected neural network with an output dimension of 768.
[0050] After the first image feature vector and the first position encoding vector are fused in the third step, they are then fused in the fourth step with the second position encoding vector to obtain the first result. The first position encoding vector is the image feature vector of one of the networks of the first input image after passing through the feature extraction structure. The second position encoding vector is the position of the image feature vector in the first position encoding vector in the first input image. The first position encoding vector is a 768-dimensional vector with the first element being 1 and the remaining elements being 0.
[0051] The second image feature vector and the third positional encoding vector are fused in a fifth process, and then fused in a sixth process with the fourth positional encoding vector to obtain the second result. The third positional encoding vector is the image feature vector of one of the networks in the second input image after feature extraction. The fourth positional encoding vector is the position of the image feature vector in the third positional encoding vector within the second input image. The third positional encoding vector is a 768-dimensional vector with the second element being 1 and all other elements being 0. Both the second and fourth positional encoding vectors can be obtained using the following formula:
[0052]
[0053]
[0054] Where PE represents the position code, and pos represents the current parameter position number. d_index represents the element position in the position encoding vector, which is the element position in the position encoding vector divided by 2 and rounded down. d represents the dimension of the position encoding vector (768 dimensions).
[0055] Then, the first result and the second result are concatenated to obtain the input features.
[0056] Among them, the methods for the third, fourth, fifth, and sixth fusions are vector summation methods.
[0057] Step 2.4: Finally, use the input features as input to the damage assessment generation network to obtain the damage assessment report.
[0058] Among them, such as Figure 2-3 As shown, the damage assessment generation network includes M Transformer encoder and decoder structures (in this embodiment, the M Transformer encoder and decoder structures from the VisualGLM-6B large model are used, which can simultaneously process image and text information features). The input features pass through the M encoder structures and then through the M decoder structures. The decoder structure includes a self-attention structure, a cross-self-attention structure, a normalization layer, and a feedforward neural network (FFN) connected sequentially from input to output. A dimensionality reduction connection network structure is added to each decoder structure. The output of the self-attention structure serves as the input to the dimensionality reduction connection network structure. The output of the dimensionality reduction connection network structure and the output of the feedforward neural network are fused together to obtain the damage assessment report. Figure 2 This refers to the decoder structure in the existing VisualGLM-6B large model. Figure 3 This is an improved decoder structure based on the decoder structure in the VisualGLM-6B large model in this method.
[0059] The calculation formula for the seventh fusion is as follows:
[0060] F = F1 + w * F2
[0061] Where represents the feature vector after the seventh fusion, F1 represents the output vector of the feedforward neural network, F2 represents the output vector of the dimension reduction connection network structure, w represents the weight of the output vector of the dimension reduction connection network structure, and 0≤w≤1.0, such as w=1.0.
[0062] The dimensionality reduction connection network structure includes a first fully connected layer and a second fully connected layer connected sequentially from input to output. The input dimension of the first fully connected layer is consistent with the input feature dimension, i.e., 768 dimensions, and the output feature dimension is 48 dimensions, which is 1 / 16 of the input feature dimension. The input dimension of the second fully connected layer is consistent with the output dimension of the first fully connected layer, i.e., 48 dimensions, and the output feature dimension is the same as the output of the decoder structure.
[0063] In another embodiment, the damage assessment network model is trained as follows:
[0064] First, a training dataset is established, and the training dataset for the damage assessment network model is established as follows:
[0065] Collect image pairs of the target area before and after damage (in this embodiment, the collected images include satellite remote sensing images and aerial reconnaissance images of the target area, and the satellite remote sensing images must include multiple images before and after damage, and the aerial reconnaissance images must include multiple images before and after damage, wherein the image pairs before and after damage can be from the same source or different sources, such as the image before damage being a satellite remote sensing image, and the image after damage being either an aerial reconnaissance image or a satellite remote sensing image), perform manual damage assessment based on the collected image pairs before and after damage, and write a first damage assessment report that meets the preset standards (written based on the prior knowledge of experts, and the preset standard can be that the number of words in the damage assessment report is not less than a preset value N, such as N being 100);
[0066] The collected images before and after damage are used to train a generative adversarial network (GAN) model, which takes the image before damage as input and outputs the image after damage. Then, the trained GAN model is used to augment the data.
[0067] The collected pre-damage image is used as input to the trained generative adversarial network model to generate the corresponding post-damage image;
[0068] The acquired damaged image is used as input to the trained generative adversarial network model to generate the corresponding undamaged image.
[0069] The images before and after damage generated by the generative adversarial network model are used as new image pairs for manual damage assessment. A second damage assessment report that meets preset standards is then written. The data before and after the augmentation are used together as the training dataset (by augmenting the training dataset, the generalization ability of the network model is greatly improved).
[0070] During the training of the image segmentation model, the pre-trained SAM image segmentation model weights (as used in existing technologies) are used as the initialization parameters of the image segmentation model. From all images in the established training set, including images before and after damage, at least 50% of the images are selected for image segmentation and manually labeled to obtain the ground truth values of the segmented images. This ground truth values are used as the training dataset for the image segmentation model, and the training iterations are no less than 100 epochs. The initial parameter learning rate is no higher than 0.0001. If the training objective is achieved, training stops, and the image segmentation model is obtained after training. Because the pre-trained SAM image segmentation model weights are used as the initialization parameters of the image segmentation model, fewer parameter updates are needed to reach the optimal level during training, thereby reducing the number of training iterations and improving training speed.
[0071] During the training of the damage assessment generation network, the network parameters other than the dimensionality reduction connection network structure are initialized using the pre-trained VisualGLM-6B model (in the prior art) (i.e., the structural parameters other than the dimensionality reduction connection network structure are locked and the parameters are not updated during training). The parameters of the dimensionality reduction connection network structure are randomly initialized. The established training dataset, including before and after damage image pairs and corresponding damage assessment reports, is used as training data.
[0072] During the training of the damage assessment network model, the damage assessment reports (including the first and second damage assessment reports) in the established training dataset are compared with the predicted damage assessment reports. The loss function value is calculated based on the loss function, the parameter gradient is calculated using backpropagation, and the parameter values are updated using gradient descent. If the training objective is achieved, training stops. If the training objective is not achieved, it is checked whether the preset number of training iterations has been reached. If the number of training iterations has been reached, training stops. Otherwise, the above steps are repeated until the training ends and the final damage assessment network model is obtained.
[0073] This large-model-based damage assessment method improves upon existing large-model approaches by adding a reduced-rank network connection structure to the decoder, resulting in a damage assessment network model. The images before and after damage are then input into this trained network model to obtain an accurate damage assessment report. This method leverages existing large-models to effectively utilize their feature extraction capabilities from massive datasets while maintaining their feature perception capabilities specific to the damage assessment domain, resulting in stronger generalization ability. Furthermore, the image segmentation model employs existing techniques, thereby improving training speed and reducing parameter updates.
[0074] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0075] The embodiments described above are merely specific and detailed examples of the embodiments described in this application, and should not be construed as limiting the scope of the patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the scope of protection of this application. Therefore, the scope of protection of this patent application should be determined by the appended claims.
Claims
1. A damage assessment method based on a large model, characterized in that: The damage assessment method based on a large model includes: Images before and after the damage were acquired, and then scaled to obtain the first image and the second image. Both the first and second images are input into a trained damage assessment network model to obtain a damage assessment report. The damage assessment network model includes an image segmentation model, a feature extraction structure, and a damage assessment generation network, wherein: The first image and the second image are both input into the image segmentation model to obtain a first segmentation binary image and a second segmentation binary image, respectively. Then, the first segmentation binary image and the first image are fused together to obtain a first input image. The second segmentation binary image and the second image are fused together to obtain a second input image. The first input image and the second input image are both processed by the feature extraction structure and then fused and stitched together to obtain input features. Finally, the input features are used as input to the damage assessment generation network to obtain a damage assessment report. The step of fusing and stitching together the first and second input images after both have undergone feature extraction by the feature extraction structure to obtain input features includes: The first input image and the second input image are divided into grid regions of a preset size, and then the feature vectors of the first image and the second image are obtained from each grid through the feature extraction structure. After the first image feature vector and the first position encoding vector are fused in the third step, they are then fused in the fourth step with the second position encoding vector to obtain the first result. The first position encoding vector is the image feature vector of one of the networks of the first input image after passing through the feature extraction structure, and the second position encoding vector is the position of the image feature vector in the first position encoding vector in the first input image. After the second image feature vector and the third position encoding vector are fused in the fifth step, they are then fused in the sixth step with the fourth position encoding vector to obtain the second result. The third position encoding vector is the image feature vector of one of the networks of the second input image after passing through the feature extraction structure, and the fourth position encoding vector is the position of the image feature vector in the third position encoding vector in the second input image. Then, the first result and the second result are concatenated to obtain the input features; The damage assessment generation network includes M Transformer encoder and decoder structures. The input features pass through the M encoder structures and then through the M decoder structures. The decoder structure includes a self-attention structure, a cross self-attention structure, a normalization layer, and a feedforward neural network connected sequentially from input to output. A dimensionality reduction connection network structure is added to each decoder structure. The output of the self-attention structure is used as the input of the dimensionality reduction connection network structure. The output of the dimensionality reduction connection network structure and the output of the feedforward neural network are fused together to obtain the damage assessment report. The dimensionality-reduced connection network structure includes a first fully connected layer and a second fully connected layer connected sequentially from input to output.
2. The damage assessment method based on a large model as described in claim 1, characterized in that: The process then involves scaling the images before and after the damage to obtain a first image and a second image, respectively, including: The images before and after the damage are scaled to a preset resolution to obtain the first image and the second image.
3. The damage assessment method based on a large model as described in claim 1, characterized in that: Both the first and second fusion methods use a weighted summation method, with the specific formulas as follows: ; in, Represents either the first input image or the second input image, when When representing the first input image, Represents the first image. This represents the first segmented binary image. Indicates the weight of the first image. Indicates the weight of the first segment of the binary graph; when When representing the second input image, This represents the second image. This represents the second segmented binary image. Indicates the weight of the second image. This represents the weight of the second segmented binary graph, and + , , .
4. The damage assessment method based on a large model as described in claim 1, characterized in that: The methods for the third, fourth, fifth, and sixth fusions are vector summation methods.
5. The damage assessment method based on a large model as described in claim 1, characterized in that: The calculation formula for the seventh fusion is as follows: ; Where represents the feature vector after the seventh fusion. This represents the output vector of the feedforward neural network. This represents the output vector of the dimension-reduced connected network structure. The weights represent the output vector of the dimensionality-reduced connected network structure, and .
6. The damage assessment method based on a large model as described in claim 1, characterized in that: The feature extraction structure is a fully connected neural network.
7. The damage assessment method based on a large model as described in claim 1, characterized in that: The training dataset for the damage assessment network model is established as follows: Collect image pairs of the target area before and after damage, conduct manual damage assessment based on the collected image pairs, and write a first damage assessment report that meets the preset standards; The collected images before and after the damage were used to train a generative adversarial network (GAN) model, and then the trained GAN model was used to augment the data: The collected pre-damage image is used as input to the trained generative adversarial network model to generate the corresponding post-damage image; The acquired damaged image is used as input to the trained generative adversarial network model to generate the corresponding undamaged image. The images before and after damage generated by the generative adversarial network model are used as new image pairs for manual damage assessment. A second damage assessment report that meets preset standards is then written, with the data before and after augmentation used together as the training dataset.