Intelligent Analysis Method and System for Geological Hazards Based on Large-Scale Language Models
By combining large-scale language models with anomaly detection and visual language alignment technology, geological disaster analysis texts are automatically generated, solving the problems of manual dependence and deployment limitations in existing technologies, and realizing efficient and standardized disaster detection and analysis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 安徽明生恒卓科技有限公司
- Filing Date
- 2025-09-15
- Publication Date
- 2026-06-30
Smart Images

Figure CN121456331B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of artificial intelligence, remote sensing monitoring and geological disaster prevention and control, specifically to a geological disaster intelligent analysis method based on a large-scale language model and a geological disaster intelligent analysis system based on a large-scale language model. Background Technology
[0002] Geological disasters (such as landslides, debris flows, and collapses) are characterized by their suddenness, destructive power, and wide impact, posing a serious threat to people's lives and property and the operation of infrastructure. To improve disaster prevention and mitigation capabilities, government departments and research institutions widely use remote sensing technology for disaster monitoring and assessment. Remote sensing imagery, due to its large coverage area, rapid update speed, and ability to be acquired from multiple sources (such as optical, radar, and infrared), has become an important data source for geological disaster monitoring.
[0003] Traditional geological hazard detection methods largely rely on machine vision and deep learning technologies. [1] The typical process is as follows: first, image segmentation models are used to extract suspected disaster areas, and then human experts analyze the disaster type, scope, and severity based on the images and segmentation results. This method improves the automation level of disaster identification to some extent, but it still has significant shortcomings in practical applications: First, it is highly dependent on human intervention and inefficient, requiring experts to evaluate each segmentation result, leading to a large workload and slow response time; second, the results are not standardized enough, with different experts having different criteria for judging the same disaster, affecting the consistency of emergency command; third, automated expression is lacking, making it difficult to directly generate structured, machine-readable disaster analysis text, thus limiting seamless integration with emergency dispatch systems; fourth, deployment is limited, as high-precision deep learning models have high computational overhead and are not suitable for real-time deployment and inference at the edge under emergency conditions.
[0004] In response to the above problems, in recent years, academia and industry have conducted a great deal of research in areas such as intelligent analysis of remote sensing images and visual-language fusion. [2] For example, large-scale language models (LLMs) have achieved significant breakthroughs in natural language understanding and generation. Multimodal models (Qwen-VL, Qwen2-VL) can process image and text inputs simultaneously, possess cross-modal information fusion and reasoning capabilities, and perform excellently in tasks such as knowledge question answering and image description. [3][4]However, the application of these technologies in geological disaster scenarios is still limited to general visual tasks, lacking deep integration and specialized adaptation of remote sensing images and disaster segmentation results. Existing research mostly focuses on disaster area identification or simple text descriptions, failing to achieve automatic generation of disaster discrimination and standardized analysis text, and even more so, failing to solve the problem of lightweight deployment of high-precision models under emergency conditions. This invention proposes a geological disaster intelligent question-answering system based on a large-scale language model. The system first processes remote sensing images and UAV inspection images using an anomaly detection model to generate disaster detection results and pixel-level feature information; then, the pixel-level feature information is input as prompt information into the public visual language model (MiniGPT-4). [5] The system automatically generates text descriptions conforming to the standards of the geological disaster field, thus constructing an image-text pair dataset. Based on this, image features and generated text semantic information are input into a visual language alignment module to achieve accurate fusion and alignment of cross-modal features. Further fine-tuning is achieved by combining a fully connected network with a large-scale language model, enabling the model to accurately understand the fine-grained semantic features of geological disasters. Ultimately, the system can not only automatically identify and classify multiple types of geological disasters, but also generate structured disaster analysis text and support intelligent question answering for disaster scenarios. Through lightweight network design and effective integration of domain knowledge, the system ensures accuracy and professionalism while running efficiently on edge devices, significantly improving the automation, standardization, and intelligence of geological disaster detection and analysis.
[0005] References:
[0006] [1].Roth K,Pemula L,Zepeda J,et al.Towards total recall in industrial anomaly detection[C] / / Proceedings of the IEEE / CVF conference on computervision and pattern recognition.2022:14318-14328.
[0007] [2].Li J, Li D, Xiong C, et al. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation[C] / / International conference on machine learning.PMLR, 2022:12888-12900.
[0008] [3].Li C, Li Z, Jing C, et al. Searchlvlms: A plug-and-play framework foraugmenting large vision-language models by searching up-to-date internetknowledge[J]. Advances in Neural Information Processing Systems, 2024, 37: 64582-64603.
[0009] [4].Yu Y,Shi C,Tang J,et al.Qwen-VL2 Model with NEFTune technique ForMedical Report Generation[C] / / 2025 4th International Symposium on ComputerApplications and Information Technology(ISCAIT).IEEE,2025:165-168.
[0010] [5].Zhu D,Chen J,Shen X,et al.Minigpt-4:Enhancing vision-language understanding with advanced large language models[J].arXiv preprint arXiv:2304.10592,2023. Summary of the Invention
[0011] (1) Technical problems to be solved
[0012] To address the technical problems of low discrimination efficiency, susceptibility to subjective influence, and lack of automated text analysis in existing disaster identification systems, this invention provides a geological disaster intelligent analysis method and system based on a large-scale language model.
[0013] (2) Technical solution
[0014] Firstly, this invention discloses a geological hazard intelligent analysis method based on a large-scale language model. The method inputs an image to be processed within a target area into a trained geological hazard analysis model, and outputs geological hazard analysis text conforming to geological hazard field standards, as well as visualized hazard detection images. The geological hazard analysis model includes:
[0015] The data preprocessing module is used to process the image to be processed into a calibrated image format;
[0016] The PatchCore anomaly detection model is used to output pixel-level features and visualized disaster detection images based on formatted images.
[0017] The visual language alignment model Q-Former is used to output visual features aligned with disaster information based on pixel-level features;
[0018] A fully connected network is used to output soft cue projection vectors based on visual features;
[0019] The large-scale language model Qwen2.0 is used to output the geological disaster analysis text based on soft cue projection vectors and calibrated cue templates.
[0020] As an improvement to the above scheme, the data preprocessing method of the data preprocessing module includes:
[0021] The image to be processed is resized so that its shorter side is proportionally scaled to the specified length.
[0022] Perform a center crop on the scaled image to obtain a fixed-resolution image region;
[0023] Perform color space unification and data type conversion on the cropped image to ensure that the image is in three-channel floating-point format;
[0024] The converted image is normalized by scaling the pixel values to the range of 0, 10, 10, 1.
[0025] Based on the normalization results, the image is standardized according to the calibrated channel mean and standard deviation to make it conform to the model input standard;
[0026] The standardized image is converted into tensor form and the channel order is adjusted to generate a calibrated image that conforms to the input specifications of the anomaly detection model.
[0027] As an improvement to the above scheme, the anomaly detection method of the PatchCore anomaly detection model includes:
[0028] The calibrated image is input into the feature extraction network to extract multi-level deep feature representations;
[0029] The extracted deep feature representations are subjected to dimensionality reduction and feature embedding to reduce redundancy and highlight key features;
[0030] The processed key features are compared with the core feature library constructed from normal samples to generate an anomaly score corresponding to each pixel position, thus obtaining the pixel-level features of the anomaly detection image.
[0031] Thresholding and spatial smoothing are performed on pixel-level features to obtain the detection results of abnormal regions;
[0032] The detection results are overlaid on the original image to generate a visualized disaster detection image.
[0033] As an improvement to the above scheme, the visual-language alignment method of the Q-Former visual-language alignment model includes:
[0034] Pixel-level features are input into the visual-language alignment model Q-Former, and contextual features of each pixel-level feature are extracted through a self-attention mechanism.
[0035] The extracted contextual features are interactively calculated with the trained disaster domain cue vector to achieve the association between pixel-level features and disaster domain knowledge;
[0036] The features obtained from interactive computation are weighted, fused, and linearly mapped to generate a low-dimensional and compact visual representation.
[0037] The visual representation is normalized to obtain visual features aligned with disaster information.
[0038] As an improvement to the above scheme, the data processing method for fully connected networks includes:
[0039] Visual features aligned with disaster information are input into the input layer of a fully connected network and linearly mapped to adjust the feature dimensions.
[0040] The mapped features are subjected to nonlinear activation processing to enhance their expressive power.
[0041] The activated features are subjected to layer-by-layer linear transformation and normalization to form a compact feature vector;
[0042] After iterative processing through multiple fully connected layers, the feature vectors are finally linearly projected.
[0043] Based on the projection results, soft cue projection vectors are generated for large-scale language models, which can be used to guide the generation of geological disaster analysis text.
[0044] The intelligent analysis method for geological hazards, and the data processing method of the large-scale language model Qwen2.0, include:
[0045] Input the soft cue projection vector and the calibrated cue template into the large-scale language model Qwen2.0;
[0046] In the model's encoding layer, the soft cue projection vector is fused with the cue template as contextual information to form an enhanced initial semantic representation;
[0047] With the help of self-attention and cross-layer information interaction, the enhanced initial semantic representation is progressively extended into context and features are passed on.
[0048] The decoder layer converts the context-expanded semantic representation into serialized text features.
[0049] During the generation process, the structured information in the prompt template and the domain knowledge constraints are combined to predict and select each output tag;
[0050] Finally, the output is a geological hazard analysis text that conforms to the standards in the field of geological hazards.
[0051] As an improvement to the above scheme, the training methods for the geological hazard analysis model include:
[0052] Collect sample images of the disaster area, including normal images of the area before the disaster as the training set and abnormal images of the area after the disaster as the test set;
[0053] PatchCore anomaly detection model training: For normal image data, the PatchCore anomaly detection model is trained to achieve automatic detection of abnormal regions without manual annotation. Through this process, pixel-level features corresponding to abnormal images and their visualized detection results can be obtained.
[0054] Furthermore, the pixel-level features of the obtained anomalous images are input into a pre-trained MiniGPT-4 model to generate corresponding text descriptions, thereby constructing a dataset of "image-text pairs" of anomalous images to avoid relying on large-scale manual text annotation.
[0055] Training the visual-language alignment model Q-Former: The visual-language alignment model Q-Former is trained using the "image-text pair" dataset as training samples. Specifically, the pixel-level features of the abnormal image are fused with the initialized cue vector through a cross-attention mechanism to obtain a visual cue vector containing disaster features. Subsequently, the visual cue vector is aligned with the corresponding text features and optimized through contrastive loss to obtain a disaster domain cue vector with strong semantic relevance in the disaster domain.
[0056] Joint training of the visual-language alignment model Q-Former and the large-scale language model Qwen2.0: After obtaining the disaster domain cue vector, it is jointly trained with the large-scale language model Qwen2.0. Specifically, pixel-level features and the disaster domain cue vector optimized by the first training are fused again through a cross-attention mechanism to generate a low-dimensional and compact visual representation. After normalization of this visual representation, visual features aligned with disaster semantics are obtained and input into a fully connected network to generate soft cue projection vectors. Subsequently, the soft cue projection vectors and the calibrated cue templates are input into the large-scale language model Qwen2.0 to generate geological disaster analysis text. Based on the similarity constraints between the generated text and the reference text in the "image-text pair" dataset, Q-Former and the fully connected network are further optimized, thereby improving the model's generation accuracy in disaster analysis tasks.
[0057] As an improvement to the above scheme, the images to be processed can be acquired through satellite imagery, drone inspection images, or aerial photography images. The geological disaster analysis text includes the disaster type, spatial distribution, affected area, potential impact, and hazard level.
[0058] Secondly, this invention discloses a geological disaster intelligent analysis system based on a large-scale language model, which uses the geological disaster intelligent analysis method disclosed in the first aspect.
[0059] (3) Beneficial effects
[0060] 1. This invention introduces an anomaly detection model to perform unsupervised feature modeling of disaster images, which can automatically identify potential anomaly areas and generate pixel-level feature information. This effectively alleviates the problems of scarce geological disaster samples and high annotation costs, improves the automation and generalization capabilities of disaster identification, and solves the problems of low discrimination efficiency, susceptibility to subjective influence, and lack of automated text analysis technology in existing disaster discrimination systems.
[0061] 2. This invention automatically generates text descriptions of disaster areas by inputting pixel-level features of disaster images into the MiniGPT-4 visual language model, thereby constructing a "text-image pair" dataset of geological disaster images. This method requires no manual annotation, can efficiently generate standardized training corpora at low cost, effectively alleviates the problem of scarce geological disaster samples, and improves the consistency and reliability of cross-disaster and multi-scenario analysis.
[0062] 3. This invention utilizes the visual-language alignment module Q-Former to learn the alignment between disaster image features and disaster text features. Supported by a learnable fully connected network, it integrates a frozen large-scale language model, Qwen2.0, to fine-tune the recognition and analysis process of geological disaster images, thereby establishing a high-precision mapping relationship between multimodal features and semantics. This method not only improves the model's recognition accuracy under complex terrain conditions but also achieves fine-grained feature representation and deep semantic understanding of disaster areas.
[0063] 4. The intelligent geological disaster analysis system constructed by this invention can automatically generate structured analysis text that conforms to the domain specifications from the input geological disaster images, and can be extended to various geological disaster scenarios such as landslides, debris flows, and collapses, realizing the systematization and automation of disaster identification and analysis, and has broad engineering application value. Attached Figure Description
[0064] Figure 1 This is a module diagram of a geological disaster intelligent analysis model based on a large-scale language model.
[0065] Figure 2 This is a flowchart of the image preprocessing method.
[0066] Figure 3 This is a flowchart of the anomaly detection method of the PatchCore anomaly detection model.
[0067] Figure 4 This is a flowchart of the visual language alignment method of the Q-Former visual language alignment model.
[0068] Figure 5 This is a flowchart of the data processing method for fully connected networks.
[0069] Figure 6 This is a flowchart of the data processing method for the large-scale language model Qwen2.0.
[0070] Figure 7 This is a flowchart of the training process for a geological disaster analysis model. Detailed Implementation
[0071] The technical solutions in the embodiments of the present invention will be clearly and completely described below. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0072] It should be noted that when a component is said to be "installed on" another component, it can be directly on the other component or it may be in a component that is centered on it. When a component is said to be "set on" another component, it can be directly set on the other component or it may also be in a component that is centered on it. When a component is said to be "fixed to" another component, it can be directly fixed to the other component or it may also be in a component that is centered on it.
[0073] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. The terminology used herein in the specification of this invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "or / and" as used herein includes any and all combinations of one or more of the associated listed items.
[0074] This invention provides an intelligent geological disaster analysis method based on a large-scale language model. The operation steps of the method are as follows: The image to be processed within the target area is input into a trained geological disaster analysis model, which outputs geological disaster analysis text conforming to geological disaster field standards, as well as visualized disaster detection images. This geological disaster analysis method can be applied to real-time monitoring, disaster assessment, and emergency decision support for geological disasters. Please refer to [link to relevant documentation]. Figure 1 , Figure 1 This is a module diagram of a geological disaster intelligent analysis model, which includes:
[0075] 1. Image acquisition module, used to acquire images to be processed within the target area;
[0076] 2. Data preprocessing module, used to process the image to be processed into a calibrated format image;
[0077] III. PatchCore, an anomaly detection model, is used to output pixel-level features and visualized disaster detection images based on the formatted images.
[0078] IV. The visual language alignment model Q-Former is used to output visual features that are aligned with disaster information based on pixel-level features;
[0079] 5. Fully connected network, used to output soft cue projection vectors based on visual features;
[0080] VI. The large-scale language model Qwen2.0 is used to output the geological disaster analysis text based on the soft cue projection vector and the calibrated cue template.
[0081] This invention significantly improves the model's recognition accuracy in complex terrain and diverse disaster scenarios, reduces misjudgments and missed judgments caused by background interference and sample imbalance, and solves the technical problems of existing disaster discrimination systems that rely on human experience, lack cross-modal semantic understanding, and have insufficient robustness in recognizing new disasters.
[0082] The following sections will introduce each module in detail.
[0083] I. Image Acquisition Module.
[0084] The system acquires images of the target area to be processed, and the data sources may include satellite remote sensing platforms, UAV aerial photography systems, and aerial photography equipment. In this embodiment, the system acquires images of the target area from multiple sensing platforms, including satellites, UAVs, and aerial photography.
[0085] II. Data Preprocessing Module.
[0086] The acquired image first undergoes preprocessing operations, including but not limited to geometric correction, radiometric correction, denoising, and color balancing, to eliminate distortion and brightness differences caused by different sensors and imaging conditions, ensuring the accuracy of subsequent analysis. In this embodiment, due to differences in imaging conditions between different sensors, the image to be processed may contain geometric distortion, radiometric bias, and noise. Through processing methods such as geometric correction, radiometric correction, denoising, and color balancing, the system standardizes the image to be processed, transforming it into a calibrated format image, that is, transforming the image to be processed into an image that the anomaly detection model can receive. This ensures the stability and reliability of subsequent feature extraction. The principle of this stage is to use image processing algorithms to eliminate interference from non-hazardous factors, ensuring that the images received by the model maintain consistency in statistical characteristics. Please refer to [link to relevant documentation]. Figure 2 , Figure 2 This is a flowchart of an image preprocessing method, which includes:
[0087] The image to be processed is resized so that its shorter side is proportionally scaled to the preset length.
[0088] Perform a center crop on the scaled image to obtain a fixed-resolution image region;
[0089] Perform color space unification and data type conversion on the cropped image to ensure that the image is in three-channel floating-point format;
[0090] The converted image is normalized by scaling the pixel values to the range of 0, 10, 10, 1.
[0091] Based on the normalization results, the standardization process is performed according to the preset channel mean and standard deviation to make it conform to the model input distribution;
[0092] The standardized image is converted into tensor form, and the channel order is adjusted to generate a calibrated image that conforms to the input specifications of the anomaly detection model.
[0093] III. Anomaly Detection Model.
[0094] The calibrated image format is input into the anomaly detection model. The model automatically identifies potentially anomalous geological image regions by constructing the regional feature distribution of normal geological images and detecting deviation patterns. The output includes pixel-level feature information and a visualized disaster image, annotating the location, boundaries, and morphological features of potential disasters. This step effectively solves the problems of high cost and heavy reliance on human experience in high-quality annotation, laying the foundation for fully automated analysis. The anomaly detection model used in this embodiment is PatchCore, which is based on high-dimensional feature representation and nearest neighbor retrieval principles, enabling automatic identification of potential anomalous regions within the calibrated image. Specifically, PatchCore first extracts features from the input normal geological image to construct a local feature library. Then, it performs nearest neighbor matching on the feature library samples to calculate the anomaly score for each pixel or region, thereby generating pixel-level feature information and a visualized disaster detection image. Please refer to [link to relevant documentation]. Figure 3 , Figure 3 This is a flowchart of the anomaly detection method of the PatchCore anomaly detection model. The anomaly detection method of the PatchCore anomaly detection model includes:
[0095] The formatted image is input into a pre-trained feature extraction network to extract multi-level deep feature representations;
[0096] The extracted features are subjected to dimensionality reduction and feature embedding to reduce redundancy and highlight key features;
[0097] In the reduced feature space, a core feature library is constructed based on existing normal samples. The processing features of the format image are compared with the feature library to generate an anomaly score corresponding to each pixel position, thus obtaining the pixel-level features of the anomaly detection image.
[0098] Thresholding and spatial smoothing are performed on pixel-level features to obtain the detection results of abnormal regions;
[0099] The detection results are overlaid on the original image to generate a visualized disaster detection image.
[0100] IV. Visual Language Alignment Model Q-Former.
[0101] The visual language alignment model can transform pixel-level feature information into visual features aligned with disaster information. In this embodiment, the visual language alignment module used is Q-Former. Please refer to [link / reference]. Figure 4 , Figure 4 The flowchart illustrates the visual language alignment method of the Q-Former visual language alignment model, which includes:
[0102] Pixel-level features are input into the visual-language alignment model Q-Former, and the contextual information of each pixel is extracted through a self-attention mechanism.
[0103] The extracted contextual features are interactively calculated with the trained disaster domain cue vector to achieve the association between pixel-level features and disaster domain knowledge;
[0104] The features obtained from interactive computation are weighted, fused, and linearly mapped to generate a low-dimensional and compact visual representation.
[0105] The visual representation is normalized to obtain visual features aligned with disaster information.
[0106] V. Fully Connected Networks.
[0107] In this embodiment, a learnable fully connected network is used to convert visual features aligned with disaster information into soft cue vectors. The fully connected network is connected to both a Q-Former and a large-scale language model. The fully connected network converts the visual disaster information generated by the Q-Former into soft cue vectors and then transmits these soft cue vectors to the large-scale language model. (See also...) Figure 5 , Figure 5 This is a flowchart of a data processing method for a fully connected network, which includes:
[0108] Visual features aligned with disaster information are input into the input layer of a fully connected network and linearly mapped to adjust the feature dimensions.
[0109] The mapped features are subjected to nonlinear activation processing to enhance their expressive power.
[0110] The activated features are subjected to layer-by-layer linear transformation and normalization to form a compact feature vector;
[0111] After iterative processing through multiple fully connected layers, the feature vectors are finally linearly projected.
[0112] Based on the projection results, soft cue projection vectors are generated for large-scale language models, which can be used to guide the generation of geological disaster analysis text.
[0113] VI. Large-scale language model Qwen2.0.
[0114] Large-scale language models can generate structured analytical text conforming to the standards of the geological hazard field based on soft cue projections of the input and under the control of calibrated cue templates. The text content includes hazard type, spatial distribution, affected area, potential impact, and hazard level, and can generate brief reports or detailed analysis reports. The final generated analytical text can be directly applied to emergency command systems, risk assessment platforms, and disaster early warning systems to achieve intelligent identification, professional interpretation, and rapid response to geological hazards.
[0115] In this embodiment, the prompt template used is an input text, which can be entered manually or pre-written. For example, a value of 0 represents a non-flood area, which can be regarded as the background; a value of 1 represents a flood area, and the output text is the disaster cause factors and disaster impact analysis text.
[0116] This embodiment employs Qwen2.0 as a large-scale language model, mapping visual features to a language understanding space through a multimodal interface to achieve comprehensive analysis of geological hazards. The model can perform semantic reasoning based on image pattern information and knowledge of the geological hazard domain, identifying hazard types, assessing the affected area, determining potential impact zones, and analyzing hazard levels. Simultaneously, utilizing spatial constraints provided by pixel-level feature information makes the judgment results more accurate, significantly improving the model's robustness and reliability in complex terrain and diverse hazard scenarios. Please refer to [link / reference]. Figure 6 , Figure 6 The flowchart illustrates the data processing method for the large-scale language model Qwen2.0, which includes:
[0117] Input the soft cue projection vector and the calibrated cue template into the large-scale language model Qwen2.0;
[0118] In the model's encoding layer, the soft cue projection vector is fused with the cue template as contextual information to form an enhanced initial semantic representation;
[0119] With the help of self-attention and cross-layer information interaction, the fused semantic representation is progressively extended into context and features are passed through.
[0120] The decoder layer converts the context-expanded semantic representation into serialized text features.
[0121] During the generation process, the structured information in the prompt template and the domain knowledge constraints are combined to predict and select each output tag;
[0122] Finally, the output is a geological hazard analysis text that conforms to the standards in the field of geological hazards.
[0123] The original Qwen2.0 is a large-scale language model framework open-sourced by Alibaba Cloud, belonging to publicly available technology. Its basic structure, training methods, and model interfaces have all been disclosed through public channels (such as GitHub and technical documentation). This invention builds upon it with structural and functional optimizations. Specifically, at the input end, a new learnable fully connected network is added to map the fine-grained disaster feature vectors output by Q-Former to a unified semantic space, achieving the fusion of visual features and linguistic semantics. At the output end, a disaster analysis task head is added for disaster type classification, disaster area assessment, potential impact area analysis, and hazard level determination. Through these optimizations, the model can efficiently transform multimodal features into structured and professional geological disaster analysis text, achieving accurate identification and comprehensive assessment of geological disasters.
[0124] The Qwen2.0 framework can be viewed as a highly modular, large-scale language model system. It centers on basic language models, encompassing different scales ranging from hundreds of millions to tens of billions of parameters (such as 7B, 13B, and 72B), capable of handling multilingual text understanding and generation tasks. Through pre-training, these basic models acquire rich linguistic knowledge and semantic relationships. In the instruction fine-tuning stage, the models further learn how to generate high-quality responses according to specific task requirements, such as question answering, dialogue, and summary generation. This hierarchical training approach enables the model to possess both broad general-purpose capabilities and excellent performance on specific tasks.
[0125] Beyond its basic language capabilities, Qwen2.0 introduces multimodal processing capabilities (Qwen2-VL), enabling it to accept visual input such as images and videos and combine them with text information for understanding and generation. For example, it can analyze image content to answer questions or generate corresponding image information based on text descriptions, which has significant advantages in scenarios such as visual question answering, document understanding, and image content generation. Simultaneously, Qwen2.0 employs a Mixture-of-Experts (MoE) architecture, dynamically selecting some experts to participate in computation during the inference process. This ensures computational efficiency while expanding model capacity, allowing large-scale models to run efficiently even on limited hardware resources.
[0126] The entire workflow begins at the input end. Whether it's plain text or multimodal information, it's first converted into tokens or feature sequences that the model can process through a unified tokenization and encoding interface (such as AutoTokenizer). Then, the model calculates the prediction or generation results. Model configuration (AutoConfig) and loading (AutoModel) provide flexible parameter management and device adaptation capabilities, enabling the same model to run efficiently on CPUs, GPUs, or multi-node clusters. Finally, through high-performance deployment frameworks (such as vLLM) or application integration tools (such as LangChain), these model capabilities are applied to real-world scenarios to realize various functions such as question answering systems, chatbots, document analysis, and multimodal content understanding.
[0127] Through the organic synergy of the above modules, the geological hazard analysis text generation process can achieve deep fusion and professional expression of multimodal information. The geological hazard analysis model used in this embodiment is a pre-trained geological hazard analysis model. Please refer to... Figure 7 , Figure 7 The flowchart illustrates the training process for a geological hazard analysis model. The training method for the geological hazard analysis model includes:
[0128] Collect sample images of the disaster area, including normal images of the area before the disaster as the training set and abnormal images of the area after the disaster as the test set;
[0129] Anomaly detection model PatchCore training: The PatchCore anomaly detection model is trained on normal image data to achieve automatic detection of anomaly regions without manual annotation. This process yields pixel-level features corresponding to anomaly images and their visualized detection results.
[0130] Furthermore, the pixel-level features of the obtained anomalous images are input into a pre-trained MiniGPT-4 model to generate corresponding text descriptions, thereby constructing a dataset of "image-text pairs" of anomalous images to avoid relying on large-scale manual text annotation.
[0131] Training the Q-Former visual-language alignment model: After obtaining the "image-text pair" dataset, the Q-Former visual-language alignment model is trained using it as training samples. Specifically, pixel-level features of the images are fused with the initialized cue vector through a cross-attention mechanism to obtain a visual cue vector containing disaster features. Subsequently, this visual cue vector is aligned with the corresponding text features and optimized using contrastive loss, thereby obtaining a disaster domain cue vector with strong semantic relevance in the disaster domain.
[0132] Joint Training of the Visual-Language Alignment Model Q-Former and the Large-Scale Language Model Qwen2.0: After obtaining the disaster domain cue vectors, they are jointly trained with the large-scale language model Qwen2.0. Specifically, pixel-level features are fused with the disaster domain cue vectors optimized in the first training stage again through a cross-attention mechanism to generate a low-dimensional and compact visual representation. After normalization of this visual representation, visual features aligned with disaster semantics are obtained and input into a fully connected network to generate soft cue projection vectors. Subsequently, the soft cue projection vectors and the calibrated cue templates are input into the large-scale language model Qwen2.0 to generate geological disaster analysis text. Based on the similarity constraints between the generated text and the reference text in the "image-text pair" dataset, Q-Former and the fully connected network are further optimized, thereby improving the model's generation accuracy in disaster analysis tasks.
[0133] Based on the above embodiments, this invention also proposes a geological disaster intelligent analysis system based on a large-scale language model, which uses the geological disaster intelligent analysis method based on a large-scale language model.
[0134] In summary, the key innovations of this invention are as follows:
[0135] 1. Anomaly detection and image-text pair generation: The PatchCore anomaly detection model is used to identify potential disaster areas in the target region, and the pixel-level features are input into the MiniGPT-4 visual language model to automatically generate corresponding text descriptions, thus constructing an image-text pair dataset to provide high-quality corpus for multimodal training.
[0136] 2. Multimodal feature fusion: The Q-Former visual language alignment module is used to train the image and text data to achieve fine-grained alignment of visual features with text semantics, generating semantically consistent disaster feature vectors and improving the accuracy and robustness of cross-modal information fusion.
[0137] 3. Fine-tuning of large-scale language models: By combining the trained Q-Former output with a learnable fully connected network, fine-tuning is performed on the frozen large-scale language model Qwen2.0 to achieve a unified mapping between visual features and language semantics, thereby optimizing the performance of disaster identification and analysis.
[0138] 4. Structured disaster text generation: Based on a fine-tuned large-scale language model, the disaster feature vectors fused from multiple modalities are transformed into structured analysis texts that conform to the standards of the geological disaster field, including disaster type, spatial distribution, affected area, potential impact and hazard level, so as to realize the generation of automated and standardized professional analysis reports.
[0139] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0140] The embodiments described above are merely illustrative of several implementations of the present invention, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these all fall within the protection scope of the present invention. Therefore, the protection scope of this invention patent should be determined by the appended claims.
Claims
1. A method for intelligent analysis of geological hazards based on a large-scale language model, characterized in that, The image to be processed within the target area is input into the trained geological hazard analysis model, which outputs geological hazard analysis text that conforms to the standards of the geological hazard field and visualized hazard detection images. Geological hazard analysis models include: The data preprocessing module is used to process the image to be processed into a calibrated image format; The PatchCore anomaly detection model is used to output pixel-level features and visualized disaster detection images based on formatted images. The visual language alignment model Q-Former is used to output visual features aligned with disaster information based on pixel-level features; A fully connected network is used to output soft cue projection vectors based on visual features; The large-scale language model Qwen2.0 is used to output the geological disaster analysis text based on the soft cue projection vector and the calibrated cue template; Training methods for geological hazard analysis models include: Collect sample images of the disaster area, including normal images of the area before the disaster as the training set and abnormal images of the area after the disaster as the test set; PatchCore anomaly detection model training: For normal image data, the PatchCore anomaly detection model is trained to achieve automatic detection of abnormal regions without manual annotation, and can obtain pixel-level features and visualization detection results corresponding to abnormal images. Furthermore, the pixel-level features of the obtained anomalous images are input into the pre-trained MiniGPT-4 model to generate corresponding text descriptions, thereby constructing a dataset of image-text pairs of anomalous images to avoid relying on large-scale manual text annotation. Training the visual-language alignment model Q-Former: The visual-language alignment model Q-Former is trained using the image-text pair dataset as training samples. Specifically, the pixel-level features of the abnormal image are fused with the initialized cue vector through a cross-attention mechanism to obtain a visual cue vector containing disaster features. Subsequently, the visual cue vector is aligned with the corresponding text features and optimized through contrastive loss to obtain a disaster domain cue vector with strong semantic relevance in the disaster domain. Joint training of the visual-language alignment model Q-Former and the large-scale language model Qwen2.0: After obtaining the disaster domain cue vector, it is jointly trained with the large-scale language model Qwen2.
0. Specifically, pixel-level features and the disaster domain cue vector optimized by the first training are fused again through a cross-attention mechanism to generate a low-dimensional and compact visual representation. After normalization of this visual representation, visual features aligned with disaster semantics are obtained and input into a fully connected network to generate soft cue projection vectors. Subsequently, the soft cue projection vectors and the labeled cue templates are input into the large-scale language model Qwen2.0 to generate geological disaster analysis text. Based on the similarity constraints between the generated text and the reference text in the image-text dataset, Q-Former and the fully connected network are further optimized.
2. The intelligent geological disaster analysis method based on a large-scale language model according to claim 1, characterized in that, The data preprocessing methods in the data preprocessing module include: The image to be processed is resized so that its shorter side is proportionally scaled to the specified length. Perform a center crop on the scaled image to obtain a fixed-resolution image region; Perform color space unification and data type conversion on the cropped image to ensure that the image is in three-channel floating-point format; The converted image is normalized by scaling the pixel values to the 0-1 range; Based on the normalization results, the image is standardized according to the calibrated channel mean and standard deviation to make it conform to the model input standard; The standardized image is converted into tensor form, and the channel order is adjusted to generate a calibrated image that conforms to the input specifications of the anomaly detection model.
3. The intelligent geological disaster analysis method based on a large-scale language model according to claim 1, characterized in that, The anomaly detection methods of the PatchCore anomaly detection model include: The calibrated image is input into the feature extraction network to extract multi-level deep feature representations; The extracted deep feature representations are subjected to dimensionality reduction and feature embedding to reduce redundancy and highlight key features; The processed key features are compared with the core feature library constructed from normal samples to generate an anomaly score corresponding to each pixel position, thus obtaining the pixel-level features of the anomaly detection image. Thresholding and spatial smoothing are performed on pixel-level features to obtain the detection results of abnormal regions; The detection results are overlaid on the original image to generate a visualized disaster detection image.
4. The intelligent geological disaster analysis method based on a large-scale language model according to claim 1, characterized in that, The visual-language alignment method of the Q-Former visual-language alignment model includes: Pixel-level features are input into the visual-language alignment model Q-Former, and contextual features of each pixel-level feature are extracted through a self-attention mechanism. The extracted contextual features are interactively calculated with the trained disaster domain cue vector to achieve the association between pixel-level features and disaster domain knowledge; The features obtained from interactive computation are weighted, fused, and linearly mapped to generate a low-dimensional and compact visual representation. The visual representation is normalized to obtain visual features aligned with disaster information.
5. The intelligent geological disaster analysis method based on a large-scale language model according to claim 1, characterized in that, Data processing methods for fully connected networks include: Visual features aligned with disaster information are input into the input layer of a fully connected network and linearly mapped to adjust the feature dimensions. The mapped features are subjected to nonlinear activation processing to enhance their expressive power. The activated features are subjected to layer-by-layer linear transformation and normalization to form a compact feature vector; After iterative processing through multiple fully connected layers, the feature vectors are finally linearly projected. Based on the projection results, soft cue projection vectors are generated for use in large-scale language models to guide the generation of geological disaster analysis text.
6. The intelligent geological disaster analysis method based on a large-scale language model according to claim 1, characterized in that, The data processing methods for the large-scale language model Qwen2.0 include: Input the soft cue projection vector and the calibrated cue template into the large-scale language model Qwen2.0; In the model's encoding layer, the soft cue projection vector is fused with the cue template as contextual information to form an enhanced initial semantic representation; With the help of self-attention and cross-layer information interaction, the enhanced initial semantic representation is progressively extended into context and features are passed on. The decoder layer converts the context-expanded semantic representation into serialized text features. During the generation process, the structured information in the prompt template and the domain knowledge constraints are combined to predict and select each output tag; Finally, the output is a geological hazard analysis text that conforms to the standards in the field of geological hazards.
7. The intelligent geological disaster analysis method based on a large-scale language model according to claim 1, characterized in that, The images to be processed can be acquired through satellite imagery, drone inspection images, or aerial photography.
8. The intelligent geological disaster analysis method based on a large-scale language model according to claim 1, characterized in that, Geological hazard analysis texts include hazard type, spatial distribution, affected area, potential impact, and hazard level.
9. A geological disaster intelligent analysis system based on a large-scale language model, characterized in that, It employs the intelligent geological disaster analysis method based on a large-scale language model as described in any one of claims 1-8.