A landslide disaster susceptibility prediction method based on catboost-dnn

By combining the CatBoost model and deep neural networks, the problems of low efficiency and overfitting in landslide susceptibility prediction in existing technologies are solved, achieving high-precision landslide susceptibility prediction and visualization results, and improving the accuracy and reliability of prediction.

CN122241582APending Publication Date: 2026-06-19重庆对外经贸学院

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
重庆对外经贸学院
Filing Date
2026-03-19
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing methods for predicting landslide hazard susceptibility suffer from low computational efficiency, long model training time, and a tendency to overfit when faced with complex nonlinear relationships and high-dimensional data. This results in insufficient generalization ability of the models in practical applications, making it difficult to fully utilize the complex relationships in multi-source data and affecting prediction accuracy and reliability.

Method used

A method combining CatBoost model for feature extraction and deep neural network (DNN) is adopted. Through preprocessing of multi-source data, collinearity judgment, feature extraction and fusion, a deep neural network model is constructed, and the trained model is used to predict landslide susceptibility.

🎯Benefits of technology

It achieves high-precision prediction of landslide hazard susceptibility, has strong generalization ability, and the output prediction results are intuitive and clear, making them easy for decision-makers to understand and apply, thus providing strong support.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241582A_ABST
    Figure CN122241582A_ABST
Patent Text Reader

Abstract

This invention discloses a landslide susceptibility prediction method based on CatBoost-DNN, comprising: acquiring multi-source data of a landslide susceptibility study area; extracting susceptibility-causing factors from the multi-source data; performing multicollinearity judgment on the susceptibility-causing factors to obtain noncollinearity-causing factors; extracting features from the susceptibility-causing factors using a CatBoost model based on the noncollinearity-causing factors to construct a new feature set; constructing a deep neural network model and training the deep neural network model using the new feature set; and using the trained deep neural network model to predict the landslide susceptibility of the study area, outputting the landslide susceptibility prediction result. This invention achieves efficient and accurate assessment of landslide susceptibility by combining the feature extraction capability of the CatBoost model with the powerful fitting capability of deep neural networks.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of intelligent prediction of geological disasters, and in particular relates to a landslide disaster susceptibility prediction method based on CatBoost-DNN. Background Technology

[0002] Landslides, due to their complex formation mechanisms, wide distribution, and destructive power, are among the most destructive geological hazards in the world. Existing methods for predicting landslide susceptibility are mainly based on traditional statistical methods or single machine learning models. Traditional methods, such as logistic regression and decision trees, while easy to operate and possessing excellent performance, often fail to achieve ideal predictive results when faced with complex nonlinear relationships and high-dimensional data. In recent years, with the rapid development of machine learning technology, its application in geological hazard assessment has become increasingly widespread. However, single machine learning models, such as random forests and support vector machines, may suffer from low computational efficiency and long training times when processing large-scale datasets, and are prone to overfitting, resulting in insufficient generalization ability in practical applications. Furthermore, existing technologies have shortcomings in feature extraction and model ensemble, making it difficult to fully utilize the complex relationships in multi-source data, thus affecting the accuracy and reliability of landslide susceptibility prediction. Therefore, this invention aims to provide a landslide susceptibility prediction method based on CatBoost-DNN, which significantly improves the accuracy and robustness of landslide susceptibility assessment through feature extraction and fusion, model training and optimization. Summary of the Invention

[0003] To address the aforementioned technical problems, this invention proposes a landslide hazard susceptibility prediction method based on CatBoost-DNN. This method combines the feature extraction capability of the CatBoost model with the powerful fitting capability of deep neural networks to achieve efficient and accurate assessment of landslide hazards.

[0004] To achieve the above objectives, this invention provides a landslide hazard susceptibility prediction method based on CatBoost-DNN, comprising: Acquire multi-source data of the landslide disaster research area and extract disaster-causing factors from the multi-source data; Multicollinearity is determined from the disaster-causing factors to identify noncollinear disaster-causing factors. Based on the aforementioned noncollinear disaster-causing factors, the CatBoost model is used to extract features from the disaster-causing factors and construct a new feature set. Construct a deep neural network model and train the deep neural network model using the new feature set; A trained deep neural network model is used to predict the susceptibility of landslides in the study area, and the prediction results of landslide disaster susceptibility are output.

[0005] Optionally, extracting disaster-causing factors from the multi-source data includes: The multi-source data is preprocessed to obtain preprocessed multi-source data; The preprocessed multi-source data is subjected to unified data processing to obtain unified data. The unified data processing includes unifying the spatial resolution, projected coordinate system, and geographic coordinate system of the preprocessed multi-source data. Disaster-causing factors are extracted from the unified data.

[0006] Optionally, the disaster-causing factors include: elevation, aspect, plane curvature, profile curvature, distance from the river, land use, vegetation normalization index, rainfall, slope, and topographic humidity index.

[0007] Optionally, multicollinearity is determined for the disaster-causing factors to obtain noncollinear disaster-causing factors, including: Calculate the variance inflation coefficient of the disaster-causing factor; Based on the threshold of the variance inflation coefficient, when the variance inflation coefficient is greater than a preset value, it is determined that collinearity exists. Collinearity-causing factors are eliminated to obtain the non-collinearity-causing factors.

[0008] Optionally, based on the noncollinearity-causing factors, a new feature set is constructed by extracting features from the factors using the CatBoost model, including: The CatBoost model is trained using the aforementioned noncollinear hazard factors; The importance of the disaster-causing factors is predicted using a trained CatBoost model, and the prediction probability is extracted as a new feature based on the importance. The new feature set is constructed by concatenating the new feature with the original feature, i.e., the disaster-causing factor.

[0009] Optionally, building a deep neural network model includes constructing a neural network structure containing multiple hidden layers, and setting a Dropout layer and a BatchNorm layer after each hidden layer.

[0010] Optionally, training the deep neural network model using the new feature set includes: The new feature set is divided into a training set and a validation set; The deep neural network model was trained using the Adam optimization algorithm and the cross-entropy loss function. An early stopping mechanism is used during training. Training is stopped when the loss value of the validation set does not decrease within a preset number of consecutive iterations.

[0011] Optionally, the landslide hazard susceptibility prediction results may be output as follows: The trained deep neural network model is used to predict the susceptibility probability value of each grid cell in the study area. Based on the aforementioned susceptibility probability values, the study area was divided into five levels using the natural breakpoint method, including extremely low, low, medium, high, and extremely high susceptibility levels. A landslide susceptibility prediction map was drawn based on the classification results.

[0012] The present invention also provides an electronic device, a computer-readable storage medium storing computer-executable instructions; and one or more processors coupled to the computer-readable storage medium and configured to execute the computer-executable instructions such that the device performs a landslide hazard susceptibility prediction method based on CatBoost-DNN.

[0013] The present invention also provides a readable storage medium storing computer-executable instructions that, when executed by a processor, configure the processor to perform a landslide hazard susceptibility prediction method based on CatBoost-DNN.

[0014] Compared with the prior art, the present invention has the following advantages and technical effects: 1. High-precision prediction: Through feature extraction and fusion, the model can capture the complex nonlinear relationship between disaster-causing factors and landslide susceptibility, thereby achieving high-precision prediction.

[0015] 2. Strong generalization ability: The multi-layer structure and Dropout layer design of deep neural networks effectively prevent overfitting, enabling the model to have good generalization performance on different datasets.

[0016] 3. Comprehensive feature utilization: It not only utilizes the original disaster-causing factors, but also integrates the feature importance information extracted by the CatBoost model, making full use of the information in the data.

[0017] 4. Visualization Results: The output landslide susceptibility prediction map is intuitive and clear, making it easy for decision-makers to understand and apply, and providing strong support for the prevention and management of landslide disasters. Attached Figure Description

[0018] The accompanying drawings, which form part of this application, are used to provide a further understanding of this application. The illustrative embodiments and descriptions of this application are used to explain this application and do not constitute an undue limitation of this application. In the drawings: Figure 1 This is a flowchart of a landslide hazard susceptibility prediction method based on CatBoost-DNN according to an embodiment of the present invention; Figure 2 This is a landslide susceptibility prediction map according to an embodiment of the present invention; Figure 3 This is a schematic diagram of the ROC curve of an embodiment of the present invention; Figure 4 This is a schematic diagram of the four performance indicators of the model in an embodiment of the present invention. Detailed Implementation

[0019] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. This application will now be described in detail with reference to the accompanying drawings and embodiments.

[0020] It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although a logical order is shown in the flowchart, in some cases the steps shown or described may be executed in a different order than that shown here.

[0021] This embodiment proposes a landslide hazard susceptibility prediction method based on CatBoost-DNN, such as... Figure 1 As shown, the specific steps include: Acquire multi-source data of the landslide disaster research area and extract disaster-causing factors from the multi-source data; Multicollinearity is determined from the disaster-causing factors to identify noncollinear disaster-causing factors. Based on the aforementioned noncollinear disaster-causing factors, the CatBoost model is used to extract features from the disaster-causing factors and construct a new feature set. Construct a deep neural network model and train the deep neural network model using the new feature set; A trained deep neural network model is used to predict the susceptibility of landslides in the study area, and the prediction results of landslide disaster susceptibility are output.

[0022] Specifically, S1: Acquire multi-source data of the landslide disaster research area and unify the spatial resolution, projected coordinate system and geographic coordinate system of the multi-source data; S2: Identify the disaster-causing factors related to landslides and extract the disaster-causing factors from the multi-source data through a GIS platform; S3: Use multicollinearity to determine whether there is collinearity among the disaster-causing factors; if so, eliminate the disaster-causing factors that have collinearity to obtain the non-collinear disaster-causing factors. S4: Use the CatBoost model to extract features from the disaster-causing factors, and then concatenate the extracted features with the original features, i.e., the disaster-causing factors, to construct a new feature set; S5: Build a deep neural network (DNN) model and train the DNN model using a new feature set; S6: Use the trained DNN model to predict the landslide susceptibility of the study area and output a landslide disaster susceptibility prediction map.

[0023] Furthermore, extracting disaster-causing factors from the multi-source data includes: The multi-source data is preprocessed to obtain preprocessed multi-source data; The preprocessed multi-source data is subjected to unified data processing to obtain unified data. The unified data processing includes unifying the spatial resolution, projected coordinate system, and geographic coordinate system of the preprocessed multi-source data. Disaster-causing factors are extracted from the unified data.

[0024] Furthermore, the disaster-causing factors include: elevation, aspect, plane curvature, profile curvature, distance from the river, land use, vegetation normalization index, rainfall, slope, and topographic humidity index.

[0025] Furthermore, multicollinearity is determined for the aforementioned disaster-causing factors, and noncollinear disaster-causing factors are identified as including: Calculate the variance inflation coefficient of the disaster-causing factor; Based on the threshold of the variance inflation coefficient, when the variance inflation coefficient is greater than a preset value, it is determined that collinearity exists. Collinearity-causing factors are eliminated to obtain the non-collinearity-causing factors.

[0026] Furthermore, based on the aforementioned noncollinearity-causing factors, a new feature set is constructed by extracting features from the factors using the CatBoost model, including: The CatBoost model is trained using the aforementioned noncollinear hazard factors; The importance of the disaster-causing factors is predicted using a trained CatBoost model, and the prediction probability is extracted as a new feature based on the importance. The new feature set is constructed by concatenating the new feature with the original feature.

[0027] Specifically, in step S4, the CatBoost model is used for feature extraction. The specific steps are as follows: S41: Use the CatBoost model to train the disaster-causing factors; S42: Use the trained CatBoost model to predict the importance of disaster-causing factors and extract the prediction probability as a new feature; S43: The extracted predicted probability features are concatenated with the original features to form a new feature set.

[0028] Furthermore, building a deep neural network model involves constructing a neural network structure with multiple hidden layers, and setting a Dropout layer and a BatchNorm layer after each hidden layer to prevent overfitting and accelerate model convergence.

[0029] Furthermore, training the deep neural network model using the new feature set includes: The new feature set is divided into a training set and a validation set; The deep neural network model was trained using the Adam optimization algorithm and the cross-entropy loss function. An early stopping mechanism is used during training. Training is stopped when the loss value of the validation set does not decrease within a preset number of consecutive iterations.

[0030] Furthermore, the output of the landslide hazard susceptibility prediction results includes: The trained deep neural network model is used to predict the susceptibility probability value of each grid cell in the study area. Based on the aforementioned susceptibility probability values, the study area was divided into five levels using the natural breakpoint method, including extremely low, low, medium, high, and extremely high susceptibility levels. A landslide susceptibility prediction map was drawn based on the classification results.

[0031] The following is a detailed description of this embodiment with reference to the accompanying drawings: 1. Multi-source data acquisition and preprocessing: 1.1 Data Acquisition: Remote sensing imagery, topographic data, basic geological data, hydrological and meteorological data, and historical landslide disaster records for the study area were collected. These data sources included geospatial data cloud, China Geological Survey geological cloud, and land use type data published by Professor Huang Xin of Wuhan University.

[0032] 1.2 Data Preprocessing: The collected data undergoes preprocessing, including data cleaning, missing value imputation, and outlier handling, to ensure data quality and integrity. Excel spreadsheets are used to remove missing and outlier values, ensuring data integrity and accuracy.

[0033] 1.3: Data Unification: Unify the spatial resolution, projected coordinate system, and geographic coordinate system of multi-source data, and resample to create layers with the same raster size to ensure data consistency and comparability. Specifically, the collected data coordinate system is converted to WGS_1984_UTM_Zone_47N using ArcGIS 10.5's Data Management Tool—Projections and Transformations—Define Projection. Then, resampling is performed using Data Management Tools—Raster—Raster Processing—Resample, selecting the elevation raster as the reference image, setting the Output Cell Size to be the same as the elevation raster, and selecting "NEAREST" as the resample method.

[0034] 2. Extraction of disaster-causing factors: 2.1: Factor Determination: The influencing factors of landslide disasters are analyzed from three aspects: geological structure, meteorological and hydrological conditions, and human activities, to determine the disaster-causing factors related to landslides. These factors include elevation, slope aspect, plane curvature, profile curvature, distance from river, land use, normalized vegetation index (NDVI), rainfall, slope and topographic moisture index, lithology, and distance from road.

[0035] 2.2 Factor Extraction: Disaster-causing factors were extracted from multi-source data using a GIS platform. For example, digital elevation model (DEM) data was used to extract factors such as elevation, slope, and aspect; lithology and fault distance data came from the China Geological Survey Geological Cloud; road and hydrological data, rainfall, and NDVI data came from the Geospatial Data Cloud; and land use type data came from data published by Professor Huang Xin of Wuhan University.

[0036] 3. Multicollinearity test: 3.1: Multicollinearity Assessment: Multicollinearity is used to determine whether collinearity exists among hazard-causing factors, and the variance inflation factor (VIF) is calculated. The formula for calculating the VIF value is: ; in, It is the determination coefficient of the j-th disaster-causing factor on other disaster-causing factors.

[0037] 3.2 Factor Removal: If the VIF value is greater than 10, collinearity is considered to exist. Collinearity-causing factors are removed, resulting in non-collinearity-causing factors. The VIF values ​​of the hazard-causing factors listed in Table 1 are all less than 10, indicating that there is no significant collinearity among these factors, and therefore they can all be used for subsequent susceptibility prediction.

[0038] Table 1 4. Feature extraction and fusion: 4.1: CatBoost Model Training: The CatBoost model is used to train the catastrophic factors. The CatBoost model has advantages such as handling categorical variables, preventing overfitting, and automatic feature processing, and can effectively extract important features of catastrophic factors.

[0039] catboost_model = CatBoostClf(iterations=300, random_seed=42, verbose=0); Initialize the CatBoost classification model, set the number of iterations to 300, the random seed to 42, and turn off detailed log output: catboost_model.fit(X_train, y_train); The CatBoost model is trained using training data to fit the model parameters.

[0040] 4.2 Feature Extraction: The importance of disaster-causing factors is predicted using the trained CatBoost model, and the predicted probabilities are extracted as new features. The predicted probabilities of the CatBoost model can reflect the contribution of each disaster-causing factor to the occurrence of landslides.

[0041] 4.3 Feature Concatenation: The extracted predicted probability features are concatenated with the original features to form a new feature set. The new feature set not only contains information about the original catastrophic factors but also incorporates the feature importance information extracted by the CatBoost model, thereby enriching the feature representation and providing more comprehensive data support for subsequent deep neural network training.

[0042] 5. Deep Neural Network Model Construction and Training: 5.1: Model Construction: Construct a DNN model. The model structure includes an input layer, multiple hidden layers, a Dropout layer, a BatchNorm layer, and an output layer. The specific structure is as follows: 5.2: Input layer: Receives new feature sets.

[0043] 5.3: Hidden Layers: Contains multiple fully connected layers, each followed by a Dropout layer and a BatchNorm layer. The Dropout layer prevents overfitting, and the BatchNorm layer accelerates model convergence. hidden_size = 256, fixing the hidden layer size to 256.

[0044] 5.4: Output layer: Outputs the probability that each sample belongs to different levels of landslide susceptibility.

[0045] 5.5: Dataset Splitting: The new feature set is split into a training set and a validation set in a 7:3 ratio. The training set is used for model training, and the validation set is used for model validation and parameter tuning.

[0046] 5.6: Model Training: The DNN model is trained using the training set, employing the Adam optimization algorithm and cross-entropy loss function. An early stopping mechanism is used during training; training is stopped when the loss value on the validation set does not significantly decrease within a certain number of consecutive iterations to prevent overfitting.

[0047] 5.7: Parameter Tuning: The hyperparameters of the DNN model are tuned using Bayesian optimization algorithms, including the learning rate, weight decay coefficient, and dropout rate, to improve the model's predictive performance.

[0048] Specific parameter settings: scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=100); Define a learning rate scheduler using a cosine annealing strategy, with T_max set to 100, to control the learning rate period: num_epochs = 1000; Set the maximum number of training epochs to 1000, meaning a maximum of 1000 complete training cycles: best_auc = 0.0; Initialize the best AUC (Area Under Curve) value to 0 to record the model's best performance during training: best_model_path = 'best_model.pth'; Specify the optimal model save path as 'best_model.pth' to store the parameters of the model with the best performance: early_stopping_patience = 300, the patience value for the early stopping mechanism; The patience value for the early stopping mechanism is set to 300, meaning that training will stop if the model performance does not improve within 300 epochs: early_stopping_counter = 0; Initialize the early stopping counter to 0 to record the number of consecutive epochs in which model performance does not improve.

[0049] 6. Landslide susceptibility prediction and results output: 6.1: Landslide susceptibility prediction: The trained DNN model is used to predict the landslide susceptibility of the study area, and the susceptibility probability value corresponding to each grid cell is output.

[0050] 6.2: Result Classification: The susceptibility probability values ​​are arranged in ascending order and divided into five levels using the natural breakpoint method: extremely low, low, medium, high, and extremely high susceptibility levels. The natural breakpoint method can determine the threshold for level classification based on the natural distribution of the data, making the classification results more reasonable.

[0051] 6.3: Output Results: A landslide susceptibility prediction map is generated based on the susceptibility level to provide decision support for landslide prevention and management (e.g., ...). Figure 2 The prediction map visually illustrates the distribution of landslide susceptibility levels at different locations within the study area, which helps relevant departments formulate targeted disaster prevention and mitigation measures.

[0052] Experimental results show that this invention achieves excellent performance in predicting landslide susceptibility in Zogang County, with an AUC value reaching 0.8938 (e.g., Figure 3 The accuracy was 0.8206, the F1 score was 0.8204, and the recall was 0.8206 (e.g.). Figure 4 These results demonstrate the effectiveness and reliability of the invention in practical applications.

[0053] This embodiment also provides an electronic device, a computer-readable storage medium storing computer-executable instructions, and one or more processors coupled to the computer-readable storage medium and configured to execute the computer-executable instructions, such that the device performs a landslide hazard susceptibility prediction method based on CatBoost-DNN.

[0054] This embodiment also provides a readable storage medium storing computer-executable instructions, which, when executed by a processor, configure the processor to perform a landslide hazard susceptibility prediction method based on CatBoost-DNN.

[0055] The above are merely preferred embodiments of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A landslide hazard susceptibility prediction method based on CatBoost-DNN, characterized in that, include: Acquire multi-source data of the landslide disaster research area and extract disaster-causing factors from the multi-source data; Multicollinearity is determined from the disaster-causing factors to identify noncollinear disaster-causing factors. Based on the aforementioned noncollinear disaster-causing factors, the CatBoost model is used to extract features from the disaster-causing factors and construct a new feature set. Construct a deep neural network model and train the deep neural network model using the new feature set; A trained deep neural network model is used to predict the susceptibility of landslides in the study area, and the prediction results of landslide disaster susceptibility are output.

2. The landslide hazard susceptibility prediction method based on CatBoost-DNN according to claim 1, characterized in that, Extracting disaster-causing factors from the multi-source data includes: The multi-source data is preprocessed to obtain preprocessed multi-source data; The preprocessed multi-source data is subjected to unified data processing to obtain unified data. The unified data processing includes unifying the spatial resolution, projected coordinate system, and geographic coordinate system of the preprocessed multi-source data. Disaster-causing factors are extracted from the unified data.

3. The landslide hazard susceptibility prediction method based on CatBoost-DNN according to claim 1, characterized in that, The disaster-causing factors include: elevation, aspect, plane curvature, profile curvature, distance from the river, land use, vegetation normalization index, rainfall, slope, and topographic humidity index.

4. The landslide hazard susceptibility prediction method based on CatBoost-DNN according to claim 1, characterized in that, Multicollinearity was determined for the aforementioned disaster-causing factors, and noncollinear disaster-causing factors were identified as follows: Calculate the variance inflation coefficient of the disaster-causing factor; Based on the threshold of the variance inflation coefficient, when the variance inflation coefficient is greater than a preset value, it is determined that collinearity exists. Collinearity-causing factors are eliminated to obtain the non-collinearity-causing factors.

5. The landslide hazard susceptibility prediction method based on CatBoost-DNN according to claim 1, characterized in that, Based on the aforementioned noncollinearity-causing factors, features are extracted from these factors using the CatBoost model to construct a new feature set, including: The CatBoost model is trained using the aforementioned noncollinear hazard factors; The importance of the disaster-causing factors is predicted using a trained CatBoost model, and the prediction probability is extracted as a new feature based on the importance. The new feature set is constructed by concatenating the new feature with the original feature, i.e., the disaster-causing factor.

6. The landslide hazard susceptibility prediction method based on CatBoost-DNN according to claim 1, characterized in that, Building a deep neural network model involves constructing a neural network structure with multiple hidden layers, and setting a Dropout layer and a BatchNorm layer after each hidden layer.

7. The landslide hazard susceptibility prediction method based on CatBoost-DNN according to claim 1, characterized in that, Training the deep neural network model using the new feature set includes: The new feature set is divided into a training set and a validation set; The deep neural network model was trained using the Adam optimization algorithm and the cross-entropy loss function. An early stopping mechanism is used during training. Training is stopped when the loss value of the validation set does not decrease within a preset number of consecutive iterations.

8. The landslide hazard susceptibility prediction method based on CatBoost-DNN according to claim 1, characterized in that, The output of the landslide susceptibility prediction results includes: The trained deep neural network model is used to predict the susceptibility probability value of each grid cell in the study area. Based on the aforementioned susceptibility probability values, the study area was divided into five levels using the natural breakpoint method, including extremely low, low, medium, high, and extremely high susceptibility levels. A landslide susceptibility prediction map was drawn based on the classification results.

9. An electronic device, characterized in that, A computer-readable storage medium storing computer-executable instructions; and one or more processors coupled to the computer-readable storage medium and configured to execute the computer-executable instructions to cause the device to perform the method according to any one of claims 1-8.

10. A readable storage medium, characterized in that, The system stores computer-executable instructions that, when executed by a processor, configure the processor to perform the method according to any one of claims 1-8.