A town development boundary optimization method and device based on gradient boosting decision tree

By optimizing town boundaries using gradient boosting decision tree models and multi-source data, the applicability and interpretability issues of boundary identification in existing technologies are resolved, achieving accurate, continuous, and dynamic optimization of town boundaries and supporting planners' scientific decision-making.

CN122242835APending Publication Date: 2026-06-19GUANGZHOU URBAN PLANNING & DESIGN SURVEY RES INST

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGZHOU URBAN PLANNING & DESIGN SURVEY RES INST
Filing Date
2026-02-24
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing AI-based town boundary optimization methods suffer from poor applicability and difficulty in accurately capturing nonlinear relationships between features, leading to fragmented development, overall imbalance, and unexplainable decision-making issues in boundary adjustments.

Method used

A gradient boosting decision tree model is adopted, combined with multi-source data and manually verified town boundaries. The model parameters are optimized by iterative fitting of residuals, and the nonlinear mapping relationship between grid cell features and town boundaries is learned to construct a town boundary identification model. Combined with rigid constraints, dynamic efficiency and spatial correlation indicators, the accurate identification and optimization of town boundaries are achieved.

Benefits of technology

It improves the accuracy and applicability of urban boundary identification, enhances the model's recognition precision, achieves continuity and overall balance of urban boundaries, supports incremental learning and the interpretability of decision-making, and meets the rationality judgment needs of planners.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242835A_ABST
    Figure CN122242835A_ABST
Patent Text Reader

Abstract

This application relates to the field of urban development technology and discloses a method and apparatus for optimizing urban development boundaries based on gradient boosting decision trees. The method includes acquiring the initial boundary range of the target area and multi-source data including land parcel vector data, ecological red line data, and remote sensing vacant data; generating grid cell feature vectors based on the multi-source data; constructing a gradient boosting decision tree model, using historically verified urban boundaries as training labels and corresponding period grid cell feature vectors as training samples; iteratively fitting residuals to optimize model parameters until the output accuracy meets a preset accuracy threshold, thus obtaining an urban boundary identification model; inputting the current period grid cell feature vectors into the urban boundary identification model, and outputting the urban boundaries located within the initial boundary range. This application improves the applicability of urban development boundary delineation and achieves higher model accuracy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of urban development technology, and in particular to a method and apparatus for optimizing urban development boundaries based on gradient boosting decision trees. Background Technology

[0002] The scientific nature of urban development boundary delineation directly impacts the efficiency of intensive land resource utilization, ecological security, and the quality of urban functional improvement. Traditional delineation methods often rely on manual experience and linear analysis, leading to applicability issues such as fragmented development, overall imbalance, and disconnect from actual development needs. Against this backdrop, optimizing development boundaries by integrating multi-source spatial data and artificial intelligence technologies has become a research hotspot.

[0003] In existing technologies, AI-powered urban boundary optimization algorithms mostly employ single models such as linear regression and random forests. These models cannot accurately capture the nonlinear relationships between features and also suffer from problems such as "optimization results touching legal limits" and "fragmented spatial layout." Furthermore, due to the "black box problem," the lack of a decision interpretation module makes it impossible to reconstruct the contribution logic of AI decision-making indicators, making it difficult for planners to judge the rationality of the proposed solutions and hindering the implementation of the technology.

[0004] Regarding the aforementioned technologies, the inventors discovered that existing AI-based town boundary optimization methods suffer from poor applicability. Summary of the Invention

[0005] To improve the applicability of urban development boundary delineation, this application provides an urban development boundary optimization method and apparatus based on gradient boosting decision trees.

[0006] Firstly, this application provides a method for optimizing urban development boundaries based on gradient boosting decision trees.

[0007] This application is achieved through the following technical solution: A method for optimizing urban development boundaries based on gradient boosting decision trees includes the following steps: Obtain the initial boundary range of the target area; and obtain multi-source data including land parcel vector data, ecological red line data, and remote sensing vacant data; Based on the multi-source data, generate grid cell feature vectors; A gradient boosting decision tree model is constructed, using manually verified town boundaries from historical periods as training labels and grid cell feature vectors from corresponding periods as training samples. The model parameters are optimized by iteratively fitting residuals, enabling the gradient boosting decision tree model to learn the nonlinear mapping relationship between grid cell features and town boundaries until the output accuracy meets the preset accuracy threshold, thus obtaining a town boundary recognition model. The current grid cell feature vector is input into the town boundary identification model, and the town boundary located within the initial boundary range is output.

[0008] In a preferred embodiment, this application can be further configured to include the following steps: Obtain rigid constraint index vectors, dynamic performance index vectors, and spatial correlation index vectors as learning labels; A preset priority for deleting inefficient land use is established. The feature vectors of grid cells from historical periods are input into a preset weak learner. The weak learner is iteratively trained by combining the corresponding learning labels and initial deleting weights, and the output is a list of candidate deleting units containing deleting priority scores. The lower the deleting priority score, the higher the priority is given to deleting the corresponding grid cell feature vector.

[0009] In a preferred embodiment, this application can be further configured to include the following steps: The potential for space expansion is preset; The feature vectors of grid cells from historical periods are input into a preset weak learner. The weak learner is then iteratively trained by combining the corresponding learning labels and initial expansion weights. The output is a list of expansion candidate pools containing spatial expansion potential values. The larger the spatial expansion potential value, the greater the expansion value of the corresponding grid cell feature vector.

[0010] In a preferred example, this application can be further configured such that the training steps of the gradient boosting decision tree model also include, Set optimization constraints to ensure that the deviation between the reduced area and the expanded area of ​​urban plots is less than the preset deviation threshold. Based on the aforementioned optimization constraints, the weak learner is optimized using a planning solver to output a candidate pool list for removing and expanding land parcels.

[0011] In a preferred embodiment, this application can be further configured to include the following steps: The spatial morphological patterns of the compliant boundaries of the region are learned through a pre-defined generative adversarial network. By utilizing the spatial morphological patterns and combining them with the town boundary identification model, the feature vectors of the grid units in the current period are used to identify the boundaries, thereby generating continuous town boundaries located within the initial boundary range.

[0012] In a preferred embodiment, this application can be further configured to include the following steps: The town boundary identification model is used to predict the probability of construction land use based on the feature vectors of grid cells in the current period, and a probability distribution raster map is generated. Based on the probability distribution grid map, the LIME algorithm is used to make a local linear approximation interpretation of the output of the town boundary identification model, analyze the core indicators and contribution design logic, and visualize and generate an indicator contribution list.

[0013] In a preferred embodiment, this application can be further configured to include the following steps: When the number of iterations of the weak learner reaches a preset threshold, the learning weight of the rigid constraint index vector accounts for the largest proportion among the learned labels.

[0014] In a preferred embodiment, this application can be further configured such that when the number of iterations of the weak learner reaches a preset threshold, the learning weight of the dynamic performance index vector in the learned labels has the largest proportion.

[0015] Secondly, this application provides a device for optimizing urban development boundaries based on gradient boosting decision trees.

[0016] This application is achieved through the following technical solution: A town development boundary optimization device based on gradient boosting decision tree, comprising, The data module is used to obtain the initial boundary range of the target area; and to obtain multi-source data including land parcel vector data, ecological red line data and remote sensing vacant data; The preprocessing module is used to generate grid cell feature vectors based on the multi-source data; The modeling module is used to construct a gradient boosting decision tree model. It uses the manually verified town boundaries from historical periods as training labels and the grid cell feature vectors of the corresponding periods as training samples. The model parameters are optimized by iteratively fitting the residuals, so that the gradient boosting decision tree model learns the nonlinear mapping relationship between grid cell features and town boundaries until the output accuracy meets the preset accuracy threshold, thus obtaining a town boundary recognition model. The boundary optimization module is used to input the current grid cell feature vector into the town boundary recognition model and output the town boundary located within the initial boundary range.

[0017] Thirdly, this application provides a computer device.

[0018] This application is achieved through the following technical solution: A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of any of the above-described methods for optimizing urban development boundaries based on gradient boosting decision trees.

[0019] In summary, compared with the prior art, the beneficial effects of the technical solution provided in this application include at least the following: The initial boundary range of the target area is obtained as the application space boundary data of this scheme, which is convenient and efficient. Multi-source data, including land parcel vector data, ecological red line data, and remote sensing vacant data, is acquired to improve data quality and facilitate the improvement of the output accuracy of the training model later. Based on the multi-source data, grid cell feature vectors are generated to standardize the data and improve the efficiency of the later training model. A gradient boosting decision tree model is constructed, using manually verified town boundaries from historical periods as training labels and grid cell feature vectors from corresponding periods as training samples. The model parameters are optimized by iteratively fitting residuals, enabling the gradient boosting decision tree model to learn the nonlinear mapping relationship between grid cell features and town boundaries until the output accuracy meets the preset accuracy threshold, resulting in a town boundary recognition model. This model can accurately capture the nonlinear correlation between features, improving the applicability of the town boundary recognition model. The constructed gradient boosting decision tree model has the advantage of handling nonlinear correlations and multi-source heterogeneous data, and has higher recognition accuracy compared to traditional single models. Attached Figure Description

[0020] Figure 1 This is a schematic diagram of the main process of an urban development boundary optimization method based on gradient boosting decision tree, provided as an exemplary embodiment of this application.

[0021] Figure 2 This is a visualization of the LIME interpretation of the urban boundary identification model output of an urban development boundary optimization method based on gradient boosting decision tree, which is provided as another exemplary embodiment of this application.

[0022] Figure 3 This is a structural block diagram of an urban development boundary optimization device based on gradient boosting decision tree, provided as an exemplary embodiment of this application. Detailed Implementation

[0023] This specific embodiment is merely an explanation of this application and is not intended to limit it. After reading this specification, those skilled in the art can make modifications to this embodiment without contributing any inventive step, but such modifications are protected by patent law as long as they fall within the scope of the claims of this application. To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0024] Furthermore, the term "and / or" in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, the character " / " in this article, unless otherwise specified, generally indicates that the preceding and following related objects have an "or" relationship.

[0025] Existing AI algorithms for urban development boundaries have the following main shortcomings: (1) The algorithm is difficult to handle multiple objectives of "constraint-optimization-balance": It often uses single models such as linear regression and random forest, which cannot accurately capture the nonlinear relationship between "ecological red line conflict-land use efficiency-spatial layout", nor can it achieve the total balance constraint solution of "reduced area = expanded area", and has inapplicability problems such as "optimization results touching the legal bottom line" and "fragmentation of spatial layout". (2) Ignoring the dynamic nature of planning and the interpretability of decision-making: Due to the "black box problem", there is a lack of decision interpretation module, and it is impossible to restore the contribution logic of AI decision indicators, making it difficult for planners to judge the rationality of the plan and hindering the implementation of the technology; at the same time, most models are static training mode and have not realized incremental learning mechanism. When the data is updated, such as planning revision and the implementation of major projects, it is necessary to retrain the full set of samples, resulting in poor adaptability. (3) The indicator system lacks systematicity and computability: Existing methods mostly use single-dimensional indicators such as "current land use + population size", without constructing a three-dimensional collaborative system of "rigid constraints - dynamic effectiveness - spatial correlation". Moreover, the indicator descriptions are vague, such as "transportation convenience" and "ecological suitability", lacking clear data sources, quantitative rules and core calculation logic, which leads to chaotic input of AI models and evaluation results that deviate from actual planning needs. (4) Failure to achieve synergistic integration of “technical analysis and human experience”: The algorithm outputs a single “optimal solution”, lacks human-computer interaction adjustment interface, and cannot integrate human experience such as historical building protection and regional development strategy, resulting in the solution being difficult to implement because it is “technically reasonable but not feasible in practice”.

[0026] Gradient boosting decision tree (GBDT) machine learning models have the advantage of handling nonlinear correlations and multi-source heterogeneous data, providing technical support for solving the problem of "constraint-efficiency-space" collaborative optimization.

[0027] To this end, this application uses gradient boosting decision trees, multi-objective optimization and interpretable AI technology to construct a full-process technical solution of "data preprocessing - indicator construction - model optimization - decision output" through core Python code, so as to achieve accurate delineation and dynamic optimization of urban development boundaries.

[0028] The embodiments of this application will now be described in further detail with reference to the accompanying drawings.

[0029] Reference Figure 1 This application provides a method for optimizing urban development boundaries based on gradient boosting decision trees. The main steps of the method are described below.

[0030] S1: Obtain the initial boundary range of the target area; and obtain multi-source data including land parcel vector data, ecological red line data, and remote sensing vacant data; S2: Based on the multi-source data, generate grid cell feature vectors; S3: Construct a gradient boosting decision tree model, using the manually verified town boundaries from historical periods as training labels and the grid cell feature vectors from the corresponding periods as training samples. Optimize the model parameters by iteratively fitting the residuals, enabling the gradient boosting decision tree model to learn the nonlinear mapping relationship between grid cell features and town boundaries until the output accuracy meets the preset accuracy threshold, thus obtaining a town boundary recognition model. S4: Input the current grid cell feature vector into the town boundary recognition model, and output the town boundary located within the initial boundary range.

[0031] Specifically, using GIS spatial analysis functions, based on the preliminary development boundary of the city / region, an "optimization study area" is formed by extending 1-2 kilometers outward, serving as the initial boundary of the target area and clarifying the application space boundary of the technical solution.

[0032] Meanwhile, we collected multi-source data, including land parcel vector data, ecological red line data, tax data, and remote sensing vacancy data, and carried out basic data cleaning work such as missing value filling and coordinate unification (WGS84) to ensure that the data quality meets the needs of subsequent analysis.

[0033] Next, the cleaned multi-source data is mapped to a unified spatial grid and time slice. Data alignment, feature extraction and fusion are completed within the grid. Finally, a fixed-length vector is output for each grid cell to obtain the grid cell feature vector, which is used for tasks such as model prediction, clustering or retrieval.

[0034] A gradient boosting decision tree model is constructed using Python. Historically verified town boundaries are used as training labels, and corresponding grid cell feature vectors are used as training samples. The model parameters are optimized by iteratively fitting residuals. Through supervised training, the gradient boosting decision tree model learns the non-linear mapping relationship between grid cell features and town boundaries, outputting town boundaries located within the initial boundary range. This process continues until the output accuracy meets a preset accuracy threshold, resulting in a town boundary recognition model.

[0035] Input the current grid cell feature vector into the trained town boundary recognition model, and output the town boundary located within the initial boundary range.

[0036] The gradient boosting decision tree model is based on an initial simple model. It trains "weak decision trees" through multiple rounds of iteration. Each tree focuses on correcting the prediction error of the previous model, i.e., the "residual". By fitting the residual, all weak decision trees are superimposed according to their weights to optimize the model parameters and form a more accurate town boundary recognition model.

[0037] In one embodiment, a method for optimizing urban development boundaries based on gradient boosting decision trees further includes the following steps: Obtain rigid constraint index vectors, dynamic performance index vectors, and spatial correlation index vectors as learning labels; A preset priority for deleting inefficient land use is established. The feature vectors of grid cells from historical periods are input into a preset weak learner. The weak learner is iteratively trained by combining the corresponding learning labels and initial deleting weights, and the output is a list of candidate deleting units containing deleting priority scores. The lower the deleting priority score, the higher the priority is given to deleting the corresponding grid cell feature vector.

[0038] In one embodiment, a method for optimizing urban development boundaries based on gradient boosting decision trees further includes the following steps: The potential for space expansion is preset; The feature vectors of grid cells from historical periods are input into a preset weak learner. The weak learner is then iteratively trained by combining the corresponding learning labels and initial expansion weights. The output is a list of expansion candidate pools containing spatial expansion potential values. The larger the spatial expansion potential value, the greater the expansion value of the corresponding grid cell feature vector.

[0039] Specifically, based on the goals of "legal compliance, resource efficiency, and spatial coordination", a complete system of 3 primary indicators, 9 secondary indicators, and 28 tertiary indicators is constructed, covering three key indicators: rigid constraints, dynamic effectiveness, and spatial correlation. The weights are determined by combining the Analytic Hierarchy Process (AHP) with expert scoring, as shown in Table 1 below.

[0040] Table 1

[0041] In this embodiment, Category A indicators have a weight of 55%, strictly adhering to legal bottom lines. Category B and C indicators have a combined weight of 45%, focusing on efficiency optimization and achieving a balance between compliance and efficiency, offering the advantage of both rigidity and flexibility. Category B indicators can be quickly adjusted through real-time data updates, responding to urban development changes without requiring system reconstruction, thus achieving dynamic adaptation. All the above indicators clearly define their "data source and quantification rules," avoiding vague descriptions, directly adapting to AI model input, and possessing computability.

[0042] The town boundary identification model is designed as a two-stage model to solve two major problems: "identification of priority for reducing inefficient land use" and "identification of potential for expansion of high-quality space", so as to fully adapt to the "three-dimensional quantifiable index system" proposed in this scheme.

[0043] By defining the independent variable set X as 28 tertiary indicators in a three-dimensional indicator system of "rigid constraints + dynamic effectiveness + spatial correlation", all x i The values ​​are all defined quantitative scores (0-100 points) to ensure standardized model input. For the two-stage model, two dependent variables are set, both of which are manually labeled evaluation results of 1-100 points, including: y_del: inefficient land use reduction priority score, the lower the score, the more priority should be given to reduction, and a reduction candidate pool list is constructed; y_exp: high-quality space expansion potential score, the higher the score, the higher the expansion value, and an expansion candidate pool list is constructed.

[0044] In this embodiment, if the score of the priority score for deletion is ≤30 points, it will be included in the "deletion candidate pool", such as idle factory buildings and illegal land plots; if the score of the priority score for deletion is >70 points, it will be excluded from deletion.

[0045] In this embodiment, expansion potential scores ≥ 80 points are included in the "expansion candidate pool," such as efficient land use and compliant planning plots; expansion potential scores < 50 points are excluded from expansion.

[0046] During training, each iteration optimizes only the residual of the previous model, calculates the predicted residual, and feeds it back to the next iteration. The iteration formula is as follows: F t (X)=F t-1 (X)+η·h n (X;θ n ) In the formula: F t (X) represents the model after the t-th iteration; F t-1 (X) represents the model for the (t-1)th round; η h n (...) represents the correction term for the t-th tree, and η is used to control the correction step size to avoid overfitting of the model.

[0047] The GBDT-del urban boundary identification model for prioritizing inefficient land use reduction includes: Input: Rigid constraint layer (A11-A23) + Dynamic performance layer (B11-B32) + Spatial correlation layer (C11-C32) → Standardized index vector X_del; Initialization: F0_del(X) = the mean of y_del for all samples, where y_del is the manually labeled deletion priority score, from 1 to 100; Iterative training: t=1 to T, where T is the number of weak learners: ① Calculate the residual: r_t_del=y_del-F_{t-1}_del(X), which is the difference between the actual score and the predicted value in the previous round; ② Training the weak learner: h_t_del(X;θ_t), where θ_t is the tree parameter (such as node splitting features, number of leaf nodes), and the fitting residual r_t_del; ③ Calculate the weights: w_t_del = the weights of the weak learner obtained by minimizing the loss function (such as mean squared error); ④ Update the model: F_t_del(X)=F_{t-1}_del(X)+η×w_t_del×h_t_del(X;θ_t) (η is the learning rate); Integrated output: F_del(X)=F_T_del(X), which is the final deletion priority score. The lower the score, the higher the priority for deletion.

[0048] The town boundary identification model GBDT-exp, used for evaluating the potential for expansion into high-quality space, includes: Input: Same as X_del, i.e., a three-dimensional index vector, focusing on the contributions of the dynamic performance layer B-class and the spatial correlation layer C-class indexes; Initialization: F0_exp(X) = the mean of y_exp of all samples, where y_exp is the manually labeled expansion potential score, from 1 to 100; Iterative training: t=1 to T: ① Calculate the residual: r_t_exp=y_exp-F_{t-1}_exp(X); ② Train the weak learner: h_t_exp(X;θ_t), and fit the residual r_t_exp; ③ Calculate the weights: w_t_exp = the weights of the weak learner obtained by minimizing the loss function (such as mean squared error); ④ Update the model: F_t_exp(X)=F_{t-1}_exp(X)+η×w_t_exp×h_t_exp(X;θ_t); Integrated output: F_exp(X)=F_T_exp(X), which is the final expansion potential score. The higher the score, the higher the expansion value.

[0049] After two-stage training, the final model of the town boundary recognition model is a "weighted sum of multiple decision trees": F(X) = F0(X) + Σ (from n=1 to T) [w n ·h n (X;θ n )] In the formula: F0(X) is the initial model, i.e., the first tree, constructed based on the average score of all samples; w n The weights of the nth decision tree are determined by minimizing the loss function; h n (X;θ n Let θ be the nth weak decision tree. n These are the tree construction parameters, such as node splitting features and the number of leaf nodes; for pruning tasks: F_del(X) outputs a pruning priority score; for expansion tasks: F_exp(X) outputs an expansion potential score.

[0050] In one embodiment, for two-stage training, after calculating the original values ​​based on 28 indicators and document quantization rules, the scores are converted to 0-100 points through "linear normalization": Standardization formula: xstd=xmax xminxraw xmin×100 In the formula, xraw is the original value, and xmin / xmax is the extreme value of the historical sample of the index, such as the farmland occupancy degree of A22, xmin=0 and xmax=100.

[0051] The specific categories and key indicators are shown in Table 2 below.

[0052] Table 2

[0053] To achieve "supervised learning," two types of labels need to be assigned to the historical land parcel sample data. The labeling rules and examples are shown in Table 3 below: Table 3

[0054] Initial model construction (F0): The two sub-models are initialized with the label mean respectively: Sub-model deletion: F0del=N1∑i=1Nydel,i (N is the number of training samples); Extended sub-model: F0exp=N1∑i=1Nyexp,i; The model configuration parameters are shown in Table 4 below. Table 4

[0055] The feature vectors of grid cells from historical periods are input into a preset weak learner. Combined with the corresponding learning labels and initial pruning weights, the pruning priority is determined based on the quantized values ​​of 28 indicators. The weak learner is trained iteratively, and a pruning candidate pool list containing pruning priority scores is output. The lower the pruning priority score, the higher the priority is given to pruning the corresponding grid cell feature vector.

[0056] Similarly, the feature vectors of grid cells from historical periods are input into a preset weak learner. Combined with the corresponding learning labels and initial expansion weights, the spatial expansion potential level is determined based on the quantized values ​​of 28 indicators. The weak learner is iteratively trained, and an expansion candidate pool list containing spatial expansion potential values ​​is output. The larger the spatial expansion potential value, the greater the expansion value of the corresponding grid cell feature vector.

[0057] When iterating through multiple rounds of weak learners, taking the reduction of sub-models as an example, the process of iterating through T=150 rounds is the same as that of expanding sub-models.

[0058] The final model outputs the final score via gbdt_del.predict(X), which is the weighted sum of all weak tree predictions.

[0059] In one embodiment, a method for optimizing urban development boundaries based on gradient boosting decision trees further includes the following steps: When the number of iterations of the weak learner reaches a preset threshold, the learning weight of the rigid constraint index vector accounts for the largest proportion among the learned labels.

[0060] In one embodiment, in a gradient boosting decision tree-based method for optimizing urban development boundaries, when the number of iterations of the weak learner reaches a preset threshold, the learning weight of the dynamic performance index vector in the learned labels has the largest proportion.

[0061] To distinguish between "reduction" and "expansion" objectives, task adaptation is achieved during training through indicator weight bias and label annotation logic.

[0062] In this evaluation strategy for prioritizing the reduction of inefficient land use, the training indicators are biased towards Class A rigid constraint layer indicators, such as a weighting of 55% (e.g., 15% weighting for A22 farmland occupancy). The training labels emphasize "violation and inefficiency," such as vacancy rate and ecological conflict. In this embodiment, after the residual fitting of Class A indicators in the first 50 rounds of weak tree focusing, the accuracy of identifying illegal land parcels is improved.

[0063] The training index weights for the evaluation strategy of high-quality space expansion potential are biased towards Class B dynamic efficiency layer indicators, such as a weight ratio of 40% (e.g., B12 unit area output weight of 6%). The training labels prioritize "compliance and efficiency", such as overall planning compliance and facility coverage. In this embodiment, after the first 50 rounds of weak tree focusing on the residual fitting of Class B indicators, the identification accuracy of high-efficiency plots is improved. Using 5-fold cross-validation, this scheme achieves a deletion recognition accuracy of ≥90% and an expansion potential recognition accuracy of ≥88%.

[0064] The GBDT two-stage model accurately captures the nonlinear correlation of indicators, and the pruning / expansion recognition accuracy is about 20% higher than that of random forest.

[0065] Through iterative training, the model can gradually and accurately identify "inefficient land use," such as plot B13 with a high vacancy rate, and "high-quality expansion land," such as plot A11 with high compliance with the master plan and plot B12 with high output per unit area.

[0066] In one embodiment, the training step of the gradient boosting decision tree model further includes, Set optimization constraints to ensure that the deviation between the reduced area and the expanded area of ​​urban plots is less than the preset deviation threshold. Based on the aforementioned optimization constraints, the weak learner is optimized using a planning solver to output a candidate pool list for removing and expanding land parcels.

[0067] Specifically, a multi-objective integer programming (MOIP) optimization problem is constructed, with objectives including maximizing expansion potential, minimizing ecological conflict, and minimizing spatial fragmentation, with weights consistent with the indicator system. By setting optimization constraints, such as a plot reduction area deviation of ≤5% and a plot expansion area deviation of ≤5%, candidate plots output by the urban boundary identification model are used as a basis. Python code is written to integrate the multiple objectives using a weighted summation method, and a 0-1 integer programming model is constructed. Using a programming solver under area balance constraints, the optimal combination of plot reduction and expansion is solved.

[0068] In one embodiment, a method for optimizing urban development boundaries based on gradient boosting decision trees further includes the following steps: The spatial morphological patterns of the compliant boundaries of the region are learned through a pre-defined generative adversarial network. By utilizing the spatial morphological patterns and combining them with the town boundary identification model, the feature vectors of the grid units in the current period are used to identify the boundaries, thereby generating continuous town boundaries located within the initial boundary range.

[0069] Specifically, using existing generative adversarial networks, based on multi-source data, we learn the spatial morphological patterns of regional compliance boundaries. Using the features of land parcels selected by MOIP as model input, we write Python code and use the trained town boundary recognition model to perform boundary recognition on the feature vectors of grid units in the current period. Under the total balance constraint, we generate continuous and compact town boundaries located within the initial boundary range to correct the land parcel discretization problem and achieve GAN spatial morphology optimization.

[0070] The GBDT+MOIP+GAN coupled algorithm achieves spatial optimization under total balance, improving computational efficiency by about 50%, avoiding fragmented development, and making the model more applicable.

[0071] In one embodiment, an algorithm closed loop is built according to the serial process of "GBDT screening of candidate plots - MOIP optimization of plot combination - GAN calibration of boundary morphology", Python code is written, and a result feedback mechanism is introduced to realize the dynamic adaptation of each module.

[0072] In one embodiment, a method for optimizing urban development boundaries based on gradient boosting decision trees further includes the following steps: The town boundary identification model is used to predict the probability of construction land use based on the feature vectors of grid cells in the current period, and a probability distribution raster map is generated. Based on the probability distribution grid map, the LIME algorithm is used to make a local linear approximation interpretation of the output of the town boundary identification model, analyze the core indicators and contribution design logic, and visualize and generate an indicator contribution list.

[0073] Among them, the town boundary identification model can calculate each indicator (independent variable x). i The contribution of the model to the overall planning compliance rate (e.g., 15% for permanent basic farmland occupation and 15% for general planning compliance) is calculated using the following formula: Importance(x i )=(1 / T)·Σ(from t=1 to T)[ΔL t (x i )] Where: ΔL t (x i ) represents the iteration in round t, using x i When used as a feature for splitting a decision tree, the reduction in the model's loss function is more significant; the greater the reduction, the better. i The more important it is; (1 / T) represents the average contribution to round T, yielding x. i Ultimate importance; In this embodiment, the town boundary identification model directly outputs the importance ranking of 28 indicators, providing algorithmic support for the "weight verification of the three-dimensional indicator system". The auxiliary explanation output of the town boundary identification model includes: indicator importance (x...i The output is presented in list form, showing the contribution of 28 indicators, such as A22 farmland occupancy rate of 15% and B13 vacancy rate of 8%, to verify the rationality of indicator weights. The residual changes during iteration are displayed as a residual convergence curve chart (e.g., residual < 5 points after 150 rounds), proving model training convergence. The weak learner weights (w_t) are displayed in list form, showing the contribution of each tree (e.g., the first 20 trees account for 60% of the weight), supporting model simplification and optimization.

[0074] By utilizing the LIME algorithm to analyze the decision-making logic of the town boundary identification model, such as outputting the top 5 core impact indicators and their contributions for a single plot, generating an indicator contribution list and a decision visualization chart, the output includes LIME interpretation results, namely the indicator contribution list and decision visualization chart; and outputs the adjusted results: the final list of plots to be deleted / expanded and the boundary data optimized by GAN, such as... Figure 2 As shown, the boundary data supports export in a standard GIS format to clearly present the key reasons why plots are included in the deletion or expansion scope, thus solving the AI ​​"black box" problem.

[0075] Verification has shown that the LIME algorithm breaks through the AI ​​"black box," and the human-machine collaboration interface incorporates human experience, increasing the solution pass rate by about 60% compared to pure algorithm output, thus meeting the professional needs of planning and decision-making.

[0076] In one embodiment, an interactive mechanism of "AI recommendation - human intervention - model iteration" is established. On the one hand, it supports traditional land parcel intervention, such as the preservation of historical buildings. By adding historical protection weights to the parcel and triggering incremental model updates, the scheme can be optimized. On the other hand, it supports spatial intervention of GAN optimization boundaries. For example, when a conflict is found between the boundary and the municipal corridor, the boundary can be contracted through manual decision-making to avoid it. The adjusted boundary data can then be used to fine-tune the GAN model, avoiding full retraining.

[0077] In one embodiment, when planning data is updated, such as with the addition of major projects or inefficient land use survey data, incremental training is adopted. The model input is: new samples (X_new, y_new) + existing model parameters (such as 150 trained trees). The training process predicts the residuals of the new samples based on the existing model, adds 10-20 weak trees to fit the residuals, and updates the weights. The model output is: the adjustment is completed within 1-2 hours without full retraining, saving 80% of the time compared to full retraining. The incremental learning mechanism supports real-time data updates, can quickly adjust the plan, adapt to the full-cycle management requirements of "periodic evaluation of national land spatial planning", and meet the needs of "dynamic optimization".

[0078] The final output of the town boundary identification model includes three main categories: First, the importance ranking of indicators, generated based on the indicator importance analysis results of the GBDT model, clarifying the influence weight of various constraints and effectiveness indicators on decision-making; second, the boundary optimization scheme, including a list of deleted / expanded plots, an area balance table, a boundary vector map after GAN optimization, and a MOIP multi-objective weight table; and third, an interpretable report, including a LIME indicator contribution map, GAN boundary optimization effect analysis: compactness improvement rate, fragmentation index reduction rate, and model accuracy data, such as a deletion identification accuracy of 91% and a GAN boundary compliance rate of 88%.

[0079] In summary, a gradient boosting decision tree-based method for optimizing urban development boundaries offers several advantages. First, it obtains the initial boundary range of the target area, serving as the application space boundary data for this scheme. This method is convenient and efficient. Second, it acquires multi-source data, including land parcel vector data, ecological red line data, and remote sensing vacant data, to improve data quality and enhance the output accuracy of the training model. Third, it generates grid cell feature vectors based on the multi-source data, standardizing the data and improving the efficiency of subsequent model training. Finally, it constructs a gradient boosting decision tree model, using manually verified urban boundaries from historical periods as training labels and corresponding grid cell feature vectors as training samples. The model parameters are optimized through iterative fitting of residuals, enabling the gradient boosting decision tree model to learn the nonlinear mapping relationship between grid cell features and urban boundaries until the output accuracy meets a preset precision threshold. This results in an urban boundary recognition model that accurately captures the nonlinear correlation between features, improving the applicability of the urban boundary recognition model. The constructed gradient boosting decision tree model exhibits higher recognition accuracy compared to traditional single-model approaches.

[0080] A method for optimizing urban development boundaries based on gradient boosting decision trees is developed. A GBDT-driven "evaluation-optimization-verification" closed-loop algorithm engine is developed, which integrates multi-objective integer programming (MOIP) and generative adversarial network (GAN) to achieve accurate identification of inefficient land use, scientific assessment of expansion potential, and spatial optimization under total balance constraints, thus solving the pain points of "difficulty in nonlinear correlation processing and difficulty in total balance control".

[0081] A gradient boosting decision tree-based urban development boundary optimization method addresses the issues of "discrete plots and irregular boundary shapes" after MOIP optimization. MOIP solves the scientific selection under multi-objective constraints, i.e., determining "which plots to select". GAN learns the regional spatial layout rules, and GAN solves the problem of learning the real distribution to generate regular shapes, i.e. optimizing "how to draw the boundary", generating continuous and compact development boundaries and improving the spatial adaptability of the scheme.

[0082] A method for optimizing urban development boundaries based on gradient boosting decision trees is proposed. The design incorporates an "explainable AI + human-machine collaboration" decision module, which uses the LIME algorithm to analyze the AI ​​decision logic and establishes an interactive process of "AI recommendation - human intervention - model iteration" to solve the problems of "black box decision-making" and "experience fragmentation" and improve the scientific validity and feasibility of the solution.

[0083] A method for optimizing urban development boundaries based on gradient boosting decision trees also realizes the incremental learning capability of the model, adapts to the dynamic update requirements of planning data, and can quickly output adjustment schemes without full retraining, thus meeting the policy requirements of "full-cycle dynamic management" of urban development boundaries.

[0084] A method for optimizing urban development boundaries based on gradient boosting decision trees is proposed. Based on the principles of "prioritizing rigid constraints, optimizing efficiency, and adapting spatial coordination", this method constructs a quantifiable indicator system and an AI closed-loop algorithm to achieve accurate evaluation, dynamic optimization, and scientific delineation of development boundaries, providing technical support for the implementation of territorial spatial planning.

[0085] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

[0086] Reference Figure 3 This application also provides a town development boundary optimization device based on gradient boosting decision trees, which corresponds one-to-one with the town development boundary optimization method based on gradient boosting decision trees described in the above embodiments. This town development boundary optimization device based on gradient boosting decision trees includes... The data module is used to obtain the initial boundary range of the target area; and to obtain multi-source data including land parcel vector data, ecological red line data and remote sensing vacant data; The preprocessing module is used to generate grid cell feature vectors based on the multi-source data; The modeling module is used to construct a gradient boosting decision tree model. It uses the manually verified town boundaries from historical periods as training labels and the grid cell feature vectors of the corresponding periods as training samples. The model parameters are optimized by iteratively fitting the residuals, so that the gradient boosting decision tree model learns the nonlinear mapping relationship between grid cell features and town boundaries until the output accuracy meets the preset accuracy threshold, thus obtaining a town boundary recognition model. The boundary optimization module is used to input the current grid cell feature vector into the town boundary recognition model and output the town boundary located within the initial boundary range.

[0087] A town development boundary optimization device based on gradient boosting decision trees also includes, The first-stage training submodule is used to preset the priority of inefficient land use reduction. The feature vectors of grid cells from historical periods are input into a preset weak learner. The weak learner is iteratively trained by combining the corresponding learning labels and initial reduction weights, and the output is a list of candidate reduction pools containing reduction priority scores. The lower the reduction priority score, the higher the priority is given to reducing the corresponding grid cell feature vector.

[0088] A town development boundary optimization device based on gradient boosting decision trees also includes, The second-stage training submodule is used to input the feature vectors of grid cells from historical periods into a preset weak learner, combine the corresponding learning labels and initial expansion weights, iteratively train the weak learner, and output an expansion candidate pool list containing spatial expansion potential values. The larger the spatial expansion potential value, the greater the expansion value of the corresponding grid cell feature vector.

[0089] A town development boundary optimization device based on gradient boosting decision trees also includes, The target optimization submodule is used to optimize the weak learner based on preset optimization constraints and with the help of a planning solver, and output a candidate pool list for removing and expanding land parcels.

[0090] A town development boundary optimization device based on gradient boosting decision trees also includes, The boundary correction submodule is used to identify the boundaries of the grid cell feature vectors in the current period by using the preset spatial morphology rules and the town boundary identification model, and to generate continuous town boundaries located within the initial boundary range.

[0091] A town development boundary optimization device based on gradient boosting decision trees also includes, The interpretation submodule is used to predict the probability of construction land use based on the feature vector of the grid unit in the current period through the town boundary identification model, and generate a probability distribution raster map. Based on the probability distribution raster map, the LIME algorithm is used to perform a local linear approximation interpretation of the output of the town boundary identification model, analyze the core indicators and contribution design logic, and visualize and generate an indicator contribution list.

[0092] For specific limitations on the urban development boundary optimization device based on gradient boosting decision tree, please refer to the limitations on the urban development boundary optimization method based on gradient boosting decision tree mentioned above, which will not be repeated here.

[0093] The modules in the aforementioned gradient boosting decision tree-based urban development boundary optimization device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the corresponding operations of each module.

[0094] In one embodiment, a computer device is provided, which may be a server. The computer device includes a processor, memory, a network interface, and a database connected via a system bus. The processor provides computing and control capabilities. The memory includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface is used to communicate with external terminals via a network connection. When the computer program is executed by the processor, it implements any of the aforementioned gradient-boosting decision tree-based urban development boundary optimization methods.

[0095] In one embodiment, a computer-readable storage medium is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements any of the above-described methods for optimizing urban development boundaries based on gradient boosting decision trees.

[0096] In one embodiment, a computer program product is provided, comprising a computer program that, when executed by a processor, implements any of the above-described methods for optimizing urban development boundaries based on gradient boosting decision trees.

[0097] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. When executed, the computer program may include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided in this application may include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

[0098] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is used as an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the system can be divided into different functional units or modules to complete all or part of the functions described above.

Claims

1. A method for optimizing urban development boundaries based on gradient boosting decision trees, characterized in that, Includes the following steps, Obtain the initial boundary range of the target area; and obtain multi-source data including land parcel vector data, ecological red line data, and remote sensing vacant data; Based on the multi-source data, generate grid cell feature vectors; A gradient boosting decision tree model is constructed, using manually verified town boundaries from historical periods as training labels and grid cell feature vectors from corresponding periods as training samples. The model parameters are optimized by iteratively fitting residuals, enabling the gradient boosting decision tree model to learn the nonlinear mapping relationship between grid cell features and town boundaries until the output accuracy meets the preset accuracy threshold, thus obtaining a town boundary recognition model. The current grid cell feature vector is input into the town boundary identification model, and the town boundary located within the initial boundary range is output.

2. The urban development boundary optimization method based on gradient boosting decision tree according to claim 1, characterized in that, It also includes the following steps, Obtain rigid constraint index vectors, dynamic performance index vectors, and spatial correlation index vectors as learning labels; A preset priority for deleting inefficient land use is established. The feature vectors of grid cells from historical periods are input into a preset weak learner. The weak learner is iteratively trained by combining the corresponding learning labels and initial deleting weights, and the output is a list of candidate deleting units containing deleting priority scores. The lower the deleting priority score, the higher the priority is given to deleting the corresponding grid cell feature vector.

3. The urban development boundary optimization method based on gradient boosting decision tree according to claim 2, characterized in that, It also includes the following steps, The potential for space expansion is preset; The feature vectors of grid cells from historical periods are input into a preset weak learner. The weak learner is then iteratively trained by combining the corresponding learning labels and initial expansion weights. The output is a list of expansion candidate pools containing spatial expansion potential values. The larger the spatial expansion potential value, the greater the expansion value of the corresponding grid cell feature vector.

4. The urban development boundary optimization method based on gradient boosting decision tree according to claim 3, characterized in that, The training steps for the gradient boosting decision tree model also include, Set optimization constraints to ensure that the deviation between the reduced area and the expanded area of ​​urban plots is less than the preset deviation threshold. Based on the aforementioned optimization constraints, the weak learner is optimized using a planning solver to output a candidate pool list for removing and expanding land parcels.

5. The urban development boundary optimization method based on gradient boosting decision tree according to any one of claims 1-4, characterized in that, It also includes the following steps, The spatial morphological patterns of the compliant boundaries of the region are learned through a pre-defined generative adversarial network. By utilizing the spatial morphological patterns and combining them with the town boundary identification model, the feature vectors of the grid units in the current period are used to identify the boundaries, thereby generating continuous town boundaries located within the initial boundary range.

6. The urban development boundary optimization method based on gradient boosting decision tree according to claim 5, characterized in that, It also includes the following steps, The town boundary identification model is used to predict the probability of construction land use based on the feature vectors of grid cells in the current period, and a probability distribution raster map is generated. Based on the probability distribution grid map, the LIME algorithm is used to make a local linear approximation interpretation of the output of the town boundary identification model, analyze the core indicators and contribution design logic, and visualize and generate an indicator contribution list.

7. The urban development boundary optimization method based on gradient boosting decision tree according to claim 2, characterized in that, It also includes the following steps, When the number of iterations of the weak learner reaches a preset threshold, the learning weight of the rigid constraint index vector accounts for the largest proportion among the learned labels.

8. The urban development boundary optimization method based on gradient boosting decision tree according to claim 3, characterized in that, When the number of iterations of the weak learner reaches a preset threshold, the learning weight of the dynamic performance index vector in the learned labels has the largest proportion.

9. A device for optimizing urban development boundaries based on gradient boosting decision trees, characterized in that, include, The data module is used to obtain the initial boundary range of the target area; and to obtain multi-source data including land parcel vector data, ecological red line data and remote sensing vacant data; The preprocessing module is used to generate grid cell feature vectors based on the multi-source data; The modeling module is used to construct a gradient boosting decision tree model. It uses the manually verified town boundaries from historical periods as training labels and the grid cell feature vectors of the corresponding periods as training samples. The model parameters are optimized by iteratively fitting the residuals, so that the gradient boosting decision tree model learns the nonlinear mapping relationship between grid cell features and town boundaries until the output accuracy meets the preset accuracy threshold, thus obtaining a town boundary recognition model. The boundary optimization module is used to input the current grid cell feature vector into the town boundary recognition model and output the town boundary located within the initial boundary range.

10. A computer device, characterized in that, The method includes a memory, a processor, and a computer program stored in the memory, wherein the processor executes the computer program to implement the steps of the method according to any one of claims 1 to 8.