A method, medium and system for identifying construction waste according to satellite remote sensing images through deep learning

By constructing a multi-temporal spectral feature database and a multi-scale spatial relationship perception model, and combining it with an improved random forest ensemble learning algorithm, the problem of spectral feature confusion in construction waste identification was solved, and the identification accuracy was improved.

CN122244720APending Publication Date: 2026-06-19QINGDAO GUOCEN HAIYAO INFORMATION TECH CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
QINGDAO GUOCEN HAIYAO INFORMATION TECH CO LTD
Filing Date
2026-05-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, the spectral characteristics of construction waste are often confused with other land features, leading to low identification accuracy.

Method used

By constructing a multi-temporal spectral feature database and establishing a temporal matrix to represent spectral change characteristics, a multi-scale spatial relationship perception model and an improved random forest ensemble learning algorithm are used to identify construction waste.

Benefits of technology

It effectively solves the problem of spectral characteristics being confused with those of construction waste and other ground features, thus improving identification accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244720A_ABST
    Figure CN122244720A_ABST
Patent Text Reader

Abstract

This invention provides a method, medium, and system for identifying construction waste based on deep learning from satellite remote sensing images. Belonging to the field of construction waste identification technology, this invention acquires and preprocesses multi-temporal high-resolution satellite remote sensing images, constructs a database of spectral features and an endmember spectral library for construction waste, establishes a temporal matrix to characterize dynamic spectral changes, employs a multi-scale spatial relationship perception model to extract multi-level features and utilizes graph convolutional networks to model spatial neighborhood relationships, designs an improved random forest ensemble learning algorithm to fuse multi-dimensional features for classification decisions, establishes a confidence assessment mechanism, and combines mixed pixel decomposition processing to obtain a spatial distribution probability map of construction waste. Finally, it generates vectorized identification results based on abundance thresholds and spatial clustering parameters, solving the technical problem of low identification accuracy caused by the confusion of spectral features of construction waste with other ground features.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of construction waste identification technology, and specifically relates to a method, medium, and system for identifying construction waste based on deep learning from satellite remote sensing images. Background Technology

[0002] Construction waste monitoring is a crucial component of urban environmental management. Traditional construction waste identification primarily relies on visual interpretation and manual inspections. However, with the development of remote sensing technology, automated identification methods based on satellite remote sensing imagery have gradually become mainstream. Currently widely used technologies include supervised classification based on spectral features, support vector machine classification, and traditional machine learning methods. These methods are widely applied in scenarios such as monitoring urban waste dumps, identifying illegal dumping sites, and assessing environmental pollution. However, current construction waste remote sensing identification technologies face challenges due to the complex and diverse composition of construction waste, including concrete blocks, bricks, metal components, and wood. Its spectral characteristics exhibit significant variability across different periods and environmental conditions. Furthermore, construction waste shares similar spectral characteristics with background features such as bare soil, roads, and buildings, making it difficult for traditional single-spectral feature analysis methods to effectively distinguish between these features. In other words, existing technologies suffer from the technical problem of low identification accuracy caused by the confusion of construction waste spectral characteristics with other features. Summary of the Invention

[0003] In view of this, the present invention provides a method, medium and system for identifying construction waste based on deep learning from satellite remote sensing images, which can solve the technical problem in the prior art where the spectral characteristics of construction waste are confused with other ground features, resulting in low identification accuracy.

[0004] The present invention is implemented as follows: The first aspect of the present invention provides a method for identifying construction waste based on deep learning from satellite remote sensing images, including acquiring multi-temporal satellite remote sensing image data of a target area, preprocessing the multi-temporal satellite remote sensing image data, constructing a spectral feature database of construction waste, identifying pure spectral endmembers using an endmember extraction algorithm, establishing a temporal matrix to characterize spectral change features, constructing a multi-scale spatial relationship perception model, designing an improved random forest ensemble learning algorithm as a classification decision module, establishing a confidence assessment mechanism for construction waste identification, performing mixed pixel decomposition processing, and generating identification results based on construction waste abundance thresholds and spatial clustering parameters.

[0005] The step of acquiring multi-temporal satellite remote sensing image data of the target area specifically involves selecting high-resolution images with a spatial resolution in the range of [0.5, 2] meters and setting the temporal interval to the range of [15, 30] days to ensure coverage of changes in the state of construction waste accumulation under different seasons and weather conditions.

[0006] The preprocessing steps for multi-temporal satellite remote sensing image data include atmospheric correction, geometric correction, and radiometric correction. A temporal registration algorithm is used to unify multi-temporal satellite remote sensing image data from different periods into the same coordinate system and establish a pixel-level temporal correspondence.

[0007] The steps involved in constructing a database of spectral characteristics of construction waste include: collecting spectral reflectance curves of known construction waste areas, extracting characteristic band combinations, calculating spectral similarity indices, and establishing a database of spectral difference patterns between construction waste and background features.

[0008] The step of identifying pure spectral endmembers using the endmember extraction algorithm specifically involves using the pure pixel index algorithm to screen pixels with high spectral purity and using spectral angle matching technology to determine the endmember spectra of the main land cover types, such as construction waste, vegetation, soil, and water bodies.

[0009] The step of establishing a temporal matrix to characterize spectral changes involves identifying dynamic spectral change patterns during the accumulation of construction waste through temporal difference analysis and trend detection, and extracting time-series feature vectors.

[0010] The steps involved in constructing a multi-scale spatial relationship perception model are as follows: using convolutional branches with multiple receptive fields to extract multi-level features from local details to global context; modeling spatial neighborhood relationships between pixels through graph convolutional networks; and using a time-folding network structure to encode time-series feature vectors.

[0011] The multi-scale spatial relationship perception model includes an end-to-end neural network architecture comprising a feature extraction layer, a multi-scale fusion layer, and a spatial relationship modeling layer. The feature extraction layer uses residual convolutional blocks to extract basic spectral and texture features. The multi-scale fusion layer uses three convolutional kernels of different sizes (3×3, 5×5, and 7×7) to process multiple branches in parallel. The spatial relationship modeling layer uses pixels as graph nodes to construct a spatial adjacency graph.

[0012] A second aspect of the present invention provides a computer-readable storage medium storing program instructions that, when executed in a computer, perform the aforementioned method for identifying construction waste based on deep learning from satellite remote sensing images.

[0013] A third aspect of the present invention provides a system for identifying construction waste based on deep learning from satellite remote sensing images, comprising the aforementioned computer-readable storage medium. The system can be any one of a computer, a server, or a microcontroller. The computer-readable storage medium is disposed within the system, and the system is provided with a microprocessor that executes the program instructions stored in the computer-readable storage medium.

[0014] This invention effectively solves the problem of spectral feature confusion between construction waste and other land cover features by constructing a multi-temporal spectral feature database, establishing a temporal matrix to represent dynamic spectral changes, extracting multi-level features using a multi-scale spatial relationship perception model, and combining an improved random forest ensemble learning algorithm for classification decisions. This invention identifies the dynamic spectral change patterns during the accumulation of construction waste through temporal difference analysis and trend detection, models spatial neighborhood relationships between pixels using graph convolutional networks, and obtains spatial distribution probability information of construction waste through hybrid pixel decomposition processing, thereby overcoming the classification confusion problem caused by traditional methods relying solely on single temporal spectral features. In summary, this invention solves the technical problem mentioned in the background art of low identification accuracy caused by the confusion of spectral features of construction waste with other land cover features. Attached Figure Description

[0015] Figure 1 This is a flowchart of the method of the present invention.

[0016] Figure 2 This is a diagram illustrating the temporal variation characteristics of the spectral composition of construction waste in the embodiment.

[0017] Figure 3 This is a schematic diagram of the mixed pixel decomposition process in the embodiment.

[0018] Figure 4 This is a diagram showing the spatial distribution identification results of construction waste in the embodiment.

[0019] Figure 5 This is a comparative analysis chart of the field verification results in the examples.

[0020] Figure 6 The image shows a contour map of the spatial distribution of construction waste abundance in the embodiment.

[0021] Figure 7 The following is a performance analysis chart of the identification results under different abundance thresholds in the embodiment, including two sub-charts: (A) is a chart of the number of identified regions under different abundance thresholds, and (B) is a chart of the identification accuracy under different abundance thresholds. Detailed Implementation

[0022] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings.

[0023] like Figure 1 The diagram shown is a flowchart of a method for identifying construction waste based on deep learning from satellite remote sensing images, provided by the first aspect of this invention. This method includes the following steps:

[0024] S01. Acquire multi-temporal satellite remote sensing image data of the target area, select high-resolution images with spatial resolution in the range of [0.5, 2] meters, and set the temporal interval to [15, 30] days to ensure coverage of changes in the state of construction waste accumulation under different seasons and weather conditions;

[0025] S02. Preprocess the acquired multi-temporal satellite remote sensing image data, including atmospheric correction, geometric correction and radiometric correction. Use the temporal registration algorithm to unify the multi-temporal satellite remote sensing image data from different periods into the same coordinate system and establish a pixel-level temporal correspondence.

[0026] S03. Construct a database of spectral characteristics of construction waste, collect spectral reflectance curves of known construction waste areas, extract characteristic band combinations, calculate spectral similarity index, and establish a database of spectral difference patterns between construction waste and background features.

[0027] S04. Use the endmember extraction algorithm to identify clean spectral endmembers from the multi-temporal satellite remote sensing image data, use the clean pixel index algorithm to screen pixels with high spectral purity, and use spectral angle matching technology to determine the endmember spectra of major land cover types such as construction waste, vegetation, soil, and water bodies.

[0028] S05. Establish a temporal matrix to characterize the spectral variation features of the same pixel location at different times. Through temporal difference analysis and trend detection, identify the dynamic spectral variation patterns during the accumulation of construction waste and extract time series feature vectors.

[0029] S06. Construct a multi-scale spatial relationship perception model, use multiple convolutional branches with different receptive fields to extract multi-level features from local details to global context, model the spatial neighborhood relationship between pixels through graph convolutional networks, and use a time-folding network structure to achieve efficient encoding of the time series feature vector.

[0030] S07. Design an improved random forest ensemble learning algorithm as a classification decision module. Employ a weighted voting strategy to fuse the output of the multi-scale spatial relationship perception model and traditional spectral texture features. Adaptively adjust the number of decision trees and splitting criteria according to sample complexity.

[0031] S08. Establish a confidence assessment mechanism for construction waste identification, calculate the classification probability and uncertainty index of each pixel, initiate the manual verification process when the confidence level is below 0.75, and increase the texture feature weight when the spectral similarity index is greater than 0.85.

[0032] S09. Perform mixed pixel decomposition processing, use the linear spectral mixing model to calculate the abundance information of construction waste in mixed pixels, combine spatial neighborhood consistency constraints to optimize the decomposition results, and output the spatial distribution probability map of construction waste.

[0033] S10. Generate the final identification result based on the abundance threshold and spatial aggregation parameter of construction waste. When the abundance information is greater than 0.6 and the area of ​​the connected region exceeds 100 square meters, it is determined to be a construction waste area, and a vectorized boundary and attribute information statistical report is output.

[0034] A temporal phase matrix is ​​a two-dimensional matrix formed by arranging spectral data acquired at different times from the same geographical location in a time series. Rows represent the time dimension, and columns represent the spectral band dimension. It is used to capture the dynamic spectral characteristics of ground features changing over time. By calculating the spectral differences and rates of change between different time periods, the temporal phase matrix can effectively identify spectral change patterns during processes such as the accumulation, cleaning, and re-accumulation of construction waste.

[0035] The spectral similarity index is a numerical indicator that quantifies the similarity between the spectral curve of a pixel to be identified and the spectral curve of standard construction waste. Its value ranges from [0, 1], with values ​​closer to 1 indicating greater spectral similarity. The spectral similarity index is obtained by calculating a weighted combination of spectral angle, spectral correlation coefficient, and Euclidean distance.

[0036] The Pure Pixel Index is an indicator for evaluating the spectral purity of pixels. It is used to screen pixels with relatively pure spectral features from mixed pixels as candidates for endmember extraction. The Pure Pixel Index is evaluated by calculating the convexity and smoothness of the pixel's spectral curve. The higher the value, the closer the pixel is to pure spectral features. Typically, pixels with a Pure Pixel Index greater than a set threshold are selected for endmember analysis.

[0037] The specific structure of the multi-scale spatial relationship perception model is an end-to-end neural network architecture comprising a feature extraction layer, a multi-scale fusion layer, and a spatial relationship modeling layer. The feature extraction layer uses residual convolutional blocks to extract basic spectral and texture features. The multi-scale fusion layer uses three convolutional kernels of different sizes (3×3, 5×5, and 7×7) to process multiple branches in parallel, with each branch outputting feature maps with different receptive fields. The features at different scales are adaptively weighted and fused using a channel attention mechanism. The spatial relationship modeling layer constructs a spatial adjacency graph using pixels as graph nodes and learns spatial dependencies between pixels using a graph convolutional network. The temporal folding network structure folds and compresses the temporal matrix in the time dimension and captures the changes in the time-series feature vectors through gated recurrent units. The model output layer uses fully connected layers and a Softmax activation function to generate probability distributions for each category.

[0038] The steps for establishing the training dataset of the multi-scale spatial relationship perception model specifically include: collecting high-resolution remote sensing image samples containing construction waste from different regions; establishing real label data through visual interpretation and field surveys; dividing the high-resolution remote sensing image samples into training patches of 256×256 pixels; labeling each training patch with category information such as construction waste, vegetation, buildings, roads, and water bodies; expanding the training patches using data augmentation techniques including rotation, flipping, and brightness adjustment; and dividing the dataset according to a 7:2:1 ratio of training set, validation set, and test set to ensure a basic balance in the number of samples of different categories.

[0039] The training steps of the multi-scale spatial relationship perception model specifically include updating parameters using a stochastic gradient descent optimizer, setting the initial learning rate to 0.001, the batch size to 32, and the training epochs to 200. A cross-entropy loss function combined with Focal Loss is used to address class imbalance. L2 regularization is introduced to prevent overfitting. An early stopping mechanism is implemented, stopping training when the validation set accuracy shows no improvement for 10 consecutive epochs. A learning rate decay strategy is used, multiplying the learning rate by 0.5 every 50 epochs. Mixed precision calculation is employed during training to improve efficiency. Finally, the model parameters with the best performance on the validation set are selected as the final model.

[0040] The improved random forest ensemble learning algorithm constructs a forest structure by building multiple decision trees. Each decision tree randomly selects a subset of features and samples during training. Gini impurity is used as the splitting criterion, and the splitting threshold selection strategy is optimized for the construction waste identification task. The improved algorithm introduces a sample weighting mechanism, assigning higher weights to difficult samples. The quality of individual decision trees is evaluated using out-of-bag error, and poorly performing trees are removed. Finally, a weighted voting method is used to fuse the prediction results of all decision trees during prediction. The weights are determined based on the accuracy and diversity metrics of individual trees.

[0041] Spectral angle matching is a method for evaluating spectral similarity by calculating the cosine of the angle between the measured spectral vector and the reference spectral vector. A smaller spectral angle indicates greater similarity between the two spectra. This technique is sensitive to changes in the shape of the spectral curves but relatively insensitive to changes in brightness, making it suitable for comparative spectral analysis under different imaging conditions.

[0042] Mixed pixel decomposition is the process of decomposing a mixed pixel containing information on multiple land features into its constituent land features and their relative proportions. The linear spectral mixing model assumes that the reflectance spectrum of a pixel is a linear combination of the spectra of its constituent land features. By solving a system of linear equations, the abundance information of each endmember in the mixed pixel is obtained. The sum of the abundance information is equal to 1, and each abundance information is non-negative.

[0043] Endmember extraction algorithms are algorithms that identify and extract spectral features representing pure ground features from hyperspectral or multispectral remote sensing data. These algorithms analyze the geometric distribution characteristics of pixel spectra to find vertex or boundary pixels in the spectral feature space; these vertex or boundary pixels typically represent pure ground feature types within the scene.

[0044] The time-folding network architecture is a neural network architecture that compresses and encodes time-series data along the time dimension. By folding features from adjacent time steps, the time-folding network reduces computational complexity in the time dimension while preserving important temporal variation information, thus achieving efficient processing of long-term data series.

[0045] Spatial neighborhood consistency constraint refers to the assumption that adjacent pixels should have similar attributes when performing pixel classification or decomposition. This constraint incorporates a spatial smoothing term into the objective function to maintain a certain consistency in the classification results or abundance values ​​of adjacent pixels, reducing spatial fragmentation of the classification results.

[0046] The specific implementation methods of the above steps are described in detail below.

[0047] The specific implementation of step S01 is as follows: First, determine the coverage area of ​​the satellite remote sensing images to be acquired based on the geographical range of the target area. Select a commercial satellite data source with high spatial resolution, and control the spatial resolution between 0.5 meters and 2 meters to ensure that the boundaries and texture features of the construction waste pile can be clearly identified. Set the temporal interval to 15 to 30 days to capture the dynamic changes in the construction waste accumulation process. When acquiring data, it is necessary to consider the lighting conditions and weather conditions of different seasons. By acquiring multi-temporal data, the differences between construction waste and other temporary accumulations can be effectively distinguished. At the same time, use the satellite data management platform to perform quality checks on the acquired images, and remove image data with cloud cover exceeding 10% or with obvious stripe noise to ensure that the quality of the images used for subsequent processing meets the analysis requirements.

[0048] The specific implementation of step S02 is as follows: First, atmospheric correction is performed on the acquired multi-temporal satellite remote sensing images to eliminate the influence of atmospheric scattering and absorption on the surface reflectivity. The attenuation degree of the atmosphere on electromagnetic waves of different bands is calculated using a radiative transfer model, and the apparent reflectivity is converted into surface reflectivity. This process can eliminate the radiometric inconsistency caused by differences in atmospheric conditions in images of different temporal phases. Then, geometric correction is performed to correct image distortion caused by changes in satellite attitude and terrain undulations. Geometric fine correction is performed using ground control points and digital elevation models to ensure that the planar position accuracy of the images reaches the sub-pixel level. Radiometric correction is performed to eliminate the differences in radiometric response between different sensors or between different periods of the same sensor through relative radiometric normalization. Temporal registration adopts an automatic registration algorithm based on feature point matching, extracting stable ground features such as building corner points as registration reference points. Through affine transformation or polynomial transformation, images of different periods are unified to the same coordinate system, establishing a precise correspondence at the pixel level. This step provides a spatial reference for subsequent temporal change analysis.

[0049] The specific implementation of step S03 is as follows: Samples of known construction waste areas are collected through a combination of field surveys and visual interpretation of high-resolution images. A portable ground object spectrometer is used to measure the spectral reflectance curves of different types of construction waste on-site, including the spectral characteristics of major components such as concrete blocks, bricks, and steel bars. The measurement data is matched and calibrated with the corresponding satellite image spectra to establish the conversion relationship between ground and satellite spectra. The spectral response characteristics of construction waste in the visible to near-infrared band range are analyzed, and the most sensitive band combinations for construction waste identification are extracted. By calculating characteristic parameters such as ratios, differences, and normalization indices between different bands, a feature space that enhances the spectral difference between construction waste and background features is constructed. The calculation of the spectral similarity index comprehensively considers three dimensions: spectral angle, correlation coefficient, and Euclidean distance. The spectral angle reflects the shape similarity of the spectral curve, the correlation coefficient characterizes the consistency of spectral change trends, and the Euclidean distance measures the absolute difference in spectral values. A comprehensive spectral similarity evaluation is obtained by weighted fusion of these three indicators. The weighting coefficients are optimized according to the spectral characteristics of different features and the requirements of the identification task. The established spectral difference pattern library can provide a basis for subsequent automatic identification.

[0050] The specific implementation of step S04 is as follows: An endmember extraction algorithm is used to identify spectral endmembers representing pure land features from multi-temporal satellite remote sensing images. This algorithm is based on the theory of mixed pixel linear decomposition, assuming that the mixed pixel spectrum in the image is composed of a linear combination of the spectra of a few pure land features. First, principal component analysis is used to reduce the dimensionality of the high-dimensional spectral data. In the dimensionality-reduced feature space, convex volume geometric analysis is used to find extreme points located at the data distribution boundary. These extreme points correspond to the pure land feature spectra in the scene. The pure pixel index algorithm screens candidate endmembers by evaluating the convexity characteristics of the spectral curve of each pixel. The convexity of the spectral curve reflects whether the pixel is close to the optical characteristics of the pure endmember. The pure pixel index value is usually set to a threshold of 0.8 or higher to be considered a high-purity pixel. Spectral angle matching technology calculates the angle between the spectral vectors of the pixel to be identified and the extracted endmember. When the angle is less than 0.1 radians, the two are considered to have high spectral similarity. In this way, standard endmember spectra of major land feature types such as construction waste, vegetation, soil, and water are determined. The extracted endmember spectral library provides a basic reference spectrum for mixed pixel decomposition.

[0051] The specific implementation of step S05 is as follows: a temporal matrix is ​​established to characterize the spectral evolution characteristics of the same spatial location at different times. The row dimension of this matrix represents the time series, and the column dimension represents the spectral band. For each pixel location, its spectral values ​​in all temporal images are extracted to form the temporal spectral matrix of that pixel. The dynamic change characteristics of ground objects are quantified by calculating the spectral difference and rate of change between adjacent temporal phases. The process of construction waste accumulation is usually characterized by an increase in spectral reflectance in the visible light band and relative stability in the near-infrared band, while the seasonal change of vegetation is characterized by significant fluctuations in the near-infrared band. Temporal difference analysis identifies areas with abnormal changes by comparing time phases. The change trend detection uses linear regression or trend analysis methods to evaluate the evolution pattern of the spectrum over time. The extracted time series feature vector includes statistical parameters such as spectral mean, standard deviation, change amplitude, and change frequency. These features can effectively characterize the spectral change patterns of dynamic processes such as construction waste accumulation, cleaning, and re-accumulation, providing rich temporal discrimination information for subsequent deep learning models.

[0052] The specific implementation of step S06 is as follows: a multi-scale spatial relationship perception model is constructed to fully utilize the spatial, spectral, and temporal multi-dimensional information of remote sensing images. This model adopts an end-to-end deep neural network architecture. The feature extraction layer uses residual convolutional blocks for basic feature learning. Residual connections can alleviate the gradient vanishing problem in deep networks and promote feature reuse. The extracted basic features include the spectral response, texture pattern, and spatial distribution characteristics of pixels. The multi-scale fusion layer designs three parallel convolutional branches, using 3x3, 5x5, and 7x7 convolutional kernels, respectively. Small-sized convolutional kernels focus on local detail features such as the edges and textures of construction debris, while large-sized convolutional kernels capture global contextual information such as the overall shape of the pile and its surrounding environment. The outputs of different scale branches are adaptively weighted and fused through a channel attention mechanism. The attention weights are dynamically adjusted according to the information content of the feature map. The spatial relationship modeling layer integrates each pixel in the image... A pixel is considered a node in a graph structure. Edges are established between adjacent pixels to form a spatial adjacency graph. The graph convolutional network learns the spatial dependencies between nodes through a message passing mechanism, so that the feature representation of each pixel incorporates the information of its neighboring pixels. This graph structure modeling can effectively capture the spatial continuity and shape regularity of construction waste piles. The temporal folding network structure processes the temporal dimension of the phase matrix. By folding the features of adjacent time steps, the amount of data in the temporal dimension is compressed. The gated recurrent unit is used to encode the temporal change pattern. Its gating mechanism can selectively remember long-term dependencies and forget irrelevant information. The output layer of the model maps the learned high-level features to the category space through a fully connected layer. The Softmax activation function converts the output into the probability distribution of each category. The entire model is optimized end-to-end through the backpropagation algorithm, automatically learning the optimal feature representation and decision rules from the original image to the classification result.

[0053] The specific implementation of step S07 involves designing an improved random forest ensemble learning algorithm as the final classification decision module. This algorithm constructs multiple decision trees to form a forest structure. During training, each decision tree randomly draws a subset of samples with replacement from the overall training samples, and simultaneously randomly selects a subset of features for node splitting. This randomness design increases the diversity between different decision trees, thereby improving the ensemble effect. The splitting criterion uses the Gini impurity index, selecting the optimal splitting feature and threshold by evaluating the degree of class confusion of the child nodes after splitting. Targeting the characteristics of the construction waste identification task, the splitting threshold selection strategy is optimized by increasing the fine-grained partitioning of boundary samples and introducing a sample weighting mechanism to assign higher weights to samples that are difficult to classify, thus improving the model's performance during training. Greater attention is paid to samples that are difficult to distinguish. Out-of-Bag error evaluation uses samples that were not used in training for each tree to perform performance testing. Weak classifiers with an accuracy of less than 70% are removed based on the evaluation results. A weighted voting strategy is used to fuse deep learning features output by the multi-scale spatial relationship perception model and traditional spectral texture features. The weight allocation is determined based on the performance on the validation set. The weights of deep learning features are usually set to 0.6 to 0.8, and the weights of traditional features are 0.2 to 0.4. The number of decision trees is adaptively adjusted according to the sample complexity. 50 to 100 trees are used for simple samples, and 150 to 200 trees are used for complex samples. The minimum number of samples for the split criterion is dynamically adjusted according to the class distribution. A smaller minimum number of split samples is set for rare classes to avoid over-pruning.

[0054] The specific implementation of step S08 involves establishing a confidence assessment mechanism for construction waste identification to quantify the reliability of the classification results. For each pixel, the posterior probability of its belonging to each category is calculated. The classification confidence is defined as the difference between the highest probability and the second highest probability. The larger the difference, the more certain the classification result. When the confidence is below 0.75, the classification of that pixel is considered to have significant uncertainty, requiring the initiation of a manual verification process. Interpreters perform manual interpretation using high-resolution imagery and other auxiliary data. The uncertainty index is calculated by analyzing the degree of voting divergence among different decision trees in a random forest. If the predictions of most trees are consistent, the uncertainty is low; if the voting results are dispersed, the uncertainty is high. The threshold judgment of the spectral similarity index is based on the degree of similarity between the pixel to be identified and the standard construction waste spectrum. The feature weights are adjusted. When the spectral similarity index is greater than 0.85, it indicates that the spectral features of the pixel are highly consistent with those of the construction waste. At this time, the weight of the texture feature can be increased to further utilize spatial structure information to improve the recognition accuracy. The weight of the texture feature is increased from the initial 0.3 to 0.5, and the weight of the spectral feature is reduced accordingly. This adaptive weight adjustment strategy can dynamically optimize the classification decision according to the feature reliability of different pixels. The confidence assessment results are also used to generate a classification quality map, which intuitively displays the reliability distribution of the recognition results in different regions, providing a reference for the application of results and subsequent verification.

[0055] The specific implementation of step S09 involves decomposing the mixed pixel problem caused by spatial resolution limitations. A mixed pixel refers to a pixel containing multiple land cover types. The linear spectral mixing model assumes that the spectrum of a mixed pixel is a linear weighted combination of the spectra of each constituent endmember, with the weight representing the proportion of each endmember in the pixel. The abundance information of each endmember is calculated by solving a system of linear equations. The constraint requires that the sum of the abundances of all endmembers equals 1 and that each abundance value is non-negative. The constrained least squares method or non-negative matrix factorization algorithm is used to solve the problem. During the decomposition process, a spatial neighborhood consistency constraint is introduced to improve the stability and rationality of the decomposition. This constraint is based on the spatial distribution of land cover. Assuming continuity, the abundance values ​​of adjacent pixels should remain relatively smooth. This is achieved by adding a spatial smoothing term to the objective function. The smoothing term calculates the abundance difference between the current pixel and its surrounding neighboring pixels. The smaller the difference, the better the spatial consistency. The constraint strength is controlled by a regularization parameter, which is usually set between 0.1 and 0.5. Too large a value will lead to over-smoothing and loss of details, while too small a value will not effectively suppress noise. The optimized decomposition results output the abundance information of construction waste in each pixel. The abundance value ranges from 0 to 1, representing the area proportion of construction waste in that pixel. The spatial distribution probability map visualizes the distribution pattern and density of construction waste in the target area through the abundance information.

[0056] The specific implementation of step S10 is as follows: The final identification result is generated based on the abundance threshold and spatial aggregation parameter of construction waste. The abundance threshold is set to 0.6, meaning that a pixel is only identified as construction waste when the abundance of construction waste in the pixel exceeds 60%. This threshold setting comprehensively considers the spectral characteristics of mixed pixels and the requirements of recognition accuracy in practical applications. Too low a threshold will lead to increased false positives, while too high a threshold will miss some construction waste areas. The spatial aggregation parameter is implemented through connected component analysis. Eight-neighbor connectivity is used to determine whether adjacent construction waste pixels belong to the same pile. Only when the area of ​​the connected region exceeds 100 square meters is it identified as a real construction waste dumping point. An area threshold effectively filters out scattered misclassified pixels and small temporary accumulations, reducing noise interference. Vectorization converts the raster-based recognition results into vector polygons. Boundary extraction uses a contour tracking algorithm to draw closed curves along the outer pixels of the construction waste area. The Douglas-Puk algorithm simplifies the boundary, reducing the number of vertices while maintaining shape characteristics. Attribute information statistics include geometric parameters such as area, perimeter, shape index, and center coordinates of each construction waste pile, as well as physical parameters such as average spectral features and texture features. The generated statistical report is output in tabular and chart form, providing data support for the supervision and cleanup work of urban management departments.

[0057] It should be noted that the detailed structure of the multi-scale spatial relationship perception model includes main components such as an input layer, a feature extraction layer, a multi-scale fusion layer, a spatial relationship modeling layer, a temporal coding layer, and an output layer. The input layer receives preprocessed multi-temporal remote sensing image data and the extracted temporal matrix. The image data includes reflectance information of multiple spectral bands, and the temporal matrix contains temporal spectral variation features. The feature extraction layer adopts a residual convolutional neural network architecture, consisting of multiple stacked residual blocks. Each residual block contains two convolutional layers, a batch normalization layer, and an activation function. Residual connections directly add the input to the output through cross-layer direct connections, alleviating the gradient vanishing and degradation problems in deep network training. The number of convolutional kernels in the first residual block is set to 64, and the number of kernels in subsequent residual blocks is increased accordingly. The step size is increased to 128, 256, and 512, enabling progressive extraction from low-level edge texture features to high-level semantic features. The multi-scale fusion layer is designed with three parallel branches to process the output of the feature extraction layer. The first branch uses a 3x3 convolutional kernel to extract local detail features, the second branch uses a 5x5 convolutional kernel to capture medium-scale contextual information, and the third branch uses a 7x7 convolutional kernel to obtain global semantic features. The output feature maps of the three branches are concatenated along the channel dimension and then input into the channel attention module. This module first performs global average pooling and global max pooling on the concatenated feature map to obtain channel-level statistical information. It then learns the non-linear relationship between channels through two fully connected network layers and outputs a channel weight vector. The weight vector is then compared with the original feature map. Multiplication achieves channel weighting, an attention mechanism that adaptively emphasizes important feature channels and suppresses redundant information. The spatial relationship modeling layer transforms the 2D image structure into a graph representation, where each pixel is a node in the graph, and the node feature is the multi-scale fused feature vector of that pixel. Edges are established based on the spatial location of pixels, and edges are established between adjacent pixels. The weights of the edges are calculated based on feature similarity and spatial distance. The graph convolutional network contains two graph convolutional layers. Each layer updates the representation of the current node by aggregating the features of neighboring nodes. The aggregation function uses a weighted summation, and the weights are determined by the edge weights and the learnable transformation matrix. The output dimension of the first graph convolutional layer is set to 256, and the output dimension of the second layer is 128. The graph convolution operation makes the features of each pixel... This approach integrates spatial context information from the local neighborhood, enhancing the perception of the spatial continuity of construction waste piles. The temporal coding layer processes the time-series feature vectors extracted from the temporal matrix and employs a temporal folding network structure to compress long-term series. Features from adjacent time steps are folded through convolutional operations to reduce the length of the temporal dimension. The folded features are then input into a gated recurrent unit for temporal modeling. The gated recurrent unit includes two gating mechanisms: an update gate and a reset gate. The update gate controls the degree of retention of historical information, while the reset gate controls the impact of historical information on the current state. Through this gating design, the model can selectively remember long-term temporal dependencies and forget irrelevant short-term fluctuations. The hidden layer dimension of the gated recurrent unit is set to 128.The output is an encoded temporal feature vector. The output layer concatenates the outputs of the spatial relationship modeling layer and the temporal encoding layer to obtain a comprehensive feature representation that integrates spatial and temporal information. This feature is mapped to the category space through a fully connected layer. The number of neurons in the fully connected layer is equal to the number of classification categories. For construction waste recognition tasks, this is typically set to 5 to 10 categories, including construction waste, vegetation, buildings, roads, and water bodies. The Softmax activation function converts the output of the fully connected layer into a probability distribution, with each category corresponding to a probability value, and the sum of the probability values ​​is 1. The loss function uses a combination of cross-entropy loss and Focal loss. Cross-entropy loss measures the difference between the predicted probability distribution and the true label distribution. Focal loss effectively solves the class imbalance problem by reducing the weight of easily classified samples and increasing the weight of difficult-to-classify samples. The total loss function is the weighted sum of the two, with the weight coefficient set to 0.5.

[0058] The detailed steps for establishing the training dataset for the multi-scale spatial relationship perception model are as follows: First, high-resolution remote sensing image samples containing construction waste are collected from different geographical regions. Data sources include construction sites, demolition areas, and landfills in multiple cities to ensure geographical diversity and scene complexity. Real-world labeling data is established through visual interpretation by professionals combined with field surveys. Interpreters identify construction waste areas based on the spectral characteristics, texture patterns, and spatial morphology of the images. Field surveys verify the accuracy of the interpretation results through GPS positioning and on-site photography. Vectorization is performed using geographic information system software for labeling. Labeling categories include construction waste, vegetation, buildings, roads, water bodies, and bare land. The number of samples in each category is required to be relatively balanced to avoid category bias during training. The labeled high-resolution remote sensing images are then divided into 256x256 pixel segments to generate training patches. A sliding window method is used for segmentation, with a window step size of 128 pixels, ensuring 50% overlap between adjacent patches. This overlap design increases the number of training samples and provides more spatial context. For each training patch, the corresponding multi-temporal data and temporal matrix are extracted as model input. The label data is the pixel-level classification result of that patch. Data augmentation techniques are used to expand the diversity of training samples, including random rotation, horizontal flipping, vertical flipping, brightness adjustment, contrast adjustment, and noise addition. The rotation angle is set to 90 degrees, 180 degrees, and 270 degrees. The brightness adjustment range is ±20%, the contrast adjustment range is ±15%, and the standard deviation of the added Gaussian noise is 0.01 to 0.05. Data augmentation can expand the original sample size by 3 to 5 times, improving the model's generalization ability and robustness. The dataset is divided into training, validation, and test sets in a ratio of 7:2:1. The training set is used for model parameter learning, the validation set is used for hyperparameter tuning and model selection, and the test set is used for final performance evaluation. The partitioning process uses stratified sampling to ensure that the proportion of each category in each dataset is consistent with the overall sample. It is also necessary to pay attention to the independence of geographical distribution to avoid the training and test sets containing samples from the same region, preventing performance overestimation due to spatial autocorrelation.

[0059] The multi-scale spatial relationship perception model described above is suitable for solving the problem of construction waste identification because its design fully considers the multi-dimensional characteristics of remote sensing images and the difficulties in identifying construction waste. Construction waste in remote sensing images typically exhibits irregular spatial distribution, complex spectral mixing, and dynamic temporal changes. Traditional single-scale convolutional neural networks, such as ResNet or VGG networks, can only extract features from a fixed receptive field, making it difficult to simultaneously capture both local details and global morphology of construction waste. In contrast, the multi-scale fusion layer of this invention, through parallel convolutional kernels of different sizes, can extract multi-level features from local to global at the same level. This design enables the model to identify both the edge texture details of construction waste piles and... Furthermore, it can understand the overall spatial layout and relationships with the surrounding environment. Existing image classification methods, such as semantic segmentation based on fully convolutional networks, often treat pixels as independent individuals, ignoring the spatial continuity and neighborhood relationships of ground features. The graph convolutional network introduced in this invention organizes pixels into a graph structure and learns the spatial dependencies between pixels through a message passing mechanism. This graph structure modeling can effectively utilize the spatial clustering characteristics of construction waste piles and avoid misclassification of isolated pixels. Compared with traditional spatial smoothing post-processing methods, graph convolutional networks incorporate spatial relationship information during the feature learning stage, achieving end-to-end optimization. In terms of temporal modeling, existing technologies mostly use simple temporal difference or change detection methods. While existing methods can capture changes in only two periods, they struggle to depict long-term evolutionary trends. This invention's time-folding network structure and gated recurrent units can handle multi-temporal sequence data. Through a gating mechanism, it selectively memorizes important temporal patterns while forgetting short-term noise fluctuations. This design enables the model to distinguish between continuous accumulation of construction waste and temporary surface changes, improving the accuracy and stability of identification. Compared to existing deep learning methods, this invention offers several advantages. Firstly, in terms of the comprehensiveness of feature extraction, the model simultaneously considers spectral, spatial, and temporal dimensions, while existing methods typically focus on only one or two dimensions. This comprehensive feature representation better addresses the identification of construction waste. In addition to addressing complex scenarios, the model architecture is specifically designed to adapt to the characteristics of remote sensing data. For example, the convolutional network utilizes the spatial continuity of ground features, while the temporal folding network adapts to the processing requirements of long time series. These designs make the model more suitable for remote sensing application scenarios. Furthermore, regarding the effectiveness of training strategies, various techniques are employed to address the problems of data imbalance and sample scarcity, such as the Focal loss function, data augmentation, and stratified sampling. These strategies improve the robustness of the model in practical applications. Finally, in terms of the interpretability of results, through attention mechanisms and probability outputs, the model not only provides classification results but also confidence assessments, facilitating manual review and quality control.

[0060] It should be noted that the key technical ideas of this invention include four aspects: multi-temporal spectral time series modeling, multi-scale spatial relationship perception, graph-structured spatial modeling, and adaptive feature fusion mechanism. Multi-temporal spectral time series modeling achieves accurate characterization of the dynamic changes in construction waste by constructing a temporal matrix and a time-folding network. Compared to traditional dual-temporal change detection methods that can only identify differences at a single moment, this technical approach can analyze the evolution trend of the entire time series, distinguish between continuous accumulation of construction waste and seasonal surface changes. The introduction of gated cyclic units gives the model long-term memory capabilities, enabling it to capture the accumulation patterns of construction waste spanning multiple months or even a year. This time series modeling can... This significantly improves the temporal stability and anti-interference ability of recognition, avoiding misjudging short-term construction activities as construction waste dumping. The multi-scale spatial relationship perception technology, through parallel multi-scale convolutional branches and channel attention mechanisms, achieves hierarchical feature extraction from local details to global semantics. Traditional single-scale feature extraction methods either focus on local texture while ignoring the overall shape, or extract global features while losing detailed information. This technology, through adaptive weighted fusion of features from different scales, can not only identify the rough texture and material properties of construction waste surfaces, but also understand the overall outline and spatial layout of the pile. This multi-scale perception capability enables the model to recognize buildings of different sizes and shapes. The model effectively identifies waste piles, overcoming the limitations of traditional methods in scale adaptability. Graph-structured spatial modeling transforms traditional raster data into a graph representation, utilizing graph convolutional networks to learn spatial dependencies between pixels. Compared to conventional convolution operations that extract features only within local neighborhoods, graph convolution can propagate information over a larger spatial range, establishing connections between distant pixels. This is particularly important for identifying construction waste that is spatially discontinuous but belongs to the same dumping site. Graph-structured modeling also effectively utilizes the spatial clustering and shape regularity of construction waste, reducing misclassification of isolated pixels and improving the spatial consistency of classification results, compared to traditional post-processing smoothing. The proposed method incorporates spatial relationships into the feature learning stage through graph convolution, achieving more natural and effective spatial modeling. The adaptive feature fusion mechanism dynamically adjusts the weights of deep learning features and traditional spectral texture features through confidence assessment and spectral similarity index. Traditional methods typically employ fixed feature fusion strategies, which cannot be adjusted according to the characteristics of specific samples. This approach adaptively allocates weights based on the classification uncertainty and spectral feature reliability of each pixel. Pixels with significant spectral features have increased spectral weights, while pixels with significant texture features have increased texture weights. This flexible fusion strategy fully utilizes the complementary advantages of different features, improving recognition accuracy in complex scenes.

[0061] Furthermore, a second aspect of the present invention provides a computer-readable storage medium storing program instructions, which, when executed in a computer, perform the aforementioned method for identifying construction waste based on deep learning from satellite remote sensing imagery. A third aspect of the present invention provides a system for identifying construction waste based on deep learning from satellite remote sensing imagery, comprising the aforementioned computer-readable storage medium. The system can be any one of a computer, a server, or a microcontroller. The computer-readable storage medium is disposed within the system, and the system includes a microprocessor that executes the program instructions stored in the computer-readable storage medium.

[0062] Specifically, the principle of this invention is as follows: The fundamental reason why this invention can solve the problem of confusion between the spectral characteristics of construction waste and other land features lies in the technical path of multi-dimensional feature fusion and dynamic change analysis. Traditional methods relying solely on static spectral features are prone to confusion, while this invention captures the dynamic spectral change patterns of construction waste during the accumulation, cleaning, and re-accumulation processes by constructing a temporal matrix. This temporal change feature is an important identifier that distinguishes construction waste from other stable land features. The multi-scale spatial relationship perception model extracts multi-level features from local details to global context through convolutional branches of different receptive fields, solving the problem of irregular spatial distribution of construction waste. The graph convolutional network modeling the spatial neighborhood relationships between pixels further enhances the spatial consistency constraint. The introduction of the spectral similarity index improves the accuracy of spectral matching by quantifying the weighted combination of spectral angle, correlation coefficient, and Euclidean distance. When the spectral similarity index is greater than 0.85, the mechanism of increasing the weight of texture features effectively compensates for the deficiencies of spectral features. The improved random forest ensemble learning algorithm uses a weighted voting strategy to fuse multi-scale features and traditional spectral and texture features. Through a sample weighting mechanism, more weights are assigned to difficult samples, improving the classification ability for complex scenes. Hybrid pixel decomposition processing utilizes a linear spectral mixture model to calculate the abundance information of construction waste, and combines spatial neighborhood consistency constraints to optimize the decomposition results, achieving accurate identification at the sub-pixel level. The confidence assessment mechanism calculates classification probability and uncertainty indicators; when the confidence level is below 0.75, manual verification is initiated to ensure the reliability of the identification results.

[0063] The following provides a specific embodiment 1 of the present invention, and the specific implementation of each step in this embodiment 1 is described in detail below.

[0064] In this embodiment, the specific implementation of steps S01-S02 is the same as described above, and will not be repeated in detail here.

[0065] The specific implementation of step S03 is as follows: Samples of known construction waste areas are collected through a combination of field surveys and visual interpretation of high-resolution images. A portable ground object spectrometer is used to measure the spectral reflectance curves of different types of construction waste on-site, including the spectral characteristics of major components such as concrete blocks, bricks, and reinforcing bars. The measurement data is matched and calibrated with the corresponding satellite image spectra to establish the conversion relationship between ground and satellite spectra. The spectral response characteristics of construction waste in the visible to near-infrared band range are analyzed, and the most sensitive band combinations for construction waste identification are extracted. By calculating characteristic parameters such as ratios, differences, and normalization indices between different bands, a feature space that enhances the spectral difference between construction waste and background features is constructed. The calculation of the spectral similarity index comprehensively considers three dimensions: spectral angle, correlation coefficient, and Euclidean distance, as specifically expressed below:

[0066] ;

[0067] In the formula, This is the spectral similarity index, with a value ranging from 0 to 1; The angle between the spectral vector of the pixel to be identified and the spectral vector of standard construction waste, in radians; The maximum spectral angle is usually set to . radian; The Pearson correlation coefficient between the spectral curve of the pixel to be identified and the standard spectral curve is given, with a value ranging from -1 to 1. The maximum value of the correlation coefficient is set to 1; The normalized Euclidean distance is dimensionless. The maximum Euclidean distance in the sample set, expressed as a percentage. The minimum Euclidean distance in the sample set, usually close to 0, is expressed as a percentage. , , These are weighting coefficients, corresponding to the weights of the spectral angle, correlation coefficient, and Euclidean distance, respectively. Their values ​​range from 0 to 1 and satisfy the following conditions: Dimensionless; , , These are the corresponding standardized parameters used for dimensionless processing, all set to 1. The method for obtaining these parameters is as follows: By calculating the spectral vector of the pixel to be identified Compared with standard construction waste spectral vector The angle between the two sides is obtained by the following formula: ,in and All dimensional vector, This refers to the number of spectral bands, typically 4 to 8. The correlation coefficient is calculated using the Pearson correlation coefficient formula, i.e. ,in and The first Spectral values ​​for each band, in percentage form. and These are the corresponding spectral mean values, in percentage form. Calculated using the normalized Euclidean distance formula, i.e. Units are percentages; weighting coefficients , , The classification performance of the validation samples was determined using a grid search method, with typical values ​​of 0.4, 0.35, and 0.25.

[0068] The specific implementation of step S04 is as follows: An endmember extraction algorithm is used to identify spectral endmembers representing pure ground features from multi-temporal satellite remote sensing images. This algorithm is based on the theory of mixed pixel linear decomposition. First, principal component analysis is used to reduce the dimensionality of the high-dimensional spectral data. In the reduced feature space, convex volume geometric analysis is used to find extreme points located at the data distribution boundary. The pure pixel index is used to screen candidate endmembers by evaluating the convexity characteristics of the spectral curve of each pixel. The calculation formula is as follows:

[0069] ;

[0070] In the formula, The Pure Pixel Index has a value ranging from 0 to 1. The larger the value, the closer the pixel is to the pure spectral characteristics. It is dimensionless. For the first In a random projection, the extremum determination index of a pixel in the projection direction is used when the pixel is located at a projection extremum position. ,otherwise Dimensionless; This is the total number of random projections, typically set to 5000 to 10000. The number of times the theoretical maximum extremum is determined is equal to... ; The number of times the theoretical minimum extremum is determined is 0. The parameter is obtained as follows: Obtained through the following steps, step 1 is in A unit vector is randomly generated in the 3D spectral feature space as the projection direction, where The dimension of the feature space after dimensionality reduction is typically 3 to 5. Step 2 involves projecting the spectral vectors of all pixels into this direction. Step 3 involves calculating the projection value for each pixel. Step 4 involves determining the pixels with the largest or smallest projection values ​​as extreme value pixels and setting them accordingly. The remaining pixels Step 5 is to repeat steps 1 to 4. Next; the pure pixel index threshold is set to 0.8, that is... Pixels with a value greater than 0.8 are selected as candidate endmembers; spectral angle matching technology calculates the spectral angle between the measured spectral vector and the reference endmember spectral vector. The formula is ,in The spectral vector of the pixel to be measured. For the endmember spectral vector, when If the spectral similarity is less than 0.1 radians, the two are considered to have high spectral similarity and are determined to be the same land cover type.

[0071] The specific implementation of step S05 is to establish a temporal phase matrix to characterize the spectral evolution characteristics of the same spatial location at different times. The formula for constructing the temporal phase matrix is ​​as follows:

[0072] ;

[0073] In the formula, Let be the temporal phase matrix at a certain pixel location, with dimension . ; For the first Phase 1 Spectral reflectance values ​​for each band, in percentage form; The number of time phases is determined based on the monitoring cycle and time phase interval, and is usually between 12 and 24 time phases. This refers to the number of spectral bands, typically 4 to 8 bands for multispectral imagery. The formula for calculating the phase difference is:

[0074] ;

[0075] In the formula, For the first The first phase and the second Between the first and second phases The normalized spectral difference of the band is dimensionless. For the first Phase 1 Spectral reflectance of the band, expressed as a percentage; For the first Phase 1 Spectral reflectance of the band, expressed as a percentage; For the first The reference reflectance for a spectral band is the average of the reflectance across all time phases, expressed as a percentage. The formula for calculating the spectral variation rate is:

[0076] ;

[0077] In the formula, For the first The spectral variation rate of the band is dimensionless. This represents the normalized spectral difference between adjacent time phases. The total number of time phases; For the first The standard deviation of the spectral reflectance for the specified band is used for normalization and is expressed as a percentage. The parameter is obtained as follows: The reflectance values ​​of pixels corresponding to the time and band are extracted directly from the preprocessed remote sensing images; By calculating all time phases The arithmetic mean of the band reflectance is obtained, and the calculation formula is: The unit is percentage; Through calculation The result is a percentage; time series feature vector. It consists of statistical parameters such as spectral mean, standard deviation, amplitude of variation, and frequency of variation, specifically... ,in The normalized mean of the spectra across all bands and time phases is calculated using the following formula: , The maximum reflectance value across all time phases and all bands, expressed as a percentage. The normalized value of the global spectral standard deviation is calculated using the following formula: Dimensionless; The average rate of change across all bands is calculated using the following formula: Dimensionless; This is the normalized value for the number of times the peak appears in the spectral curve, calculated using the following formula: ,in The total number of detected peaks, where a peak is defined as satisfying... and The time point is dimensionless.

[0078] The specific implementation method of step S06 is the same as described above, and will not be repeated in detail here.

[0079] The specific implementation of step S07 is as follows: an improved random forest ensemble learning algorithm is designed as the final classification decision module. This algorithm constructs multiple decision trees to form a forest structure. The splitting criterion adopts the Gini impurity index, and the calculation formula is as follows:

[0080] ;

[0081] In the formula, For nodes The normalized Gini impurity, which ranges from 0 to 1, with smaller values ​​indicating higher node purity, is dimensionless. Total number of categories; For nodes The Middle The proportion of samples of each class is calculated using the following formula: ,in For the node Number of class samples The total number of samples in the node is dimensionless. The formula for calculating the sample weight mechanism is:

[0082] ;

[0083] In the formula, For the first Normalized weights for each sample, dimensionless; For the first The cumulative classification accuracy of a sample in the previous iteration, ranging from 0 to 1, is dimensionless; This is a weighting adjustment parameter, usually set to 5 to 10, and is dimensionless; The total number of training samples; It is a natural exponential function. The fusion formula for the weighted voting strategy is:

[0084] ;

[0085] In the formula, For category The final normalized predicted probability is dimensionless; Deep learning feature pairs output by a multi-scale spatial relationship perception model for classifying spatial relationships The predicted probability is dimensionless; For traditional spectral texture features to categories The predicted probability is dimensionless; The maximum value of the predicted probability for deep learning features; This represents the maximum predicted probability for traditional features. and The fusion weighting coefficient has a value range of 0 to 1, and is typically... Set to 0.6 to 0.8. Setting it to 0.2 to 0.4 satisfies the requirements. Dimensionless; and All parameters are set to 1 for normalization.

[0086] The specific implementation of step S08 is to establish a confidence assessment mechanism for construction waste identification to quantify the reliability of the classification results. The formula for calculating the classification confidence is as follows:

[0087] ;

[0088] In the formula, This is the normalized classification confidence level, with a value ranging from 0 to 1, and is dimensionless. The maximum predicted probability among all categories, dimensionless; The second highest predictive probability, dimensionless; The minimum prediction probability is dimensionless. The uncertainty index is calculated using the following formula:

[0089] ;

[0090] In the formula, This is a normalized uncertainty index, with a value ranging from 0 to 1. The larger the value, the higher the uncertainty. It is dimensionless. The total number of decision trees in the random forest; For the first Each decision tree encodes the voting category of the cell, with a value ranging from 1 to... A dimensionless integer; The average value encoded for all decision tree voting categories, dimensionless; The maximum value for the category code is equal to ; This is the minimum value for the category code, equal to 1. The parameter is obtained as follows: and It is obtained by sorting the probabilities of each category output by the random forest; For the first The encoded values ​​corresponding to the predicted categories of each decision tree; when When the spectral similarity index is below 0.75, a manual verification process is initiated; when the spectral similarity index... When the value is greater than 0.85, the texture feature weight is adjusted from the initial value of 0.3 to 0.5.

[0091] The specific implementation of step S09 is to decompose the mixed pixel problem caused by spatial resolution limitations, and the calculation formula of the linear spectral mixing model is as follows:

[0092] ;

[0093] In the formula, Let be the observed spectral reflectance vector of the mixed pixels, with dimension . , Number of bands; This refers to the number of endmembers, typically 4 to 8; For the first The abundance value of each endmember in the mixed pixel ranges from 0 to 1 and is dimensionless. For the first The spectral reflectance vector of each endmember has a dimension of The unit is percentage; This is the model error term, with dimension 1. The unit is percentage. The constraints are:

[0094] ,and , ;

[0095] Abundance estimation is solved using the nonnegativity-constrained least squares method, with the objective function being:

[0096] ;

[0097] In the formula, Let be the abundance vector to be solved, with dimension . Dimensionless; Denotes the Euclidean norm; This is a spatial smoothing regularization parameter, ranging from 0.1 to 0.5, and is dimensionless. For pixels The spatial neighborhood set; For pixels The abundance vector, with dimension Dimensionless; For neighboring pixels The abundance vector, with dimension Dimensionless; This represents the number of neighboring pixels. This represents the maximum abundance value, which is equal to 1. The parameter is obtained as follows: The spectral reflectance of the mixed pixels is extracted directly from the preprocessed image; Provided by the endmember spectral library extracted in step S04; abundance vector The solution is obtained through an iterative optimization algorithm, with the initial value set as follows: ; The decomposition accuracy of the validation samples was determined using cross-validation, with a typical value of 0.3; Construction waste abundance. The component corresponding to the construction waste endmember in the abundance vector is dimensionless and not a new variable independent of the objective function. The objective function optimizes the abundance vector as a whole by minimizing the fitting error between the observed spectrum of the mixed pixel and the weighted combined spectrum of each endmember, and by incorporating spatial neighborhood consistency constraints, to obtain the abundance proportion of each endmember in the mixed pixel. When an endmember in the endmember spectral library is identified as a construction waste endmember, the corresponding component of that endmember in the obtained abundance vector is the construction waste abundance. This is used to subsequently compare with the abundance threshold and determine whether the pixel belongs to the construction waste area. This setting avoids creating a separate construction waste abundance variable in the objective function, while ensuring that the construction waste abundance and other endmember abundances are solved jointly under the same constraints, thereby improving the physical consistency and quantitative reliability of the mixed pixel decomposition results.

[0098] The specific implementation of step S10 is to generate the final identification result based on the abundance threshold and spatial aggregation parameter of construction waste, with the following judgment criteria:

[0099] ;

[0100] In the formula, The determination result for this pixel is 1, which represents construction waste, and 0 represents non-construction waste. It is dimensionless. The abundance value of construction waste is dimensionless. The abundance threshold is set to 0.6, which is dimensionless. The area of ​​the connected region is expressed in square meters. The area threshold is set to 100 square meters. The parameter acquisition method is as follows: Calculated by step S09; Through eight-neighbor connectivity analysis, all those satisfying... Adjacent cells are grouped into the same connected region, and the area of ​​the connected region is the number of cells in the region multiplied by the actual area of ​​a single cell; the vectorized boundary is extracted by the contour tracing algorithm, and the attribute information includes geometric parameters such as the area, perimeter, and shape index of each stack.

[0101] It needs further explanation that the principle of the spectral similarity index formula is based on multi-dimensional spectral matching theory. It comprehensively characterizes the similarity of spectral features by comprehensively evaluating the shape similarity, consistency of variation trends, and numerical closeness of spectral curves. This formula uses a weighted fusion strategy to combine the spectral angle term... Correlation coefficient term and Euclidean distance term Three complementary similarity metrics are integrated into a unified evaluation index. Spectral angle is sensitive to the shape of the spectral curve but not to changes in brightness; correlation coefficient reflects the synchronicity of spectral change trends; and Euclidean distance measures the absolute difference in spectral values. The combination of these three metrics overcomes the limitations of a single index. Compared to traditional methods that only use spectral angle or only use Euclidean distance, this formula can more accurately identify the spectral differences between construction waste and other ground features, especially maintaining high recognition accuracy under varying lighting conditions or sensor response differences. Dimensionless processing ensures the comparability of different dimensional parameters by normalizing each item, enabling the similarity index to have consistent discrimination ability across different scenarios and data sources, thereby improving the stability and reliability of construction waste identification.

[0102] The pure pixel index formula is based on convex geometry theory. It evaluates the spectral purity of a pixel by statistically analyzing the frequency of its extreme positions in multiple random projections. The core idea is that pure endmembers are usually located at the vertices or edges of the convex hull of the data distribution in the high-dimensional spectral feature space. When projected along random directions, these extreme points are more likely to become projection extrema than mixed pixels. The statistical results of a large number of random projections can effectively distinguish between pure pixels and mixed pixels. Compared with traditional endmember extraction methods based on spectral variance or clustering, this formula directly locates pure endmembers through geometric analysis without relying on band selection or clustering parameter settings, thus avoiding parameter sensitivity issues. Dimensionless processing normalizes the number of projections, making the index value uniformly interpretable. This provides reliable pure endmembers for construction waste identification. These pure endmembers serve as the basis for mixed pixel decomposition and directly affect the accuracy of subsequent abundance estimation, thereby improving the overall identification effect.

[0103] The spectral change rate formula further statistically analyzes the change amplitude of the entire time series. Standard deviation normalization ensures the comparability of the change rate. Compared with the traditional dual-temporal change detection method, which can only identify the differences between two periods, this set of formulas can comprehensively depict the temporal evolution of the construction waste accumulation process. In particular, by analyzing multi-temporal spectral change patterns, it can effectively distinguish between the continuous accumulation of construction waste and seasonal changes in vegetation or short-term construction activities, thereby improving the temporal stability and anti-interference ability of the identification. The dimensionless design makes the temporal features consistent in discrimination ability across different image sources and different geographical regions.

[0104] The Gini impurity formula and sample weighting mechanism constitute the core decision-making mechanism of the improved random forest. The splitting of the decision tree is guided by measuring the degree of class confusion in the nodes. Normalization ensures the comparability of indicators under different class numbers, enabling the algorithm to split effectively even in class imbalance scenarios. The sample weighting formula uses an exponential function to assign higher weights to samples that are difficult to classify. Compared to traditional equal-weighted random forests, this mechanism makes the model pay more attention to difficult-to-distinguish boundary samples during training, thereby improving the recognition ability of complex scenes. The weighted voting fusion formula integrates the prediction results of deep learning features and traditional features through normalization, avoiding fusion bias caused by differences in feature scales. Compared to simple average or maximum value fusion methods, this weighting strategy can adaptively adjust the contribution of different features, fully leveraging the high-level semantic expression ability of deep learning features and the physical interpretability of traditional features, thus achieving better classification performance in construction waste recognition tasks. Especially in cases of fuzzy spectral features or complex spatial structures, multi-feature collaborative decision-making can significantly reduce the misclassification rate.

[0105] When the predicted probabilities of multiple categories are close, the confidence level decreases, indicating the need for manual verification. The uncertainty index formula evaluates classification stability from the perspective of ensemble learning by statistically analyzing the degree of divergence in voting among different decision trees in a random forest. Normalization eliminates the influence of category encoding values, making the index consistent in its interpretability across different category systems. Compared to traditional methods that only output classification labels, this set of formulas provides a quantitative description of the result's credibility. In particular, by setting a confidence threshold to trigger a manual verification mechanism, the quality of construction waste identification can be effectively controlled, avoiding the misuse of low-confidence results. At the same time, the strategy of dynamically adjusting feature weights based on the spectral similarity index enables the system to optimize decisions based on the feature reliability of different pixels, thereby maintaining stable recognition performance in complex urban environments.

[0106] A linear spectral mixing model and its optimization objective function establish a quantitative decomposition framework for mixed pixels. Based on optical linear mixing theory, this model assumes that the reflectance spectrum of a pixel is an area-weighted linear combination of the endmember spectra. Abundance and non-negativity constraints are introduced to ensure the physical rationality of the decomposition results. The optimization objective function consists of two main parts: the first is a spectral fitting term, which ensures consistency between the decomposition results and the observed spectra; the second is a spatial smoothing regularization term, which utilizes the spatial continuity assumption of ground features by penalizing the abundance differences between adjacent pixels. The regularization parameter... By controlling the balance between spectral fitting accuracy and spatial smoothness, this objective function, compared to traditional unconstrained least squares methods or decomposition methods that only consider the spectral dimension, integrates constraints from both spectral and spatial dimensions. This allows it to suppress decomposition noise while ensuring spectral fitting accuracy, thus improving the stability of abundance estimation. In particular, for features like construction waste with irregular boundaries but relatively uniform interiors, spatial constraints effectively reduce fragmentation, resulting in more reliable information on the area proportion of construction waste. Dimensionless processing, by normalizing the fitting error and spatial difference term, ensures the consistency of the optimization objective across different pixels and scenarios, making the abundance estimation results more comparable and interpretable.

[0107] The judgment criterion formula integrates abundance information and spatial clustering into a binary recognition decision. This formula filters pixels dominated by construction waste by setting an abundance threshold and filters sporadic noise points by using an area threshold. The synergistic constraint of the two thresholds effectively balances the accuracy and completeness of the recognition. Compared with traditional methods based solely on spectral classification or abundance judgment, this criterion fully utilizes the spatial clustering characteristics of construction waste piles, avoiding misjudging isolated pixels or small temporary piles as target objects. At the same time, connectivity analysis ensures the spatial coherence of the recognition results, thereby achieving accurate positioning and boundary extraction of construction waste in complex urban environments. This judgment strategy based on abundance and spatial constraints makes the recognition results more consistent with the actual distribution patterns of construction waste, providing reliable data support for the supervision and cleanup work of urban management departments.

[0108] To better understand and implement this invention, the following is a specific application scenario of this invention, Example 2:

[0109] A technical team needs to design a 25-acre area in the suburbs of a city. The team conducted construction waste identification and monitoring in the target area. They first acquired multi-temporal satellite remote sensing imagery of the area, selecting high-resolution imagery with a spatial resolution of 0.8 meters and setting the temporal interval to 20 days, acquiring a total of 18 imagery periods covering 12 months. These images encompassed different weather conditions across the four seasons, ensuring the capture of changes in construction waste accumulation under varying environmental conditions. The total amount of raw imagery acquired reached 1.2 million cubic meters. It contains information from 8 spectral bands.

[0110] During the preprocessing stage, the technical team performed atmospheric, geometric, and radiometric corrections on the acquired multi-temporal satellite remote sensing image data. A temporal registration algorithm was used to unify the image data from 18 different time periods into the UTM coordinate system, establishing pixel-level temporal correspondences. The registration accuracy reached 0.3 pixels, ensuring the accuracy of subsequent temporal analysis. The preprocessed image data met the analytical requirements in terms of both spectral and geometric consistency.

[0111] When constructing the database of spectral characteristics of construction waste, the technical team collected spectral reflectance curves from known construction waste areas, gathering spectral data from 156 construction waste sample points. Analysis revealed that construction waste exhibits distinct spectral characteristics in the near-infrared band (760-900nm) and the short-wave infrared band (1550-1750nm). The team extracted 12 characteristic band combinations, calculated the spectral similarity index, and established a spectral difference pattern library encompassing major land cover types such as construction waste, vegetation, soil, and water bodies.

[0112] Using an endmember extraction algorithm, the team identified clean spectral endmembers from multi-temporal image data. A clean pixel index algorithm was then used to select 2847 pixels with high spectral purity. Endmember spectra for six major land cover types were determined using spectral angle matching technology. Among these, the construction waste endmember spectrum exhibited relatively low reflectance in the visible light band and high reflectance in the near-infrared band. A clean pixel index threshold of 0.65 was set, effectively identifying representative endmember spectra.

[0113] A temporal phase matrix was established to characterize the spectral variation features of the same pixel location at different times. The technical team constructed an 18×8-dimensional temporal phase matrix, where rows represent the time dimension and columns represent the spectral band dimension. Through temporal phase difference analysis and trend detection, the dynamic spectral change patterns during the accumulation of construction waste were identified. Figure 2 As shown, the analysis revealed that the spectral changes in the construction waste area exhibited distinct phased characteristics. Newly accumulated construction waste showed more dramatic spectral changes in the short term, which stabilized after weathering. The technical team extracted a 72-dimensional time-series feature vector, effectively capturing the temporal variation characteristics of the construction waste.

[0114] When constructing a multi-scale spatial relationship perception model, the technical team used a channel attention mechanism to adaptively weight and fuse features at different scales, with weight coefficients of 0.4, 0.35, and 0.25, respectively.

[0115] During the establishment of the training dataset, the technical team collected a total of 3,200 high-resolution remote sensing images containing construction waste from different regions. Real-world labeling data was established through visual interpretation and field surveys. The image samples were divided into 15,680 training patches of 256×256 pixels each. Each training patch was labeled with information on six types of land features: construction waste, vegetation, buildings, roads, water bodies, and bare land. Data augmentation techniques, including rotation, flipping, and brightness adjustment, were used to expand the number of training patches to 47,040. The dataset was then divided into training, validation, and test sets in a 7:2:1 ratio.

[0116] As shown in Table 1, the distribution of the number of samples in different categories is basically balanced.

[0117] Table 1. Distribution of Sample Quantities for Each Category in the Training Dataset

[0118]

[0119] During model training, the technical team used a stochastic gradient descent optimizer for parameter updates, setting the initial learning rate to 0.001, the batch size to 32, and the training epochs to 200. Cross-entropy loss combined with Focal Loss was used to address class imbalance, with the Focal Loss focusing parameter γ set to 2.0 and the balancing parameter α set to 0.25. L2 regularization was introduced to prevent overfitting, with a regularization coefficient of 0.0001. An early stopping mechanism was implemented, halting training when the validation set accuracy showed no improvement for 10 consecutive epochs; actual training stopped at epoch 156. A learning rate decay strategy was employed, multiplying the learning rate by 0.5 every 50 epochs, and mixed precision calculations were used during training to improve efficiency.

[0120] An improved random forest ensemble learning algorithm was designed as the classification decision module. The technical team constructed 300 decision trees to form a forest structure. During training, each decision tree randomly selected 70% of its feature subset and 80% of its sample subset, using Gini impurity as the splitting criterion, and setting the maximum tree depth to 15. A sample weighting mechanism was introduced, assigning a 1.5x weight to difficult samples. The quality of individual decision trees was evaluated using out-of-bag error, and 26 poorly performing decision trees were removed. For final prediction, a weighted voting method was used to fuse the prediction results of all decision trees, with weights determined based on the accuracy and diversity metrics of individual trees.

[0121] A confidence assessment mechanism for construction waste identification was established, with the technical team calculating the classification probability and uncertainty index for each pixel. A confidence threshold of 0.75 was set; when the confidence level fell below this threshold, a manual verification process was initiated. In practical application, 1847 pixels had confidence levels below the threshold, and manual verification confirmed that 1203 of these pixels indeed exhibited classification uncertainty. When the spectral similarity index exceeded 0.85, the weight of texture features was increased to 1.3 times the original weight, improving the accuracy of identifying complex areas.

[0122] When performing mixed-pixel decomposition, the technical team used a linear spectral mixing model to calculate the abundance information of construction waste in the mixed pixels. For example... Figure 3 As shown, the decomposition process considered the contributions of six endmembers, with constraints requiring the sum of the abundances of each endmember to equal 1 and each abundance to be non-negative. Combining spatial neighborhood consistency constraints to optimize the decomposition results, the spatial smoothing parameter was set to 0.15, effectively reducing the spatial fragmentation of the classification results. Figure 6 As shown, the abundance values ​​of construction waste obtained from hybrid pixel decomposition exhibit significant spatial differentiation characteristics in different regions, with high abundance areas mainly concentrated around industrial parks and demolition areas. The output spatial distribution probability map of construction waste clearly displays the distribution location and abundance information of each suspected area.

[0123] The final identification results were generated based on the abundance threshold and spatial aggregation parameters of construction waste. The technical team set the abundance threshold to 0.6 and the connected region area threshold to 100. .like Figure 4 As shown, a total of 42 construction waste accumulation areas were identified within the target area, covering a total area of ​​8.7 square kilometers. The largest area reached 1.2. It is located near a demolition site. For example... Figure 7 As shown, by analyzing the changes in recognition results under different abundance thresholds, when the threshold increases from 0.4 to 0.8, the number of identified construction waste areas shows a decreasing trend, but the average accuracy shows an increasing trend. Finally, 0.6 was selected as the optimal threshold, ensuring high accuracy while avoiding excessive omissions. The technical team output a statistical report on vectorized boundaries and attribute information, providing precise spatial positioning information for subsequent monitoring and cleanup work.

[0124] As shown in Table 2, the distribution statistics of construction waste areas of different sizes reflect the characteristics of construction waste accumulation in the area.

[0125] Table 2. Statistical Table of Regional Distribution of Construction Waste

[0126]

[0127] To verify the reliability of the identification results, the technical team conducted on-site investigations. For example... Figure 5 As shown, on-site verification was conducted on 42 identified areas using a GPS positioning system. The results showed that 39 of these areas did indeed have construction waste accumulation, achieving an accuracy rate of 92.9%. The three misidentified areas were mainly due to the spectral similarity between bare land and construction waste. Additionally, the on-site investigation also uncovered two smaller areas (less than 100 square meters). Construction waste sites were not identified by the system; these missed sites all ranged in size from 50 to 80 square meters. The area between these values ​​meets the set area threshold constraint.

[0128] It should be noted that the variables involved in this invention are explained in detail in Tables 3 and 4 below.

[0129] Table 3. Variable Explanation Table (Part 1)

[0130]

[0131] Table 4. Variable Explanation Table (Part Two)

[0132]

[0133] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention.

Claims

1. A method for identifying construction waste based on deep learning from satellite remote sensing imagery, characterized in that, This includes acquiring multi-temporal satellite remote sensing image data of the target area, preprocessing the multi-temporal satellite remote sensing image data, constructing a database of spectral features of construction waste, identifying pure spectral endmembers using an endmember extraction algorithm, establishing a temporal matrix to represent spectral change features, constructing a multi-scale spatial relationship perception model, designing an improved random forest ensemble learning algorithm as a classification decision module, establishing a confidence assessment mechanism for construction waste identification, performing mixed pixel decomposition processing, and generating identification results based on construction waste abundance thresholds and spatial clustering parameters.

2. The method according to claim 1, characterized in that, The steps for acquiring multi-temporal satellite remote sensing image data of the target area are as follows: select high-resolution images with a spatial resolution in the range of [0.5, 2] meters, and set the temporal interval to the range of [15, 30] days to ensure coverage of changes in the state of construction waste accumulation under different seasons and weather conditions.

3. The method according to claim 2, characterized in that, The preprocessing steps for multi-temporal satellite remote sensing image data specifically include atmospheric correction, geometric correction, and radiometric correction. A temporal registration algorithm is used to unify multi-temporal satellite remote sensing image data from different periods into the same coordinate system and establish a pixel-level temporal correspondence.

4. The method according to claim 3, characterized in that, The steps to construct a database of spectral characteristics of construction waste are as follows: collect spectral reflectance curves of known construction waste areas, extract characteristic band combinations, calculate spectral similarity indices, and establish a database of spectral difference patterns between construction waste and background features.

5. The method according to claim 4, characterized in that, The steps for identifying pure spectral endmembers using endmember extraction algorithms are as follows: a pure pixel index algorithm is used to screen pixels with high spectral purity, and spectral angle matching technology is used to determine the endmember spectra of the main land cover types, such as construction waste, vegetation, soil, and water bodies.

6. The method according to claim 5, characterized in that, The steps to establish a temporal matrix to characterize spectral changes are as follows: through temporal difference analysis and trend detection, the dynamic spectral change patterns during the accumulation of construction waste are identified, and time series feature vectors are extracted.

7. The method according to claim 6, characterized in that, The steps to construct a multi-scale spatial relationship perception model are as follows: using multiple convolutional branches with different receptive fields to extract multi-level features from local details to global context, modeling spatial neighborhood relationships between pixels through graph convolutional networks, and using a time-folding network structure to encode time-series feature vectors.

8. The method according to claim 7, characterized in that, The multi-scale spatial relationship perception model comprises an end-to-end neural network architecture consisting of a feature extraction layer, a multi-scale fusion layer, and a spatial relationship modeling layer. The feature extraction layer uses residual convolutional blocks to extract basic spectral and texture features. The multi-scale fusion layer forms multiple branches through parallel processing using three convolutional kernels of different sizes. The spatial relationship modeling layer constructs a spatial adjacency graph using pixels as graph nodes.

9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores program instructions, which, when executed in a computer, are used to perform the method for identifying construction waste based on deep learning from satellite remote sensing images as described in any one of claims 1-8.

10. A system for identifying construction waste based on deep learning from satellite remote sensing imagery, characterized in that, The system comprises the computer-readable storage medium of claim 9, wherein the system is a computer, the computer-readable storage medium is disposed within the system, and the system is provided with a microprocessor that executes program instructions stored in the computer-readable storage medium.