Soil profile data self-adaptive cutting method based on deep learning

By aligning multi-source soil profile data using deep learning methods and training models with cross-modal contrastive learning and mask sequence strategies, boundary probability heatmaps and uncertainty quantification maps are generated. This solves the problems of multimodal data fusion and data gaps, and achieves high-precision adaptive cutting and reduces manual processing.

CN122244441APending Publication Date: 2026-06-19INST OF AGRI RESOURCES & REGIONAL PLANNING CHINESE ACADEMY OF AGRI SCI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
INST OF AGRI RESOURCES & REGIONAL PLANNING CHINESE ACADEMY OF AGRI SCI
Filing Date
2026-03-23
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to effectively integrate multimodal soil profile data and lack robustness to data incompleteness, resulting in limited cutting accuracy and requiring extensive manual post-processing.

Method used

By using deep learning methods, multi-source digital data is collected and aligned to construct a multimodal alignment dataset. A general feature encoder is pre-trained using cross-modal contrastive learning, and a cutting model is trained by combining a mask sequence strategy. The output boundary probability heatmap and uncertainty quantification map are generated, and the final soil boundary set is generated by receiving manual correction instructions through visualization.

Benefits of technology

It achieves high-precision, adaptive soil profile cutting under conditions of multimodal data fusion and data loss, improving cutting accuracy and reducing the need for manual processing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244441A_ABST
    Figure CN122244441A_ABST
Patent Text Reader

Abstract

This invention discloses an adaptive soil profile data segmentation method based on deep learning, belonging to the field of soil informatics technology. The method includes: collecting multi-source digitized data of soil profiles and aligning them based on depth to obtain a multimodal aligned dataset; pre-training the unlabeled data portion of the multimodal aligned dataset using cross-modal contrastive learning to obtain a general feature encoder; combining the general feature encoder with a task module and training the combined task module using a mask sequence strategy to obtain a soil profile segmentation model; inputting the original multimodal data of the soil profile to be segmented into the soil profile segmentation model for inference, and outputting a boundary probability heatmap and an uncertainty quantization map of the soil profile; and visualizing the boundary probability heatmap and uncertainty quantization map. This invention solves the problems of insufficient deep feature mining and sensitivity to incomplete data in existing technologies.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of soil informatics technology, and in particular to an adaptive cutting method for soil profile data based on deep learning. Background Technology

[0002] In the field of environmental monitoring, automatic analysis of soil profile data is a core component in building digital soil models. Existing technologies typically employ fixed threshold segmentation combined with sliding window analysis, which identifies boundaries by setting empirical thresholds for physicochemical indicators and detecting abrupt changes in local statistical characteristics. While these methods have some degree of automation, their performance heavily relies on preset thresholds, making it difficult to adapt to differences in soil properties across different geographical environments. Furthermore, they are sensitive to data noise and missing values, and are prone to misjudgment when the quality of field data is poor.

[0003] Existing technologies are essentially shallow data processing, failing to effectively uncover deep correlations in multimodal data and lacking model-level inference capabilities for incomplete data. Consequently, in real-world scenarios where labeled data is scarce and data quality is variable, existing methods have limited segmentation accuracy and still require extensive manual post-processing. Solving the problems of multimodal data fusion and robustness to missing data falls within the interdisciplinary fields of soil informatics and deep learning. Summary of the Invention

[0004] In view of the aforementioned existing problems, the present invention is proposed.

[0005] Therefore, this invention provides an adaptive cutting method for soil profile data based on deep learning to solve the problems of existing technologies that are difficult to effectively integrate multimodal data and lack robust inference capabilities for missing data.

[0006] To solve the above-mentioned technical problems, the present invention provides the following technical solution: In a first aspect, the present invention provides an adaptive cutting method for soil profile data based on deep learning, which includes collecting multi-source digital data of soil profiles and aligning them based on depth to obtain a multimodal aligned dataset. The unlabeled data portion of the multimodal aligned dataset is pre-trained using cross-modal contrastive learning to obtain a general feature encoder; By combining a general feature encoder with a task module and using a mask sequence strategy to train the combined task module, a soil profile cutting model is obtained. The original multimodal data of the soil profile to be cut is input into the soil profile cutting model for inference, and the boundary probability heat map and uncertainty quantification map of the soil profile are output. Visualize the boundary probability heatmap and uncertainty quantification map, receive manual correction instructions, and generate the final soil layer boundary set. Based on the final soil layer boundary set, the original high-density data of the corresponding soil layers are resampled and processed to generate a standardized soil profile stratification physicochemical property database.

[0007] As a preferred embodiment of the deep learning-based adaptive cutting method for soil profile data described in this invention, the method includes the following steps: collecting multi-source digitized data of soil profiles and aligning them based on depth to obtain a multimodal aligned dataset: Time synchronization is performed on multi-source digital data of soil profiles to obtain time-synchronized multi-source digital data; Spatial registration is performed on the time-synchronized multi-source digital data. The sampling points of each data stream are matched based on the depth coordinates to obtain the spatially registered multi-source digital data. The sampling interval of the spatially registered multi-source digital data is standardized to obtain multi-source digital data with standard intervals. The multi-source digital data with standard intervals is then integrated into a multimodal aligned dataset indexed by depth.

[0008] As a preferred embodiment of the deep learning-based adaptive segmentation method for soil profile data described in this invention, the method involves pre-training the unlabeled data portion of the multimodal aligned dataset using a cross-modal contrastive learning approach to obtain a general feature encoder, including the following steps: From the unlabeled data portion of the multimodal alignment dataset, spectral sequences and elemental content sequences from the same profile are extracted as positive sample pairs, and sequences from different profiles are extracted as negative sample pairs to construct a cross-modal contrastive learning sample pair set. The optimized dual-branch neural network is obtained by using a bi-branch neural network and a contrastive loss function to optimize the training of cross-modal contrastive learning sample sets. By fixing the parameters of the feature extraction part in the optimized dual-branch neural network, a general feature encoder is obtained.

[0009] As a preferred embodiment of the deep learning-based adaptive soil profile data cutting method of the present invention, the method includes the following steps: combining a general feature encoder with a task module, and training the combined task module using a mask sequence strategy to obtain a soil profile cutting model. The general feature encoder is connected to the task module to form an untrained soil profile cutting model; Masked training data is generated by applying a mask sequence strategy to the labeled data portion of the multimodal aligned dataset. The masked training data is input into the untrained soil profile cutting model, and the parameters of the task module in the untrained soil profile cutting model are updated by minimizing the combined loss of boundary prediction loss and feature reconstruction loss. After the training process is completed, the untrained soil profile cutting model becomes the soil profile cutting model.

[0010] As a preferred embodiment of the deep learning-based adaptive soil profile data cutting method of the present invention, the method includes the following steps: inputting the original multimodal data of the soil profile to be cut into the soil profile cutting model for inference, and outputting the boundary probability heatmap and uncertainty quantification map of the soil profile. The original multimodal data of the soil profile to be cut is input into the soil profile cutting model, and multiple random forward propagation inferences are performed to obtain multiple boundary probability distribution samples. The probability values ​​of multiple boundary probability distribution samples at each depth point are aggregated, and the arithmetic mean is taken to generate a boundary probability heatmap of the soil profile. The dispersion of the probability values ​​of multiple boundary probability distribution samples at each depth point is evaluated, and an uncertainty quantification map of the soil profile is generated by calculating the statistical variance.

[0011] As a preferred embodiment of the deep learning-based adaptive cutting method for soil profile data described in this invention, the visualization based on the boundary probability heatmap and uncertainty quantification map includes the following steps: The boundary probability heatmap of the soil profile is converted into a boundary probability heatmap visualization layer, and the uncertainty quantification map of the soil profile is converted into an uncertainty quantification map visualization layer. By overlaying the boundary probability heatmap visualization layer with the uncertainty quantification visualization layer, a combined visualization layer is generated. Add a depth axis to the combined visualization layers to create a visualization framework with coordinates; Raw multimodal data based on soil profiles are rendered on a coordinate-based visualization framework to generate interactive visualizations.

[0012] As a preferred embodiment of the deep learning-based adaptive cutting method for soil profile data described in this invention, the method includes the following steps: receiving manual correction instructions and generating the final soil layer boundary set. The interactive visualization captures manual correction instructions, which are parsed into depth coordinate adjustment operations and boundary operation types. Depth coordinate adjustment and boundary operation types are applied to the initial boundary position in the boundary probability heatmap of the soil profile to generate updated soil layer boundary positions. Based on the updated soil boundary locations, the final set of soil boundaries is generated.

[0013] As a preferred embodiment of the deep learning-based adaptive soil profile data cutting method described in this invention, the method involves: resampling and reorganizing the original high-density data of the corresponding soil layers according to the final soil layer boundary set to generate a standardized soil profile stratified physicochemical property database, including the following steps: Determine the depth range of each soil layer based on the final soil layer boundary set, and extract the corresponding data segment from the original high-density data based on the depth range of each soil layer. For each soil layer, the data segment is resampled, and the physicochemical index values ​​are calculated using cubic spline interpolation. The resampled physicochemical index values ​​were organized into a structured record of profile markings, depths, soil layer markings, and physicochemical index values. The structured records are combined to form a standardized database of soil profile stratification and physicochemical properties.

[0014] In a second aspect, the present invention provides a computer device including a memory and a processor, wherein the memory stores a computer program, wherein: when the computer program is executed by the processor, it implements any step of the deep learning-based adaptive cutting method for soil profile data as described in the first aspect of the present invention.

[0015] Thirdly, the present invention provides a computer-readable storage medium having a computer program stored thereon, wherein: when the computer program is executed by a processor, it implements any step of the deep learning-based adaptive cutting method for soil profile data as described in the first aspect of the present invention.

[0016] The beneficial effects of this invention are as follows: by collecting multi-source digital data and constructing a multimodal dataset based on depth alignment, a general feature encoder is pre-trained from unlabeled data using cross-modal contrastive learning to mine deep multimodal correlations. Then, a cutting model is trained using a mask sequence strategy to improve robustness to missing data. Subsequently, boundary probability heatmaps and uncertainty quantification maps are output through model inference. Based on visualization, manual correction instructions are received to generate the final boundary set. The corresponding soil layer data is resampled and organized to generate a standardized database. This invention achieves high-precision, adaptive soil profile cutting under conditions of multimodal data fusion and missing data, solving the problems of insufficient deep feature mining and sensitivity to incomplete data in existing technologies. Attached Figure Description

[0017] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0018] Figure 1 This is a flowchart of a deep learning-based adaptive cutting method for soil profile data. Detailed Implementation

[0019] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0020] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.

[0021] Secondly, the term "one embodiment" or "example" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the invention. The appearance of an embodiment in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that mutually excludes other embodiments.

[0022] Reference Figure 1 This is one embodiment of the present invention, which provides an adaptive cutting method for soil profile data based on deep learning, including the following steps: S1. Collect multi-source digital data of soil profiles and align them based on depth to obtain a multimodal aligned dataset.

[0023] S1.1. Time synchronization of multi-source digital data of soil profile is performed to obtain time-synchronized multi-source digital data.

[0024] Furthermore, different sensors exhibit inherent startup delays and sampling period differences when acquiring data such as spectral reflectance, elemental content, and conductivity, resulting in misalignment of modal data from the same depth point across time series. Time synchronization ensures a consistent temporal correspondence between different physicochemical indicators from the same profile by identifying and aligning the timestamps of each data stream.

[0025] Specifically, eliminating timing misalignments caused by differences in instrument response speeds and unifying heterogeneous, independently acquired data streams into a common time reference frame is a prerequisite for any meaningful multimodal analysis. Time-synchronized multi-source digitized data provides temporally consistent input for spatial registration.

[0026] S1.2. Spatial registration is performed on the time-synchronized multi-source digital data. Based on the depth coordinates, the sampling points of each data stream are matched to obtain the spatially registered multi-source digital data.

[0027] Furthermore, the core of spatial registration for time-synchronized multi-source digital data lies in matching the sampling points of each data stream based on depth coordinates. Although the data is synchronized in time, the spatial positions of each sensor probe, sampling intervals, and even starting depths may have deviations ranging from micrometers to centimeters. Spatial registration uses the common spatial dimension of depth coordinates and interpolation to map each set of time-synchronized multi-source digital data onto a unified depth coordinate grid, so that at each specific depth value, a complete set of data on spectral reflectance, elemental content, conductivity, and other indicators can be obtained.

[0028] Specifically, by leveraging the inherent vertical variation of soil profile data and using depth as the absolute benchmark for spatial registration, the problem of data misalignment caused by inconsistent spatial deployment of multi-source sensors was solved, achieving precise data fusion in the physical spatial dimension. The spatially registered multi-source digitized data enables direct comparison and correlation analysis of different physicochemical indicators at the same depth location.

[0029] S1.3. Standardize the sampling interval of the spatially registered multi-source digital data to obtain multi-source digital data with standard intervals. Integrate the multi-source digital data with standard intervals into a multimodal aligned dataset indexed by depth.

[0030] Furthermore, spatially registered multi-source digitized data may still have different original sampling intervals. Sampling interval standardization, through resampling techniques, unifies all data streams to the same, preset depth sampling interval, such as a one-millimeter interval, ensuring that the data is uniform and consistent along the depth axis and avoiding analytical biases introduced by different sampling densities. Integrating all these multi-source digitized data with standard intervals according to the depth index forms a multimodal aligned dataset that is strictly aligned by depth.

[0031] Specifically, the multi-source, heterogeneous, and non-uniform raw data is transformed into a standardized data cube with regular structure, depth alignment, and multi-feature parallelism. This provides high-quality, structured input that can be directly digested and understood by subsequent deep learning models, solving the challenge that multimodal data is difficult for models to effectively utilize due to its inconsistent format and scale. The multimodal aligned dataset is the direct foundation for cross-modal contrastive learning and model training. S2. The unlabeled data portion of the multimodal aligned dataset is pre-trained using cross-modal contrastive learning to obtain a general feature encoder.

[0032] S2.1 Extract spectral sequences and elemental content sequences from the same profile as positive sample pairs from the unlabeled data portion of the multimodal alignment dataset, and extract sequences from different profiles as negative sample pairs to construct a cross-modal contrastive learning sample pair set.

[0033] Furthermore, for any soil profile, its spectral reflectance sequence and the elemental content sequence of the same profile are extracted to form a positive sample pair. Different physicochemical measurements of the same profile essentially describe the same set of soil layer spatial structures and should have a high degree of consistency in the expression of high-level features. Spectral sequences or elemental content sequences are randomly extracted from other different soil profiles and combined with any sequence of the current profile to form a negative sample pair. This forces the dual-branch neural network to learn to identify which feature patterns are intrinsic attributes of a specific profile structure and which are irrelevant associations generated by random combinations.

[0034] Specifically, for example, a well-developed sedimentary layer exhibits specific absorption characteristics in its spectrum, which may correspond to an enrichment of iron and aluminum oxides in its elemental composition. A bi-branch neural network needs to learn to correlate these cross-modal, soil-layer-related signals, while ignoring spurious correlations caused by random pairings or surface noise. By eliminating the reliance on expensive soil layer boundary labels and utilizing only naturally existing correspondences (same profile) and non-correspondences (different profiles) within the data as supervisory signals, it can fully utilize massive amounts of unlabeled field survey data, making it possible for the model to learn the essential structural features of soil profiles. Constructing a set of cross-modal comparative learning sample pairs is the foundation for initiating unsupervised pre-training.

[0035] S2.2 Optimize the cross-modal contrastive learning sample set by using a bi-branch neural network and a contrastive loss function to obtain an optimized bi-branch neural network.

[0036] Furthermore, two structurally identical but independently initialized branches of the neural network process input sequences of different modalities, such as one branch processing spectral sequences and the other processing elemental composition sequences. The core function of the contrastive loss function is to optimize the distance between positive sample pairs in the feature space mapped by the two-branch neural network, making them as close as possible, while maximizing the distance between negative sample pairs in the feature space. During this training process, the two-branch neural network is not directly told where the soil layer boundaries are, but is forced to explore and capture deep common feature factors that can simultaneously explain spectral and elemental composition variations. These factors are often closely related to soil processes such as organic matter accumulation, clay leaching, or iron oxide deposition.

[0037] Specifically, the goal of model optimization shifts from fitting specific labels to learning a metric that can determine whether two sets of data describe the same entity. This guides the network to extract cross-modal, invariant features crucial for soil layer identification, while filtering out sensor-specific noise or variations unrelated to soil structure. After sufficient optimization and training, the bi-branch neural network gains the ability to map the original multimodal data to a shared feature space rich in soil layer semantics. Obtaining the optimized bi-branch neural network marks the completion of cross-modal feature representation learning.

[0038] S2.3 Fix the parameters of the feature extraction part in the optimized dual-branch neural network to obtain a general feature encoder.

[0039] Furthermore, fixing the parameters of the feature extraction portion in the optimized dual-branch neural network to obtain a universal feature encoder is a crucial step in solidifying and transferring model knowledge. The optimized dual-branch neural network comprises two functional parts: a front-end encoder responsible for feature extraction and a back-end projection head that may be used for contrastive learning. Obtaining the universal feature encoder involves separately retaining the feature extraction encoder portions from both branches and locking all their parameters, preventing them from participating in gradient updates during subsequent training phases. This ensures that the valuable knowledge learned during unsupervised pre-training—regarding the essential structure of multimodal soil profile data, i.e., how to transform raw data into high-level features rich in soil layer semantics—is fully preserved and transferred to downstream tasks.

[0040] Specifically, the downstream soil profile cutting model no longer needs to learn these basic and universal feature representation capabilities from scratch. It can directly perform fine-tuning for the boundary cutting task on a high-quality feature starting point, which reduces the downstream task's demand for labeled data and improves the stability and final performance of the dual-branch neural network. This is a key technology for realizing the pre-training-fine-tuning paradigm and solving the problem of scarce labeled data.

[0041] S3. Combine the general feature encoder with the task module, and use the mask sequence strategy to train the combined task module to obtain the soil profile cutting model.

[0042] S3.1 Connect the general feature encoder to the task module to form an untrained soil profile cutting model.

[0043] Furthermore, the output of a pre-trained, parameter-fixed general feature encoder is connected in series with a task module specifically responsible for sequence-to-sequence mapping or boundary prediction. The general feature encoder is responsible for converting the input multimodal aligned data into high-level feature representations rich in soil layer semantics, while the task module performs probabilistic prediction of soil layer boundaries based on these feature representations. This constructs a feature knowledge transfer architecture. The general feature encoder, as a pre-feature extractor, has its parameters learned from the essential structural features of the soil profile during the pre-training stage and frozen, ensuring that valuable unsupervised learning knowledge is not destroyed. The task module, as a trainable, lightweight adapter, focuses on learning how to map the general feature representations to specific boundary prediction tasks.

[0044] Specifically, transfer learning was implemented, enabling the model to be trained efficiently on small-scale labeled data while possessing powerful feature representation capabilities, thus avoiding the massive amounts of labeled data required to train deep networks from scratch. The untrained soil profile cutting model formed by the connections is a trainable architecture with a strong foundation in feature extraction but which has not yet learned specific cutting rules.

[0045] S3.2. Apply a masking sequence strategy to the labeled data portion of the multimodal aligned dataset to generate masked training data.

[0046] Furthermore, consecutive data segments are randomly selected from labeled data samples, and the feature values ​​of all modalities within these segments are either set to zero or replaced with specific mask identifiers. The masking sequence strategy aims to proactively disrupt complete, known data, artificially creating challenging scenarios of missing or corrupted data. For example, a segment containing spectral and elemental data of a suspected soil transition zone might be masked, forcing the soil profile cutting model to rely not only on simple local abrupt changes in the data of that region, but also on the unmasked contextual information—the soil features above and below that region—to infer the possible features of the masked portion and the overall soil structure.

[0047] Specifically, model training is transformed from a passive pattern matching process into an active, context-based inference-driven feature learning and reconstruction process. The generated masked training data can effectively simulate common issues encountered during field data collection, such as signal loss, sensor noise interference, or local data anomalies. This guides the soil profile cutting model to develop stronger robustness and contextual understanding capabilities, rather than simply memorizing surface patterns in the data. Generating masked training data is crucial for providing subsequent training with input that simulates real, incomplete scenarios.

[0048] S3.3 Input the masked training data into the untrained soil profile cutting model, and update the parameters of the task module in the untrained soil profile cutting model by minimizing the combined loss of boundary prediction loss and feature reconstruction loss.

[0049] Furthermore, the masked training data is input into the untrained soil profile cutting model, and the task module parameters are updated by minimizing the combined loss of boundary prediction loss and feature reconstruction loss, forming the core mechanism of the training process. Boundary prediction loss directly measures the difference between the model's output boundary probability map and the actual soil layer boundary labels, driving the task module to learn accurate boundary localization. Feature reconstruction loss requires the model to attempt to predict the original feature values ​​of the masked sequence fragments, thus encouraging the soil profile cutting model to recover the damaged data based on context. In optimizing the combined loss, the soil profile cutting model is not only trained to complete the explicit boundary cutting task but is also implicitly required to deeply understand the intrinsic generation logic and interrelationships of soil properties in the vertical sequence.

[0050] Specifically, for example, to accurately reconstruct masked calcium content data, it is necessary to understand its enrichment patterns in the sedimentary layer and its relationship with the layers above and below it. This understanding, in turn, enhances the accuracy and scientific rigor of soil layer boundary judgments, especially when data is incomplete. Through the backpropagation algorithm, the gradient of the combined loss only updates the parameters of the task modules in the untrained soil profile cutting model, while the parameters of the general feature encoder remain unchanged, ensuring the stability of pre-trained knowledge. This enables the soil profile cutting model to learn how to perform robust inference using deep features and contextual relationships even with incomplete data.

[0051] S3.4 After the training process is completed, the untrained soil profile cutting model becomes the soil profile cutting model. Furthermore, after the training process is completed, the untrained soil profile cutting model becomes the soil profile cutting model. At this time, the soil profile cutting model integrates the cross-modal feature understanding ability from the general feature encoder, which has been verified by massive unlabeled data, and the decision-making ability from the task module, which has been optimized by the mask sequence strategy and the combined loss function, and has strong robustness for boundary prediction.

[0052] Specifically, the end of the training process signifies the convergence of model parameter optimization, and the value of the combined loss function stabilizes at a low level, indicating that the model has been able to effectively coordinate the two objectives of boundary prediction accuracy and feature context understanding on the training set. Thus, the soil profile cutting model has transformed from a network architecture awaiting training into a fully functional, deployable inference tool capable of accepting raw multimodal data input and outputting intelligent predictions of soil layer boundaries. Furthermore, its internal mechanisms make it inherently resistant to missing data and noise. The soil profile cutting model is the final carrier for performing the core cutting function in the method of this invention.

[0053] S4. Input the original multimodal data of the soil profile to be cut into the soil profile cutting model for inference, and output the boundary probability heat map and uncertainty quantification map of the soil profile.

[0054] S4.1 Input the original multimodal data of the soil profile to be cut into the soil profile cutting model, perform multiple random forward propagation inferences, and obtain multiple boundary probability distribution samples.

[0055] Furthermore, during the inference process, the randomly deactivated units within the soil profile cutting model are kept active. Forward propagation is repeatedly performed on the same input data. Due to the randomness of the deactivated units, each forward propagation is equivalent to inferring the data through a slightly different sub-network, resulting in a potentially different boundary probability output each time. This utilizes the inherent randomness of deep learning models as a tool to probe their prediction confidence. For soil layer boundaries that the deep learning model can clearly identify and that have well-defined features, the results of multiple inferences will be highly consistent; however, for deep regions in transition zones, with ambiguous data, or where the model's understanding is insufficient, the results of multiple inferences will show significant discrepancies.

[0056] Specifically, without altering the deep learning model structure or training method, the distribution predicted by the deep learning model can be obtained at low cost during the inference stage, providing a direct data foundation for subsequent uncertainty quantification. Obtaining multiple boundary probability distribution samples is a prerequisite for implementing uncertainty assessment.

[0057] S4.2 Aggregate the probability values ​​of multiple boundary probability distribution samples at each depth point, and generate a boundary probability heatmap of the soil profile by taking the arithmetic mean.

[0058] Furthermore, for each depth location, the probability values ​​obtained from all multiple inferences are summed and divided by the total number of inferences. The idea of ​​ensemble averaging is used to improve the robustness of the prediction. A single inference may be affected by the randomness of the random activation path within the model. By averaging the results of multiple inferences, the noise caused by randomness can be effectively smoothed out, and a probability estimate that is closer to the model consensus or expectation can be obtained.

[0059] Specifically, for example, at a certain depth point, even if a few inferences yield abnormally high or low boundary probabilities due to specific combinations of neuron inactivation, the arithmetic mean can mitigate the impact of these outliers on the final judgment. The generated probability heatmap is more stable and reliable than the output of any single inference, and can more clearly reveal the true peak location of the soil boundary probability, providing a less noisy and more credible predictive view for decision-making. The boundary probability heatmap of the generated soil profile provides the final probability estimate of the soil boundary location.

[0060] The expression for the average value is: ; in, For depth The average value at the boundary after aggregation, This represents the total number of random forward propagation inferences. For the first The next reasoning is at the target depth point. The boundary probability value output at that point. For indexing random reasoning, For the target depth point; S4.3. Evaluate the dispersion of the probability values ​​of multiple boundary probability distribution samples at each depth point, and generate an uncertainty quantification map of the soil profile by calculating the statistical variance.

[0061] Furthermore, the dispersion of probability values ​​from multiple boundary probability distribution samples at each depth point is assessed. An uncertainty quantification map of the soil profile is generated by calculating the statistical variance, and the average of the squares of the deviations of multiple probability sample values ​​from the arithmetic mean at each depth point is calculated. The variance expression precisely defines this calculation. This gives the statistical variance a new interpretation: as a quantitative measure of the cognitive uncertainty of the soil profile cutting model regarding its own predictions. It is recognized that the consistency of results from multiple random inferences is itself a perfect indicator of model confidence: the smaller the variance, the more similar the predictions from all inference paths; the larger the variance, the more different the predictions from different inference paths, indicating confusion and uncertainty in the model.

[0062] Specifically, for example, in a region where soil properties gradually change, the model may be unable to determine precise boundaries, leading to predictions from multiple inferences being scattered over a wide probability range, resulting in high variance. Instead of being a black box output of the model, it provides a clear, point-by-point labeled confidence map, directly revealing which parts of the prediction results are reliable and which are questionable. Generating an uncertainty quantification map of the soil profile achieves a transparent measurement of the reliability of the model's cutting results.

[0063] The variance expression is: ; in, For target depth point Uncertainty variance at the location.

[0064] S5. Visualize the results based on boundary probability heatmaps and uncertainty quantification maps.

[0065] S5.1 Convert the boundary probability heatmap of the soil profile into a boundary probability heatmap visualization layer, and convert the uncertainty quantification map of the soil profile into an uncertainty quantification map visualization layer.

[0066] Furthermore, the probability value corresponding to each depth point in the boundary probability heatmap is mapped to a continuously changing color gradient or brightness value, thus forming an image layer distributed along the depth axis, using color or brightness to represent the probability of boundary existence. This transforms the abstract, numerical probability sequence into an intuitive, visually discernible pattern; for example, warm colors or bright areas represent high-probability boundary locations, while cool colors or dark areas represent low-probability locations. This allows for the rapid identification of potential soil layer boundaries identified by the model, much like observing a real cross-section, improving the intuitiveness and efficiency of information interpretation. The boundary probability heatmap visualization layer becomes one of the fundamental elements for subsequent overlay displays.

[0067] S5.2 Overlay the boundary probability heatmap visualization layer with the uncertainty quantification visualization layer to generate a combined visualization layer.

[0068] Furthermore, based on the variance or entropy value of each depth point in the uncertainty quantification map, another independent visual encoding dimension is assigned, such as using different color saturation, pattern density, or layer transparency. Areas with high uncertainty are given higher saturation, denser patterns, or higher translucency to visually highlight them. This distinguishes between what the soil profile cutting model predicts and how confident the model is in its predictions.

[0069] For example, even if a boundary probability value is high, if uncertainty is also high, that area will be displayed in the visualization with both a bright boundary indicator and a semi-transparent or flashing effect, warning that while the prediction is clear, its reliability is questionable. This avoids information clutter and allows experts to easily distinguish which boundaries are clear and certain, and which are ambiguous transition zones that are difficult for the soil profile model to determine, providing direct visual guidance for targeted manual review. The uncertainty quantification visualization layer is a key component of a complete decision view.

[0070] S5.3 Add a depth axis to the combined visualization layers to form a visualization framework with coordinates.

[0071] Furthermore, adding a depth axis to the combined visualization layers to form a coordinated visualization framework involves adding a scale bar and tick marks precisely aligned with the depth axis to the side or a designated location of the combined visualization layer. The addition of the depth axis ensures that the visualization has a precise spatial reference, establishing a strict correspondence between the visual pattern and the actual physical depth—an indispensable dimension for soil profile analysis. The coordinated visualization framework provides a unified and accurate spatial benchmark for subsequently rendering any depth-related data onto the same view, guaranteeing the alignment and comparability of all visualization elements along the depth dimension.

[0072] S5.4 The raw multimodal data based on soil profiles are rendered on a coordinate-based visualization framework to generate an interactive visualization.

[0073] Furthermore, the raw multimodal data based on soil profiles is rendered into interactive visualizations on a coordinate-based visualization framework. This involves plotting raw multimodal data such as spectral reflectance and elemental content as line graphs, curve clusters, or filled plots on a base map that already includes combined visualization layers and a depth coordinate axis. During rendering, it is ensured that the horizontal position (depth) of each curve is strictly aligned with the coordinate axis scale. Interactivity is achieved through a graphical user interface, allowing users to control layer display, such as toggling the display of specific modal data, scaling the depth range, and directly manipulating the prediction boundaries in the boundary probability heatmap by clicking or dragging. This constructs a collaborative workbench centered on intelligent model interpretation, using raw data as the verification background, and serving as the decision-making terminal. It juxtaposes machine-generated insights with raw evidence and empowers experts with the final decision-making power through intuitive interaction.

[0074] For example, when encountering a highly uncertain boundary highlighted in the model, one can immediately compare it with the actual shape of the original spectrum and element curves at that depth, and directly drag the virtual boundary line for adjustment. This integrates rendering and interaction into an efficient, evidence-driven, closed-loop human-machine collaborative segmentation workflow, organically combining the rapid screening capabilities of artificial intelligence with the domain knowledge and final decision-making authority of human experts. This improves overall efficiency while ensuring the scientific accuracy of the results. The generation of interactive visualizations completes the interface construction from automated reasoning to human-machine collaborative decision-making.

[0075] S6. Receive manual correction instructions and generate the final soil layer boundary set.

[0076] S6.1 Capture manual correction instructions from interactive visual displays, and parse the manual correction instructions into depth coordinate adjustment operations and boundary operation types.

[0077] Furthermore, by monitoring user interactions with the initial boundary position markers on the visual interface—such as dragging virtual boundary lines, clicking the delete icon, or clicking to add new boundary points—these raw interactive events are transformed into machine-processable semantic instructions. Depth coordinate adjustment operations are specifically recorded as the user dragging a boundary point from its original depth to a target depth value. Boundary operation types are defined as categories such as moving existing boundaries, deleting existing boundaries, or inserting new boundaries. This transforms qualitative decisions based on professional knowledge and visual judgment into quantitative operation instructions that the computer can precisely execute.

[0078] Specifically, for example, to determine if a boundary should be moved downwards by a certain distance, one only needs to drag the corresponding line, without directly inputting numbers. This interaction is parsed as a movement operation type and the specific depth coordinate change, lowering the barrier to human-computer interaction and injecting domain knowledge into the digital process in the most natural way, while ensuring that all corrective intentions are accurately captured and recorded. The parsed depth coordinate adjustment operation and boundary operation type provide a clear basis for executing specific modifications.

[0079] S6.2, Depth Coordinate Adjustment Operation and Boundary Operation Type are applied to the initial boundary position in the boundary probability heatmap of the soil profile to generate updated soil layer boundary positions.

[0080] Furthermore, based on the type of boundary operation, the initial boundary probability peak points or regions in the boundary probability heatmap are mathematically adjusted. For movement operations, the probability distribution at the initial boundary position is shifted along the depth axis according to the depth coordinates; for deletion operations, the probability value at that boundary position is set to zero; for addition operations, a new probability peak is added at the specified depth coordinates. This approach directly feeds back and integrates human expert corrections into the model's output probability representation, rather than simply overwriting a final value. It respects the characteristic of the probability heatmap as a soft output of the model, adjusting the probability distribution rather than rigidly setting boundaries. Specifically, for example, shifting a boundary peak to the right essentially redistributes the probability energy at the original peak to the new depth location, smoothly transitioning the original position. This aligns better with the probabilistic model's logic than simply deleting and adding. The resulting soil boundary locations are manually corrected and updated, existing in the form of an adjusted probability distribution. This approach incorporates expert judgment while maintaining data format consistency and subsequent processing capabilities. The updated soil boundary locations directly reflect the application of expert knowledge to the model's initial results.

[0081] S6.3. Based on the updated soil boundary locations, generate the final set of soil boundary conditions.

[0082] Furthermore, the positions of all soil layer boundaries after depth coordinate adjustment and boundary operation types are sorted and confirmed. This includes checking the rationality of the updated boundaries in terms of depth order, removing boundary points that are too close or contradictory due to the operation, and arranging all confirmed boundary points in ascending order of depth to form an ordered and definite list of depth values. This achieves a convergence from an interactive and probabilistic intermediate state to a definite and structured final result.

[0083] Specifically, after a series of drag-and-drop operations, the final soil layer boundary set clearly lists the top and bottom depths of each soil layer. Any subsequent processing, such as data resampling, is based on this precise set, ensuring that the final output of the human-machine collaborative work is a high-quality digital stratification scheme that can be directly used for scientific analysis or engineering applications. This combines the rapid initial screening capability of the soil profile cutting model with the precise decision-making of human experts, thus achieving an optimal balance between efficiency and accuracy. The generation of the final soil layer boundary set marks the completion of the human-machine collaborative decision-making process in the adaptive soil profile cutting process.

[0084] S7. Based on the final soil layer boundary set, resample and organize the original high-density data of the corresponding soil layers to generate a standardized soil profile stratification physicochemical property database.

[0085] S7.1 Determine the depth range of each soil layer based on the final soil layer boundary set, and extract the corresponding data segment from the original high-density data based on the depth range of each soil layer.

[0086] Furthermore, by using the top and bottom depths of each soil layer recorded in the final soil layer boundary set as the index range, all data points located between these two depth values ​​are precisely extracted from the original high-density data, thus forming a continuous data subset for each independent soil layer. This achieves data segmentation based on precise physical boundaries, transforming the boundaries with clear soil science significance, which have been confirmed through human-machine collaboration, into actual operational instructions in data processing, ensuring that all subsequent analyses are strictly limited to the correct soil generation unit.

[0087] Specifically, for example, for a soil layer defined as an accretionary layer, its depth range is used to specifically extract all spectral and elemental data within that layer, avoiding the incorrect inclusion of attributes from upper or lower layers in the analysis. This ensures the consistency of data, knowledge, and the processed objects in physical space, laying a correct foundation for generating standardized data with true soil science interpretability. The corresponding data segments extracted from the original high-density data are the direct input for standardized resampling within the soil layer.

[0088] S7.2 Resample the data segments corresponding to each soil layer and use cubic spline interpolation to calculate the physicochemical index values.

[0089] Furthermore, within the depth range of each soil layer, a series of target depth points are set at standard intervals. Then, using the original, non-uniformly spaced raw data points within that soil layer data segment, a cubic spline interpolation curve passing through all the raw data points is constructed. Based on this curve, the physicochemical index values ​​at each target depth point are calculated. The expressions for the physicochemical index values ​​describe the specific calculation method within each spline segment. The cubic spline interpolation method is chosen to adapt to the natural characteristics of vertical variations in soil properties. It requires that the interpolation curve not only passes through all raw data points but also has continuous first and second derivatives at the connection points, which ensures that the generated curve is extremely smooth.

[0090] Specifically, most soil physicochemical properties, such as clay content and organic carbon concentration, typically exhibit gradual rather than abrupt changes in the vertical direction. The smoothness enforced by cubic spline interpolation precisely conforms to this natural law, more realistically reflecting the continuous variation trend of soil properties within the soil layer. This avoids the jagged artificiality that linear interpolation might produce or the excessive oscillations that high-order polynomial interpolation might cause. A continuous, smooth, and physically reliable property variation model was reconstructed from the original discrete data, and based on this model, highly reliable and comparable physicochemical index values ​​at standard depth points were generated. The physicochemical index values ​​calculated using cubic spline interpolation are the core values ​​for constructing standardized records.

[0091] The expression for the physicochemical index value is: ; in, For target depth point The interpolation results of the physicochemical indicators at the location, For the target depth point, For the first Spline segment constant term coefficients, For spline index, For the first The depth value of the original sampling point of the spline segment. For the first coefficients of the quadratic term in the spline segment For the first The coefficient of the linear term in the spline segment; S7.3 Organize the resampled physicochemical index values ​​into a structured record of profile markings, depths, soil layer markings, and physicochemical index values.

[0092] Furthermore, the resampled physicochemical index values ​​are organized into structured records of profile identifiers, depths, soil layer identifiers, and physicochemical index values, and formatted and encapsulated according to a predefined, fixed data pattern. Each structured record is clearly associated with four key dimensions: the profile identifier indicates the specific spatial location of the data; the standard depth provides precise vertical coordinates; the soil layer identifier, based on the final soil layer boundary set, indicates the soil genetic unit to which the data point belongs; and the physicochemical index values ​​are the specific attribute measurements obtained through scientific interpolation at that location.

[0093] Specifically, a flat data structure with strong self-interpretation and clear relationships was constructed. The complex three-dimensional entity of soil profiles (two-dimensional profiles plus multiple attribute dimensions) is deconstructed into atomic records. Each row fully carries all the basic information about its location, layer, and value, enhancing data readability, exchangeability, and computability. Whether used for statistical analysis, spatial modeling, or machine learning, it can be read and processed efficiently and unambiguously, making it an ideal format for centralized data management and cross-project sharing. Structured records are the basic units for building the database.

[0094] S7.4 Combine structured records to form a standardized database of soil profile stratification and physicochemical properties.

[0095] Furthermore, combining structured records to form a standardized database of soil profile stratification and physicochemical properties involves aggregating structured records generated from all profiles, soil layers, and standard depth points, organizing them into a complete dataset according to unified standards, such as storing them in tabular or formatted files. This process transforms raw, unstructured multi-source sensor data into a high-quality database that is deeply standardized, semantically encoded in soil layers, continuously formatted with attributes, and formatted through a complete technological chain of intelligent segmentation, expert interaction, scientific interpolation, and structured organization. It integrates the initial judgment of artificial intelligence, the domain knowledge of human experts, and the mathematical respect for natural laws, representing the ultimate culmination of information refinement and continuous value enhancement.

[0096] Specifically, for example, this database allows users to directly query data at 1-cm intervals where the organic carbon content in the sedimentary layers of all profiles exceeds a specific value, providing an unprecedented high-quality and highly consistent data foundation for standardized soil quality assessment, digital soil mapping, and process modeling. The formation of a standardized database of soil profile stratification and physicochemical properties marks a complete transformation from raw data to standardized information products that can directly serve scientific research and decision support.

[0097] This embodiment also provides a computer device applicable to the adaptive cutting method of soil profile data based on deep learning, including: a memory and a processor; the memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions to implement the adaptive cutting method of soil profile data based on deep learning as proposed in the above embodiment.

[0098] The computer device can be a terminal, comprising a processor, memory, communication interface, display screen, and input devices connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, carrier networks, NFC (Near Field Communication), or other technologies. The display screen can be an LCD screen or an e-ink screen. The input devices can be a touch layer covering the display screen, buttons, a trackball, or a touchpad on the computer device's casing, or an external keyboard, touchpad, or mouse.

[0099] This embodiment also provides a storage medium storing a computer program that, when executed by a processor, implements the deep learning-based adaptive cutting method for soil profile data as proposed in the above embodiments. The storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Red-Only Memory (PROM), Read-Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0100] In summary, this invention collects multi-source digital data and constructs a multimodal dataset based on depth alignment. It utilizes cross-modal contrastive learning to pre-train a general feature encoder from unlabeled data to mine deep multimodal correlations. Then, it combines a mask sequence strategy to train a cutting model to improve robustness to missing data. Subsequently, it outputs boundary probability heatmaps and uncertainty quantification maps through model inference. Based on visualization, it receives manual correction instructions to generate the final boundary set. It resamples and organizes the corresponding soil layer data to generate a standardized database. This invention achieves high-precision, adaptive soil profile cutting under conditions of multimodal data fusion and missing data, solving the problems of insufficient deep feature mining and sensitivity to incomplete data in existing technologies.

[0101] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.

Claims

1. A soil profile data self-adaptive cutting method based on deep learning, characterized in that: This includes collecting multi-source digital data of soil profiles and aligning them based on depth to obtain a multimodal aligned dataset; The unlabeled data portion of the multimodal aligned dataset is pre-trained using cross-modal contrastive learning to obtain a general feature encoder; By combining a general feature encoder with a task module and using a mask sequence strategy to train the combined task module, a soil profile cutting model is obtained. The original multimodal data of the soil profile to be cut is input into the soil profile cutting model for inference, and the boundary probability heat map and uncertainty quantification map of the soil profile are output. Visualize the boundary probability heatmap and uncertainty quantification map, receive manual correction instructions, and generate the final soil layer boundary set. Based on the final soil layer boundary set, the original high-density data of the corresponding soil layers are resampled and processed to generate a standardized soil profile stratification physicochemical property database.

2. The deep learning-based soil profile data adaptive cutting method of claim 1, wherein: Multi-source digitized data of soil profiles are collected and aligned based on depth to obtain a multimodal aligned dataset, including the following steps: Time synchronization is performed on multi-source digital data of soil profiles to obtain time-synchronized multi-source digital data; Spatial registration is performed on the time-synchronized multi-source digital data. The sampling points of each data stream are matched based on the depth coordinates to obtain the spatially registered multi-source digital data. The sampling interval of the spatially registered multi-source digital data is standardized to obtain multi-source digital data with standard intervals. The multi-source digital data with standard intervals is then integrated into a multimodal aligned dataset indexed by depth. 3.The deep learning-based soil profile data adaptive cutting method of claim 2, wherein: The unlabeled data portion of the multimodal aligned dataset is pre-trained using cross-modal contrastive learning to obtain a general feature encoder, including the following steps: From the unlabeled data portion of the multimodal alignment dataset, spectral sequences and elemental content sequences from the same profile are extracted as positive sample pairs, and sequences from different profiles are extracted as negative sample pairs to construct a cross-modal contrastive learning sample pair set. The optimized dual-branch neural network is obtained by using a bi-branch neural network and a contrastive loss function to optimize the training of cross-modal contrastive learning sample sets. By fixing the parameters of the feature extraction part in the optimized dual-branch neural network, a general feature encoder is obtained. 4.The deep learning-based soil profile data adaptive cutting method of claim 3, wherein: The soil profile cutting model is obtained by combining a general feature encoder with a task module and training the combined task module using a mask sequence strategy, including the following steps: The general feature encoder is connected to the task module to form an untrained soil profile cutting model; Masked training data is generated by applying a mask sequence strategy to the labeled data portion of the multimodal aligned dataset. The masked training data is input into the untrained soil profile cutting model, and the parameters of the task module in the untrained soil profile cutting model are updated by minimizing the combined loss of boundary prediction loss and feature reconstruction loss. After the training process is completed, the untrained soil profile cutting model becomes the soil profile cutting model.

5. The deep learning-based soil profile data adaptive cutting method of claim 4, wherein: The original multimodal data of the soil profile to be cut is input into the soil profile cutting model for inference, and the boundary probability heatmap and uncertainty quantification map of the soil profile are output, including the following steps: The original multimodal data of the soil profile to be cut is input into the soil profile cutting model, and multiple random forward propagation inferences are performed to obtain multiple boundary probability distribution samples. The probability values ​​of multiple boundary probability distribution samples at each depth point are aggregated, and the arithmetic mean is taken to generate a boundary probability heatmap of the soil profile. The dispersion of the probability values ​​of multiple boundary probability distribution samples at each depth point is evaluated, and an uncertainty quantification map of the soil profile is generated by calculating the statistical variance.

6. The deep learning-based soil profile data adaptive cutting method of claim 5, wherein: The visualization based on the boundary probability heatmap and uncertainty quantification map includes the following steps: The boundary probability heatmap of the soil profile is converted into a boundary probability heatmap visualization layer, and the uncertainty quantification map of the soil profile is converted into an uncertainty quantification map visualization layer. By overlaying the boundary probability heatmap visualization layer with the uncertainty quantification visualization layer, a combined visualization layer is generated. Add a depth axis to the combined visualization layers to create a visualization framework with coordinates; Raw multimodal data based on soil profiles are rendered on a coordinate-based visualization framework to generate interactive visualizations.

7. The deep learning-based soil profile data adaptive cutting method of claim 6, wherein: Receive manual correction instructions and generate the final soil boundary set, including the following steps: The interactive visualization captures manual correction instructions, which are parsed into depth coordinate adjustment operations and boundary operation types. Depth coordinate adjustment and boundary operation types are applied to the initial boundary position in the boundary probability heatmap of the soil profile to generate updated soil layer boundary positions. Based on the updated soil boundary locations, the final set of soil boundaries is generated.

8. The adaptive cutting method for soil profile data based on deep learning as described in claim 7, characterized in that, Based on the final soil layer boundary set, the original high-density data of the corresponding soil layers are resampled and processed to generate a standardized soil profile stratification physicochemical property database, including the following steps: Determine the depth range of each soil layer based on the final soil layer boundary set, and extract the corresponding data segment from the original high-density data based on the depth range of each soil layer. For each soil layer, the data segment is resampled, and the physicochemical index values ​​are calculated using cubic spline interpolation. The resampled physicochemical index values ​​were organized into a structured record of profile markings, depths, soil layer markings, and physicochemical index values. The structured records are combined to form a standardized database of soil profile stratification and physicochemical properties.

9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that: When the processor executes the computer program, it implements the steps of the deep learning-based adaptive cutting method for soil profile data as described in any one of claims 1 to 8.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that: When the computer program is executed by the processor, it implements the steps of the deep learning-based adaptive cutting method for soil profile data as described in any one of claims 1 to 8.