Soil heavy metal content estimation method, device, equipment and medium

By collecting soil samples and determining heavy metal content, Vis-NIR and XRF spectral data were obtained. The LV competition model and NSGA-II algorithm were used to screen band features and construct a machine learning model, which solved the accuracy problem of soil heavy metal content inversion and achieved high-precision heavy metal estimation.

CN122306734APending Publication Date: 2026-06-30NORTHEASTERN UNIV CHINA +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NORTHEASTERN UNIV CHINA
Filing Date
2026-06-01
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, it is difficult to accurately invert the content of heavy metals in soil by relying solely on single-band extraction or simple statistical models. In particular, the direct spectral response signal in the Vis-NIR band is weak and easily masked by the spectral signals of other components such as soil organic matter and clay minerals.

Method used

Soil samples were collected and heavy metal content was determined. Visible-near-infrared hyperspectral reflectance and X-ray fluorescence spectral data were obtained. The competition coefficient between heavy metal elements was calculated using the LV competition model, and the competition relationships were grouped. The optimal band ratio feature set was selected by combining the band importance of random forest and NSGA-II algorithm. A machine learning regression model was constructed for estimation.

Benefits of technology

It enables high-precision and rapid estimation of soil heavy metal content in the context of multi-metal coexistence, improves the accuracy and stability of the estimation results, and provides an efficient and reliable monitoring method for multi-metal contamination areas such as ion-adsorption rare earth mining areas.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122306734A_ABST
    Figure CN122306734A_ABST
Patent Text Reader

Abstract

This application relates to the field of metal mining technology and proposes a method, apparatus, equipment, and medium for estimating the heavy metal content in soil. The method includes: collecting soil samples from a target area and determining the content of multiple heavy metal elements; acquiring visible-near-infrared hyperspectral reflectance data and X-ray fluorescence spectral data of the soil samples; calculating the competition coefficients between heavy metal elements using the L-V competition model to determine the set of competing elements for the target heavy metal element; constructing a multi-objective optimization function based on the band importance of random forests, with the goal of maximizing the spectral response differences of the target heavy metal element and minimizing the spectral response differences of its competing elements; combining wavelength spacing constraints; using the NSGA-II algorithm to select the optimal set of band ratio features; training a machine learning regression model; and estimating the heavy metal content in the soil of the target area. This scheme significantly improves the inversion accuracy and stability of the model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of metal mining technology, and in particular to a method, apparatus, equipment and medium for estimating the heavy metal content in soil. Background Technology

[0002] During the mining of ion-adsorption rare earth minerals, leaching agents such as ammonium sulfate are injected into the weathered crust ore body through injection wells to replace rare earth ions. Leakage of the leaching solution can easily lead to soil acidification, which in turn activates heavy metal elements such as Pb, Cu, Co, and Cr associated with the ore body, causing them to desorb and migrate to the soil around the mining area, resulting in regional soil heavy metal contamination. Therefore, efficient and accurate monitoring of the heavy metal content in the soil of the mining area has become an urgent need for the ecological environmental protection of the mining area.

[0003] Visible-near-infrared (Vis-NIR) hyperspectral technology has become the mainstream technique for investigating and monitoring heavy metal contamination in soil due to its advantages of low cost, convenient sampling, and rapid non-destructive operation. However, since most heavy metals in soil are trace elements, their direct spectral response signals in the Vis-NIR band are weak, and their characteristic absorption peaks are easily masked by the spectral signals of other components such as soil organic matter and clay minerals. Therefore, it is difficult to accurately invert the content of heavy metals in soil by relying solely on single-band extraction or simple statistical models. Summary of the Invention

[0004] This application provides a method, apparatus, equipment, and medium for estimating soil heavy metal content, aiming to solve the technical problems in related technologies, such as the difficulty in accurately inverting soil heavy metal content by relying solely on single-band extraction or simple statistical models.

[0005] In a first aspect, embodiments of this application provide a method for estimating the heavy metal content in soil, the method comprising: Soil samples were collected from the target area, and the content of various heavy metal elements in the soil samples was determined. Obtain visible-near-infrared hyperspectral reflectance data and X-ray fluorescence spectral data of the soil samples; Based on the X-ray fluorescence spectroscopy data, the competition coefficient between heavy metal elements is calculated using the LV competition model, and the multiple heavy metal elements are grouped according to the competition coefficient to determine the set of competing elements that compete with the target heavy metal element. Based on the grouping results of the competition relationship, and taking the band importance of random forest as the basis, a multi-objective optimization function is constructed with the goal of maximizing the spectral response difference of the target heavy metal element and minimizing the spectral response difference of its competing elements. Combined with the wavelength spacing constraint, the NSGA-II algorithm is used to screen out the optimal band ratio feature set from the visible-near infrared hyperspectral reflectance data. A machine learning regression model is trained based on the optimal band ratio feature set and the heavy metal element content. The heavy metal content in the soil of the test area was estimated using the machine learning regression model.

[0006] In one embodiment, optionally, based on the X-ray fluorescence spectroscopy data, the competition coefficient between heavy metal elements is calculated using the LV competition model, including: From the X-ray fluorescence spectral data, the intensity of the first characteristic peak and the intensity of the second characteristic peak corresponding to the first heavy metal element and the second heavy metal element are extracted and defined as the first system state variable and the second system state variable, respectively. Substituting the state variables of the first system and the state variables of the second system into the equations of the LV competition model: Assuming that the first heavy metal element and the second heavy metal element are in a dynamic equilibrium state in the soil, the partial least squares method is used to fit the equations of the LV competition model to obtain the first competition coefficient of the second heavy metal element to the first heavy metal element, and the second competition coefficient of the first heavy metal element to the second heavy metal element. The first and second competition coefficients obtained by fitting are subjected to a significance test. When either competition coefficient is greater than zero and passes the preset significance level test, it is determined that the first heavy metal element and the second heavy metal element have a corresponding competition relationship.

[0007] In one embodiment, optionally, the step of grouping the multiple heavy metal elements according to the competition coefficient to determine the set of competing elements that compete with the target heavy metal element includes: The target heavy metal element is compared with each of the other heavy metal elements in pairs to determine their competitive relationship. All heavy metal elements that compete with the target heavy metal element are grouped together to form the competing element set.

[0008] In one embodiment, optionally, the construction of a multi-objective optimization function aimed at maximizing the spectral response differences of the target heavy metal element and minimizing the spectral response differences of its competing elements includes: The characteristic peak intensities of X-ray fluorescence of each competing element in the set of competing elements are normalized and superimposed. The superposition result is then substituted into the LV model as an overall variable to obtain the comprehensive competition coefficient of the set of competing elements for the target heavy metal element. Based on the comprehensive competition coefficient, a weighted constraint is applied to the optimization objective of minimizing the spectral response differences of competing elements in the multi-objective optimization function. The comprehensive competition coefficient is positively correlated with the constraint strength for minimizing the difference in spectral response of the competing elements.

[0009] In one embodiment, optionally, the basis for band importance in the random forest includes: For each target heavy metal element and its corresponding set of competing elements, a random forest model is constructed. Based on the constructed random forest model, the importance score of each band of the visible-near-infrared hyperspectral spectrum for the prediction of the content of the corresponding heavy metal element is calculated, which is used as the importance of the band.

[0010] In one embodiment, optionally, the multi-objective optimization function include:

[0011] Where M represents the target heavy metal element, {E1, ..., En} represents the set of competing elements, and n represents the number of competing elements. , ) indicates a candidate band pair. This indicates the wavelength band of high importance for the target heavy metal element. The bands represent low importance for the target heavy metal element, and RC represents the band importance calculated based on random forest. The wavelength spacing constraint includes:

[0012] in, This indicates the wavelength spacing constraint. This represents the wavelength value corresponding to the band, and g represents the constraint threshold, which is used to limit the maximum spacing between band pairs.

[0013] In one embodiment, optionally, the NSGA-II algorithm is used to screen the optimal band ratio feature set from the visible-near-infrared hyperspectral reflectance data, including: Using the entire visible-near-infrared hyperspectral spectrum as the search space, all candidate band pairs are traversed. , ); Using the multi-objective optimization function as the fitness function, iterative optimization is performed in conjunction with the wavelength spacing constraint; The output consists of the optimal combination of band pairs that satisfy the non-dominated sorting, forming the optimal band ratio feature set.

[0014] Secondly, embodiments of this application provide a soil heavy metal content estimation device, the device comprising: The acquisition module is used to collect soil samples from the target area and determine the content of various heavy metal elements in the soil samples; The acquisition module is used to acquire visible-near-infrared hyperspectral reflectance data and X-ray fluorescence spectral data of the soil sample; The grouping module is used to calculate the competition coefficient between heavy metal elements based on the X-ray fluorescence spectroscopy data using the LV competition model, and to group the multiple heavy metal elements according to the competition coefficient to determine the set of competing elements that have a competitive relationship with the target heavy metal element. The filtering module is used to construct a multi-objective optimization function based on the band importance of random forest, based on the grouping results of the competition relationship, with the goal of maximizing the spectral response difference of the target heavy metal element and minimizing the spectral response difference of its competing elements. Combined with the wavelength spacing constraint, the NSGA-II algorithm is used to filter out the optimal band ratio feature set from the visible-near infrared hyperspectral reflectance data. The training module is used to train a machine learning regression model based on the optimal band ratio feature set and the heavy metal element content. The estimation module is used to estimate the heavy metal content of the soil in the test area using the machine learning regression model.

[0015] Thirdly, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the above-described method for estimating soil heavy metal content.

[0016] Fourthly, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the steps of the above-described method for estimating the heavy metal content in soil.

[0017] In this invention, soil samples from a target area are collected, and the contents of various heavy metal elements in the soil samples are measured. Visible-near-infrared hyperspectral reflectance data and X-ray fluorescence spectral data of the soil samples are obtained. Based on the X-ray fluorescence spectral data, the competition coefficients between heavy metal elements are calculated using the LV competition model, and the various heavy metal elements are grouped according to the competition coefficients to determine the set of competing elements that compete with the target heavy metal element. Based on the grouping results, a multi-objective optimization function is constructed using the band importance of random forests, aiming to maximize the spectral response difference of the target heavy metal element and minimize the spectral response difference of its competing elements. Combined with wavelength spacing constraints, the NSGA-II algorithm is used to screen the optimal band ratio feature set from the visible-near-infrared hyperspectral reflectance data. A machine learning regression model is trained based on the optimal band ratio feature set and the heavy metal element contents. The machine learning regression model is then used to estimate the heavy metal content in the soil of the test area. This technical solution involves collecting soil samples and determining heavy metal content, simultaneously acquiring dual-source data of visible-near-infrared (Vis-NIR) hyperspectral reflectance and X-ray fluorescence (XRF) spectra, laying a precise data foundation for subsequent heavy metal competition analysis and feature mining. Based on the XRF spectral data, the Lotka-Volterra (LV) competition model is used to quantify and calculate the competition coefficients between heavy metal elements, completing the competition relationship grouping and determining the set of competing elements for the target heavy metal. This achieves a precise characterization of the competition between elements in the coexistence of multiple metals, filling the technical gap in existing research that does not consider heavy metal interactions. Subsequently, based on the importance of random forest bands, a multi-objective optimization function is constructed to maximize the difference in spectral response of the target heavy metal and minimize the difference in spectral response of competing elements, combined with the competition relationship grouping results. Simultaneously, wavelength spacing constraints are introduced, and the NSGA-II algorithm is used to obtain data from Vis-NIR... By selecting the optimal band ratio feature set from hyperspectral reflectance data, the optimal band ratio feature set can be effectively selected to accurately characterize the target heavy metal and significantly suppress the spectral interference of competing elements. This solves the problem of low feature effectiveness caused by traditional feature selection relying on statistical correlation or single-objective optimization. Finally, a machine learning regression model is trained based on the optimal band ratio feature set and heavy metal content and used to estimate the heavy metal content of soil in the test area. The optimized feature set improves the model's fitting and prediction ability of heavy metal content, realizing high-precision and rapid estimation of heavy metal content in mining soil under the background of multi-metal competition and coexistence. Compared with existing estimation methods that model single metals or do not consider competition, the accuracy and stability of the estimation results are greatly improved, providing an efficient and reliable technical means for monitoring heavy metals in soil in multi-metal contaminated areas such as ion-adsorption rare earth mining areas. Attached Figure Description

[0018] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1 A schematic flowchart of a method for estimating soil heavy metal content according to an embodiment of this application is shown.

[0020] Figure 2 A correlation diagram of heavy metal elements according to an embodiment of this application is shown.

[0021] Figure 3 XRF and heavy metal correlation plots are shown according to one embodiment of this application; (a) Cu; (b) Pb; (c) La; (d) Ce.

[0022] Figure 4 The following is a band distribution diagram of multi-target screening according to an embodiment of the present application: (a) Pb; (b) Cu.

[0023] Figure 5 A scatter density plot of band ratios according to an embodiment of this application is shown: Cu: (a); Pb: (b).

[0024] Figure 6 A scatter density plot of the sensitive band according to an embodiment of this application is shown: Cu: (a); Pb: (b).

[0025] Figure 7 A block diagram of a soil heavy metal content estimation device according to an embodiment of this application is shown.

[0026] Figure 8 A schematic diagram of the structure of a computer device according to an embodiment of this application is shown.

[0027] Figure 9 Another structural schematic diagram of a computer device according to an embodiment of this application is shown. Detailed Implementation

[0028] To better understand the technical solution of this application, the embodiments of this application will be described in detail below with reference to the accompanying drawings.

[0029] It should be understood that the described embodiments are merely some, not all, of the embodiments in this application. All other embodiments obtained by those skilled in the art based on the embodiments in this application without inventive effort are within the scope of protection of this application.

[0030] The terminology used in the embodiments of this application is for the purpose of describing particular embodiments only and is not intended to be limiting of this application. The singular forms “a,” “the,” and “the” used in the embodiments of this application and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise.

[0031] The following detailed description of some embodiments of this application is provided in conjunction with the accompanying drawings. Unless otherwise specified, the following embodiments and features can be combined with each other.

[0032] Please see Figure 1 , Figure 1 A schematic flowchart of a method for estimating soil heavy metal content according to an embodiment of this application is shown.

[0033] like Figure 1 As shown, a method for estimating soil heavy metal content according to an embodiment of this application includes: Step S101: Collect soil samples from the target area and determine the content of various heavy metal elements in the soil samples.

[0034] Select representative sample points in the area to be tested and collect surface soil samples from 0-20cm. Furthermore, the soil samples underwent pretreatment operations such as drying, grinding, digestion, acid removal, dilution, and volume adjustment; the content of various heavy metals in the soil samples was measured using an ICP-MS instrument. Step S102: Obtain visible-near-infrared hyperspectral reflectance data and X-ray fluorescence spectral data of the soil sample; The soil samples were dried, ground, sieved (≤2mm), and placed into black sample boxes; Furthermore, the VIS-NIR spectra of the samples were measured in a darkroom environment. A 50W halogen lamp was used as the sole light source, illuminating the sample surface at a zenith angle of 45°. A 4° field-of-view lens was used on the spectrometer, and the sensor probe was positioned 15cm vertically above the sample to ensure that the sensor received only the reflectance spectrum of the soil sample. Whiteboard calibration was performed every 10 minutes during the measurement process. Each sample was measured twice, and the average value was taken as the final spectral data for that sample. Furthermore, the sample is placed in a special sample cup and compacted to ensure a smooth surface. The top of the sample cup is covered with a polypropylene film for XRF spectroscopy measurement. Each scan lasts 60 seconds, and each sample is tested twice. The average spectrum is then used for analysis.

[0035] Furthermore, each soil spectral reflectance dataset was preprocessed with smoothing; the samples were randomly divided into a training set (75%) and a validation set (25%), and the soil heavy metal content was used as a label.

[0036] Step S103: Based on the X-ray fluorescence spectroscopy data, the competition coefficient between heavy metal elements is calculated using the LV competition model, and the multiple heavy metal elements are grouped according to the competition coefficient to determine the set of competing elements that compete with the target heavy metal element. In one embodiment, optionally, based on the X-ray fluorescence spectroscopy data, the competition coefficient between heavy metal elements is calculated using the LV competition model, including: From the X-ray fluorescence spectral data, the intensity of the first characteristic peak and the intensity of the second characteristic peak corresponding to the first heavy metal element and the second heavy metal element are extracted and defined as the first system state variable and the second system state variable, respectively. Substituting the state variables of the first system and the state variables of the second system into the equations of the LV competition model: Assuming that the first heavy metal element and the second heavy metal element are in a dynamic equilibrium state in the soil, the partial least squares method is used to fit the equations of the LV competition model to obtain the first competition coefficient of the second heavy metal element to the first heavy metal element, and the second competition coefficient of the first heavy metal element to the second heavy metal element. The first and second competition coefficients obtained by fitting are subjected to a significance test. When either competition coefficient is greater than zero and passes the preset significance level test, it is determined that the first heavy metal element and the second heavy metal element have a corresponding competition relationship.

[0037] In one embodiment, optionally, the step of grouping the multiple heavy metal elements according to the competition coefficient to determine the set of competing elements that compete with the target heavy metal element includes: The target heavy metal element is compared with each of the other heavy metal elements in pairs to determine their competitive relationship. All heavy metal elements that compete with the target heavy metal element are grouped together to form the competing element set.

[0038] The Lotka–Volterra (LV) model is a classic ecological competition model, originally used to describe interspecific competition under conditions of limited resources, and can quantitatively characterize the interactions between multiple objects in a system. In soil environments where multiple metals coexist, heavy metal elements may compete, cooperate, or antagonize at adsorption sites or coordination channels. These interactions affect the migration and transformation of elements in the soil and may also reduce the reliability of variable selection during modeling. This study introduces the LV model into the analysis of heavy metal element relationships in soil to determine the intensity and direction of competition between typical elements. Under the assumption that the system is in a dynamic equilibrium state, the intensity of characteristic peaks of elemental XRF spectra is regarded as the system state variable, and a competition model between two types of elements is established, the mathematical form of which is:

[0039] in, and These represent the XRF spectral response values ​​of the two types of elements, respectively. , Indicates the growth rate; , This refers to the environmental carrying capacity (i.e., the spectral response saturation threshold). , Let be the competition coefficient, where This represents the competition intensity coefficient between element 2 and element 1, while This represents the competition intensity coefficient between element 1 and element 2. If α > 0 and passes the significance test (p < 0.05), it can be interpreted as a competitive relationship existing. Model parameters , , , , , Partial least squares estimation was used, and a regression model was constructed with the XRF characteristic peak intensity as the independent variable. Based on the calculated competition coefficient... and Elements with competing relationships can be divided into two groups. For example, when the target monitored element is Pb, the competition coefficients between Cu, La, Ce, and Pb can be calculated separately to obtain three corresponding groups. and If all three sets of coefficients are positive, Cu-La-Ce is divided into one group and Pb into another. Based on this, the XRF characteristic peak intensities of each element in Cu-La-Ce are normalized and superimposed, and then substituted into the LV model to fit the parameters with Pb, thereby calculating the comprehensive competition coefficient of Cu-La-Ce to Pb, so as to better reflect the competition effect between elements in the spectral feature screening.

[0040] Step S104: Based on the grouping results of the competition relationship, and based on the band importance of the random forest, a multi-objective optimization function is constructed with the goal of maximizing the spectral response difference of the target heavy metal element and minimizing the spectral response difference of its competing elements. Combined with the wavelength spacing constraint, the NSGA-II algorithm is used to screen out the optimal band ratio feature set from the visible-near-infrared hyperspectral reflectance data. Furthermore, based on the grouping results of element competition relationships determined by the LV model, heavy metal elements are grouped, and feature screening is carried out using a multi-objective optimization algorithm.

[0041] Furthermore, given the large number of hyperspectral bands and the nonlinear relationships between variables, a random forest model is used to calculate the importance of bands in order to objectively quantify the importance of different bands to each element.

[0042] Furthermore, based on the Vis-NIR spectral reflectance, the characteristic band ratio R( ) / R( The target metal is M. This indicates a band that is sensitive to the monitored target metal M. The bands with weaker sensitivity to the target metal are represented by R, and the reflectance of the band is represented by R. The core idea of ​​this model is to automatically select the optimal band pair suitable for the inversion of the target heavy metal content from a large set of hyperspectral bands through constraints and optimization strategies. Its basic assumptions include two aspects: (1) For the target heavy metal, the two bands in the ideal band pair should show a large difference in band importance, that is, find a band with high importance and a band with low importance; (2) For competing elements, the two bands in the ideal band pair should show similar band importance, that is, the difference in spectral importance of the two bands on the competing elements is small. The Non-dominated Sorting Genetic Algorithm II (NSGA-II) is used to solve the problem. Equation (2) defines the objective function equation corresponding to the monitoring target:

[0043] Where the target element is M, the set of interfering elements is I = {E1, ..., En}, and n is the number of competing elements. This set is determined by the competition grouping results calculated by the LV model. For example, if the grouping results are Cu, La, Ce in one group and Pb in another, then when monitoring Pb, there are three competing elements. Candidate band pair X = , Indicates the band of high importance to the target element. The bands represent low importance to the target element, and RC represents the band importance calculated based on random forest. Equation (3) defines the constraint condition Φ(X).

[0044] Furthermore, considering that the reflectance of the soil background changes relatively gradually within adjacent wavelength ranges, a wavelength spacing constraint is introduced to reduce the interference of other substances on the spectral ratio, i.e., the wavelength difference between the two selected bands should be as small as possible. The constraint condition Φ(X) is defined in equation (3):

[0045] Here, WV refers to the wavelength value corresponding to the band, and g is the constraint threshold used to limit the maximum spacing between band pairs, achieving a balance between model complexity and robustness.

[0046] Step S105: Based on the optimal band ratio feature set and the heavy metal element content, a machine learning regression model is trained. Furthermore, a machine learning regression model is established based on the training samples to correlate the spectral index with the target metal content; Furthermore, the accuracy of the regression model is validated using validation samples; Further, model accuracy validation: selecting determination coefficients. The root mean square error (RMSE) (Formula 4) and the root mean square error (RMSE) (Formula 5) are used as evaluation indicators to obtain the accuracy of the regression model validation.

[0047] (4) (5) In the formula, n is the number of samples; This is a predicted value; These are measured values; This is the mean.

[0048] Step S106: Use the machine learning regression model to estimate the heavy metal content of the soil in the test area.

[0049] This invention utilizes visible-near-infrared (Vis-NIR) spectral data, XRF spectral data, and heavy metal content data of soil from mining areas. It introduces the Lotka-Volterra (LV) model to characterize the competitive relationships among multiple metals, constructing a multi-objective feature mining and modeling method that considers the spectral competitive mechanisms among heavy metals in the soil. The LV model is used to competitively group heavy metals, and a Non-Dominated Sorting Genetic Algorithm II (NSGA-II) is combined for band feature screening to obtain band indices that effectively distinguish target metals and suppress the influence of interfering metals. Finally, a machine learning regression model is used to achieve high-precision estimation of soil heavy metal content. Compared with existing spectral inversion methods based only on a single metal or without considering competitive relationships, this invention can more accurately reveal the spectral response mechanism under multi-metal coexistence conditions, significantly improving the inversion accuracy and stability of the model.

[0050] The technical solution of the present invention will be described in detail below with reference to a specific embodiment.

[0051] 153 soil samples were collected in the field. Taking Cu and Pb as examples, their contents were used as labels. Soil reflectance data and XRF spectral data were obtained using the method described above and preprocessed. The contents of rare earth elements La and Ce in the soil were also measured. The Pearson correlation between Cu, Pb, La, Ce contents and soil reflectance spectra was calculated, showing that… Figure 2 Cu, La, and Ce exhibit similar spectral responses and are all positively correlated with soil spectral reflectance, thus they can be grouped together. Pb, however, shows relatively independent spectral response characteristics and is negatively correlated with soil spectral reflectance. The opposite correlation trends between the two groups of elements in the key absorption bands indicate that when Cu-La-Ce binds more strongly to mineral components, the binding ability of Pb relatively weakens, suggesting a possible competitive adsorption effect between the elements.

[0052] Based on this, to investigate the competitive relationships among elements in the soil, the correlation between different elements and XRF spectral reflectance under different energy distributions was calculated, such as... Figure 3 (a)- Figure 3 As shown in (d). When inner-shell electrons of an element are excited and transition, the X-rays of a specific energy are detected, and different elements have unique atomic energy level differences. The XRF characteristic peak intensities of four elements (Cu 8.05 keV, Pb 10.55 keV, La 4.65 keV, Ce 4.84 keV) were used as G1 and G2 in formula (1) for calculation, and partial least squares method was used for fitting. The results show that there is a significant competitive effect between the two groups of elements (Group 1: Cu, La, Ce, Group 2: Pb) (p<0.01), and the competition coefficient α 12=0.13 (the effect of Pb on Cu-La-Ce), α 21 =0.20 (Effect of Cu-La-Ce on Pb). The inhibitory effect of Cu-La-Ce on Pb is 54.8% stronger than the reverse effect, indicating an asymmetric competitive relationship.

[0053] Based on this, a multi-objective optimization model was constructed according to the element grouping results. Competitive constraints were introduced, and combined with the NSGA-II optimization strategy, a total of 46 optimal band ratios for Cu and 50 optimal band ratios for Pb were finally selected. Figure 4 ).like Figure 4 As shown, the sensitive wavelength range of Cu ( Figure 4 (a) The distribution is relatively scattered, and the sensitive band of Pb is ( Figure 4 (b) The main concentration is in the 800–1000 nm region. Overall, multi-target feature screening significantly improved the correlation between features and soil heavy metal content (Table 1). The feature correlation was significantly improved; for Cu, the highest absolute correlation value increased from 0.32 in the original band to 0.70 in the feature band ratio, and for Pb, the highest absolute correlation value increased from 0.33 in the original band to 0.49. Figure 5 and Figure 6 As shown, a scatter density plot was drawn using sensitive bands and band ratio characteristics with heavy metal content. Figure 6 (a) and Figure 6 (b) It can be seen that the scatter distribution of single-band and metal content is relatively dispersed, especially in the low-content area where the point distribution is discrete, while Figure 5 (a) and Figure 5 (b) shows that the band ratio features and heavy metal content after multi-target screening exhibit a significant linear distribution trend, with concentrated point cloud distribution. The machine learning regression model built based on the feature band ratio improves the prediction accuracy of Cu. The accuracy of the Pb prediction was 0.77, with an RMSE of 6.35 mg / kg. The value was 0.75, and the RMSE was 38.47 mg / kg.

[0054] Table 1. Correlation between characteristics and soil heavy metal content

[0055] In the above technical solution, soil samples are collected and heavy metal content is measured. Simultaneously, dual-source data of visible-near-infrared (Vis-NIR) hyperspectral reflectance and X-ray fluorescence (XRF) spectra are acquired, laying a precise data foundation for subsequent heavy metal competition analysis and feature mining. Then, based on the XRF spectral data, the Lotka-Volterra (LV) competition model is used to quantify and calculate the competition coefficients between heavy metal elements, completing the grouping of competition relationships and determining the set of competing elements for the target heavy metal. This achieves a precise characterization of the competition between elements under multi-metal coexistence, filling the technical gap in existing research that does not consider heavy metal interactions. Subsequently, based on the importance of random forest bands, a multi-objective optimization function is constructed, maximizing the difference in spectral response of the target heavy metal and minimizing the difference in spectral response of competing elements, combined with the competition relationship grouping results. Simultaneously, wavelength spacing constraints are introduced, and the NSGA-II algorithm is used to analyze Vis-NIR... By selecting the optimal band ratio feature set from hyperspectral reflectance data, the optimal band ratio feature set can be effectively selected to accurately characterize the target heavy metal and significantly suppress the spectral interference of competing elements. This solves the problem of low feature effectiveness caused by traditional feature selection relying on statistical correlation or single-objective optimization. Finally, a machine learning regression model is trained based on the optimal band ratio feature set and heavy metal content and used to estimate the heavy metal content of soil in the test area. The optimized feature set improves the model's fitting and prediction ability of heavy metal content, realizing high-precision and rapid estimation of heavy metal content in mining soil under the background of multi-metal competition and coexistence. Compared with existing estimation methods that model single metals or do not consider competition, the accuracy and stability of the estimation results are greatly improved, providing an efficient and reliable technical means for monitoring heavy metals in soil in multi-metal contaminated areas such as ion-adsorption rare earth mining areas.

[0056] like Figure 7 As shown, in a second aspect, embodiments of this application provide a soil heavy metal content estimation device 70, the device comprising: The acquisition module 71 is used to collect soil samples from the target area and determine the content of various heavy metal elements in the soil samples; Acquisition module 72 is used to acquire visible-near-infrared hyperspectral reflectance data and X-ray fluorescence spectral data of the soil sample; Grouping module 73 is used to calculate the competition coefficient between heavy metal elements based on the X-ray fluorescence spectral data using the LV competition model, and to group the multiple heavy metal elements according to the competition coefficient to determine the set of competing elements that have a competitive relationship with the target heavy metal element. The filtering module 74 is used to construct a multi-objective optimization function based on the band importance of random forest, based on the grouping results of the competition relationship, with the goal of maximizing the spectral response difference of the target heavy metal element and minimizing the spectral response difference of its competing elements. Combined with the wavelength spacing constraint, the NSGA-II algorithm is used to filter out the optimal band ratio feature set from the visible-near infrared hyperspectral reflectance data. Training module 75 is used to train a machine learning regression model based on the optimal band ratio feature set and the heavy metal element content. The estimation module 76 is used to estimate the heavy metal content of the soil in the test area using the machine learning regression model.

[0057] In one embodiment, optionally, based on the X-ray fluorescence spectroscopy data, the competition coefficient between heavy metal elements is calculated using the LV competition model, including: From the X-ray fluorescence spectral data, the intensity of the first characteristic peak and the intensity of the second characteristic peak corresponding to the first heavy metal element and the second heavy metal element are extracted and defined as the first system state variable and the second system state variable, respectively. Substituting the state variables of the first system and the state variables of the second system into the equations of the LV competition model: Assuming that the first heavy metal element and the second heavy metal element are in a dynamic equilibrium state in the soil, the partial least squares method is used to fit the equations of the LV competition model to obtain the first competition coefficient of the second heavy metal element to the first heavy metal element, and the second competition coefficient of the first heavy metal element to the second heavy metal element. The first and second competition coefficients obtained by fitting are subjected to a significance test. When either competition coefficient is greater than zero and passes the preset significance level test, it is determined that the first heavy metal element and the second heavy metal element have a corresponding competition relationship.

[0058] In one embodiment, optionally, the step of grouping the multiple heavy metal elements according to the competition coefficient to determine the set of competing elements that compete with the target heavy metal element includes: The target heavy metal element is compared with each of the other heavy metal elements in pairs to determine their competitive relationship. All heavy metal elements that compete with the target heavy metal element are grouped together to form the competing element set.

[0059] In one embodiment, optionally, the construction of a multi-objective optimization function aimed at maximizing the spectral response differences of the target heavy metal element and minimizing the spectral response differences of its competing elements includes: The characteristic peak intensities of X-ray fluorescence of each competing element in the set of competing elements are normalized and superimposed. The superposition result is then substituted into the LV model as an overall variable to obtain the comprehensive competition coefficient of the set of competing elements for the target heavy metal element. Based on the comprehensive competition coefficient, a weighted constraint is applied to the optimization objective of minimizing the spectral response differences of competing elements in the multi-objective optimization function. The comprehensive competition coefficient is positively correlated with the constraint strength for minimizing the difference in spectral response of the competing elements.

[0060] In one embodiment, optionally, the basis for band importance in the random forest includes: For each target heavy metal element and its corresponding set of competing elements, a random forest model is constructed. Based on the constructed random forest model, the importance score of each band of the visible-near-infrared hyperspectral spectrum for the prediction of the content of the corresponding heavy metal element is calculated, which is used as the importance of the band.

[0061] In one embodiment, optionally, the multi-objective optimization function include:

[0062] Where M represents the target heavy metal element, {E1, ..., En} represents the set of competing elements, and n represents the number of competing elements. , ) indicates a candidate band pair. This indicates the wavelength band of high importance for the target heavy metal element. The bands represent low importance for the target heavy metal element, and RC represents the band importance calculated based on random forest. The wavelength spacing constraint includes:

[0063] in, This indicates the wavelength spacing constraint. This represents the wavelength value corresponding to the band, and g represents the constraint threshold, which is used to limit the maximum spacing between band pairs.

[0064] In one embodiment, optionally, the NSGA-II algorithm is used to screen the optimal band ratio feature set from the visible-near-infrared hyperspectral reflectance data, including: Using the entire visible-near-infrared hyperspectral spectrum as the search space, all candidate band pairs are traversed. , ); Using the multi-objective optimization function as the fitness function, iterative optimization is performed in conjunction with the wavelength spacing constraint; The output consists of the optimal combination of band pairs that satisfy the non-dominated sorting, forming the optimal band ratio feature set.

[0065] Thirdly, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the above-described method for estimating soil heavy metal content.

[0066] Fourthly, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the steps of the above-described method for estimating the heavy metal content in soil.

[0067] Specific limitations regarding the soil heavy metal content estimation device can be found in the limitations of the soil heavy metal content estimation method described above, and will not be repeated here. Each module in the aforementioned soil heavy metal content estimation device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in the processor of a computer device in hardware form or independently of the processor, or stored in the memory of a computer device in software form, so that the processor can call and execute the corresponding operations of each module.

[0068] The software tools or components not belonging to our company that appear in the embodiments of this application are merely examples and do not represent actual use.

[0069] In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 8 As shown, the computer device includes a processor, memory, network interface, and database connected via a system bus. The processor provides computational and control capabilities. The memory includes non-volatile and / or volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface is used to communicate with external clients via a network connection. When the computer program is executed by the processor, it implements the functions or steps of a soil heavy metal content estimation method on the server side.

[0070] In one embodiment, a computer device is provided, which may be a client, and its internal structure diagram may be as follows: Figure 9As shown, the computer device includes a processor, memory, network interface, display screen, and input devices connected via a system bus. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface is used to communicate with an external server via a network connection. When the computer program is executed by the processor, it implements the functions or steps of a soil heavy metal content estimation method on the client side.

[0071] It should be understood that the processor can be a Central Processing Unit (CPU), but it can also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Among these, a general-purpose processor can be a microprocessor or any conventional processor.

[0072] In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the method described in any one of the first aspect embodiments.

[0073] It should be noted that the functions or steps that can be implemented by the computer-readable storage medium or electronic device described above can be referred to the relevant descriptions in the foregoing method embodiments. To avoid repetition, they will not be described one by one here.

[0074] It should be understood that the term "and / or" used in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. Additionally, the character " / " in this article generally indicates that the preceding and following related objects have an "or" relationship.

[0075] It should be understood that although the terms "first," "second," etc., may be used to describe the setting units in the embodiments of this application, these setting units should not be limited to these terms. These terms are only used to distinguish the setting units from each other. For example, without departing from the scope of the embodiments of this application, the first setting unit may also be referred to as the second setting unit, and similarly, the second setting unit may also be referred to as the first setting unit.

[0076] Depending on the context, the word "if" as used here can be interpreted as "when," "when," "in response to determination," or "in response to detection." Similarly, depending on the context, the phrase "if determination" or "if detection (of the stated condition or event)" can be interpreted as "when determination," "in response to determination," "when detection (of the stated condition or event)," or "in response to detection (of the stated condition or event)."

[0077] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0078] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or in a combination of hardware and software functional units.

[0079] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

[0080] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.

Claims

1. A method for estimating the heavy metal content in soil, characterized in that, The method includes: Soil samples were collected from the target area, and the content of various heavy metal elements in the soil samples was determined. Obtain visible-near-infrared hyperspectral reflectance data and X-ray fluorescence spectral data of the soil samples; Based on the X-ray fluorescence spectroscopy data, the competition coefficient between heavy metal elements is calculated using the LV competition model, and the multiple heavy metal elements are grouped according to the competition coefficient to determine the set of competing elements that compete with the target heavy metal element. Based on the grouping results of the competition relationship, and taking the band importance of random forest as the basis, a multi-objective optimization function is constructed with the goal of maximizing the spectral response difference of the target heavy metal element and minimizing the spectral response difference of its competing elements. Combined with the wavelength spacing constraint, the NSGA-II algorithm is used to screen out the optimal band ratio feature set from the visible-near infrared hyperspectral reflectance data. A machine learning regression model is trained based on the optimal band ratio feature set and the heavy metal element content. The heavy metal content in the soil of the test area was estimated using the machine learning regression model.

2. The method according to claim 1, characterized in that, Based on the X-ray fluorescence spectroscopy data, the competition coefficients between heavy metal elements were calculated using the LV competition model, including: From the X-ray fluorescence spectral data, the intensity of the first characteristic peak and the intensity of the second characteristic peak corresponding to the first heavy metal element and the second heavy metal element are extracted and defined as the first system state variable and the second system state variable, respectively. Substituting the state variables of the first system and the state variables of the second system into the equations of the LV competition model: Assuming that the first heavy metal element and the second heavy metal element are in a dynamic equilibrium state in the soil, the partial least squares method is used to fit the equations of the LV competition model to obtain the first competition coefficient of the second heavy metal element to the first heavy metal element, and the second competition coefficient of the first heavy metal element to the second heavy metal element. The first and second competition coefficients obtained by fitting are subjected to a significance test. When either competition coefficient is greater than zero and passes the preset significance level test, it is determined that the first heavy metal element and the second heavy metal element have a corresponding competition relationship.

3. The method according to claim 1, characterized in that, The step of grouping the various heavy metal elements according to the competition coefficient to determine the set of competing elements that compete with the target heavy metal element includes: The target heavy metal element is compared with each of the other heavy metal elements in pairs to determine their competitive relationship. All heavy metal elements that compete with the target heavy metal element are grouped together to form the competing element set.

4. The method according to claim 3, characterized in that, The construction of the multi-objective optimization function, which aims to maximize the spectral response difference of the target heavy metal element and minimize the spectral response difference of its competing elements, includes: The characteristic peak intensities of X-ray fluorescence of each competing element in the set of competing elements are normalized and superimposed. The superposition result is then substituted into the LV model as an overall variable to obtain the comprehensive competition coefficient of the set of competing elements for the target heavy metal element. Based on the comprehensive competition coefficient, a weighted constraint is applied to the optimization objective of minimizing the spectral response differences of competing elements in the multi-objective optimization function. The comprehensive competition coefficient is positively correlated with the constraint strength for minimizing the difference in spectral response of the competing elements.

5. The method according to claim 1, characterized in that, The method based on the band importance of random forests includes: For each target heavy metal element and its corresponding set of competing elements, a random forest model is constructed. Based on the constructed random forest model, the importance score of each band of the visible-near-infrared hyperspectral spectrum for the prediction of the content of the corresponding heavy metal element is calculated, which is used as the importance of the band.

6. The method according to claim 1, characterized in that, The multi-objective optimization function include: Where M represents the target heavy metal element, {E1, ..., En} represents the set of competing elements, and n represents the number of competing elements. , ) indicates a candidate band pair. This indicates the wavelength band of high importance for the target heavy metal element. The bands represent low importance for the target heavy metal element, and RC represents the band importance calculated based on random forest. The wavelength spacing constraint includes: in, This indicates the wavelength spacing constraint. This represents the wavelength value corresponding to the band, and g represents the constraint threshold, which is used to limit the maximum spacing between band pairs.

7. The method according to claim 6, characterized in that, The NSGA-II algorithm was used to select the optimal band ratio feature set from the visible-near-infrared hyperspectral reflectance data, including: Using the entire visible-near-infrared hyperspectral spectrum as the search space, all candidate band pairs are traversed. , ); Using the multi-objective optimization function as the fitness function, iterative optimization is performed in conjunction with the wavelength spacing constraint; The output consists of the optimal combination of band pairs that satisfy the non-dominated sorting, forming the optimal band ratio feature set.

8. A soil heavy metal content estimation device, characterized in that, The device includes: The acquisition module is used to collect soil samples from the target area and determine the content of various heavy metal elements in the soil samples; The acquisition module is used to acquire visible-near-infrared hyperspectral reflectance data and X-ray fluorescence spectral data of the soil sample; The grouping module is used to calculate the competition coefficient between heavy metal elements based on the X-ray fluorescence spectroscopy data using the LV competition model, and to group the multiple heavy metal elements according to the competition coefficient to determine the set of competing elements that have a competitive relationship with the target heavy metal element. The filtering module is used to construct a multi-objective optimization function based on the band importance of random forest, based on the grouping results of the competition relationship, with the goal of maximizing the spectral response difference of the target heavy metal element and minimizing the spectral response difference of its competing elements. Combined with the wavelength spacing constraint, the NSGA-II algorithm is used to filter out the optimal band ratio feature set from the visible-near infrared hyperspectral reflectance data. The training module is used to train a machine learning regression model based on the optimal band ratio feature set and the heavy metal element content. The estimation module is used to estimate the heavy metal content of the soil in the test area using the machine learning regression model.

9. A computer device, characterized in that, include: At least one processor; And, a memory communicatively connected to the at least one processor; The memory stores instructions that can be executed by the at least one processor; The instructions are configured to perform the method described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The device stores computer-executable instructions for performing the method as described in any one of claims 1 to 7.