Predicted doping of Pr 3+ AdaBoost ensemble learning method for emission wavelengths of luminescent materials

By employing the AdaBoost ensemble learning method, the problems of inefficiency and low accuracy in predicting the emission wavelength of Pr3+-doped luminescent materials have been solved, achieving efficient and economical prediction and promoting the development and application of luminescent materials.

CN118863101BActive Publication Date: 2026-06-26GUILIN UNIV OF ELECTRONIC TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GUILIN UNIV OF ELECTRONIC TECH
Filing Date
2024-07-22
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies for predicting the emission wavelength of Pr3+-doped luminescent materials suffer from problems such as long experimental time, high cost, cumbersome operation, and low prediction accuracy. Traditional methods lack flexibility and precision, hindering the development and application of materials.

Method used

The AdaBoost ensemble learning method is adopted. By collecting data from multiple sources, a dataset containing feature vectors is constructed. The AdaBoost algorithm is used to build a model, optimize the prediction model, limit the depth of the decision tree and set the node parameters to prevent overfitting, improve the model's generalization ability, and ensure the reproducibility of the results.

Benefits of technology

It significantly reduces experimental time and cost, improves the accuracy and reliability of predictions, and can more comprehensively and accurately capture the complex characteristics of Pr3+ doped luminescent materials, optimize material design, and promote the rapid development and application of luminescent materials.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118863101B_ABST
    Figure CN118863101B_ABST
Patent Text Reader

Abstract

The application belongs to the technical field of luminescent material emission wavelength prediction, and discloses a method for predicting the emission wavelength of a doped Pr 3+ The application first collects descriptors of luminescent materials from multiple channels as input data of the model; then, the data is screened and converted; then, the data is divided into input variables and target variables, and divided into a training set and a test set; finally, the model is established based on the AdaBoost algorithm, the combination of multiple decision tree regressors is learned through iteration, the model performance is optimized by adjusting the parameters of the decision tree regressors, and the fitting degree of the model is quantified by calculating the determination coefficient R 2 , which effectively reduces the experimental time and cost, and improves the prediction accuracy and reliability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of emission wavelength prediction technology for luminescent materials, specifically to a method for predicting the emission wavelength of doped Pr. 3+ An AdaBoost ensemble learning method for the emission wavelength of luminescent materials. Background Technology

[0002] Pr doping 3+ The preparation of luminescent materials faces numerous challenges, including lengthy experimental trials, high experimental costs, and cumbersome procedures. These processes require significant experimental time and equipment usage, limiting the efficiency and cost-effectiveness of materials research and development. Traditional methods for predicting the emission wavelength of luminescent materials typically rely on empirical formulas or simple regression models. These methods are less effective for Pr-doped materials. 3+ The prediction of emission wavelengths for complex luminescent materials lacks sufficient flexibility and accuracy, and the predicted Pr doping wavelengths are... 3+ The low prediction accuracy of the emission wavelength of luminescent materials hinders the development of Pr doping. 3+ The development and application of luminescent materials. Summary of the Invention

[0003] This invention aims to provide a method for predicting Pr doping. 3+ The AdaBoost ensemble learning method for emissive material emission wavelengths constructs a complete dataset containing various feature vectors by collecting literature and experimental data and using FullProf software to compute relevant data. Then, the AdaBoost ensemble learning model is used to train and optimize the dataset, learning complex nonlinear relationships from a large amount of experimental data to optimize the prediction model. This method effectively reduces experimental time and cost while improving the accuracy and reliability of predictions.

[0004] To achieve the above objectives, the present invention provides the following technical solution:

[0005] Predicted doping of Pr 3+ The AdaBoost ensemble learning method for the emission wavelength of luminescent materials includes the following steps:

[0006] S1. Dataset Acquisition

[0007] Collect Pr doped literature 3+ The descriptor of the luminescent material, each material including its corresponding descriptor is a separate data entry;

[0008] S2, Dataset Preprocessing

[0009] If a material does not have an existing CIF file in the existing literature, delete the record. The crystal system of the material is in English characters and converted into numbers that can be recognized by a computer. Different numbers represent different crystal systems.

[0010] S3, Model Establishment

[0011] The first column of the dataset after preprocessing in step S2 is used as the target variable, and the remaining columns are used as input variables to form a sample dataset. Then, these sample datasets are divided into training and test sets according to the proportions. Finally, a model is built based on AdaBoost.

[0012] S4, Model Training

[0013] S5, Evaluation Model

[0014] Output the coefficients of determination R for the training and test sets. 2 The value of R; where R 2 The value of is between 0 and 1, and R 2 The closer the value is to 1, the better the model fits the data.

[0015] Furthermore, in S1, the datasets are obtained through various means, including literature, Springer Materials and Materials Project databases, FullProf software, the pymatgen and periodicable libraries in Python, and ChatGPT. Specifically, the matrix of the luminescent material, along with its emission wavelength, crystal structure, optimal concentration, and band gap, are extracted from the literature or Springer Materials and Materials Project databases. The Bond_str module in FullProf software calculates the coordination number, bond length, and average bond valence of the material based on its CIF file. The pymatgen and periodicable libraries in Python automatically obtain the average atomic radius of the material. ChatGPT supplements the band gap information based on the material's compound and crystal system.

[0016] Furthermore, the Python libraries pymatgen and periodicable automatically obtain the average atomic radius of materials as follows: First, accept an Excel file, with the first column containing chemical formulas as input; then, use the Composition class to parse the chemical formulas and obtain the composition of each element; finally, obtain the atomic radius of each element and calculate the average atomic radius.

[0017] Furthermore, in S3, to evaluate the model's performance, R² is chosen as the statistical indicator for assessing the goodness of fit of the regression model; R 2 R measures how well a model explains the variability of the dependent variable, i.e., how well the model fits the observed data. 2 The formula for calculation is:

[0018] R 2=1-SS res / SS tot

[0019] In the formula, SS res Sum of squared residuals, representing the sum of squares of the differences between model predictions and actual observations; SS tot The total sum of squares represents the sum of squares of the differences between the dependent variable and its mean.

[0020] 5. A method for predicting Pr doping according to claim 1 3+ The AdaBoost ensemble learning method for the emission wavelength of luminescent materials is characterized by the following steps in S3 to S5: model building, training, and evaluation.

[0021] A1. Divide the dataset into a training set and a test set in a 9:1 ratio;

[0022] A2. Standardize the feature data to ensure a uniform scale during model training. Use the StandardScaler method from the sklearn library to standardize the data, eliminating differences in scale between features and ensuring that the weight of each feature proportionally affects the model's prediction results during training. The standardization formula is as follows:

[0023] z=(x-μ) / σ

[0024] In the formula, x is the training data value, μ is the mean of the training data, σ is the standard deviation of the training data, and z is the standardized data value;

[0025] A3. Create a decision tree regressor and set the following parameters to control the model's complexity and generalization ability:

[0026] max_depth=3: Limits the maximum depth of the decision tree to 3 to avoid the model overfitting the training data;

[0027] min_samples_split=10: Requires at least 10 samples before splitting a node to ensure that each split can effectively improve the model's predictive ability;

[0028] min_samples_leaf = 1: Each leaf node contains at least one sample to avoid making the model too complex during training and to improve generalization ability;

[0029] min_weight_fraction_leaf = 0.001: Sets the minimum weight of leaf node samples to 0.001 to control model complexity and prevent overfitting;

[0030] ccp_alpha = 0.5: Use a complexity pruning parameter of 0.5 to adjust the structure of the decision tree to ensure that the model can effectively generalize when dealing with complex data;

[0031] A4. Using the decision tree regressor described above as the base learner, create an AdaBoost regression model:

[0032] n_estimators = 65: The number of base learners;

[0033] random_state=10: Random seed, ensuring that the results can be repeated;

[0034] The AdaBoost algorithm learns a combination of multiple base learners iteratively, with each base learner focusing on data points that were misclassified in the previous round, thereby continuously improving the overall model performance; a random seed of 10 is set to ensure the repeatability of model training, so that the same random results are produced each time the model is run.

[0035] A5. Train the AdaBoost model;

[0036] A6. Output the coefficients of determination R for the training and test sets. 2 Values ​​are used to evaluate the model;

[0037] A7. Create a scatter plot to show the relationship between the true and predicted values; use different colored points to represent the training and / or test sets, and dashed lines to represent perfect fits; and display the R-squared values ​​for the training and test sets in the plot. 2 value.

[0038] The principles and beneficial effects of the technical solution are as follows:

[0039] The predictive doping Pr provided by this invention 3+ The AdaBoost ensemble learning method for emissive material emission wavelengths first collects descriptors of luminescent materials from various sources (literature, databases, software, etc.) as input data for the model. Next, the data is filtered and transformed to ensure its integrity and consistency for subsequent model processing. Then, the preprocessed data is divided into input and target variables, and further divided into training and test sets. Finally, a model is built based on the AdaBoost algorithm, iteratively learning a combination of multiple decision tree regressors, optimizing model performance by adjusting the parameters of the decision tree regressors, and calculating the decision coefficient R0. 2 To quantify the model's fit.

[0040] The predictive doping Pr provided by this invention 3+The AdaBoost ensemble learning method for emissive materials emission wavelengths, by combining the prediction results of multiple base learners using the AdaBoost ensemble learning algorithm, can significantly improve the accuracy of emissive materials for doped Pr. 3 +Predicted accuracy of emission wavelength of luminescent materials; By limiting the maximum depth of the decision tree and setting parameters such as the number of samples for node splits, overfitting of the model can be effectively prevented and the model's generalization ability to new data can be enhanced; By setting random seeds, the repeatability of model training can be ensured, so that the same results can be obtained each time; By creating visualization methods such as scatter plots, the relationship between the model's prediction results and the actual observations can be intuitively displayed, thereby improving the interpretability of the model.

[0041] In summary, the predictive doping Pr provided by this invention... 3+ The AdaBoost ensemble learning method for the emission wavelength of luminescent materials effectively reduces experimental time and cost, and improves the accuracy and reliability of predictions. It can more comprehensively and accurately capture the emission wavelength of doped Pr. 3+ The complex properties of luminescent materials also allow for personalized optimization of material design to meet the needs of different application scenarios; this provides an efficient, economical, and reliable solution for material design and development, and promotes the rapid development and application of luminescent materials. Attached Figure Description

[0042] Figure 1 For the purpose of this invention, predicting Pr doping 3+ A flowchart of an integrated learning method for the emission wavelength of luminescent materials;

[0043] Figure 2 For the purpose of this invention, predicting Pr doping 3+ A simulation diagram of the XGBoost ensemble learning method for emission wavelengths of luminescent materials based on XGBoost modeling;

[0044] Figure 3 For the purpose of this invention, predicting Pr doping 3+ A simulation diagram of a Random Forest ensemble learning method for emission wavelengths of luminescent materials, based on Random Forest modeling;

[0045] Figure 4 For the purpose of this invention, predicting Pr doping 3+ A simulation diagram of the AdaBoost ensemble learning method for emission wavelengths of luminescent materials based on AdaBoost modeling;

[0046] Figure 5 For the purpose of this invention, predicting Pr doping 3+ A comparative analysis of the accuracy of AdaBoost ensemble learning methods for modeling emission wavelengths of luminescent materials using XGBoost, Random Forest, and AdaBoost, respectively. Detailed Implementation

[0047] The present invention will now be described in further detail with reference to the accompanying drawings and embodiments:

[0048] like Figure 1 As shown, the predicted doping Pr 3+ The AdaBoost ensemble learning method for the emission wavelength of luminescent materials includes the following steps:

[0049] S1. Dataset Acquisition

[0050] Collect Pr doped literature 3+ The descriptor of the luminescent material, each material including its corresponding descriptor is a separate data entry;

[0051] The datasets were obtained from various sources, including literature, Springer Materials and Materials Project databases, FullProf software, the pymatgen and periodicable libraries in Python, and ChatGPT. The matrix of the luminescent material, along with its emission wavelength, crystal structure, optimal concentration, and band gap, were extracted from the literature or Springer Materials and Materials Project databases. The Bond_str module in FullProf software calculated the coordination number, bond length, and average bond valence of the material based on its CIF file. The pymatgen and periodicable libraries in Python automatically obtained the average atomic radius of the material. ChatGPT supplemented the band gap information based on the material's compound and crystal system.

[0052] The method for automatically obtaining the average atomic radius of materials from the pymatgen and periodicable libraries in Python is as follows: First, accept an Excel file, with the first column containing chemical formulas as input; then, use the Composition class to parse the chemical formulas and obtain the composition of each element; finally, obtain the atomic radius of each element and calculate the average atomic radius.

[0053] S2, Dataset Preprocessing

[0054] If a material does not have an existing CIF file in the existing literature, delete the record. The crystal system of the material is in English characters and converted into numbers that can be recognized by a computer. Different numbers represent different crystal systems.

[0055] S3, Model Establishment

[0056] The first column of the dataset after preprocessing in step S2 is used as the target variable, and the remaining columns are used as input variables to form a sample dataset. Then, these sample datasets are divided into training and test sets according to the proportions. Finally, a model is built based on AdaBoost.

[0057] To evaluate the performance of the model, R was selected. 2 R serves as a statistical indicator for evaluating the goodness of fit of a regression model. 2 R measures the degree to which a model explains the variability of the dependent variable, i.e., the goodness of fit of the model to the observed data. 2 The formula for calculation is:

[0058] R 2 =1-SS res / SS tot

[0059] In the formula, SS res Sum of squared residuals, representing the sum of squares of the differences between model predictions and actual observations; SS tot The total sum of squares represents the sum of squares of the differences between the dependent variable and its mean.

[0060] S4, Model Training

[0061] S5, Evaluation Model

[0062] Output the coefficients of determination R for the training and test sets. 2 The value of R; where R 2 The value of is between 0 and 1, and R 2 The closer the model is to 1, the better it fits the data.

[0063] The steps for model building, training, and evaluation are as follows:

[0064] A1. Divide the dataset into a training set and a test set in a 9:1 ratio;

[0065] A2. Standardize the feature data to ensure a uniform scale during model training. Use the StandardScaler method from the sklearn library to standardize the data, eliminating differences in scale between features and ensuring that the weight of each feature proportionally affects the model's prediction results during training. The standardization formula is as follows:

[0066] z=(x-μ) / σ

[0067] In the formula, x is the training data value, μ is the mean of the training data, σ is the standard deviation of the training data, and z is the standardized data value;

[0068] A3. Create a decision tree regressor and set the following parameters to control the model's complexity and generalization ability:

[0069] max_depth=3: Limits the maximum depth of the decision tree to 3 to avoid the model overfitting the training data;

[0070] min_samples_split=10: Requires at least 10 samples before splitting a node to ensure that each split can effectively improve the model's predictive ability;

[0071] min_samples_leaf = 1: Each leaf node contains at least one sample to avoid making the model too complex during training and to improve generalization ability;

[0072] min_weight_fraction_leaf = 0.001: Sets the minimum weight of leaf node samples to 0.001 to control model complexity and prevent overfitting;

[0073] ccp_alpha = 0.5: Use a complexity pruning parameter of 0.5 to adjust the structure of the decision tree to ensure that the model can effectively generalize when dealing with complex data;

[0074] A4. Using the decision tree regressor described above as the base learner, create an AdaBoost regression model:

[0075] n_estimators = 65: The number of base learners;

[0076] random_state=10: Random seed, ensuring that the result can be repeated;

[0077] The AdaBoost algorithm learns a combination of multiple base learners iteratively, with each base learner focusing on data points that were misclassified in the previous round, thereby continuously improving the overall model performance; a random seed of 10 is set to ensure the repeatability of model training, so that the same random results are produced each time the model is run.

[0078] A5. Train the AdaBoost model;

[0079] A6. Output the coefficients of determination R for the training and test sets. 2 Values ​​are used to evaluate the model;

[0080] A7. Create a scatter plot to show the relationship between the true and predicted values; use different colored points to represent the training and / or test sets, and dashed lines to represent perfect fits; and display the R-squared values ​​for the training and test sets in the plot. 2 value.

[0081] The specific implementation process is as follows:

[0082] (1) Dataset Acquisition

[0083] Collect Pr doped literature 3+ The descriptor of the luminescent material, each material including its corresponding descriptor is a separate data entry;

[0084] This involves extracting the matrix of the luminescent material, as well as the emission wavelength, crystal structure, optimal concentration, and band gap information of the matrix from the literature; if the information is not provided in the literature, it is supplemented by data from the Springer Materials and Materials Project databases, or by Findit software.

[0085] Using the Bond_str module in the FullProf software, the coordination number, bond length, and average bond valence of the material are calculated based on the material's CIF file.

[0086] The average atomic radius can be automatically obtained using the pymatgen and periodicable libraries in Python; the specific method for obtaining it is as follows:

[0087] Accepts Excel files, with the first column containing chemical formulas as input;

[0088] Use the Composition class to parse chemical formulas and obtain the composition of each element;

[0089] Obtain the atomic radius of each element and calculate the average atomic radius;

[0090] For band gap information not provided in the literature, ChatGPT was used to supplement the corresponding band gap information based on the compound and crystal system;

[0091] (2) Dataset preprocessing

[0092] If a material does not have an existing CIF file in the existing literature, delete the record. The crystal system of the material is in English characters, which are then converted into numbers that can be recognized by computers and replaced with numbers 1-8. Different numbers represent different crystal systems.

[0093] (3) Model building

[0094] In the processed dataset, the first column is used as the target variable and the remaining columns are used as input variables to form a sample dataset; then these sample data are divided into training set and test set in a 9:1 ratio.

[0095] In the model selection and parameter tuning process, three commonly used machine learning algorithms—Random Forest, AdaBoost, and XGBoost—were employed to find the optimal parameter settings for each model. To evaluate model performance, R-squared(R... 2 This statistical indicator; R-squared(R 2 R0 is a statistical indicator used to evaluate the goodness of fit of a regression model. It measures how well the model explains the variability of the dependent variable (target), i.e., the goodness of fit of the model to the observed data. 2 The value of is between 0 and 1, with values ​​closer to 1 indicating a better fit of the model to the data; R0 2 The formula for calculation is:

[0096] R 2 =1-SSres / SStot

[0097] In the formula, SS res It is the residual sum of squares, representing the sum of squares of the differences between the model's predicted values ​​and the actual observed values. tot It is the total sum of squares, representing the sum of squares of the differences between the dependent variable (target) and its mean.

[0098] Finally, the R values ​​for the training and test sets are output. 2 Values ​​(coefficients of determination) and the created fit plot: Create a scatter plot to show the relationship between the true and predicted values; such as Figures 2 to 4 The figures shown are simulation graphs of three machine learning algorithms: XGBoost, Random Forest, and AdaBoost. In the graphs, dark dots represent the training set, light dots represent the test set, and dashed lines represent the perfect fit (predicted values ​​equal to true values). The R-squared values ​​for the training and test sets are also displayed. 2 Value; such as Figure 5 As shown in the figure, the accuracy comparison analysis is performed using XGBoost, Random Forest, and AdaBoost respectively. It can be seen that the model built using AdaBoost exhibits superior performance compared to the models built using XGBoost and Random Forest. Based on this, the parameters of the AdaBoost model are explained in detail below:

[0099] a. Divide the data into a training set and a test set, with a ratio of 9:1 between the training set and the test set;

[0100] b. Standardizing Feature Data To ensure that feature data has a uniform scale during model training, the StandardScaler method from the sklearn library is used to standardize the data. This eliminates differences in the units of measurement between features and ensures that the weight of each feature proportionally affects the model's prediction results during training. The standardization formula is as follows:

[0101] z=(x-μ) / σ

[0102] Where: x is the training data value, μ is the mean of the training data, σ is the standard deviation of the training data, and z is the standardized data value;

[0103] c. Create a decision tree regressor and set the following parameters to control the model's complexity and generalization ability:

[0104] max_depth=3: Limits the maximum depth of the decision tree to 3 to avoid the model overfitting the training data;

[0105] min_samples_split=10: Requires at least 10 samples before splitting a node to ensure that each split can effectively improve the model's predictive ability;

[0106] min_samples_leaf = 1: Each leaf node contains at least one sample to avoid making the model too complex during training and to improve generalization ability;

[0107] min_weight_fraction_leaf = 0.001: Sets the minimum weight of leaf node samples to 0.001, which helps control the complexity of the model and prevent overfitting;

[0108] ccp_alpha = 0.5: Use a complexity pruning parameter of 0.5 to further adjust the structure of the decision tree and ensure that the model can effectively generalize when dealing with complex data;

[0109] e. Creating an AdaBoost regression model: Using the decision tree regressor described above as a base learner, create an AdaBoost regression model:

[0110] n_estimators = 65: The number of base learners.

[0111] random_state=10: Random seed, ensuring that the result can be repeated;

[0112] The AdaBoost algorithm iteratively learns a combination of multiple base learners, each focusing on misclassified data points from the previous round, thereby continuously improving the overall model's performance. In this embodiment, 65 base learners are used, each playing a crucial role in the model's prediction process. These learners cooperate, focusing on misclassified samples from the previous round in each iteration, thus improving the overall model's accuracy and stability. Furthermore, setting the random seed to 10 ensures the reproducibility of model training, guaranteeing consistent random results with each run.

[0113] f. Train the AdaBoost model;

[0114] g. Evaluate the model: Output the R² values ​​for the training and test sets. 2 Value (determination coefficient);

[0115] h. Fit Plot: Create a scatter plot showing the relationship between the true and predicted values; dark dots represent the training set, light dots represent the test set, and dashed lines represent the perfect fit (predicted values ​​equal true values). The plot should also display the R-squared values ​​for the training and test sets. 2 value.

[0116] The above descriptions are merely embodiments of the present invention, and common knowledge regarding specific technical solutions or characteristics is not elaborated upon here. It should be noted that those skilled in the art can make various modifications and improvements without departing from the technical solutions of the present invention, and these should also be considered within the scope of protection of the present invention. These modifications and improvements will not affect the effectiveness of the implementation of the present invention or the practicality of the patent. The scope of protection claimed in this application should be determined by the content of its claims, and the specific embodiments described in the specification can be used to interpret the content of the claims.

Claims

1. Predicting Pr doping 3+ The AdaBoost ensemble learning method for the emission wavelength of luminescent materials is characterized by, Includes the following steps: S1. Dataset Acquisition Collect Pr doped literature 3+ The descriptor of the luminescent material, each material including its corresponding descriptor is a separate data entry; S2, Dataset Preprocessing If a material does not have an existing CIF file in the existing literature, delete the record. The crystal system of the material is in English characters and converted into numbers that can be recognized by a computer. Different numbers represent different crystal systems. S3, Model Establishment The first column of the dataset after preprocessing in step S2 is used as the target variable, and the remaining columns are used as input variables to form a sample dataset. Then, these sample datasets are divided into training and test sets according to the proportions. Finally, a model is built based on AdaBoost. S4, Model Training S5, Evaluation Model Output the coefficient of determination R² for the training and test sets; where R² ranges from 0 to 1, and the closer R² is to 1, the better the model fits the data; In S3~S5, the steps for model building, training, and evaluation are as follows: A1. Divide the dataset into a training set and a test set in a 9:1 ratio; A2. Standardize the feature data to ensure a uniform scale during model training. Use the StandardScaler method from the sklearn library to standardize the data, eliminating differences in scale between features and ensuring that the weight of each feature proportionally affects the model's prediction results during training. The standardization formula is as follows: z=(x μ) / In the formula, x represents the training data value, and μ represents the mean of the training data. Let z be the standard deviation of the training data, and z be the standardized data value. A3. Create a decision tree regressor and set the following parameters to control the model's complexity and generalization ability: max_depth=3: Limits the maximum depth of the decision tree to 3 to avoid the model overfitting the training data; min_samples_split=10: Requires at least 10 samples before splitting a node to ensure that each split can effectively improve the model's predictive ability; min_samples_leaf=1: Each leaf node contains at least one sample to avoid the model becoming too complex during training and to improve generalization ability; min_weight_fraction_leaf=0.001: Sets the minimum weight of leaf node samples to 0.001 to control model complexity and prevent overfitting; ccp_alpha=0.5: Use a complexity pruning parameter of 0.5 to adjust the structure of the decision tree to ensure that the model can effectively generalize when dealing with complex data; A4. Using the decision tree regressor described above as the base learner, create an AdaBoost regression model: n_estimators=65: The number of base learners; random_state=10: Random seed, ensuring that the result can be repeated; The AdaBoost algorithm learns a combination of multiple base learners iteratively, with each base learner focusing on data points that were misclassified in the previous round, thereby continuously improving the overall model performance; a random seed of 10 is set to ensure the repeatability of model training, so that the same random results are produced each time the model is run. A5. Train the AdaBoost model; A6. Output the decision coefficients R² values ​​for the training and test sets to evaluate the model; A7. Create a scatter plot to show the relationship between the true and predicted values; use different colored points to represent the training set and / or test set, and dashed lines to represent the perfect fit; and display the R² values ​​of the training set and test set in the plot.

2. A method for predicting Pr doping according to claim 1 3+ The AdaBoost ensemble learning method for the emission wavelength of luminescent materials is characterized by, In S1, datasets were obtained from literature, Springer Materials and Materials Project databases, FullProf software, the pymatgen and periodicable libraries in Python, and ChatGPT. Specifically, the matrix of the luminescent material, along with its emission wavelength, crystal structure, optimal concentration, and band gap, were extracted from literature or the Springer Materials and Materials Project databases. The Bond_str module in FullProf software calculated the coordination number, bond length, and average bond valence of the material based on its CIF file. The pymatgen and periodicable libraries in Python automatically obtained the average atomic radius of the material. ChatGPT supplemented the band gap information based on the material's compound and crystal system.

3. A method for predicting Pr doping according to claim 2 3+ The AdaBoost ensemble learning method for the emission wavelength of luminescent materials is characterized by, In Python, the pymatgen and periodicable libraries automatically obtain the average atomic radius of materials as follows: First, accept an Excel file, with the first column containing chemical formulas as input; then, use the Composition class to parse the chemical formulas and obtain the composition of each element; finally, obtain the atomic radius of each element and calculate the average atomic radius.

4. A method for predicting Pr doping according to claim 3 3+ The AdaBoost ensemble learning method for the emission wavelength of luminescent materials is characterized by, In S3, R² is used as a statistical indicator to evaluate the goodness of fit of the regression model in order to assess the model's performance. R² measures the extent to which the model explains the variability of the dependent variable, i.e., the goodness of fit of the model to the observed data. The formula for calculating R² is: R² = 1- SS res / SS tot In the formula, SS res Sum of squared residuals, representing the sum of squares of the differences between model predictions and actual observations; SS tot The total sum of squares represents the sum of squares of the differences between the dependent variable and its mean.