Method and system for predicting hydrogen content in feedstock oil and heavy distillate oil

CN116029434BActive Publication Date: 2026-06-30EAST CHINA UNIV OF SCI & TECH

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: EAST CHINA UNIV OF SCI & TECH
Filing Date: 2023-01-04
Publication Date: 2026-06-30

Application Information

Patent Timeline

04 Jan 2023

Application

30 Jun 2026

Publication

CN116029434B

IPC: G06Q10/04; G06Q50/04; G06N3/045; G06N3/048; G06N3/08

AI Tagging

Technology Topics

Assay Physical chemistry

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Methods and systems for sequencing nucleic acid molecules
WO2026143030A2Genomic sequencingAntibiotic resistance
Pikfyve assay
WO2026136219A1Microbiological testing/measurement TransferasesMutated proteinAdduct
RNA capping efficiency assay
US20260167661A1Sugar derivatives Hydrolases HydrolysateTriple quadrupole mass spectrometry
Methods, Systems, and Devices for Electromagnetically Shielded Paramagnetic Bead Analysis
US20260185922A1AssayMagnetic bead
DNA polymerase inhibitor
WO2026136820A1Microbiological testing/measurement Aptamer Assay

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN116029434B_ABST

Patent Text Reader

Abstract

This invention provides a method and system for predicting the hydrogen content in feedstock oil and heavy distillate oil, relying solely on the D86 distillation curve and density from routine physicochemical property test reports to calculate the hydrogen content of feedstock oil and heavy distillate oil. The system consists of four modules: (1) a feature selection module for selecting modeling characteristic variables; (2) a data standardization module for standardizing data to be directly used for modeling; (3) a model training module for establishing a hydrogen content prediction model based on a stacked autoencoder and neural network; and (4) a model evaluation module for optimizing model structure, parameters, and verifying the model.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of petroleum processing technology, specifically to a method and system for predicting the hydrogen content in crude oil and heavy distillate oil. Background Technology

[0002] Crude oil and heavy distillate oils are mixtures of various complex organic compounds, including hydrocarbons composed of carbon and hydrogen and non-hydrocarbons composed of carbon, hydrogen, and other elements. The structure and content of these hydrocarbons and non-hydrocarbons determine the properties of heavy distillate oils and their products. Petroleum processing is primarily a process of rebalancing carbon, hydrogen, and other elements in crude oil, including decarbonization and hydrogenation; the corresponding technical routes are the decarbonization technical route and the hydrogenation technical route.

[0003] Hydrogen balance calculations and analyses can evaluate the rationality of product distribution and hydrogen utilization efficiency in petroleum processing units. In actual production, component analysis of liquefied petroleum gas (LPG) and dry gas is widely conducted in industrial production. Refineries utilize relevant instruments and equipment for the full composition analysis of gaseous hydrocarbons, and their hydrogen content can be calculated based on the composition. However, most refineries lack dedicated equipment for determining the hydrogen content in liquid oil products, requiring samples to be sent to specialized institutions for testing, resulting in poor timeliness. Furthermore, the empirical formulas for calculating the hydrogen content of liquid oil products provided by researchers do not require all the routine analytical parameters used by refineries. Using these empirical formulas to calculate the hydrogen content of liquid oil products necessitates the measurement of specific physicochemical properties, making the empirical formulas inconvenient to use.

[0004] In summary, the research direction for those skilled in the art lies in overcoming the limitations of existing technologies and providing a convenient calculation method for hydrogen content in feedstock oils and heavy distillate oils, which can be used to analyze and evaluate the product distribution and operational rationality of the equipment. Summary of the Invention

[0005] To address the shortcomings of existing technologies, this invention proposes a method for predicting the hydrogen content in feedstock oils and heavy distillate oils. Compared with existing techniques, this invention only requires conventional physicochemical property analysis data from refineries to achieve accurate calculation of hydrogen content, avoiding the drawbacks of long feedback cycles and high costs associated with chemical analysis. The method and system for predicting hydrogen content in feedstock oils and heavy distillate oils proposed in this invention can quickly calculate the hydrogen content of feedstock oils and heavy distillate oils, providing inspiration and guidance for production.

[0006] The specific technical solution of this invention is: a method for predicting the hydrogen content in crude oil and heavy distillate oil, comprising the following steps:

[0007] Step 1: Selection of Feature Variables for the Model Based on Pearson Correlation Analysis. By performing correlation analysis on the hydrogen content and conventional physicochemical properties of the sample data, the feature variables for establishing the machine learning model are determined. These conventional physicochemical properties include: density (x1), initial boiling point of the D86 distillation curve (x2), 10% distillation temperature (x3), 20% distillation temperature (x4), 30% distillation temperature (x5), 40% distillation temperature (x6), 50% distillation temperature (x7), 60% distillation temperature (x8), 70% distillation temperature (x9), and 80% distillation temperature (x1). 10 ), 90% distillation temperature (x 11 ), final boiling point (x) 12 Based on the calculated Pearson correlation coefficient and the set critical value λ (λ ranges from [0,1]), the conventional physicochemical properties required to build the model can be determined.

[0008] The principle of Pearson feature selection is explained below:

[0009] The Pearson correlation coefficient is a method for measuring the relationship between a characteristic variable and a target variable; it measures the linear correlation between the variables. Variable x i The Pearson correlation coefficient between the i-th conventional physicochemical property and the variable y (denoted as the hydrogen content analysis value) is expressed as x. i The covariance of y divided by x i The standard deviation of y can be viewed as a special type of covariance after standardization, eliminating the influence of the two variables' dimensions. Covariance measures the degree to which each dimension deviates from its mean; a positive covariance value indicates a positive correlation between the two variables, otherwise a negative correlation. Variable x i The formula for calculating the Pearson correlation coefficient of y is as follows:

[0010]

[0011] The result ranges from [-1, 1], where -1 indicates a perfect negative correlation, 1 indicates a perfect positive correlation, and 0 indicates no linear correlation. The magnitude of the absolute value indicates the strength of the correlation.

[0012] like The i-th variable is selected as the model input variable.

[0013] By using a feature selection module based on Pearson correlation analysis, redundant variables that are irrelevant to the target variable can be removed, which can effectively improve the prediction accuracy of the prediction model.

[0014] Step two involves constructing a data standardization method for the hydrogen content prediction model of feedstock oil and heavy distillate oil. This method standardizes the sample data to make it suitable for building machine learning models.

[0015] Data standardization uses the min-max standardization method. Data standardization is a fundamental task in data mining. In real life, a target variable can be considered to be influenced and controlled by multiple characteristic variables, and these characteristic variables may have different dimensions and numerical magnitudes. If not processed, it may affect the results of data analysis. The data processing measures taken to eliminate the differences in dimensions and value ranges between indicators are called data standardization.

[0016] Min-max standardization, also known as deviation standardization, is a linear transformation of the original data, mapping data values to the range [0,1]. The transformation formula is as follows:

[0017]

[0018] Max-min standardization preserves the relationships that exist in the original data and is the simplest method to eliminate the influence of units and data range.

[0019] Step 3, the method for constructing a training and prediction module for the hydrogen content model of feedstock oil and heavy distillate oil, can utilize the conventional physicochemical reports (i.e., density and D86 distillation curves) and corresponding hydrogen content samples provided by petrochemical enterprises to establish a machine learning prediction model based on a stacked autoencoder.

[0020] The following section introduces the principle and model structure of a stacked autoencoder:

[0021] An autoencoder (AE) is an unsupervised method for compressing data dimensionality and representing data features. An autoencoder is a feedforward neural network containing three layers of neurons. After training, it can copy the input to the output. An autoencoder consists of an encoder and a decoder. The encoder projects the original input onto a feature space, which can be used for dimensionality reduction. The decoder reconstructs the original input from the feature space. The mathematical principle of an autoencoder is as follows:

[0022] For an m-dimensional input sample x, the expressions for the encoder part f1(·) and the decoder part f2(·) of the autoencoder are as follows:

[0023] f1(·):z=δ1(W1x+b1)

[0024]

[0025] Where z is a vector of the feature space output by the encoder. This is the final predicted output. δ1(·) represents the activation function in the neural network, including the Sigmoid function, tangent function, linear rectified unit function, etc. W1 and W2 are the weights between network layers, and b1 and b2 are the bias values of the corresponding layers.

[0026] The goal of autoencoding is to make the input and the reconstructed output as similar as possible, i.e. Assuming the training set input x contains n samples, the objective function of the autoencoder is as follows:

[0027]

[0028] in, The error function is generally expressed using the mean squared error, i.e.

[0029]

[0030] Since a typical autoencoder has only one hidden layer for transformation, the encoding and decoding capabilities of the model can be enhanced by increasing the number of layers and neurons in the neural network. This led to the introduction of the concept of a stacked autoencoder (SAE). A stacked autoencoder stacks multiple autoencoders together, with the output of the encoding part of the previous autoencoder serving as the input to the next. The neural network structure of a stacked autoencoder is typically symmetrical about the intermediate hidden layers. Aside from the structural difference, the mathematical principles and objective function of a stacked autoencoder are the same as those of an autoencoder.

[0031] The proposed method and system for predicting hydrogen content in crude oil and heavy distillate oil is based on a stacked autoencoder. First, a stacked autoencoder is trained using an input space [x|y] composed of selected physicochemical properties of the oil and their corresponding hydrogen contents. The output of the stacked autoencoder contains the predicted hydrogen content. Since hydrogen content is the target variable to be predicted, it cannot be used as input in actual prediction. Therefore, the output z of the intermediate hidden layer of the stacked autoencoder is used as the target variable, and the physicochemical properties x of the oil are used as the input space to train a neural network model. The structure of the neural network model is similar to the encoder part of the stacked autoencoder model. Finally, the output of the neural network model is used as the input to the decoder part of the stacked autoencoder model, thus enabling the prediction of hydrogen content using the physicochemical properties of the oil. The training model structure described in this invention is as follows: Figure 2 As shown.

[0032] Step four, the present invention provides a method for model parameter optimization and model evaluation module, which can determine the network structure of stacked autoencoder model and neural network model and perform corresponding hyperparameter selection, and evaluate the predictive ability of the model based on the model's performance on the validation set.

[0033] Firstly, the network structure of the stacked autoencoder is determined using a layer-by-layer training approach. This involves first training an autoencoder using the input space as the first layer of the stacked autoencoder to obtain the corresponding number of neurons. Then, the output of the intermediate hidden layer of the first autoencoder is used as the input to train the second autoencoder, which becomes the second layer of the stacked autoencoder. This process continues to obtain the network structure of the stacked autoencoder. The network structure of the neural network model is similar to the encoder part of the stacked autoencoder.

[0034] After determining the network structures of the stacked autoencoder model and the neural network model, it is also necessary to determine the hyperparameters of the network. The optimal hyperparameters of the model are determined by using cross-validation and grid search strategies.

[0035] To evaluate the predictive performance of the established hydrogen content prediction model, the following model evaluation metrics were mainly adopted:

[0036] Root mean square error:

[0037] Mean absolute error:

[0038] Coefficient of determination: in

[0039] As can be seen from the above technical solution, the method and system for predicting the hydrogen content in feedstock oil and heavy distillate oil proposed in this invention have the following advantages: it can quickly predict the hydrogen content in feedstock oil and heavy distillate oil, which can play a role in troubleshooting and production guidance in actual production; it does not rely on complex elemental analysis and other techniques, but only on easily measurable conventional physicochemical properties (density, D86 distillation curve) to quickly calculate the hydrogen content of oil products, reducing the cost of sampling and analysis in actual production. Attached Figure Description

[0040] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following are the accompanying drawings used in the description of the embodiments or the prior art. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0041] Figure 1 This is a flowchart of the method for predicting the hydrogen content in feedstock oil and heavy distillate oil according to the present invention.

[0042] Figure 2 This is a diagram of the training model structure.

[0043] Figure 3 This is a Pearson correlation coefficient analysis chart of the present invention.

[0044] Figure 4 This is a schematic diagram of the modeling results. Detailed Implementation

[0045] The following content is merely an embodiment of this application and is not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of the claims of this application.

[0046] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0047] Example: Implementation case based on a refinery of a large petrochemical enterprise

[0048] 1. Implementation of a feature selection module based on Pearson correlation analysis

[0049] In the implementation case, the manufacturer provided routine physicochemical property test reports for the products from each unit, along with corresponding hydrogen content (y) sample data. The routine physicochemical property reports included: density (x1), initial boiling point of the D86 distillation curve (x2), 10% distillation temperature (x3), 20% distillation temperature (x4), 30% distillation temperature (x5), 40% distillation temperature (x6), 50% distillation temperature (x7), 60% distillation temperature (x8), 70% distillation temperature (x9), and 80% distillation temperature (x1). 10 ), 90% distillation temperature (x 11 ), final boiling point (x) 12 ).

[0050] Pearson correlation analysis was used to calculate the Pearson correlation coefficients between each physicochemical property and hydrogen content. If the absolute value of the Pearson correlation coefficient between a variable and hydrogen content is greater than the critical value λ (λ = 0.38 in this example), then that variable is selected as a modeling feature variable. A bar chart of the absolute values of the correlation coefficients is shown below. Figure 3 As shown, the feature variables selected by the feature selection module in this study are x1, x2, x3, x4, and x9, namely density, initial boiling point, 10% distillation temperature, 20% distillation temperature, and 70% distillation temperature.

[0051] 2. Implementation of the data standardization module

[0052] Before modeling, data standardization is usually required to eliminate the influence of dimensions. The inputs for this modeling include the density of the oil and the distillation temperatures of various points on the D86 distillation curve, which vary significantly in value. Without standardization, the model may overlearn the large variables and under-train the small variables, thus affecting model performance. The data standardization module of this invention uses the min-max standardization method, standardizing the values using the maximum and minimum values in the data column to map them to the range [0,1]. Specifically, after selecting feature variables and dividing the data into training and test sets in the feature selection module, min-max standardization is performed on the input training and test sets respectively to obtain the data used to build the machine learning model.

[0053] 3. Implementation of the training module for the hydrogen content prediction model based on a stacked autoencoder

[0054] The model in this embodiment includes a stacked autoencoder and a neural network model. The input and output of the stacked autoencoder model are both standardized physicochemical property variables and corresponding hydrogen content training sample data. The input of the neural network model is standardized physicochemical property variables, and its output is the output of the encoder part of the stacked autoencoder model. The final output of the model is obtained by using the output of the neural network model as the input of the decoder part of the stacked autoencoder model. The initial network structure of the stacked autoencoder model is 10-20-3-20-10, and the initialization settings of other important parameters are as follows: random_state = 1, activation = 'relu', max_iter = 2000, solver = 'adam'. The initial network structure of the neural network model is 10-20, and the initialization settings of other important parameters are as follows: random_state = 1, activation = 'relu', max_iter = 2000, solver = 'adam'. In summary, the prediction model training of a method and system for predicting hydrogen content in crude oil and heavy distillate oil has been completed.

[0055] 4. Establishment of the model structure parameter optimization and evaluation module

[0056] The network structure of the stacked autoencoder is determined using a layer-by-layer driving strategy. First, the preprocessed physicochemical properties and corresponding hydrogen content are used as input to establish an autoencoder model, with the mean absolute error (MSE) as the objective function, thus determining the number of neurons in the first layer of the stacked autoencoder. Then, the output of the encoder part of the first autoencoder is used as the input to train another autoencoder, thus obtaining the number of neurons in the second layer of the stacked autoencoder. This process continues to obtain the complete network structure of the stacked autoencoder. The network structure of the neural network model is the same as that of the encoder part of the stacked autoencoder. At this point, the structure of the training model for the hydrogen content prediction method and system in crude oil and heavy distillate oil proposed in this invention has been determined. Based on actual results, the adjusted network structure of the stacked autoencoder is 10-15-3-15-10, and the network structure of the neural network model is 10-15.

[0057] After determining the structure of the model network, the hyperparameters of the model were adjusted using cross-validation and grid search strategies. Based on the actual calculation results, the adjusted parameters of the stacked autoencoder model are as follows: random_state = 1, activation = 'relu', max_iter = 2000, solver = 'lbfgs'. The parameters of the neural network model are as follows: random_state = 1, activation = 'relu', max_iter = 2000, solver = 'lbfgs'.

[0058] The second part of the model structure parameter optimization and evaluation module in this embodiment evaluates the established prediction model for hydrogen content in feedstock and heavy distillate oils. The evaluation indicators selected are the mean absolute error (MSE) and the coefficient of determination R. 2 The threshold for the evaluation index of the prediction model is set as MSE ≤ 0.007, R0 2 ≥0.9. Only models that meet the requirements will be accepted. If a model fails to meet the evaluation criteria, the process will return to the previous step, reconfigure the network structure, and perform parameter optimization until a model that meets the requirements is obtained. The final modeling result is as follows: Figure 4 As shown.

[0059] In summary, a method and system for predicting the hydrogen content in a type of feedstock oil and heavy distillate oil applied to this plant have been established.

Claims

1. A system for predicting the hydrogen content in crude oil and heavy distillate oil, characterized in that... It includes the following four modules: The system is divided into four modules: The first module is the feature selection module. This module performs correlation analysis on the hydrogen content and conventional physicochemical properties of the sample data to determine the feature variables for establishing a machine learning model, and accordingly determines the conventional physicochemical properties required for model establishment. The feature selection module selects the following conventional physicochemical properties of feedstock oil and heavy distillate oil: density, initial boiling point of the D86 distillation curve, 10% distillation temperature, 20% distillation temperature, 30% distillation temperature, 40% distillation temperature, 50% distillation temperature, 60% distillation temperature, 70% distillation temperature, 80% distillation temperature, 90% distillation temperature, and final boiling point. These conventional physicochemical properties and their corresponding hydrogen contents form a set of sample data. Based on the collected sample data, Pearson correlation analysis is used to analyze the Pearson correlation coefficient values between each conventional physicochemical property and the hydrogen content. If the absolute value of the Pearson correlation coefficient between a certain conventional physicochemical property and hydrogen content is greater than the critical value λ, and the value of λ is in the range of [0,1], then it is selected as the feature variable for modeling. The second module is the data standardization module; this module can eliminate the influence of different units between input variables on the model results and obtain data that can be directly used for machine learning modeling. The standardization method used is the min-max standardization method. The third module is the model training module; The model is a combination of a stacked autoencoder model and a neural network model. Standardized feature variables are used as model input, and hydrogen content is used as model output to establish a data-driven prediction model for hydrogen content in feedstock and heavy distillate oils. The model training module, combining the stacked autoencoder and neural network model, works as follows: Standardized sample data consisting of feature variables and corresponding hydrogen content is used as both input and output to train a stacked autoencoder. Then, the output of the intermediate hidden layer of the stacked autoencoder is used as the target variable. The training sample data of the stacked autoencoder, after removing the hydrogen content, is used as input to train a neural network model. The structure of this neural network model is similar to the encoder part of the stacked autoencoder model. Finally, the output of the neural network model is used as the input to the decoder part of the stacked autoencoder model to obtain the predicted hydrogen content. The fourth module is the model parameter optimization and model evaluation module. This module uses a layer-by-layer driven strategy to determine the model network structure, uses cross-validation combined with a grid search strategy to optimize parameters, and selects root mean square error, mean absolute error, and coefficient of determination as model evaluation methods to evaluate the model's predictive ability.

2. The system for predicting hydrogen content in crude oil and heavy distillate oil according to claim 1, characterized in that, The data standardization module uses the min-max standardization method to standardize the sample data composed of selected modeling feature variables, making it data that can be directly used for machine learning modeling.

3. The system for predicting hydrogen content in crude oil and heavy distillate oil according to claim 1, characterized in that, The model training module establishes a stacked autoencoder and a neural network model, combining the two to predict the corresponding hydrogen content based on the conventional physicochemical properties of oil.

4. The system for predicting hydrogen content in crude oil and heavy distillate oil according to claim 1, characterized in that, The prediction process of the model is as follows: the sample data composed of standardized modeling feature variables is input into the trained neural network model, and then the output of the neural network model is input into the decoder part of the trained stacked autoencoder for decoding. The hydrogen content of crude oil and heavy distillate oil can be predicted based solely on conventional physicochemical properties.

5. The system for predicting hydrogen content in crude oil and heavy distillate oil according to claim 1, characterized in that, Using a layer-by-layer driven strategy, the network structure of the stacked autoencoder model is obtained by training the autoencoder layer by layer. The network structure of the neural network model is referenced from the network structure of the encoder part in the stacked autoencoder. The hyperparameters of the model are selected using a combination of cross-validation and grid search strategy.

6. A method for predicting the hydrogen content in crude oil and heavy distillate oil, comprising the following steps: Step 1: Selection of model feature variables based on Pearson correlation analysis; by conducting correlation analysis on the hydrogen content and conventional physicochemical properties of the sample data, the feature variables for establishing the machine learning model are determined. The aforementioned conventional physicochemical properties include: Density (x1), initial boiling point of D86 distillation curve (x2), 10% distillation temperature (x3), 20% distillation temperature (x4), 30% distillation temperature (x5), 40% distillation temperature (x6), 50% distillation temperature (x7), 60% distillation temperature (x8), 70% distillation temperature (x9), 80% distillation temperature (x1). 10 ), 90% distillation temperature (x 11 ), final boiling point (x) 12 Based on the calculated Pearson correlation coefficient, and by setting a critical value λ, where λ ranges from [0,1], the conventional physicochemical properties required to establish the model are determined. The Pearson correlation coefficient is a method for measuring the relationship between a characteristic variable and a target variable; it measures the linear correlation between the variables. Variable x i The Pearson correlation coefficient between x and the variable y is expressed using x. i The covariance of y divided by x i The standard deviation of y is a special covariance after standardization to remove the influence of the two variables' dimensions; covariance measures the degree to which each dimension deviates from its mean. A positive covariance value indicates a positive correlation between the two, otherwise a negative correlation; variable x i The formula for calculating the Pearson correlation coefficient of y is as follows: ， Where, x i Let y represent the i-th conventional physicochemical property, and let y be the hydrogen content analysis value. The result ranges from [-1, 1], where -1 indicates a perfect negative correlation, 1 indicates a perfect positive correlation, and 0 indicates no linear correlation. The magnitude of the absolute value indicates the strength of the correlation. like The i-th variable is selected as the model input variable; Step 2: Data standardization processing method for constructing the hydrogen content prediction model of feedstock oil and heavy distillate oil. The sample data is standardized to become data suitable for building machine learning models. Data standardization uses the min-max standardization method; it is a linear transformation of the original data, mapping data values to the range [0,1]; the transformation formula is as follows: ， Max-min standardization preserves the relationships that existed in the original data; Step 3: Method for constructing a training and prediction module for the hydrogen content model of feedstock oil and heavy distillate oil. Based on conventional physicochemical reports and corresponding hydrogen content samples, a machine learning prediction model based on a stacked autoencoder is established. The principle and model structure of the stack-based self-encoder: For an m-dimensional input sample x, the encoder part of the autoencoder and decoder section The expression is as follows: , Where z is a vector of the feature space output by the encoder. This is the final predicted output; These represent activation functions in neural networks, including the Sigmoid function, tangent function, and linear rectified unit function. These are the weights between network layers. It is the bias value of the corresponding layer; The goal of autoencoding is to make the input and the reconstructed output as similar as possible, i.e. Assuming the training set input x contains n samples, the objective function of the autoencoder is as follows: , in, The error function is generally expressed using the mean squared error, i.e. , Using the input space [x|y] formed by the selected physicochemical properties of oil and their corresponding hydrogen content, a stacked autoencoder is trained. The output of the stacked autoencoder contains the predicted value of the required hydrogen content. The output z of the intermediate hidden layer of the stacked autoencoder is used as the target variable, and the physicochemical properties x of the oil are used as the input space. A neural network model is trained. The structure of the neural network model is similar to that of the encoder part of the stacked autoencoder model. Finally, the output of the neural network model is used as the input of the decoder part of the stacked autoencoder model. This allows the prediction of hydrogen content using the physicochemical properties of oil. Step four involves determining the network structures of the stacked autoencoder model and the neural network model, selecting the corresponding hyperparameters, and evaluating the model's predictive ability based on its performance on the validation set. For determining the network structure of the stacked autoencoder, a layer-by-layer training approach is adopted. First, an autoencoder is trained using the input space as the first layer of the stacked autoencoder to obtain the corresponding number of neurons. Then, the output of the intermediate hidden layer of the first autoencoder is used as the input of the second autoencoder to train the second autoencoder as the second layer of the stacked autoencoder. This process continues to obtain the network structure of the stacked autoencoder. The network structure of the neural network model is similar to the encoder part of the stacked autoencoder. After determining the network structure of the stacked autoencoder model and the neural network model, it is also necessary to determine the hyperparameters of the network. The optimal hyperparameters of the model are determined by using cross-validation and grid search strategies. The following evaluation metrics were used to assess the predictive performance of the established hydrogen content prediction model: Root mean square error: Mean absolute error: , Coefficient of determination: ,in .