Drilling rig real-time drilling speed prediction method based on transformer framework
By simplifying the Encoder architecture and CLFN module of the Transformer model and combining it with a multi-head sparse self-attention mechanism, the problem of capturing nonlinear relationships in drilling speed prediction is solved, and efficient drilling speed prediction results are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA UNIV OF MINING & TECH
- Filing Date
- 2024-06-28
- Publication Date
- 2026-06-19
AI Technical Summary
Existing drilling speed prediction methods struggle to effectively capture the nonlinear relationships between various parameters, resulting in insufficient prediction accuracy and real-time performance. The encoder-decoder structure of the traditional Transformer model also affects the model's real-time performance and computational efficiency.
We adopt a simplified Transformer model architecture, using only the Encoder module, combined with the Channel-Linear Feed-Forward Network (CLFN) module and a multi-head sparse self-attention mechanism. By embedding independent channel data and extracting features, we abandon the Decoder module and focus on the drilling rate prediction task.
The model's real-time performance and computational efficiency have been improved, and its ability to process multivariate time series data has been enhanced. Experimental results show that the prediction accuracy is better than that of LSTM, Transformer and Informer models.
Smart Images

Figure CN118734036B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of drilling technology, and in particular relates to a method for real-time drilling speed prediction of drilling rigs based on the Transformer framework. Background Technology
[0002] In drilling operations, the prediction of drilling speed (RSS) is crucial. Accurate RSS prediction not only reduces drilling risks but also allows for the optimization of drilling parameters in advance, which is of great significance for improving the efficiency of deep well drilling. However, many parameters affect RSS, such as power head rotation speed, drilling pressure, and tubing pressure. The influence of these multiple parameters causes RSS to vary non-linearly, which makes prediction difficult. Traditional prediction methods often rely on manual extraction of data features and simple linear models, which cannot effectively capture complex non-linear relationships.
[0003] In practical applications, while pursuing prediction model accuracy, real-time performance is often more valuable and widely applicable. The Transformer model, with its sequence-to-sequence structure and parallel output, avoids accumulated errors and ensures high real-time performance. With advancements in deep learning technology, it has been confirmed that Transformer performs better than LSTM for long sequence predictions. Furthermore, combining GRU with Informer, using the GRU module to extract time-series features as input to the Informer module, has achieved significant improvements in accuracy, leading to an increasing number of drilling rate prediction models adopting the Transformer architecture. However, the Transformer model employs a complex encoder-decoder structure, which impacts both real-time performance and computational efficiency. Summary of the Invention
[0004] The technical problem this invention aims to solve is to simplify the structure of the Transformer model used for drilling speed prediction, making the model more focused on the prediction task. This invention proposes a drilling speed prediction method based on the Transformer framework, comprising the following steps:
[0005] Step 1, Build the dataset
[0006] This includes drilling speed and the corresponding feature data;
[0007] The drilling rate and the corresponding feature data are standardized.
[0008] Step 2, Construct a drilling rate prediction model
[0009] Step 3: Divide the data in the dataset into training set, validation set and test set, and train, validate and test the drilling speed prediction model to obtain the trained drilling speed prediction model.
[0010] Furthermore, the characteristic data corresponding to the drilling rate mentioned in step 1 refers to characteristic data with a large mutual information value with the drilling rate, such as measured well depth MD, drilling pressure WOB, tubing pressure SP, torque TQ, rotational speed RS, mud flow rate MF, hook load HL, and vertical well depth TVD.
[0011] Furthermore, the drilling rate prediction model in step 1 includes a Normlization module, a CLFN module, an Encoder module, a Projector module, and a De-Normlization module connected in sequence; the sliding window method is used to generate the input sequence X = [x1, x2, ..., x] from the feature data. S ] T ∈R S×C The input sequence X is fed into the Normlization module, where S is the length of the input sequence and C is the number of features of the input data. The Normlization module is used to translate and scale the input sequence X to obtain the sequence X. in Construct mask data X0 to obscure future information, and combine it with sequence X. in The data is input together into the CLFN module for data embedding, and then output... The Encoder module is used for... Feature extraction is performed, and the output H is obtained. The Projector module is used to map H to the output Y′=[y1′,y′2,…,y′]. P ] T Where P is the predicted sequence length and C is the number of input features; the De-Normlization module is used to denormalize the sequence Y′ to obtain the output.
[0012] Furthermore, in the Normlization module, translation and scaling are handled as follows:
[0013]
[0014] Where, μ x ∈R 1×C It is the mean, σ x ∈R 1×C It represents variance, and ⊙ represents element-wise multiplication; X in =[x′1,x′2,…,x′ S ] T ∈R S×C .
[0015] Furthermore, in the De-Normlization module, the denormalization process is as follows:
[0016]
[0017] Furthermore, the input to the CLFN module is (X in +X0)∈R S×C' Where S refers to the sequence length and C' refers to X. in The number of features after adding X0 in the feature number dimension;
[0018] The CLFN module includes two UpLinear layers, two Linear layers, one DownLinear layer, and one Dropout layer;
[0019] X in the CLFN module in +X0 passes through an UpLinear layer and a Linear layer in sequence, and then is processed by the GELU activation function to obtain feature f1;
[0020] X in the CLFN module in +X0 passes through another UpLinear layer and another Linear layer in sequence to obtain feature f2; feature f1 and feature f2 are multiplied together, and then passed through a DownLinear layer and a Dropout layer to obtain the output of the CLFN module.
[0021] Furthermore, there are N Encoder modules; each Encoder module includes a multi-head sparse self-attention layer and a feedforward neural network; and residual connections and layer normalization are added after the multi-head sparse self-attention layer and the feedforward neural network.
[0022] Furthermore, the mean square error (MSE) and mean absolute error (MAE) are used to evaluate the drilling speed prediction model.
[0023]
[0024] Where: m is the size of the test dataset, y i This represents the actual drilling speed. To predict drilling speed values for the model.
[0025] Furthermore, the optimizer used is the Adam optimizer, and the loss function is MSELoss.
[0026] Beneficial Effects: This study proposes a Transformer-based drilling rate prediction model—CLformer. It utilizes only the Encoder module of the Transformer model, focusing on capturing adaptive correlations between multivariate sequences, while discarding the Decoder module. This significantly simplifies the model architecture and allows it to focus more on the prediction task. Furthermore, by constructing a Channel-Linear Feed-Forward Network (CLFN) module, independent channel embedding of the data is achieved, and a multi-head sparse self-attention mechanism accurately captures complex interactions between variables, enabling the CLformer model to deeply mine the potential information in time-series data. Experiments on real-world datasets demonstrate that the CLformer model outperforms LSTM, Transformer, and Informer network models in predicting drilling rate.
[0027] Therefore, the main contribution of this invention is:
[0028] 1. A drilling rate prediction model based on Transformer, CLformer, is proposed. It adopts a simplified encoder architecture, which improves the model's real-time performance and computational efficiency.
[0029] 2. A CLFN module was designed to enhance the model's ability to process multivariate time series data.
[0030] 3. Experimental results show that CLformer outperforms existing LSTM, Transformer and Informer models in terms of prediction accuracy and real-time performance on real datasets. Attached Figure Description
[0031] Figure 1 This is a graph showing the results of the cross-information analysis of drilling features;
[0032] Figure 2 This is a structural diagram of the CLformer model of the present invention;
[0033] Figure 3 This is a structural diagram of the CLFN module of the present invention;
[0034] Figure 4 This is a graph showing the effect of step size on the experiment. Detailed Implementation
[0035] The present invention will now be further described with reference to the accompanying drawings.
[0036] The present invention provides a drilling rig speed prediction method based on the Transformer framework, comprising the following steps:
[0037] Step 1, Data Collection
[0038] In this specific embodiment of the invention, all data used comes from a directional well in a specific block on-site, and the logging data from the completed well is analyzed. Deep learning is very sensitive to missing values in incomplete data, which can cause the deep learning model to fail. Since the proportion of missing data is small compared to the total logging data, a data processing method that ignores incomplete data is adopted. The final data includes various data from well depths of 500 meters to over 3000 meters, such as measured depth (MD), weight on bit (WOB), standpipe pressure (SP), torque (TQ), rotational speed (RS), mud flow (MF), drill pipe diameter (D), hook load (HL), vertical depth (TVD), and rate of penetration (ROP), totaling over 99,000 data points. Some data are shown in Table 1.
[0039] Table 1 Partial Drilling Data
[0040]
[0041] Step 2, Feature Selection
[0042] If all logging data features are input into a deep learning model for training, it may result in an excessive number of model dimensions and poor predictive performance. Therefore, feature selection is crucial for drilling rate prediction models. Traditional feature correlation analyses include Pearson correlation coefficient, Spearman correlation coefficient, and Kendall correlation coefficient methods. However, these methods can only reflect linear correlations and simple nonlinear relationships between two features. When the relationship between two features exhibits complex nonlinear relationships, these methods perform poorly, hence the introduction of mutual information methods.
[0043] Mutual information is a method used to measure the correlation between two variables, typically employed in feature selection and feature correlation analysis. When a strong nonlinear dependency exists between two variables, the mutual information value will far exceed 1. Compared to traditional feature correlation analysis methods, mutual information is not limited by linear relationships, can discover nonlinear relationships and complex feature correlations, and is applicable to a wider range of data types and scenarios.
[0044] Suppose that the joint distribution of two random variables X and Y is p(x, y), and their marginal distributions are p(x) and p(y), respectively. Then the mutual information I(X, Y) is the relative entropy between the joint distribution p(x, y) and the marginal distributions p(x) and p(y), expressed as follows:
[0045]
[0046] Perform mutual information correlation analysis on all data to obtain the following results: Figure 1 The drilling feature correlation analysis results shown are from Figure 1 It is known that the correlation coefficient between the target line rate of drilling (ROP) and the drill pipe diameter (D) is low, only 0.78, so this feature is discarded. Therefore, nine features are selected as input variables for the drilling rig rate of drilling prediction model: measured well depth (MD), weight on bit (WOB), tubing pressure (SP), torque (TQ), rotational speed (RS), mud flow rate (MF), hook load (HL), vertical well depth (TVD), and rate of drilling (ROP).
[0047] Step 3, Data Standardization Processing
[0048] The collected data exhibits significant variations in indicators due to differences in units of measurement and orders of magnitude among various features. Data standardization is employed to remove dimensions, transforming data of different orders of magnitude into a single order of magnitude to eliminate the influence of order of magnitude. The raw data for each variable z is preprocessed using the standard deviation method.
[0049]
[0050] Where, z′ i —Standardized data; z i —Data before standardization; —The average value of the data before standardization; n—The total number of data entries.
[0051] After standardization, the data is scaled so that the mean of all features is 0 and the standard deviation is 1, thus eliminating the influence of units.
[0052] Step 4, Construction of Drilling Rate Prediction Model
[0053] The drilling rate prediction model constructed in this invention is also called the CLformer model, and its overall architecture is as follows: Figure 2 As shown in Table 2, the CLformer model consists of the Normlization module, CLFN module, Encoder module, Projector module, and De-Normlization module, which are connected in sequence. The input and output variables of each module in the CLformer model are listed in Table 2.
[0054] Table 2 Input and output variables of each module in the CLformer model
[0055]
[0056] (1) Normalization module and De-Normalization module
[0057] The stationarity of time series data is crucial for prediction, while real-world data often exhibits non-stationarity, with skewed distributions: their statistical properties, such as mean and variance, change over time. Classical statistical methods, like ARIMA, use differencing to stationarize the data. For deep learning models, Series-Stationarization is widely used to handle non-stationarity. This study employs this method to process non-stationary input data, including Normalization and De-Normalization modules. The time series model uses a sliding window method to generate the input sequence, performing Z-score normalization on each sequence segment within the window, discarding learnable parameters. This means that only normalization and de-normalization are needed for the input and output, greatly simplifying the model. For the input sequence X = [x1, x2, ..., x...], ... S ] T ∈R S×C The normalization module translates and scales X to obtain X. in =[x′1,x′2,…,x′ S ] T ∈R S×C Where S is the length of the input sequence and C is the number of features in the input data. The data translation and scaling are handled as follows:
[0058]
[0059] Where, μ x ∈R 1×C It is the mean, σ x ∈R 1×C This represents variance, and ⊙ represents element-wise multiplication. The Normalization module stabilizes the distribution of the model's input data, reducing the non-stationarity of the sequences.
[0060] In the De-Normalization module, the predicted time series Y′=[y1′,y′2,…,y′] output by the Projector module is... P ] T ∈R P×C Perform denormalization, where P is the predicted sequence length and C is the number of input features. Receive μ from the Normalization module. x σ x The value is used to obtain the output using the following formula.
[0061]
[0062] The De-Normalization module restores the sequence. In this way, the model can better handle non-stationary time series data, improving the accuracy and stability of predictions.
[0063] (2) CLFN module
[0064] In the data embedding stage, unlike the traditional Transformer model which uses timestamps, location information, and data to form a token, this study employs a channel-independent data embedding method. The data to be embedded is data X processed by the Normalization module. in It consists of masked data X0 constructed to obscure future information, with each variable using a separate channel. This independent channel approach allows the model to preserve the physical meaning and multivariate correlations of each feature while maintaining the integrity of the time series data.
[0065] The original Transformer uses only a single convolutional layer for data embedding, and its data extraction capability needs improvement, especially after the input data X undergoes de-stationarization processing by the Normalization module. Therefore, a module—Channel-Linear Feed-Forward Network (CLFN)—was invented and designed for data embedding. Unlike the GDFN (Gated-Dconv Feed-Forward Network) module proposed in the Restormer model, the CLFN module uses linear layers instead of convolutional layers, making it suitable for embedding independent channels of time series data. Each channel can better preserve the features and meaning of the data, and the model is simplified while reducing computational complexity.
[0066] The CLFN module structure is as follows: Figure 3 As shown. The module's input is (X in +X0)∈R S×C' Where S refers to the sequence length and C' refers to X. in The number of features obtained by adding X0 and X0 in the feature number dimension (greater than X) in The feature number C) is output as D represents the embedding dimension. The UpLinear layer is a linear layer responsible for mapping the input data to a higher dimension (typically 4*D) for more complex feature extraction. The Linear layer, as an intermediate hidden layer, increases the data expressiveness. When choosing the activation function, it was found that data in the negative domain has a significant impact on the prediction results, and its distribution characteristics need to be well preserved. Therefore, the GELU activation function was chosen to introduce non-linearity into the model, which is necessary for enhancing data representation. The DownLinear layer is responsible for reducing the data dimension to D. Finally, a Dropout layer is added, which prevents overfitting by randomly discarding the activations of some neurons. The CLFN module is represented by the following formula:
[0067]
[0068] Where g represents the activation function GELU, [X in [X0] represents the sum of two tensors along the feature dimension, where W and b are parameter matrices.
[0069] After passing through the CLFN module, each time series driven by the underlying complex dynamics is transformed into tokens by the data embedding layer to characterize the properties of each variable. The data is mapped to the required dimensions, providing rich feature representations for the Encoder module. Overall, the CLFN module not only improves the quality of data embedding but also optimizes the overall computational burden of the model.
[0070] (3) Encoder module
[0071] Output of CLFN module As input to the Encoder module, the output of the Encoder module is denoted as H∈R. C×D :
[0072] H i+1 =Encoder(H i ), i = 0, ..., N-1 (22)
[0073] Where N is the number of modules in the Encoder.
[0074] The Encoder module structure is as follows: Figure 2 As shown on the left. The Encoder module uses a multi-head probSparse self-attention mechanism to capture long-term dependencies and extract important information from the sequence. Unlike traditional self-attention mechanisms, multi-head probSparse self-attention reduces computational complexity by sparsifying the attention weight matrix and uses a probabilistic method to determine the attention weights for each head. First, the attention weights of each head are... Generate three tensors, query Q, key K, and value V; the algorithm steps in the Encoder module are as follows:
[0075] Step 1: Initialization, setting tensors h is the number of heads in the attention mechanism, and the hyperparameters c, u = clnL Q U = L Q lnL K ;
[0076] Step 2: Random sampling, randomly select U pairs of dot products from K as...
[0077] Step 3: Calculate sample scores and set up the sample score matrix.
[0078] Step 4: Calculate the measured values, calculating the measured values row by row.
[0079] Step 5: Set up the top query, and select the largest value from Q based on the measured value M. u For the top query matrix
[0080] Step 6: Calculate the attention weights and the attention weight matrix.
[0081] Step 7: Set the default score, set S0 = mean(V);
[0082] Step 8: Merge the scores and combine S = {S1, S0} into an attention output matrix.
[0083] The Encoder module incorporates residual connections and layer normalization (Add & LayerNorm) to help prevent the vanishing gradient problem during training and accelerate convergence. Layer normalization was originally proposed to improve the convergence and training stability of deep networks. In typical Transformer-based predictors, this module normalizes the representations of feature variables at the same timestamp, gradually making each variable indistinguishable from the others. However, in this model, normalization is applied to the sequence dimension of each feature variable, as shown below:
[0084]
[0085] Where H = [h1,…,h] C′ ]∈R C'×DSince all sequences are normalized to a normal distribution as (variable) tokens, discrepancies caused by inconsistent measurements can be reduced. The output is then fed forward via a feedforward neural network at the end of the module.
[0086] (4) Projector module
[0087] The Projector module consists of linear layers used to map the output H of the Encoder module to the output Y′=[y1′,y′2,…,y′]. P ] T , where P is the predicted sequence length and C is the number of input features.
[0088] The model of this invention was verified by experiments:
[0089] (1) Prepare the dataset
[0090] The standardized data were used to construct a multivariate dataset with nine input feature variables: measured well depth (MD), weight on drill bit (WOB), tubing pressure (SP), torque (TQ), rotational speed (RS), mud flow rate (MF), hook load (HL), vertical well depth (TVD), and rate of drilling (ROP). The dataset was split into training, validation, and test sets in a 7:2:1 ratio, as shown in Table 3.
[0091] Table 3 Dataset Details
[0092]
[0093] (2) Evaluation indicators
[0094] To effectively evaluate the model proposed in this study, two indicators are used for assessment: mean squared error (MSE) and mean absolute error (MAE), which are defined as follows:
[0095]
[0096] Where: m is the size of the test dataset, y i For the true value, This represents the model's predicted value. By definition, the smaller the value, the better the prediction effect and the better the model's predictive performance.
[0097] (3) Experimental Environment
[0098] The experimental environment consisted of a 12th Gen Intel(R) Core(TM) i7-12650 processor, 32GB of RAM, and an NVIDIA GeForce RTX 4060 GPU. The model was built using the PyTorch deep learning framework. Based on empirical values, the batch size was set to 64, a decaying learning rate was used, and early stopping was implemented during training to prevent overfitting. The Adam optimizer was used, and the loss function was MSELoss. All constructed models were multivariate input, univariate output prediction models, with drilling speed as the predicted feature variable.
[0099] (4) Experimental analysis of retrospective step length and prediction step length
[0100] This study explores the combined impact of lookback step size and prediction step size on model predictive performance. Lookback step size refers to the number of historical time points considered by the model when making predictions, while prediction step size refers to the number of future time points the model predicts. Theoretically, an appropriate lookback step size can provide sufficient historical information for accurate predictions, while a suitable prediction step size ensures the model can effectively predict future trends.
[0101] To comprehensively evaluate the impact of lookback step size and prediction step size, a series of experiments were designed in this specific implementation. Specifically, the prediction step size T∈{32,64,96,128} was set, and multiple different lookback step sizes “Timestep” were tested under each prediction step size. The mean squared error (MSE) of the model was recorded as a performance metric for each set of experiments.
[0102] Experimental results are as follows Figure 4 As shown in the figure. Experimental results show that, with a fixed prediction step size, the mean squared error (MSE) of the model increases slowly with the increase of the backtracking step size, but this change is not very significant, and overall, it remains at a balance point.
[0103] The experiment further revealed that, under the condition of a fixed lookback step size, the model's MSE increases with the increase of the prediction step size, with an increase of approximately 3% for every 32 prediction steps. The optimal lookback step size is often equal to the prediction step size.
[0104] In all experiments, all parameters except the lookback step size and prediction step size remained constant. This invention further adjusted other hyperparameters, specifically the embedding dimension D, the number of encoder modules, and the hyperparameter c, while keeping the lookback step size fixed at 192 and the prediction step size at 64. Table 4 shows the impact of these parameter changes on model performance. As can be seen from Table 4, even increasing these parameters did not significantly change the MSE, indicating that the model is insensitive to changes in these parameters. Therefore, the possibility of studying the influence of hyperparameters on the lookback and prediction step sizes can be ruled out.
[0105] Table 4. Hyperparameter Comparison Experiment
[0106]
[0107] Through the above exploration, this invention, using an independent channel data embedding method, enables the model to more effectively handle the physical meaning and multivariate correlations of each feature, rather than simply relying on the amount of historical data. This method allows the model to maintain stable performance under different lookback step sizes, demonstrating low sensitivity to the length of the input data. The model exhibits robustness and generalization ability when dealing with complex and non-stationary data, as it can maintain consistent performance under varying conditions, rather than only adapting to specific lookback step size settings. This flexibility and stability are of significant value for time series forecasting models in practical applications.
[0108] (5) Model Comparison Experiment Analysis
[0109] This specific implementation will be compared and analyzed with three other classic models in the field of time series forecasting: LSTM, Transformer, and Informer.
[0110] LSTM: LSTM is a special type of recurrent neural network (RNN) that can learn long-term dependency information. The key to LSTM lies in its internal gating mechanism, which allows it to effectively filter and store important information in sequential data and prevents gradient explosion or vanishing problems common in RNNs during training. It is one of the commonly used time series prediction models.
[0111] Transformer: The Transformer is a model based on a self-attention mechanism. It abandons traditional recurrent and convolutional structures, improving efficiency by processing input sequences in parallel. The core advantage of the Transformer lies in its ability to capture long-range dependencies in a sequence, which is especially important when processing complex input sequences. Its structure consists of an encoder and a decoder, each composed of multiple identical layers. Each layer contains a multi-head self-attention mechanism and a simple, positionally fully connected feedforward network.
[0112] Informer: Informer solves the time complexity and memory usage problems of traditional Transformer through a multi-head sparse self-attention mechanism, achieving O(L ln L) time complexity and memory usage. A key feature of Informer is that its decoder adopts a generative style, which can predict the entire long sequence at once, rather than predicting step by step, which greatly improves the inference speed of long sequence prediction.
[0113] (5.1) Comparison Experiment of Prediction Results
[0114] Table 5 compares the performance of different models under various lookback and prediction step sizes, using mean squared error (MSE) and mean absolute error (MAE) as evaluation metrics, with lower values indicating better model performance. In most tests, the CLformer model performed best, followed by Transformer, then Informer, while LSTM performed worst. This likely reflects the limitations of LSTM in handling complex time series data, typically requiring data embedding layers to enhance its temporal feature processing capabilities. The Transformer model performed comparably to CLformer in some tests, but its performance declined in long-term prediction tasks. The MSE of both CLformer and Transformer increased slightly with increasing lookback step size, but the increase was limited, indicating that these two models can effectively handle longer historical information to some extent. The Informer model, however, performed increasingly better with increasing prediction step size, demonstrating its potential for long-sequence prediction.
[0115] Overall, the CLformer model maintains good drilling rate prediction performance across various combinations of lookback and prediction step sizes, likely due to its independent channel data embedding method and efficient attention mechanism. Transformer and Informer models approach CLformer performance in some cases, but their performance degrades when handling longer prediction step sizes. These results highlight the importance of considering model architecture and data processing methods when selecting a time series forecasting model.
[0116] Table 5 Comparison of Model Predictions in Experiments
[0117]
[0118] (5.2) Real-time comparison experiment
[0119] To evaluate the real-time performance of different models, a series of comparative experiments were conducted. In the experiments, the prediction step size and lookback step size for all models were set to 64. The results, shown in Table 6, were obtained by comparing the model weights and inference time. The LSTM model performed best in real-time performance, but its prediction accuracy was inferior to other models. In contrast, the CLformer model constructed in this invention maintained a low inference time while possessing moderate model weights and the best prediction metrics. The Transformer and Informer models performed similarly. Furthermore, the inference time showed a certain degree of correlation with the model weight size.
[0120] Table 6 Real-time performance comparison experiment
[0121]
[0122]
[0123] (6) Ablation Experiment Analysis
[0124] To further illustrate the contributions of each component in the CLformer model, this invention conducted a series of ablation experiments. Specifically, the impact of the following variants on model performance was examined:
[0125] Model A: The original Channel-Linear Feed-Forward Network (CLFN) module was replaced with the CNN layer used by the Transformer to evaluate the contribution of CLFN to data embedding.
[0126] Model B: Change the channel-independent data embedding method to the traditional data embedding method in the Transformer model to verify the effectiveness of the independent channel method.
[0127] Model C: The Normalization and De-Normalization modules were removed from the model, and the original data was used directly for training and prediction to verify the impact of these modules on model performance.
[0128] Table 7 Ablation Experiment Results
[0129]
[0130] The experimental results are shown in Table 7. The CLformer model constructed in this invention performs well. The experimental results for Model A show that replacing the CLFN module with a CNN layer has little impact on MSE and MAE, indicating that the CLFN module does indeed play a crucial role in data embedding. However, the CLFN module's role in speed optimization is unparalleled by CNN; it can quickly extract features and map them to higher dimensions. The results for Model B show that the model's performance decreases after changing the data embedding method, especially with longer lookback steps, where the increase in MSE and MAE is more significant. This demonstrates the effectiveness of the independent channel data embedding method in maintaining model performance stability and high accuracy. The results for Model C are the most significant. Removing these modules significantly decreases the model's performance, while both MSE and MAE increase substantially, highlighting the important role of the Normalization and De-Normalization modules in data preprocessing and improving model performance.
[0131] Drilling is a crucial part of the petroleum industry, and accurate data prediction can help improve operational efficiency and safety. This invention proposes a deep learning-based multivariate time series prediction model, CLformer, which significantly improves the model's ability to model drilling data and its prediction accuracy through independent channel data embedding and attention mechanisms. In particular, the CLFN module proposed in this invention plays a key role in data embedding. The independent channel data embedding method used performs excellently when processing multivariate time series data, helping to more accurately capture the correlation and physical meaning between features, thereby improving the model's robustness and generalization ability. Experimental results show that the CLformer model performs well in most cases, with low mean squared error (MSE) and mean absolute error (MAE). With a fixed lookback step size of 64, the average MSE reaches 6.76% and the RMSE reaches 14.48% in experiments with different prediction step sizes, which is much lower than other models. These results demonstrate the significant improvement in prediction accuracy of the CLformer model. It also has advantages in real-time performance, with an inference time of 3.56ms and a moderate model weight size. In summary, the CLformer model proposed in this invention performs excellently in drilling data prediction, exhibiting high prediction accuracy and real-time performance, which is of great significance for improving the efficiency and safety of the drilling process. Future technical solutions can further explore the application of this model in other fields such as rescue drilling, and delve deeper into more data embedding methods and attention mechanisms to further enhance the model's performance and generalization ability.
Claims
1. A drilling rig speed prediction method based on the Transformer framework, characterized in that, Includes the following steps: Step 1, Build the dataset This includes drilling speed and the corresponding feature data; The drilling rate and the corresponding feature data are standardized. Step 2, Construct a drilling rate prediction model The drilling speed prediction model includes a Normlization module, a CLFN module, an Encoder module, a Projector module, and a De-Normlization module connected in sequence. The sliding window method is used to generate input sequences from feature data. ∈R S×C The input is fed into the Normalization module, where S is the length of the input sequence and C is the number of features of the input data; The Normlization module is used to process the input sequence. After translation and scaling, the sequence is obtained. The translation and scaling processes are as follows: in, ∈R 1×C It is the mean. ∈R 1×C It is variance. This represents element-wise multiplication. ∈R S×C ; Construct mask data to obscure future information , and sequence The data is input together into the CLFN module for data embedding, and then output... Specifically: the input to the CLFN module is (X in +X0)∈R S×C' Where S refers to the sequence length and C' refers to X. in The number of features after adding X0 in the feature number dimension; The CLFN module includes two UpLinear layers, two Linear layers, one DownLinear layer, and one Dropout layer; X in the CLFN module in +X0 passes through an UpLinear layer and a Linear layer in sequence, and then is processed by the GELU activation function to obtain the features. ; X in the CLFN module in +X0 passes through another UpLinear layer and another Linear layer in sequence to obtain the feature. ; Features With features After multiplication, through DownLinear layer and Dropout layer, the output of CLFN module is obtained ; The Encoder module is used for... Perform feature extraction and output H; The projector module is used to map H to the output where P is the prediction sequence length and C is the number of input features. The De-Normlization module is used for sequence... Performing denormalization, it is represented as: get output ; Step 3: Divide the data in the dataset into training set, validation set and test set, and train, validate and test the drilling speed prediction model to obtain the trained drilling speed prediction model.
2. The rig penetration rate prediction method based on the Transformer framework according to claim 1, characterized in that, The feature data corresponding to the drilling speed mentioned in step 1 refers to feature data with a large mutual information value with the drilling speed.
3. The rig penetration rate prediction method based on the Transformer framework of claim 1, wherein, The characteristic data corresponding to the drilling rate mentioned in step 1 include the measured well depth MD, drilling pressure WOB, tubing pressure SP, torque TQ, rotational speed RS, mud flow rate MF, hook load HL, and vertical well depth TVD.
4. The drilling speed prediction method based on the Transformer framework according to claim 1, characterized in that, There are N encoder modules; each encoder module includes a multi-head sparse self-attention layer and a feedforward neural network; and residual connections and layer normalization are added after the multi-head sparse self-attention layer and the feedforward neural network.
5. The rig penetration rate prediction method based on the Transformer framework of claim 1, wherein, The mean square error (MSE) and mean absolute error (MAE) were used to evaluate the drilling speed prediction model. where: m is the test data set size, is the true value of the penetration rate, is the model predicted value of the penetration rate.
6. The rig penetration rate prediction method based on the Transformer framework of claim 1, wherein, The optimizer used is the Adam optimizer, and the loss function used is MSELoss.