A method and system for autonomous hover control of a plunger based on expansion mechanism in a natural gas well
By using a multi-point pressure sensor array and edge computing nodes for real-time monitoring, combined with adaptive filtering and machine learning models, and utilizing a deep Q-network intelligent decision-making mechanism under a reinforcement learning framework and piezoelectric material actuators, the adaptability and precise control of the plunger autonomous hovering control system in the face of changes in the downhole environment were solved, achieving efficient and safe hovering operation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CNPC BOHAI DRILLING ENG
- Filing Date
- 2024-12-27
- Publication Date
- 2026-06-30
AI Technical Summary
Existing plunger autonomous hovering control systems in natural gas wells are unable to adapt quickly to changes in the downhole environment, cannot achieve precise deformation control at the micron level, and are difficult to flexibly adjust operating strategies when encountering unforeseen situations, leading to increased operational risks.
Real-time monitoring is achieved using a multi-point pressure sensor array and edge computing nodes. The expansion degree is predicted by combining adaptive filtering and machine learning models. The optimal hovering strategy is determined by combining fluid dynamic parameters and high-resolution environmental information through a deep Q-network intelligent decision-making mechanism under a reinforcement learning framework. Piezoelectric material actuators are used for micron-level precise deformation control. A Bayesian inference-driven online monitoring system is deployed to ensure stable hovering.
It achieves micron-level precise control of plunger position and state, reduces reaction time, improves system flexibility and adaptability, reduces potential risks, and ensures operational safety and efficiency.
Smart Images

Figure CN122304673A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of plunger hovering control technology, and in particular to a plunger autonomous hovering control method and system based on an expansion mechanism in a natural gas well. Background Technology
[0002] In natural gas wells, autonomous hovering control of the plunger is crucial for improving gas production efficiency and safety. The plunger needs to precisely adjust its position and state in the complex downhole environment to ensure efficient fluid transfer and prevent equipment damage. To this end, modern natural gas production systems are typically equipped with multi-point pressure sensor arrays to monitor pressure distribution changes on the wellbore in real time. The data collected by these sensors not only reflects the dynamic environment within the well but also provides critical information for optimizing plunger operation. By transmitting the monitoring data to edge computing nodes and processing it using adaptive filtering algorithms and machine learning models, noise can be removed and the optimal expansion degree can be predicted, thereby generating an expansion diameter adjustment scheme. This process requires highly accurate data processing capabilities and intelligent decision support to ensure the plunger can operate stably in the complex and ever-changing downhole environment.
[0003] While existing solutions have achieved some degree of automation in plunger operation, several limitations remain. For instance, due to reliance on pre-set parameters and limited feedback mechanisms, current systems struggle to adapt quickly to sudden changes in the downhole environment, potentially missing optimal operating opportunities. Traditional methods also exhibit limitations in handling complex nonlinear problems, particularly in predicting plunger position and stress conditions, failing to achieve micrometer-level precision in deformation control. Furthermore, existing systems have limited adaptability to unforeseen situations or new challenges, making it difficult to flexibly adjust operating strategies to maintain plunger safety and stability. Additionally, once the plunger deviates from the predetermined hovering position, the process of recalculating the optimal hovering position or executing emergency avoidance maneuvers is often delayed, increasing operational risks. Summary of the Invention
[0004] This invention provides a method and system for autonomous plunger hovering control based on an expansion mechanism in a natural gas well, in order to solve some of the technical problems in the background art.
[0005] In a first aspect, embodiments of the present invention provide a method for autonomous plunger hovering control based on an expansion mechanism in a natural gas well, comprising:
[0006] A multi-point pressure sensor array mounted on the plunger is used to monitor the pressure distribution changes on the wellbore in real time, obtain monitoring data, and transmit the monitoring data to an edge computing node. Based on an adaptive filtering algorithm and a machine learning model, the monitoring data is noise-removed and the optimal expansion degree is predicted to obtain an expansion diameter adjustment scheme.
[0007] The expansion mechanism of the plunger is adjusted accordingly based on the expansion diameter adjustment scheme. At the same time, a three-dimensional flow field simulation environment is constructed by combining the hydrodynamic parameters of the plunger's location and the high-resolution environmental information provided by the downhole distributed fiber optic sensor network. The current motion state of the plunger in the three-dimensional flow field simulation environment is modeled and predicted using the particle swarm optimization algorithm and Kalman filter, and the plunger position prediction result and the resistance situation of the plunger are generated.
[0008] Based on the plunger position prediction results and resistance conditions, combined with the complex distance and path relationship between multiple pre-set hovering target points in the well, a deep Q-network intelligent decision-making mechanism under the reinforcement learning framework is adopted in conjunction with a risk assessment module to determine the optimal hovering strategy that minimizes energy consumption, time cost, safety and reliability. The hovering strategy specifically defines the sequence of actions required when approaching each hovering point.
[0009] When the plunger approaches the selected hovering point according to the above hovering strategy, a fine adjustment mode based on feedback control theory is activated. According to the piezoelectric material actuator integrated in the expansion mechanism, the pressure difference on both sides of the plunger is precisely deformed at the micrometer level according to the action sequence specified in the hovering strategy. The sliding film variable structure control algorithm is applied to stabilize the plunger in hovering.
[0010] After the plunger successfully hovers, an online monitoring system driven by Bayesian inference is deployed. Based on the continuous analysis of the changing trends of surrounding environmental parameters by the system, if factors that may cause the plunger to deviate from the hovering position are detected, the preset emergency response plan is immediately activated to recalculate the optimal hovering position or perform an emergency avoidance maneuver to maintain the stability of the plunger at the target hovering point.
[0011] Optionally, based on the plunger position prediction results and resistance conditions, and combined with the complex distance and path relationships between multiple pre-set hovering target points in the well, a deep Q-network intelligent decision-making mechanism under the reinforcement learning framework, combined with a risk assessment module, is used to determine the optimal hovering strategy that minimizes energy consumption, time cost, safety, and reliability. This hovering strategy specifically defines the sequence of actions required when approaching each hovering point, including:
[0012] Based on the plunger position prediction results and the resistance encountered by the plunger, combined with the complex distance and path relationship between multiple pre-set hovering target points in the well, high-resolution environmental information provided by the downhole distributed fiber optic sensor network is integrated to form a comprehensive decision dataset.
[0013] Using a comprehensive decision dataset, the intelligent decision-making mechanism of the deep Q network is trained to minimize energy consumption and time cost as conditions for the reward function, and a risk assessment module is introduced to quantify the safety and reliability of different paths.
[0014] Through continuous simulation and optimization, the intelligent decision-making model is trained to obtain an optimized intelligent decision-making model that selects the optimal action in a given state. Based on this optimized intelligent decision-making model, each state is analyzed and processed to generate specific operation instructions, which constitute the action sequence of the corresponding state.
[0015] Each action sequence is designed to guide the plunger to the next hovering point in the most efficient and safe way. The operating instructions include adjusting the diameter of the expansion mechanism, changing the plunger's speed and direction.
[0016] Optionally, a comprehensive decision dataset is used to train the deep Q-network intelligent decision-making mechanism, minimizing energy consumption and time cost as part of the reward function, and a risk assessment module is introduced to quantify the safety and reliability of different paths, including:
[0017] Based on the plunger position prediction results and resistance conditions, combined with the complex distance and path relationships between multiple pre-set hovering target points in the well, and the high-resolution environmental information provided by the downhole distributed fiber optic sensor network, a comprehensive decision dataset is formed. A reward function is constructed based on this comprehensive decision dataset, in which energy consumption and time cost minimization are taken as the core optimization objectives. At the same time, a risk assessment module is introduced to quantify the safety and reliability of different paths.
[0018] Using a comprehensive decision dataset, the deep Q-network model is initialized. The input layer is defined to receive the plunger state, the output layer generates the predicted action value, and the hidden layer structure is set to capture complex nonlinear relationships. The parameter weights are initialized to provide a starting point for the learning process.
[0019] Based on the initialized deep Q-network model, the model is trained in a simulated environment using a comprehensive decision dataset. In each iteration, an action is selected based on the current state, and the immediate reward and the next state brought about by the action are observed. The Q-value table is updated or the experience replay buffer is used to store past experiences, thus obtaining the pre-trained intelligent decision model.
[0020] The intelligent decision-making model after initial training is optimized by applying reinforcement learning algorithms. The Q-value is updated based on the actual rewards obtained, and the model parameters are adjusted to determine the action sequence that maximizes the cumulative reward. This generates an optimized intelligent decision-making model. During training, if a certain path is considered to have a high risk, the expected reward of the action corresponding to that path is reduced to encourage the model to choose a safer path and optimize the intelligent decision-making model. When the model converges, the optimized intelligent decision-making model is obtained. This model can select the optimal action in a given state, that is, the action sequence required to approach each hovering point.
[0021] Based on the optimized intelligent decision-making model, each preset predicted state is analyzed and processed to generate specific operation instructions. These operation instructions constitute the action sequence for the corresponding state, which guides the plunger to reach the next hovering point in the most efficient and safe manner.
[0022] Optionally, reinforcement learning algorithms are applied to optimize the initially trained intelligent decision-making model. The Q-value is updated based on the actual rewards obtained, and the model parameters are adjusted to determine the action sequence that maximizes the cumulative reward, generating an optimized intelligent decision-making model. During training, if a path is considered to have a high risk, the expected reward for the action corresponding to that path is reduced to encourage the model to choose safer paths, thus optimizing the intelligent decision-making model. When the model converges, the optimized intelligent decision-making model is obtained. This model can select the optimal action in a given state, i.e., the action sequence required to approach each hovering point, including:
[0023] Using the pre-trained intelligent decision-making model, reinforcement learning algorithms are applied for optimization to obtain a deep Q-network model. Based on the initialized deep Q-network model, the pre-trained intelligent decision-making model is obtained by training on a comprehensive decision dataset in a simulated environment. Reinforcement learning algorithms are applied to further optimize the model to obtain an optimized intelligent decision-making model. The optimization process uses a target network to stabilize the learning process, and dual DQN reduces the overestimation problem.
[0024] Based on the actual rewards obtained, the Q-value of the optimized intelligent decision-making model is updated and the parameters are adjusted to generate an optimized intelligent decision-making model. Based on the actual immediate rewards obtained after the plunger performs the action, the Q-value of the corresponding state-action pair is updated using the Bellman equation. At the same time, the weight parameters of the neural network are adjusted through the backpropagation algorithm, so that the intelligent decision-making model can gradually learn to select action sequences that can maximize the cumulative rewards in the long run, thereby generating an optimized intelligent decision-making model.
[0025] Based on the optimized intelligent decision-making model, the action sequence that maximizes the cumulative reward is determined. During the training process, the target action sequence is learned based on the optimized intelligent decision-making model. The target action can determine the maximum cumulative reward in different states, that is, the optimal action sequence to be taken when approaching each hovering point.
[0026] The risk assessment module is invoked in real time to quantitatively evaluate the safety and reliability of the selected path. If a path is deemed to have a high risk, the expected reward for the action corresponding to that path is reduced, prompting the model to favor choosing a safer path. This feedback mechanism is directly integrated into the reward function, ensuring that the final intelligent decision-making model is not only efficient but also safe and reliable.
[0027] When the training process reaches model convergence, the final optimized intelligent decision-making model is obtained.
[0028] Based on the final optimized intelligent decision-making model, each preset predicted state is analyzed and processed to generate specific operation instructions. These operation instructions constitute the action sequence for the corresponding state, which guides the plunger to reach the next hovering point in the most efficient and safe manner.
[0029] Optionally, based on the actual rewards obtained, the intelligent decision-making model under optimization is updated with Q-values and its parameters are adjusted to generate an optimized intelligent decision-making model. The Bellman equation is used to update the Q-values of the corresponding state-action pairs based on the immediate rewards obtained after the plunger performs its action. Simultaneously, the weight parameters of the neural network are adjusted using the backpropagation algorithm, enabling the intelligent decision-making model to gradually learn to select action sequences that maximize cumulative rewards in the long run, thereby generating an optimized intelligent decision-making model, including:
[0030] Based on the immediate reward actually obtained after the plunger performs the action, update the Q value of the corresponding state-action pair using the Bellman equation;
[0031] Based on the updated Q value, the weight parameters of the neural network are adjusted through the backpropagation algorithm to obtain the intelligent decision model with adjusted parameters.
[0032] By updating the Q-value and adjusting the parameters, the intelligent decision-making model determines the maximum cumulative reward of the target action sequence under different states, that is, the optimal action sequence to be taken when approaching each hovering point. Finally, when the training reaches model convergence, the final optimized intelligent decision-making model is obtained.
[0033] Optionally, based on the immediate reward actually obtained after the plunger performs the action, the Q value of the corresponding state-action pair is updated using the Bellman equation, including:
[0034] By utilizing the immediate reward actually obtained after the plunger performs the action, combined with future reward prediction and environmental uncertainty estimation, the Q value of the corresponding state-action pair is updated using the improved Bellman equation to obtain the updated Q value. The improved Bellman equation includes the introduction of a mixing factor, uncertainty estimation, and comprehensive consideration of immediate reward, future reward prediction, and environmental uncertainty.
[0035] A dual Q-learning mechanism and multi-step rewards are introduced to further optimize the Q-value update process.
[0036] Secondly, embodiments of the present invention provide a plunger autonomous hovering control system based on an expansion mechanism in a natural gas well, comprising:
[0037] The monitoring module is used to monitor the pressure distribution changes on the well wall in real time using a multi-point pressure sensor array set on the plunger, obtain monitoring data, and transmit the monitoring data to the edge computing node. Based on the adaptive filtering algorithm and machine learning model, the monitoring data is noise-removed and the optimal expansion degree is predicted to obtain the expansion diameter adjustment scheme.
[0038] The module is used to adjust the expansion mechanism of the plunger according to the expansion diameter adjustment scheme. At the same time, it combines the hydrodynamic parameters of the plunger's location and the high-resolution environmental information provided by the downhole distributed fiber optic sensor network to construct a three-dimensional flow field simulation environment. The particle swarm optimization algorithm and Kalman filter are used to model and predict the current motion state of the plunger in the three-dimensional flow field simulation environment, and generate the plunger position prediction result and the resistance situation of the plunger.
[0039] The determination module is used to determine the optimal hovering strategy based on the plunger position prediction results and resistance conditions, combined with the complex distance and path relationship between multiple pre-set hovering target points in the well. It adopts a deep Q-network intelligent decision-making mechanism under the reinforcement learning framework and combines it with the risk assessment module to determine the optimal hovering strategy that minimizes energy consumption, time cost, safety and reliability. The hovering strategy specifically defines the sequence of actions to be taken when approaching each hovering point.
[0040] The control module is used to activate a fine adjustment mode based on feedback control theory when the plunger approaches the selected hovering point according to the above hovering strategy. According to the piezoelectric material actuator integrated in the expansion mechanism, the pressure difference on both sides of the plunger is precisely controlled at the micron level according to the action sequence specified in the hovering strategy. The sliding film variable structure control algorithm is applied to stabilize the plunger hovering.
[0041] The deployment module is used to deploy an online monitoring system driven by Bayesian inference after the plunger successfully hovers. Based on the continuous analysis of the changing trends of surrounding environmental parameters, if factors that may cause the plunger to deviate from the hovering position are detected, the preset emergency response plan is immediately activated to recalculate the optimal hovering position or perform an emergency avoidance maneuver to maintain the stability of the plunger at the target hovering point.
[0042] Thirdly, embodiments of the present invention provide a computing device, including a processing component and a storage component; the storage component stores one or more computer instructions; the one or more computer instructions are invoked and executed by the processing component to implement a plunger autonomous hovering control method based on an expansion mechanism in a natural gas well as described in the first aspect above.
[0043] Fourthly, embodiments of the present invention provide a computer storage medium storing a computer program, which, when executed by a computer, implements a plunger autonomous hovering control method based on an expansion mechanism in a natural gas well as described in the first aspect.
[0044] This invention utilizes a multi-point pressure sensor array mounted on the plunger to monitor real-time pressure distribution changes on the wellbore. Noise is removed and the optimal expansion degree is predicted using an adaptive filtering algorithm and machine learning model in the edge computing node, generating an expansion diameter adjustment scheme. The plunger's expansion mechanism is adjusted according to this scheme, and a three-dimensional flow field simulation environment is constructed by combining fluid dynamics parameters and high-resolution environmental information. Particle swarm optimization and Kalman filters are used for modeling and prediction to determine the plunger's position and resistance. Furthermore, a deep Q-network intelligent decision-making mechanism under a reinforcement learning framework, combined with a risk assessment module, is employed to formulate an optimal hovering strategy that minimizes energy consumption and ensures safety and reliability, defining the action sequence when approaching each hovering point. When the plunger approaches a selected hovering point, a fine-tuning mode based on feedback control theory is activated, using piezoelectric actuators to achieve micron-level precise deformation control, and a sliding film variable structure control algorithm is applied to ensure stable hovering. After successful hovering, a Bayesian inference-driven online monitoring system is deployed to continuously analyze environmental parameter trends and activate emergency response plans to maintain stability.
[0045] This invention achieves micron-level precise control of the plunger's position and state through a multi-point pressure sensor array and advanced algorithms. It introduces a rapid feedback control system, enabling the plunger to quickly adapt to changes in the downhole environment, reducing reaction time. Utilizing a deep Q-network intelligent decision-making mechanism within a reinforcement learning framework, it minimizes energy consumption and time costs, improving economic efficiency. Combined with a risk assessment module and a Bayesian inference-driven online monitoring system, it ensures the safety and reliability of plunger operation, reduces potential risks, and possesses the ability to handle complex nonlinear problems. It can flexibly adjust operating strategies in changing downhole environments to maintain stable plunger hovering.
[0046] Furthermore, this embodiment of the invention describes how to update the Q-value and adjust the parameters of the intelligent decision-making model under optimization based on the actual reward obtained, in order to generate an optimized intelligent decision-making model. The specific steps include: First, updating the Q-value of the corresponding state-action pair using the Bellman equation based on the immediate reward obtained after the plunger performs the action. This step ensures that the model can continuously learn the optimal action selection strategy and considers the balance between short-term and long-term gains. Second, based on the updated Q-value, adjusting the weight parameters of the neural network through the backpropagation algorithm, obtaining a parameter-adjusted intelligent decision-making model that minimizes the error between the predicted Q-value and the actual reward. As training progresses, the intelligent decision-making model gradually learns to select action sequences that maximize cumulative rewards in the long run, i.e., the optimal action sequence to be taken when approaching each hovering point. Finally, when the model converges, the final optimized intelligent decision-making model is obtained, which can select the optimal action in a given state, ensuring that the operation is both efficient and safe.
[0047] By updating the Q-value using the Bellman equation and adjusting the weight parameters using the backpropagation algorithm, the intelligent decision-making model can quickly and accurately learn the optimal action selection strategy, shortening the training cycle. The introduction of a weighted average estimate of immediate and future rewards allows the model to consider not only short-term gains but also long-term benefits, enhancing its adaptability to complex and changing environments. Utilizing the improved Q-value update mechanism, the intelligent decision-making model can more accurately evaluate the value of different actions, thereby determining the optimal action sequence and improving the quality and reliability of decision-making. The model learns to choose the safest path, reducing potential risks and ensuring the safety and stability of the plunger operation. When selecting actions, the intelligent decision-making model fully considers the goal of minimizing energy consumption and time costs, achieving efficient resource utilization and improving the overall system's economic benefits. The model can dynamically adjust parameters based on the latest environmental information to maintain optimal performance, quickly adapting even to unknown or changing environments and maintaining a highly efficient and stable operating state.
[0048] These or other aspects of this application will become more apparent in the following description of the embodiments. Attached Figure Description
[0049] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0050] Figure 1 A flowchart of a plunger autonomous hovering control method based on an expansion mechanism in a natural gas well, as provided in this application, is shown.
[0051] Figure 2 This application provides a schematic diagram of the structure of a plunger autonomous hovering control system based on an expansion mechanism in a natural gas well.
[0052] Figure 3 A schematic diagram of the structure of a computing device provided in this application is shown. Detailed Implementation
[0053] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings.
[0054] In some of the processes described in the specification, claims, and accompanying drawings of this application, multiple operations appearing in a specific order are included. However, it should be clearly understood that these operations may not be executed in the order they appear herein, or may be executed in parallel. The operation numbers, such as 101, 102, etc., are merely used to distinguish different operations and do not themselves represent any execution order. Furthermore, these processes may include more or fewer operations, and these operations may be executed sequentially or in parallel. It should be noted that the descriptions such as "first," "second," etc., in this document are used to distinguish different messages, devices, modules, etc., and do not represent a chronological order, nor do they limit "first" and "second" to different types.
[0055] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0056] Figure 1 A flowchart of a plunger autonomous hovering control method based on an expansion mechanism in a natural gas well is provided as an embodiment of the present invention, as shown below. Figure 1 As shown, the method includes:
[0057] Step S1: Using a multi-point pressure sensor array installed on the plunger, the pressure distribution change of the well wall is monitored in real time to obtain monitoring data. The monitoring data is then transmitted to an edge computing node. Based on an adaptive filtering algorithm and a machine learning model, the monitoring data is noise-removed and the optimal expansion degree is predicted to obtain an expansion diameter adjustment scheme.
[0058] In this step, a multi-point pressure sensor array mounted on the plunger is used to monitor pressure distribution changes in the wellbore in real time. These sensors are distributed at different locations on the plunger, accurately capturing pressure changes in different areas of the wellbore and transmitting the monitoring data to edge computing nodes. These edge computing nodes are located downhole, close to the sensors, to reduce data transmission latency and alleviate the burden on the central server. Based on adaptive filtering algorithms and machine learning models, the monitoring data is noise-removed and the optimal expansion diameter is predicted. Adaptive filtering removes noise interference, improving data quality; machine learning models (such as random forests and support vector machines) analyze the processed data to predict the optimal expansion diameter adjustment scheme.
[0059] The embodiments of the present invention improve the accuracy and reliability of data, providing a solid foundation for subsequent decision-making. The application of adaptive filtering and machine learning models enhances the intelligence level of the system and improves prediction accuracy.
[0060] Step S2: Adjust the expansion mechanism of the plunger according to the expansion diameter adjustment scheme. At the same time, combine the hydrodynamic parameters of the plunger's location and the high-resolution environmental information provided by the downhole distributed fiber optic sensor network to construct a three-dimensional flow field simulation environment. Use the particle swarm optimization algorithm and Kalman filter to model and predict the current motion state of the plunger in the three-dimensional flow field simulation environment, and generate the plunger position prediction result and the resistance situation of the plunger.
[0061] In this step, the plunger's expansion mechanism is adjusted according to the expansion diameter adjustment scheme. The plunger's diameter is adjusted via a hydraulic or electric drive system to match the predicted optimal expansion degree. A three-dimensional flow field simulation environment is constructed by combining the hydrodynamic parameters of the plunger's location with high-resolution environmental information provided by a downhole distributed fiber optic sensor network. This environment comprehensively considers parameters such as temperature, pressure, and flow velocity, providing a comprehensive downhole dynamic view. The current motion state of the plunger in this three-dimensional flow field simulation environment is modeled and predicted using a particle swarm optimization algorithm and a Kalman filter, generating predicted plunger position results and the resistance experienced by the plunger.
[0062] Beneficial effects: Accurate simulation and prediction of plunger behavior in complex downhole environments improves operational safety and efficiency. The introduction of high-resolution environmental information makes the simulation more realistic and reliable, enhancing prediction accuracy.
[0063] Step S3: Based on the plunger position prediction results and resistance conditions, and combined with the complex distance and path relationships between multiple pre-set hovering target points in the well, a deep Q-network intelligent decision-making mechanism under the reinforcement learning framework is adopted in conjunction with a risk assessment module to determine the optimal hovering strategy that minimizes energy consumption, time cost, safety, and reliability. This hovering strategy specifically defines the sequence of actions required when approaching each hovering point.
[0064] In this step, based on the plunger position prediction results and resistance conditions, and combined with the complex distance and path relationships between multiple pre-set hovering target points in the well, a deep Q-network intelligent decision-making mechanism under the reinforcement learning framework is adopted in conjunction with a risk assessment module to determine the optimal hovering strategy that minimizes energy consumption, time cost, safety, and reliability. The hovering strategy specifically defines the sequence of actions required when approaching each hovering point, including but not limited to adjusting the diameter of the expansion mechanism, changing the plunger movement speed or direction, etc.
[0065] Beneficial effects: The application of reinforcement learning framework and deep Q-network enables intelligent decision-making, ensuring the efficiency, economy and safety of operations. The introduction of risk assessment module further ensures the safety of operations and reduces potential risks.
[0066] Step S4: When the plunger approaches the selected hovering point according to the above hovering strategy, the fine adjustment mode based on feedback control theory is activated. According to the piezoelectric material actuator integrated in the expansion mechanism, the pressure difference on both sides of the plunger is precisely deformed at the micrometer level according to the action sequence specified in the hovering strategy. The sliding film variable structure control algorithm is applied to stabilize the plunger.
[0067] In this step, when the plunger approaches the selected hovering point according to the aforementioned hovering strategy, a fine-tuning mode based on feedback control theory is activated. This mode utilizes a piezoelectric actuator integrated into the expansion mechanism to perform micron-level precise deformation control on the pressure difference across the plunger according to the action sequence specified in the hovering strategy. A sliding film variable structure control algorithm is applied to stabilize the plunger's hovering. The sliding film variable structure control algorithm can respond and adjust rapidly in unstable environments, ensuring stable plunger hovering.
[0068] Beneficial effects: Micron-level precise deformation control greatly improves the accuracy of plunger hovering and reduces deviation. The application of the sliding film variable structure control algorithm enhances the system's response speed and stability, ensuring the continuity and reliability of operation.
[0069] Step S5: After the plunger successfully hovers, deploy an online monitoring system driven by Bayesian inference. Based on the continuous analysis of the changing trends of surrounding environmental parameters by the system, if factors that may cause the plunger to deviate from the hovering position are detected, immediately activate the preset emergency response plan, recalculate the optimal hovering position or perform an emergency avoidance maneuver to maintain the stability of the plunger at the target hovering point.
[0070] In this step, after the plunger successfully hovers, an online monitoring system driven by Bayesian inference is deployed. This system continuously analyzes the changing trends of surrounding environmental parameters, such as fluid characteristics and signs of geological activity. If it detects factors that may cause the plunger to deviate from its hovering position, it immediately activates a preset emergency response plan, recalculates the optimal hovering position, or executes an emergency avoidance maneuver to maintain the stability of the plunger at the target hovering point.
[0071] Beneficial effects: The Bayesian inference-driven online monitoring system provides real-time monitoring and early warning functions, enhances the system's self-protection capabilities, and ensures that the plunger remains stable even under abnormal conditions by instant activation of the emergency response plan, thereby improving the overall system's safety and reliability.
[0072] In summary, this invention provides a highly efficient method for autonomous plunger hovering control based on an expansion mechanism in natural gas wells. This method not only improves the accuracy and response speed of plunger operation but also enhances the system's flexibility and adaptability, thereby significantly improving the safety and efficiency of natural gas extraction.
[0073] In another embodiment of the present invention, based on the plunger position prediction results and resistance conditions, and combined with the complex distance and path relationships between multiple pre-set hovering target points in the well, a deep Q-network intelligent decision-making mechanism under a reinforcement learning framework, combined with a risk assessment module, is used to determine the optimal hovering strategy that minimizes energy consumption, time cost, safety, and reliability. This hovering strategy specifically defines the sequence of actions required when approaching each hovering point, including:
[0074] Based on the plunger position prediction results and the resistance encountered by the plunger, combined with the complex distance and path relationship between multiple pre-set hovering target points in the well, high-resolution environmental information provided by the downhole distributed fiber optic sensor network is integrated to form a comprehensive decision dataset.
[0075] In this step, based on the plunger position prediction results and the resistance encountered by the plunger, combined with the complex distances and path relationships between multiple pre-set hovering target points in the well, high-resolution environmental information provided by the downhole distributed fiber optic sensor network is integrated to form a comprehensive decision dataset. This dataset contains various situations that the plunger may encounter under different conditions, such as pressure distribution at different locations, hydrodynamic parameters, and geological structural characteristics, providing a rich information foundation for subsequent intelligent decision-making.
[0076] Using a comprehensive decision dataset, the intelligent decision-making mechanism of the deep Q network is trained to minimize energy consumption and time cost as conditions for the reward function, and a risk assessment module is introduced to quantify the safety and reliability of different paths.
[0077] In this step, a Deep Q-Network (DQN) intelligent decision-making mechanism is trained using a comprehensive decision dataset. Through extensive simulation experiments, the model learns how to select the optimal action based on the current state, minimizing energy consumption and time cost as part of the reward function. This ensures the model prioritizes efficiency and economy in the decision-making process. A risk assessment module is introduced to quantify the safety and reliability of different paths. This module evaluates the risk level of each potential path and feeds it back to the reward function, allowing the model to consider safety when selecting actions.
[0078] Through continuous simulation and optimization, the intelligent decision-making model is trained to obtain an optimized intelligent decision-making model that selects the optimal action in a given state. Based on this optimized intelligent decision-making model, each state is analyzed and processed to generate specific operation instructions, which constitute the action sequence of the corresponding state.
[0079] In this step, the intelligent decision-making model is trained and optimized through continuous simulation, gradually improving its performance until it can select the optimal action under a given state, resulting in an optimized intelligent decision-making model that selects the optimal action under a given state. This fully trained model can make the best decision quickly and accurately. Based on this optimized intelligent decision-making model, each state is analyzed and processed to generate specific operational instructions. These instructions constitute the action sequence for the corresponding state, guiding the plunger to complete the transition from one hovering point to the next.
[0080] Each action sequence is designed to guide the plunger to the next hovering point in the most efficient and safe way. The operating instructions include adjusting the diameter of the expansion mechanism, changing the plunger's speed and direction.
[0081] In this step, each action sequence is designed to guide the plunger to the next hovering point in the most efficient and safe manner. Specific operational instructions include, but are not limited to, adjusting the diameter of the expansion mechanism, changing the plunger's speed and direction. These operational instructions are dynamically generated by an optimized intelligent decision-making model, responding in real time to changes in the downhole environment to ensure that the plunger is always in the optimal operating condition.
[0082] In summary, this invention provides a method for determining the optimal hovering strategy based on a deep Q-network intelligent decision-making mechanism within a reinforcement learning framework, combined with a risk assessment module. This method not only improves the safety and efficiency of plunger operation but also ensures efficient energy utilization and minimizes time costs. Furthermore, through continuous simulation and optimization, the intelligent decision-making model gradually converges to the optimal solution, providing intelligent autonomous hovering guidance for the plunger and significantly enhancing the safety and economic benefits of natural gas extraction.
[0083] In another embodiment of the present invention, a comprehensive decision dataset is used to train the deep Q-network intelligent decision-making mechanism, minimizing energy consumption and time cost as part of the reward function, and a risk assessment module is introduced to quantify the safety and reliability of different paths, including:
[0084] Based on the plunger position prediction results and resistance conditions, combined with the complex distance and path relationships between multiple pre-set hovering target points in the well, and the high-resolution environmental information provided by the downhole distributed fiber optic sensor network, a comprehensive decision dataset is formed. A reward function is constructed based on this comprehensive decision dataset, in which energy consumption and time cost minimization are taken as the core optimization objectives. At the same time, a risk assessment module is introduced to quantify the safety and reliability of different paths.
[0085] In this step, based on the plunger position prediction results and resistance conditions, combined with the complex distances and path relationships between multiple pre-set hovering target points within the well, and high-resolution environmental information provided by the downhole distributed fiber optic sensor network, a comprehensive decision dataset is formed. A reward function is then constructed based on this comprehensive decision dataset. The reward function uses minimizing energy consumption and time cost as the core optimization objective, while a risk assessment module is introduced to quantify the safety and reliability of different paths. This approach ensures that the model considers not only efficiency and economy but also operational safety when selecting actions.
[0086] Using a comprehensive decision dataset, the deep Q-network model is initialized. The input layer is defined to receive the plunger state, the output layer generates the predicted action value, and the hidden layer structure is set to capture complex nonlinear relationships. The parameter weights are initialized to provide a starting point for the learning process.
[0087] In this step, a comprehensive decision dataset is used to initialize the deep Q-network model. The input layer is defined to receive the plunger state (such as position, velocity, resistance, etc.), the output layer generates predicted action values, and the hidden layer structure is set to capture complex nonlinear relationships. The parameter weights are initialized to provide a starting point for the learning process.
[0088] Based on the initialized deep Q-network model, the model is trained in a simulated environment using a comprehensive decision dataset. In each iteration, an action is selected based on the current state, and the immediate reward and the next state brought about by the action are observed. The Q-value table is updated or the experience replay buffer is used to store past experiences, thus obtaining the pre-trained intelligent decision model.
[0089] In this step, the initialized deep Q-network model is trained in a simulated environment using a comprehensive decision dataset. In each iteration, an action is selected based on the current state, and the immediate reward and the next state resulting from that action are observed. The Q-value table is then updated, or past experiences are stored in an experience replay buffer, resulting in a preliminarily trained intelligent decision model.
[0090] The intelligent decision-making model after initial training is optimized by applying reinforcement learning algorithms. The Q-value is updated based on the actual rewards obtained, and the model parameters are adjusted to determine the action sequence that maximizes the cumulative reward. This generates an optimized intelligent decision-making model. During training, if a certain path is considered to have a high risk, the expected reward of the action corresponding to that path is reduced to encourage the model to choose a safer path and optimize the intelligent decision-making model. When the model converges, the optimized intelligent decision-making model is obtained. This model can select the optimal action in a given state, that is, the action sequence required to approach each hovering point.
[0091] In this step, reinforcement learning algorithms are applied to optimize the initially trained intelligent decision-making model. The Q-value is updated based on the actual rewards obtained, and the model parameters are adjusted to determine the action sequence that maximizes the cumulative reward, generating an optimized intelligent decision-making model. During training, if a path is considered to have a high risk, the expected reward for the action corresponding to that path is reduced to encourage the model to choose safer paths. This feedback mechanism is directly integrated into the reward function, ensuring that the final intelligent decision-making model is not only efficient but also safe and reliable. When the model converges, the optimized intelligent decision-making model is obtained, which can select the optimal action in a given state, i.e., the action sequence required to approach each hovering point.
[0092] Based on the optimized intelligent decision-making model, each preset predicted state is analyzed and processed to generate specific operation instructions. These operation instructions constitute the action sequence for the corresponding state, which guides the plunger to reach the next hovering point in the most efficient and safe manner.
[0093] In this step, based on the optimized intelligent decision-making model, each preset predicted state is analyzed and processed to generate specific operation instructions. These operation instructions constitute the action sequence for the corresponding state, guiding the plunger to reach the next hovering point in the most efficient and safe manner. The operation instructions include, but are not limited to, adjusting the diameter of the expansion mechanism, changing the plunger's speed and direction.
[0094] This invention implements a method for training a deep Q-network intelligent decision-making mechanism using a comprehensive decision dataset. This method not only improves the safety and efficiency of plunger operation but also ensures efficient energy utilization and minimizes time costs. Simultaneously, through continuous simulation and optimization, the intelligent decision-making model gradually converges to the optimal solution, providing intelligent autonomous hovering guidance for the plunger, significantly improving the safety and economic benefits of natural gas extraction. Furthermore, the introduced risk assessment module further enhances the system's safety and reliability, ensuring operational safety.
[0095] In another embodiment of the present invention, a reinforcement learning algorithm is applied to optimize the initially trained intelligent decision-making model. The Q-value is updated based on the actual rewards obtained, and the model parameters are adjusted to determine the action sequence that maximizes the cumulative reward, generating an optimized intelligent decision-making model. During training, if a path is considered to have a high risk, the expected reward for the action corresponding to that path is reduced to encourage the model to choose a safer path, thus optimizing the intelligent decision-making model. When the model converges, the optimized intelligent decision-making model is obtained. This model can select the optimal action in a given state, i.e., the action sequence required to approach each hovering point, including:
[0096] Using the pre-trained intelligent decision-making model, reinforcement learning algorithms are applied for optimization to obtain a deep Q-network model. Based on the initialized deep Q-network model, the pre-trained intelligent decision-making model is obtained by training on a comprehensive decision dataset in a simulated environment. Reinforcement learning algorithms are applied to further optimize the model to obtain an optimized intelligent decision-making model. The optimization process uses a target network to stabilize the learning process, and dual DQN reduces the overestimation problem.
[0097] In this step, the pre-trained intelligent decision-making model is optimized using reinforcement learning algorithms. Based on the initialized Deep Q-Network (DQN) model, the pre-trained intelligent decision-making model is trained in a simulated environment using a comprehensive decision dataset. A target network is employed to stabilize the learning process, and double DQN is used to reduce overestimation. These techniques help improve training efficiency and model stability, avoiding learning bias caused by overestimation.
[0098] Beneficial effects: The application of the target network enhances the stability of the learning process and reduces learning fluctuations. The dual DQN technique effectively reduces overestimation problems and improves the accuracy of model predictions.
[0099] Based on the actual rewards obtained, the Q-value of the optimized intelligent decision-making model is updated and the parameters are adjusted to generate an optimized intelligent decision-making model. Based on the actual immediate rewards obtained after the plunger performs the action, the Q-value of the corresponding state-action pair is updated using the Bellman equation. At the same time, the weight parameters of the neural network are adjusted through the backpropagation algorithm, so that the intelligent decision-making model can gradually learn to select action sequences that can maximize the cumulative rewards in the long run, thereby generating an optimized intelligent decision-making model.
[0100] In this step, the intelligent decision-making model undergoes Q-value updates and parameter adjustments based on the actual rewards received. After each action performed by the plunger, the Q-value of the corresponding state-action pair is updated using the Bellman equation based on the immediate reward, and the weight parameters of the neural network are adjusted through backpropagation. This step aims to minimize the error between the predicted Q-value and the actual reward, allowing the model to gradually converge to the optimal solution.
[0101] Beneficial effects: The application of the Bellman equation ensures the accuracy and rationality of Q-value updates, and the backpropagation algorithm enables the model to continuously optimize its parameters, thereby improving decision quality.
[0102] Based on the optimized intelligent decision-making model, the action sequence that maximizes the cumulative reward is determined. During the training process, the target action sequence is learned based on the optimized intelligent decision-making model. The target action can determine the maximum cumulative reward in different states, that is, the optimal action sequence to be taken when approaching each hovering point.
[0103] In this step, an optimized intelligent decision-making model is used to determine the action sequence that maximizes cumulative reward. As training progresses, the model gradually learns which action sequences can bring the greatest cumulative reward in different states, i.e., the optimal action sequence to take when approaching each hovering point.
[0104] Beneficial effects: The goal of maximizing cumulative rewards enables the model to make optimal decisions in the long run, improving operational efficiency and economy.
[0105] The risk assessment module is invoked in real time to quantitatively evaluate the safety and reliability of the selected path. If a path is deemed to have a high risk, the expected reward for the action corresponding to that path is reduced, prompting the model to favor choosing a safer path. This feedback mechanism is directly integrated into the reward function, ensuring that the final intelligent decision-making model is not only efficient but also safe and reliable.
[0106] In this step, the risk assessment module is invoked in real time to quantitatively evaluate the safety and reliability of the selected path. If a path is deemed to have a high risk, the expected reward for the action corresponding to that path is reduced, prompting the model to favor choosing a safer path. This feedback mechanism is directly integrated into the reward function, ensuring that the final intelligent decision-making model is not only efficient but also safe and reliable.
[0107] Beneficial effects: The application of the risk assessment module enhances the system's security and reduces potential risks. The introduction of the feedback mechanism allows the model to prioritize security while ensuring efficiency.
[0108] When the training process reaches model convergence, the final optimized intelligent decision-making model is obtained.
[0109] Based on the final optimized intelligent decision-making model, each preset predicted state is analyzed and processed to generate specific operation instructions. These operation instructions constitute the action sequence for the corresponding state, which guides the plunger to reach the next hovering point in the most efficient and safe manner.
[0110] In this step, when the model converges during training, the final optimized intelligent decision-making model is obtained. At this point, the model has learned to select the optimal action in various states, achieving optimal performance. Based on the final optimized intelligent decision-making model, each preset predicted state is analyzed and processed to generate specific operation instructions. These operation instructions constitute the action sequence for the corresponding state, used to guide the plunger to reach the next hovering point in the most efficient and safe manner.
[0111] Beneficial effects: Model convergence signifies the successful completion of the training process, ensuring that the model has efficient decision-making capabilities and stable performance. The generation of specific operation instructions provides clear operating guidelines for the plunger, improving the accuracy and reliability of execution.
[0112] This invention implements a method for optimizing an intelligent decision-making model after initial training using reinforcement learning algorithms. This method not only improves the safety and efficiency of plunger operation but also ensures efficient energy utilization and minimizes time costs. Simultaneously, through continuous simulation and optimization, the intelligent decision-making model gradually converges to the optimal solution, providing intelligent autonomous hovering guidance for the plunger and significantly improving the safety and economic benefits of natural gas extraction. Furthermore, the introduced risk assessment module further enhances the system's safety and reliability, ensuring operational safety.
[0113] In another embodiment of the present invention, the intelligent decision-making model under optimization is updated with Q-values and its parameters are adjusted based on the actual rewards obtained, generating an optimized intelligent decision-making model. The Q-values of the corresponding state-action pairs are updated using the Bellman equation based on the immediate rewards obtained after the plunger performs its action. Simultaneously, the weight parameters of the neural network are adjusted using a backpropagation algorithm, enabling the intelligent decision-making model to gradually learn to select action sequences that maximize cumulative rewards in the long run, thereby generating an optimized intelligent decision-making model, including:
[0114] Based on the immediate reward actually obtained after the plunger performs the action, update the Q value of the corresponding state-action pair using the Bellman equation;
[0115] In this step, based on the updated Q value, the weight parameters of the neural network are adjusted through the backpropagation algorithm to obtain the parameter-adjusted intelligent decision model;
[0116] In this step, the Q-value of the corresponding state-action pair is updated using the Bellman equation based on the immediate reward actually obtained by the plunger after performing the action. Whenever the plunger takes an action in a given state and observes the immediate reward brought about by that action, the Q-value of the current state-action pair is calculated and updated using the Bellman equation.
[0117] Beneficial effects: The application of the Bellman equation ensures the accuracy and rationality of Q-value updates, enabling the model to continuously learn the optimal action selection strategy. The immediate reward feedback mechanism enhances the model's learning efficiency, allowing it to quickly adapt to environmental changes.
[0118] By updating the Q-value and adjusting the parameters, the intelligent decision-making model determines the maximum cumulative reward of the target action sequence under different states, that is, the optimal action sequence to be taken when approaching each hovering point. Finally, when the training reaches model convergence, the final optimized intelligent decision-making model is obtained.
[0119] In this step, the weight parameters of the neural network are adjusted using the backpropagation algorithm based on the updated Q-value. After each Q-value update, the weight parameters in the Deep Q-Network (DQN) model are adjusted using the backpropagation algorithm to minimize the error between the predicted Q-value and the actual reward. Through Q-value updates and parameter adjustments, the intelligent decision-making model determines the target action sequence. As training progresses, the model gradually learns which action sequences can bring the maximum cumulative reward in different states, i.e., the optimal action sequence to take when approaching each hovering point.
[0120] Beneficial effects: The application of the backpropagation algorithm enables the model to continuously optimize its parameters and improve decision quality. The adjustment of weight parameters ensures that the model can gradually converge to the optimal solution, which improves the stability of the learning process. Model convergence marks the successful completion of the training process, ensuring that the model has efficient decision-making ability and stable operation performance. Finally, the optimized intelligent decision-making model provides intelligent autonomous hovering guidance for the plunger, which significantly improves the safety and economic benefits of natural gas extraction.
[0121] Furthermore, based on the immediate reward actually obtained after the plunger performs the action, the Q value of the corresponding state-action pair is updated using the Bellman equation, including:
[0122] By utilizing the immediate reward actually obtained after the plunger performs the action, combined with future reward prediction and environmental uncertainty estimation, the Q value of the corresponding state-action pair is updated using the improved Bellman equation to obtain the updated Q value. The improved Bellman equation includes the introduction of a mixing factor, uncertainty estimation, and comprehensive consideration of immediate reward, future reward prediction, and environmental uncertainty.
[0123] A dual Q-learning mechanism and multi-step rewards are introduced to further optimize the Q-value update process.
[0124] In this embodiment of the invention, an improved Bellman equation is used to update the Q-value of the corresponding state-action pair based on the immediate reward actually obtained after the plunger performs the action. Specifically, this method updates the Q-value of the corresponding state-action pair using the improved Bellman equation, taking into account the immediate reward obtained after the plunger performs the action, future reward prediction, and environmental uncertainty estimation. A dual Q-learning mechanism and n-step returns are introduced to further optimize the Q-value update process. This method ensures that the model can more comprehensively consider the balance between short-term and long-term returns and adapt to dynamically changing downhole environments.
[0125] Among them, the improved Bellman equation application uses the actual instantaneous reward r obtained after the plunger performs the action, combined with future reward prediction and environmental uncertainty estimation, to update the Q value of the corresponding state-action pair (s,a) using the improved Bellman equation, and obtain the updated Q value.
[0126] The specific formula is as follows:
[0127] Q(s,a)←Q(s,a)+α[r+γE a′~π [Q(s′,a′)]Q(s,a)]
[0128] in,
[0129] r stands for instant reward;
[0130] α is the learning rate, which controls the degree of influence of new information;
[0131] γ is a discount factor used to reduce the importance of future rewards and make recent rewards more important;
[0132] s′ is the new state reached after taking action a;
[0133] E a′~π [Q(s′,a′)] is the expected future reward obtained by choosing action a′ according to policy π starting from the new state;
[0134] A Bayesian inference framework is introduced to assess environmental uncertainty and adjust the weights of future rewards, making the model more adaptable to the dynamically changing downhole environment.
[0135] Beneficial effects: The improved Bellman equation not only considers immediate rewards but also introduces a weighted average estimate of future rewards and assesses environmental uncertainty through a Bayesian inference framework, which improves the accuracy and adaptability of Q-value updates. This approach enables the model to more comprehensively consider the balance between short-term and long-term returns and enhances its decision-making ability in complex and volatile environments.
[0136] In addition, a double Q-learning mechanism is introduced. When updating the Q-value, two independent Q-networks are used to estimate the Q-value separately, and the estimates are cross-referenced during the update process to reduce overestimation. Specifically, when updating one Q-network, the other Q-network is used to select the optimal action and evaluate its Q-value, thus avoiding overestimation caused by self-reinforcement of a single Q-network.
[0137] Beneficial effects: The dual Q-learning mechanism significantly reduces overestimation and improves the accuracy of Q-value updates. The mutual verification and cross-use of the two Q-networks enhance the stability and reliability of the model.
[0138] Finally, multi-step returns can be applied, considering not only immediate rewards but also cumulative rewards over the next n steps, thus providing more accurate estimates of future rewards and improving learning efficiency and decision-making quality. The specific formula is as follows:
[0139] G t =r t+1 +γr t+2 +…+γ n-1 r t+n +γ n max a′ Q(s t+n ,a′)
[0140] in,
[0141] G t It is the cumulative reward over n steps starting from time t;
[0142] r t+i It is the immediate reward for step i;
[0143] γ is the discount factor;
[0144] max a′ Q(s t+n ,a′) is the estimate of the maximum future reward that can be obtained starting from the new state.
[0145] Beneficial effects:
[0146] The application of multi-step rewards provides a longer-term reward perspective, enabling the model to make better decisions in the long run and improving overall operational efficiency. By comprehensively considering the rewards of multiple time steps, the model's learning process is more stable and efficient.
[0147] This invention implements a method for updating the Q-value of corresponding state-action pairs using an improved Bellman equation, based on the immediate reward actually obtained after the plunger performs an action, combined with future reward prediction and environmental uncertainty estimation. This method not only improves the accuracy and adaptability of Q-value updates but also further optimizes the Q-value update process by introducing a dual Q-learning mechanism and multi-step rewards. This ensures that the model can more comprehensively consider the balance between short-term and long-term benefits and adapt to dynamically changing downhole environments. Ultimately, these improvements significantly enhance the learning efficiency, stability, and decision-making quality of the intelligent decision-making model, providing intelligent autonomous hovering guidance for the plunger and significantly improving the safety and economic benefits of natural gas extraction.
[0148] Implementation details:
[0149] 1. The specific process of applying the improved Bellman equation
[0150] After each action of the plunger, the Q-value of the current state-action pair (s,a) is calculated and updated using the improved Bellman equation, based on the actual immediate reward r and the next state s′, combined with future reward prediction and environmental uncertainty estimation:
[0151] Q(s,a)←Q(s,a)+α|[r+γE a′~π [Q(s′,a′)]Q(s,a)]
[0152] Where α is the learning rate, γ is the discount factor, and E a′~π [Q(s′,a′)] is an estimate of the maximum future reward that can be obtained starting from the new state, and a Bayesian inference framework is introduced to assess environmental uncertainty.
[0153] 2. The specific process of the dual Q learning mechanism:
[0154] When updating the Q-value, two independent Q-networks are used to estimate the Q-value separately. Each time one Q-network is updated, the other Q-network is used to select the optimal action and evaluate its Q-value, thus avoiding overestimation caused by self-reinforcement of a single Q-network.
[0155] 3. The specific process of multi-step returns:
[0156] Combining n-step returns not only considers immediate rewards but also the cumulative rewards over the next n steps, thus providing a more accurate estimate of future rewards, improving learning efficiency and decision-making quality. The specific formula is as follows:
[0157] G t =r t+1 +γr t+2 +…+γ n-1 r t+n +γ n max a′ Q(s t+n ,a′)
[0158] Through detailed step descriptions and technical implementation details, embodiments of the present invention demonstrate how to effectively apply the improved Bellman equation and advanced learning mechanisms to optimize the Q-value update process, ensuring that the plunger can operate efficiently and safely in complex and variable downhole environments.
[0159] Furthermore, in complex downhole environments, relying solely on maximum values may lead to overestimation, while introducing expected values can provide more stable long-term predictions. Downhole environments are complex and volatile, with many unknowns. By introducing an uncertainty estimate U(s,a) and assigning it an appropriate weight η, the model can better cope with uncertainty, reduce potential risks, and ensure operational safety. Meanwhile, a single-dimensional reward mechanism may be insufficient to guide the model to make optimal decisions. By comprehensively considering immediate rewards, future reward predictions, and environmental uncertainties, the model can more comprehensively evaluate the value of each action, thereby selecting the optimal action sequence. Based on this, the present invention also provides a specific embodiment as follows:
[0160] An improved Bellman equation for plunger autonomous hovering control based on expansion mechanisms in natural gas wells is proposed to better adapt to the dynamically changing downhole environment and improve decision-making accuracy.
[0161] Original formula:
[0162] Q(s,a)←Q(s,a)+α[r+γE a′~π [Q(s′,a′)]-Q(s,a)]
[0163] Improved formula:
[0164] Q(s,a)←Q(s,a)+α[r+γ((1-β)max a Q(s′,a′)+βE a′~π [Q(s′,a′)])+ηU(s,a)-Q(s,a)]
[0165] in,
[0166] r stands for instant reward;
[0167] α is the learning rate, which controls the degree of influence of new information;
[0168] γ is a discount factor used to reduce the importance of future rewards;
[0169] β is a mixing factor (0≤β≤1) used to balance the contribution between the maximum value and the expected value;
[0170] η is the uncertainty weight, used to estimate the environmental uncertainty.
[0171] U(s,a) is an uncertainty estimate of the state-action pair (s,a), which is evaluated using a Bayesian inference framework;
[0172] E a′~π [Q(s′,a′)] is the expected future reward obtained by choosing action a′ according to strategy π starting from the new state.
[0173] In this embodiment of the invention, the purpose of introducing a mixing factor ≤ is to balance the contribution between the maximum value and the expected value, so that the model can find a better balance between exploration and exploitation. The purpose of introducing uncertainty estimation U(s,a) is to consider the uncertainty and risk of the environment, so that the model can more accurately assess the safety and reliability of different paths. The purpose of comprehensively considering immediate rewards, future reward predictions and environmental uncertainties is to improve the comprehensiveness and accuracy of Q-value updates, so that the model can make optimal decisions in the long run.
[0174] The embodiments of this invention improve the accuracy and adaptability of Q-value updates: by introducing a mixing factor ≤ and an uncertainty estimate U(s,a), the model can more comprehensively consider the balance between short-term and long-term benefits, enhancing its decision-making ability in complex and volatile environments; it reduces overestimation problems, as the introduction of the mixing factor β allows the model to find a better balance between the maximum value and the expected value, reducing overestimation problems caused by excessive reliance on the maximum value; and it enhances safety, as the introduction of the uncertainty estimate U(s,a) allows the model to better assess the safety and reliability of different paths, reducing potential risks and ensuring operational safety. Through the above processing, the improved Bellman equation not only improves the accuracy and adaptability of Q-value updates but also enhances the model's decision-making ability and safety in complex and volatile downhole environments, providing more intelligent autonomous hovering guidance for the plunger, and significantly improving the safety and economic benefits of natural gas extraction.
[0175] In another embodiment of the present invention, a plunger autonomous hovering control system based on an expansion mechanism in a natural gas well is also provided, such as... Figure 2 As shown, it includes:
[0176] Monitoring module 01 is used to monitor the pressure distribution changes on the well wall in real time using a multi-point pressure sensor array set on the plunger, obtain monitoring data, and transmit the monitoring data to the edge computing node. Based on the adaptive filtering algorithm and machine learning model, the monitoring data is noise-removed and the optimal expansion degree is predicted to obtain an expansion diameter adjustment scheme.
[0177] Module 02 is used to adjust the expansion mechanism of the plunger according to the expansion diameter adjustment scheme. At the same time, it combines the hydrodynamic parameters of the plunger's location and the high-resolution environmental information provided by the downhole distributed fiber optic sensor network to construct a three-dimensional flow field simulation environment. The particle swarm optimization algorithm and Kalman filter are used to model and predict the current motion state of the plunger in the three-dimensional flow field simulation environment, and generate the plunger position prediction result and the resistance situation of the plunger.
[0178] The determination module 03 is used to determine the optimal hovering strategy based on the plunger position prediction results and resistance conditions, combined with the complex distance and path relationship between multiple pre-set hovering target points in the well. It adopts a deep Q-network intelligent decision-making mechanism under the reinforcement learning framework combined with the risk assessment module to determine the optimal hovering strategy that minimizes energy consumption, time cost, safety and reliability. The hovering strategy specifically defines the sequence of actions required when approaching each hovering point.
[0179] Control module 04 is used to activate a fine adjustment mode based on feedback control theory when the plunger approaches the selected hovering point according to the above hovering strategy. According to the piezoelectric material actuator integrated in the expansion mechanism, the pressure difference on both sides of the plunger is precisely deformed at the micron level according to the action sequence specified in the hovering strategy. The sliding film variable structure control algorithm is applied to stabilize the plunger hovering.
[0180] Deployment module 05 is used to deploy an online monitoring system driven by Bayesian inference after the plunger successfully hovers. Based on the continuous analysis of the changing trends of surrounding environmental parameters, if factors that may cause the plunger to deviate from the hovering position are detected, the preset emergency response plan is immediately activated to recalculate the optimal hovering position or perform an emergency avoidance maneuver to maintain the stability of the plunger at the target hovering point.
[0181] In one possible design, Figure 2 The illustrated embodiment of a natural gas well-based plunger autonomous hovering control system using an expansion mechanism can be implemented as a computing device, such as... Figure 3 As shown, the computing device may include a storage component 31 and a processing component 32;
[0182] The storage component 31 stores one or more computer instructions, wherein the one or more computer instructions are invoked and executed by the processing component 32.
[0183] The processing component 32 is used for the above Figure 1 The embodiment describes a method for autonomous plunger hovering control based on an expansion mechanism in a natural gas well.
[0184] The processing component 32 may include one or more processors to execute computer instructions to complete all or part of the steps in the above-described method. Alternatively, the processing component may be implemented as one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the above-described method.
[0185] Storage component 31 is configured to store various types of data to support operations at the terminal. The storage component can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.
[0186] Of course, computing devices may also include other components, such as input / output interfaces, display components, communication components, etc.
[0187] Input / output interfaces provide interfaces between processing components and peripheral interface modules, which can be output devices, input devices, etc.
[0188] The communication components are configured to facilitate wired or wireless communication between computing devices and other devices.
[0189] The computing device can be a physical device or an elastic computing host provided by a cloud computing platform. In this case, the computing device can refer to a cloud server, and the aforementioned processing components, storage components, etc., can be basic server resources rented or purchased from the cloud computing platform.
[0190] This invention also provides a computer storage medium storing a computer program, which, when executed by a computer, can perform the above-described functions. Figure 1 The embodiment shown is a plunger autonomous hovering control method based on an expansion mechanism in a natural gas well.
[0191] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0192] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0193] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0194] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.
Claims
1. A method for autonomous hover control of an expansion mechanism based plunger in a natural gas well, characterized by, include: A multi-point pressure sensor array mounted on the plunger is used to monitor the pressure distribution changes on the wellbore in real time, obtain monitoring data, and transmit the monitoring data to an edge computing node. Based on an adaptive filtering algorithm and a machine learning model, the monitoring data is noise-removed and the optimal expansion degree is predicted to obtain an expansion diameter adjustment scheme. The expansion mechanism of the plunger is adjusted accordingly based on the expansion diameter adjustment scheme. At the same time, a three-dimensional flow field simulation environment is constructed by combining the hydrodynamic parameters of the plunger's location and the high-resolution environmental information provided by the downhole distributed fiber optic sensor network. The current motion state of the plunger in the three-dimensional flow field simulation environment is modeled and predicted using the particle swarm optimization algorithm and Kalman filter, and the plunger position prediction result and the resistance situation of the plunger are generated. Based on the plunger position prediction results and resistance conditions, combined with the complex distance and path relationship between multiple pre-set hovering target points in the well, a deep Q-network intelligent decision-making mechanism under the reinforcement learning framework is adopted in conjunction with a risk assessment module to determine the optimal hovering strategy that minimizes energy consumption, time cost, safety and reliability. The hovering strategy specifically defines the sequence of actions required when approaching each hovering point. When the plunger approaches the selected hovering point according to the above hovering strategy, a fine adjustment mode based on feedback control theory is activated. According to the piezoelectric material actuator integrated in the expansion mechanism, the pressure difference on both sides of the plunger is precisely deformed at the micrometer level according to the action sequence specified in the hovering strategy. The sliding film variable structure control algorithm is applied to stabilize the plunger in hovering. After the plunger successfully hovers, an online monitoring system driven by Bayesian inference is deployed. Based on the continuous analysis of the changing trends of surrounding environmental parameters by the system, if factors that may cause the plunger to deviate from the hovering position are detected, the preset emergency response plan is immediately activated to recalculate the optimal hovering position or perform an emergency avoidance maneuver to maintain the stability of the plunger at the target hovering point.
2. The method of claim 1, wherein, Based on the plunger position prediction results and resistance conditions, and considering the complex distances and path relationships between multiple pre-set hovering target points within the well, a deep Q-network intelligent decision-making mechanism within a reinforcement learning framework, combined with a risk assessment module, is employed to determine the optimal hovering strategy that minimizes energy consumption, time cost, and safety and reliability. This hovering strategy specifically defines the sequence of actions required when approaching each hovering point, including: Based on the plunger position prediction results and the resistance encountered by the plunger, combined with the complex distance and path relationship between multiple pre-set hovering target points in the well, high-resolution environmental information provided by the downhole distributed fiber optic sensor network is integrated to form a comprehensive decision dataset. Using a comprehensive decision dataset, the intelligent decision-making mechanism of the deep Q network is trained to minimize energy consumption and time cost as conditions for the reward function, and a risk assessment module is introduced to quantify the safety and reliability of different paths. Through continuous simulation and optimization, the intelligent decision-making model is trained to obtain an optimized intelligent decision-making model that selects the optimal action in a given state. Based on this optimized intelligent decision-making model, each state is analyzed and processed to generate specific operation instructions, which constitute the action sequence of the corresponding state. Each action sequence is designed to guide the plunger to the next hovering point in the most efficient and safe way. The operating instructions include adjusting the diameter of the expansion mechanism, changing the plunger's speed and direction.
3. The method of claim 1, wherein, Using a comprehensive decision dataset, a deep Q-network intelligent decision-making mechanism is trained to minimize energy consumption and time cost as part of the reward function. A risk assessment module is also introduced to quantify the safety and reliability of different paths, including: Based on the plunger position prediction results and resistance conditions, combined with the complex distance and path relationships between multiple pre-set hovering target points in the well, and the high-resolution environmental information provided by the downhole distributed fiber optic sensor network, a comprehensive decision dataset is formed. A reward function is constructed based on this comprehensive decision dataset, in which energy consumption and time cost minimization are taken as the core optimization objectives. At the same time, a risk assessment module is introduced to quantify the safety and reliability of different paths. Using a comprehensive decision dataset, the deep Q-network model is initialized. The input layer is defined to receive the plunger state, the output layer generates the predicted action value, and the hidden layer structure is set to capture complex nonlinear relationships. The parameter weights are initialized to provide a starting point for the learning process. Based on the initialized deep Q-network model, the model is trained in a simulated environment using a comprehensive decision dataset. In each iteration, an action is selected based on the current state, and the immediate reward and the next state brought about by the action are observed. The Q-value table is updated or the experience replay buffer is used to store past experiences, thus obtaining the pre-trained intelligent decision model. The intelligent decision-making model after initial training is optimized by applying reinforcement learning algorithms. The Q-value is updated based on the actual rewards obtained, and the model parameters are adjusted to determine the action sequence that maximizes the cumulative reward. This generates an optimized intelligent decision-making model. During training, if a certain path is considered to have a high risk, the expected reward of the action corresponding to that path is reduced to encourage the model to choose a safer path and optimize the intelligent decision-making model. When the model converges, the optimized intelligent decision-making model is obtained. This model can select the optimal action in a given state, that is, the action sequence required to approach each hovering point. Based on the optimized intelligent decision-making model, each preset predicted state is analyzed and processed to generate specific operation instructions. These operation instructions constitute the action sequence for the corresponding state, which guides the plunger to reach the next hovering point in the most efficient and safe manner.
4. The method of claim 3, wherein, The intelligent decision-making model after initial training is optimized using reinforcement learning algorithms. The Q-value is updated based on the actual rewards obtained, and model parameters are adjusted to determine the action sequence that maximizes the cumulative reward, generating an optimized intelligent decision-making model. During training, if a path is considered high-risk, the expected reward for the corresponding action is reduced to encourage the model to choose safer paths, thus optimizing the intelligent decision-making model. When the model converges, the optimized intelligent decision-making model is obtained. This model can select the optimal action in a given state, i.e., the action sequence required to approach each hovering point, including: Using the pre-trained intelligent decision-making model, reinforcement learning algorithms are applied for optimization to obtain a deep Q-network model. Based on the initialized deep Q-network model, the pre-trained intelligent decision-making model is obtained by training on a comprehensive decision dataset in a simulated environment. Reinforcement learning algorithms are applied to further optimize the model to obtain an optimized intelligent decision-making model. The optimization process uses a target network to stabilize the learning process, and dual DQN reduces the overestimation problem. Based on the actual rewards obtained, the Q-value of the optimized intelligent decision-making model is updated and the parameters are adjusted to generate an optimized intelligent decision-making model. Based on the actual immediate rewards obtained after the plunger performs the action, the Q-value of the corresponding state-action pair is updated using the Bellman equation. At the same time, the weight parameters of the neural network are adjusted through the backpropagation algorithm, so that the intelligent decision-making model can gradually learn to select action sequences that can maximize the cumulative rewards in the long run, thereby generating an optimized intelligent decision-making model. Based on the optimized intelligent decision-making model, the action sequence that maximizes the cumulative reward is determined. During the training process, the target action sequence is learned based on the optimized intelligent decision-making model. The target action can determine the maximum cumulative reward in different states, that is, the optimal action sequence to be taken when approaching each hovering point. The risk assessment module is invoked in real time to quantitatively evaluate the safety and reliability of the selected path. If a path is deemed to have a high risk, the expected reward for the action corresponding to that path is reduced, prompting the model to favor choosing a safer path. This feedback mechanism is directly integrated into the reward function, ensuring that the final intelligent decision-making model is not only efficient but also safe and reliable. When the training process reaches model convergence, the final optimized intelligent decision-making model is obtained. Based on the final optimized intelligent decision-making model, each preset predicted state is analyzed and processed to generate specific operation instructions. These operation instructions constitute the action sequence for the corresponding state, which guides the plunger to reach the next hovering point in the most efficient and safe manner.
5. The method of claim 4, wherein, Based on the actual rewards obtained, the intelligent decision-making model under optimization is updated with Q-values and its parameters are adjusted to generate an optimized intelligent decision-making model. The Bellman equation is used to update the Q-values of the corresponding state-action pairs based on the immediate rewards obtained after the plunger performs its action. Simultaneously, the weight parameters of the neural network are adjusted using the backpropagation algorithm, enabling the intelligent decision-making model to gradually learn to select action sequences that maximize cumulative rewards in the long run. This results in an optimized intelligent decision-making model, including: Based on the immediate reward actually obtained after the plunger performs the action, update the Q value of the corresponding state-action pair using the Bellman equation; Based on the updated Q value, the weight parameters of the neural network are adjusted through the backpropagation algorithm to obtain the intelligent decision model with adjusted parameters. By updating the Q-value and adjusting the parameters, the intelligent decision-making model determines the maximum cumulative reward of the target action sequence under different states, that is, the optimal action sequence to be taken when approaching each hovering point. Finally, when the training reaches model convergence, the final optimized intelligent decision-making model is obtained.
6. The method of claim 5, wherein, Based on the immediate reward actually obtained after the plunger performs the action, the Q value of the corresponding state-action pair is updated using the Bellman equation, including: By utilizing the immediate reward actually obtained after the plunger performs the action, combined with future reward prediction and environmental uncertainty estimation, the Q value of the corresponding state-action pair is updated using the improved Bellman equation to obtain the updated Q value. The improved Bellman equation includes the introduction of a mixing factor, uncertainty estimation, and comprehensive consideration of immediate reward, future reward prediction, and environmental uncertainty. A dual Q-learning mechanism and multi-step rewards are introduced to further optimize the Q-value update process.
7. A self-hovering control system for a plunger based on an expansion mechanism in a natural gas well, characterized by, include: The monitoring module is used to monitor the pressure distribution changes on the well wall in real time using a multi-point pressure sensor array set on the plunger, obtain monitoring data, and transmit the monitoring data to the edge computing node. Based on the adaptive filtering algorithm and machine learning model, the monitoring data is noise-removed and the optimal expansion degree is predicted to obtain the expansion diameter adjustment scheme. The module is used to adjust the expansion mechanism of the plunger according to the expansion diameter adjustment scheme. At the same time, it combines the hydrodynamic parameters of the plunger's location and the high-resolution environmental information provided by the downhole distributed fiber optic sensor network to construct a three-dimensional flow field simulation environment. The particle swarm optimization algorithm and Kalman filter are used to model and predict the current motion state of the plunger in the three-dimensional flow field simulation environment, and generate the plunger position prediction result and the resistance situation of the plunger. The determination module is used to determine the optimal hovering strategy based on the plunger position prediction results and resistance conditions, combined with the complex distance and path relationship between multiple pre-set hovering target points in the well. It adopts a deep Q-network intelligent decision-making mechanism under the reinforcement learning framework and combines it with the risk assessment module to determine the optimal hovering strategy that minimizes energy consumption, time cost, safety and reliability. The hovering strategy specifically defines the sequence of actions to be taken when approaching each hovering point. The control module is used to activate a fine adjustment mode based on feedback control theory when the plunger approaches the selected hovering point according to the above hovering strategy. According to the piezoelectric material actuator integrated in the expansion mechanism, the pressure difference on both sides of the plunger is precisely controlled at the micron level according to the action sequence specified in the hovering strategy. The sliding film variable structure control algorithm is applied to stabilize the plunger hovering. The deployment module is used to deploy an online monitoring system driven by Bayesian inference after the plunger successfully hovers. Based on the continuous analysis of the changing trends of surrounding environmental parameters, if factors that may cause the plunger to deviate from the hovering position are detected, the preset emergency response plan is immediately activated to recalculate the optimal hovering position or perform an emergency avoidance maneuver to maintain the stability of the plunger at the target hovering point.
8. A computing device, comprising: It includes a processing component and a storage component; the storage component stores one or more computer instructions; the one or more computer instructions are invoked and executed by the processing component to implement the plunger autonomous hovering control method based on the expansion mechanism in a natural gas well as described in any one of claims 1 to 6.
9. A computer storage medium, characterized in that The system contains a computer program that, when executed by a computer, implements a plunger autonomous hovering control method based on an expansion mechanism in a natural gas well, as described in any one of claims 1 to 6.