A management method of a radio resource control layer in a 5G mobile communication network
By dynamically adjusting the timing threshold of the RRC inactive timer using a deep Q network model in 5G mobile communication networks, the flexibility problem of the RRC connection management mechanism in large-scale random access scenarios is solved, achieving a balance between energy efficiency and signaling overhead, and improving the flexibility and accuracy of terminal state transition management.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SUN YAT SEN UNIV
- Filing Date
- 2026-03-20
- Publication Date
- 2026-06-19
AI Technical Summary
In existing 5G mobile communication networks, the RRC connection management mechanism has poor timing flexibility in large-scale random access scenarios, making it difficult to balance access network energy efficiency and signaling overhead, resulting in low utilization of radio resources and a surge in terminal energy consumption or a significant increase in signaling load.
The network data analysis network element collects terminal session state data and network energy efficiency data. The deep Q network model is used to select the optimal adjustment action of the RRC inactive timer, generate analysis information and send it to the access and mobility management network element, and dynamically adjust the timing threshold of the RRC inactive timer to manage the RRC connection state.
It improves the energy efficiency of the access network under different network load conditions, avoids air interface signaling overhead and service latency caused by frequent terminal reconnection, and enhances the flexibility and accuracy of terminal state migration management.
Smart Images

Figure CN122248568A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the technical field of 5G wireless communication, and more specifically, to a management method for the radio resource control layer in a 5G mobile communication network. Background Technology
[0002] In existing 5G mobile communication networks, the RRC layer is the core for managing the connection between the terminal and the radio access network. The 5G NR protocol defines three RRC states: idle, inactive, and connected. To balance terminal power consumption and network resource usage, 5G mobile communication networks typically employ an inactive timer mechanism to manage state transitions. That is, when a terminal in the connected state does not transmit data within the timer's set duration, the network releases the RRC connection, controlling the terminal to migrate to the inactive or idle state.
[0003] However, existing RRC connection management mechanisms primarily rely on static parameter configurations, meaning the duration of inactive timers is typically fixed and preset. In application scenarios such as the Internet of Things (IoT) with large-scale random access, this static configuration method reveals significant technical shortcomings. First, fixed timer parameters struggle to adapt to dynamically changing network environments and diverse service traffic patterns, lacking flexibility and adaptability. Second, inappropriate timer settings can lead to severe performance trade-offs: if the timer is set too long, the terminal will occupy connection resources ineffectively for extended periods, resulting in low wireless resource utilization and a surge in terminal power consumption; if it is set too short, it will trigger frequent state switching, causing a significant increase in signaling load, which not only increases network overhead but may also affect access performance. Therefore, existing technologies struggle to simultaneously meet the requirements of high energy efficiency and network stability in large-scale access scenarios. Summary of the Invention
[0004] To address the issue that existing 5G RRC layer connection management methods suffer from poor timing flexibility and difficulty in balancing access network energy efficiency and signaling overhead in large-scale random access scenarios, this invention proposes a management method for the radio resource control layer in 5G mobile communication networks. This method improves the flexibility of timing, enhances the adaptability of 5G RRC layer connection management methods to dynamic network environments, and achieves a balance between access network energy efficiency and signaling overhead.
[0005] To achieve the above-mentioned technical effects, the technical solution of the present invention is as follows: In a first aspect, this application proposes a management method for the Radio Resource Control (RRC) layer in a 5G mobile communication network. The 5G mobile communication network includes a core network, a Radio Access Network (RAN), and transmission equipment. The core network is connected to the RAN through the transmission equipment. The core network includes a Network Data Analysis Element (NWDAF), an Access and Mobility Management Element (AMF), and a Session Management Element (SMF). The RAN manages terminal equipment via the RRC layer. The connection management method for the RRC layer includes the following steps: S1. Use network data analysis network elements to collect terminal session status data and network energy efficiency data and perform preprocessing; S2. Based on the preprocessed terminal session state data and network energy efficiency data, the optimal adjustment action for the timing threshold of the inactive timer of the radio resource control layer is selected using the deep Q network model that has been pre-trained and configured in the network data analysis network element. S3. Based on the optimal adjustment action for the inactive timer of the Radio Resource Control (RRC) layer, generate analysis information containing the RRC inactive timer adjustment action, and send the analysis information to the access and mobility management (MLM) network element. The MLM network element then uses the analysis information to generate RRC auxiliary information and distribute it to the radio access network. S4. Utilize the radio access network to receive radio resource control layer auxiliary information, adjust the timing threshold of the radio resource control layer inactive timer based on the received radio resource control layer auxiliary information, and manage the radio resource control layer connection status based on the adjusted timing threshold of the radio resource control layer inactive timer.
[0006] In this technical solution, firstly, network data analysis network elements are used to collect terminal session state data and network energy efficiency data and perform preprocessing. Based on the preprocessed terminal session state data and network energy efficiency data, a deep Q network model is used to select the optimal adjustment action for the timing threshold of the RRC inactive timer, and analysis information containing this adjustment action is generated. This information is then converted into RRC auxiliary information and sent to the radio access network via access and mobility management network elements. This dynamically adjusts the timing threshold of the RRC inactive timer and manages the RRC connection state. This solution improves the energy efficiency of the access network under different network load conditions while avoiding air interface signaling overhead and service latency caused by frequent terminal reconnection, and enhances the flexibility and accuracy of terminal state migration management.
[0007] Preferably, the terminal session state data includes the number of terminals in a suspended state. Number of terminals in transmission state and the number of terminals waiting in the queue for transmission ; The process of the network data analysis network element collecting terminal session status data includes: The network data analysis element sends a subscription request to the session management element; The session management network element receives subscription requests and reports the number of terminals in a suspended state to the network data analysis network element. Number of terminals in transmission state and the number of terminals waiting in the queue for transmission .
[0008] Preferably, the network energy efficiency data includes the current time slot. The timing threshold of the RRC inactive timer and the current time slot Energy efficiency ; The process of the network data analysis network element collecting terminal session status data includes: The network data analysis network element sends a subscription request to the access and mobility management network element; The access and mobility management network element receives subscription requests and obtains the current time slot from the radio access network. The timing threshold and current time slot of the RRC inactive timer The energy efficiency is measured and fed back to the network data analysis network element via notification messages.
[0009] Preferably, the preprocessing of terminal session state data and network energy efficiency data includes: The collected terminal session state data and the network energy efficiency data are mapped into a state vector. The expression is:
[0010] in, The state vector represents the number of terminals queuing for transmission. Indicates time slot The state vector under the given conditions.
[0011] Preferably, the deep Q-network model generates an optimal adjustment strategy for the timing threshold of the RRC inactive timer, including: S21. Load the pre-trained target Q-network parameters from the deep Q-network model; S22. Order Represents the current time slot, and retrieves the state vector under the current time slot. Based on the pre-trained Q-network parameters, the Q-values of all adjustment actions for the timing threshold of the RRC inactive timer are output, and the adjustment actions... This includes: increasing the timing threshold of the RRC inactive timer by 1 time slot, decreasing it by 1 time slot, keeping it unchanged, and decreasing it by 2 time slots; S23. From all the Q values of the adjustment actions for the timing threshold of the RRC inactive timer, select the adjustment action with the largest Q value as the optimal adjustment action for the timing threshold of the RRC inactive timer.
[0012] Preferably, the process of training the deep Q-network model includes: S211. Initialize the deep Q-network model, construct the current Q-network and the target Q-network with the same structure, and randomly initialize the first weight parameters of the current Q-network and the second weight parameters of the target Q-network; S212. Order Represents time slot variables, Execute S213; S213. Obtain Time Slot Variables state vector Based on the first weight parameter of the current Q network, output all adjustment actions for the timing threshold of the RRC inactive timer. The predicted Q value; S214. Based on all adjustment actions The predicted Q value is obtained by utilizing the dynamically decaying Q value with the number of iterations. Strategy selection and adjustment actions And execute; S215. Obtain and execute adjustment actions The state vector of the next time slot And calculate the instant reward value according to the preset reward function. The expression for the preset reward function is:
[0013] in, and This represents the preset constant parameter. This indicates the current energy efficiency of the wireless access network. This indicates the change in the energy efficiency of the wireless access network before and after the adjustment action was performed; S216. Set the current time slot variable state vector Adjusting movements The state vector of the next time slot and reward function As an interactive sample, it is stored in a pre-defined experience replay buffer; S217. Determine whether the number of interactive samples in the experience replay buffer has reached a preset threshold. If not, then let... Increment the value by 1 and return S213; If so, a batch of samples are randomly drawn from the experience replay buffer, and the target Q value based on the time difference is calculated in combination with the target network. A loss function is constructed based on the target Q value and the predicted Q value. The first weight parameter of the current Q network is updated using the gradient descent method, and S218 is executed. S218. Determine the current time slot variable. If the target network update cycle has been reached as an integer multiple of the preset target network update cycle, and if so, the first weight parameter of the current Q network is synchronized to the second weight parameter of the target Q network, and S219 is executed; otherwise, let... Increment the value by 1 and return S213; S219. Determine the current time slot variable. Has the preset threshold for time slot variables been reached? If yes, then training is complete; otherwise, let... Increment the value by 1 and return S213.
[0014] Preferably, the network data analysis element generates analysis information containing the RRC inactive timer adjustment actions based on the optimal adjustment actions for the RRC inactive timer, and further includes: The service-oriented interface of the network data analysis network element transmits analysis information, including RRC inactive timer adjustment actions, to the access and mobility management network element. After receiving the analysis information, the access and mobility management network element encapsulates the analysis information containing the RRC inactive timer adjustment action into RRC auxiliary information and sends it to the radio access network.
[0015] Preferably, the management of RRC connection status based on the timing threshold of the adjusted RRC inactive timer includes: Based on the updated RRC inactive timer threshold, the data packet transmission of the target terminal is monitored. If no data packet transmission of the target terminal is detected within the time limit specified by the updated threshold, the radio access network actively triggers the RRC Release procedure and sends a release signaling carrying the suspension configuration to the target terminal, instructing the target terminal to migrate from the RRC connected state to the RRC inactive state or the RRC idle state. Otherwise, reset the RRC inactive timer and keep the target terminal in RRC connected state.
[0016] Secondly, this application also proposes a computer device, which includes a memory, a processor, and a computer program stored in the memory that can be run by the processor. The processor executes the computer program to implement the management method of the radio resource control layer in the 5G mobile communication network.
[0017] Thirdly, this application also proposes a computer storage medium storing a computer program, the computer program including program instructions, which, when executed by a computer, cause the computer to perform a management method for the radio resource control layer in a 5G mobile communication network.
[0018] Compared with the prior art, the beneficial effects of the present invention are: This invention proposes a management method for the radio resource control layer in a 5G mobile communication network. First, network data analysis elements collect and preprocess terminal session state data and network energy efficiency data. Based on the preprocessed terminal session state data and network energy efficiency data, a deep Q-network model is used to select the optimal adjustment action for the timing threshold of the RRC inactive timer, and analysis information containing this adjustment action is generated. This information is then converted into RRC auxiliary information and sent to the radio access network via access and mobility management elements. This dynamically adjusts the timing threshold of the RRC inactive timer and manages the RRC connection state. This solution improves the energy efficiency of the access network under different network load conditions while avoiding air interface signaling overhead and service latency caused by frequent terminal reconnections, enhancing the flexibility and accuracy of terminal state migration management. Attached Figure Description
[0019] Figure 1 A flowchart illustrating the management method of the radio resource control layer in a 5G mobile communication network proposed in Embodiment 1 of the present invention; Figure 2 A schematic diagram illustrating the transition of the RRC state and its sub-states of the terminal proposed in Embodiment 2 of the present invention; Figure 3 This is a schematic diagram illustrating the overall communication architecture between the network data analysis network element and the 5G core network proposed in Embodiment 2 of the present invention. Figure 4 This is a schematic diagram illustrating the process of training the deep Q-network model proposed in Embodiment 2 of the present invention; Figure 5 This is a schematic diagram illustrating the overall communication architecture between the network data analysis network element and the 5G core network based on the OAI open-source 5G platform proposed in Embodiment 3 of the present invention. Figure 6 This is a schematic diagram of the structure of the computer device proposed in Embodiment 5 of the present invention. Detailed Implementation
[0020] The accompanying drawings are for illustrative purposes only and should not be construed as limiting the scope of this patent. To better illustrate this embodiment, some parts of the accompanying drawings may be omitted, enlarged, or reduced, and do not represent the actual dimensions; It is understandable to those skilled in the art that some well-known details may be omitted from the accompanying drawings.
[0021] The technical solution of the present invention will be further described below with reference to the accompanying drawings and embodiments.
[0022] The positional relationships depicted in the accompanying drawings are for illustrative purposes only and should not be construed as limiting this patent. Example 1 This embodiment proposes a management method for the Radio Resource Control (RRC) layer in a 5G mobile communication network. The 5G mobile communication network includes a core network, a Radio Access Network (RAN), and transmission equipment. The core network is connected to the RAN through the transmission equipment. The core network includes a Network Data Analysis Element (NWDAF), an Access and Mobility Management Element (AMF), and a Session Management Element (SMF). The RAN connects to and manages terminal equipment through the RRC layer. The flowchart of the management method for the RRC layer in the 5G mobile communication network is described below. Figure 1 This includes the following steps: S1. Use network data analysis network elements to collect terminal session status data and network energy efficiency data and perform preprocessing; S2. Based on the preprocessed terminal session state data and network energy efficiency data, the optimal adjustment action for the timing threshold of the inactive timer of the radio resource control layer is selected using the deep Q network model that has been pre-trained and configured in the network data analysis network element. S3. Based on the optimal adjustment action for the inactive timer of the Radio Resource Control (RRC) layer, generate analysis information containing the RRC inactive timer adjustment action, and send the analysis information to the access and mobility management (MLM) network element. The MLM network element then uses the analysis information to generate RRC auxiliary information and distribute it to the radio access network. S4. Utilize the radio access network to receive radio resource control layer auxiliary information, adjust the timing threshold of the radio resource control layer inactive timer based on the received radio resource control layer auxiliary information, and manage the radio resource control layer connection status based on the adjusted timing threshold of the radio resource control layer inactive timer.
[0023] In this embodiment, firstly, network data analysis network elements are used to collect terminal session state data and network energy efficiency data and perform preprocessing. Based on the preprocessed terminal session state data and network energy efficiency data, the optimal adjustment action for the timing threshold of the RRC inactive timer is selected using a deep Q network model, and analysis information containing this adjustment action is generated. This information is then converted into RRC auxiliary information and sent to the radio access network via access and mobility management network elements. This dynamically adjusts the timing threshold of the RRC inactive timer and manages the RRC connection state. This solution improves the energy efficiency of the access network under different network load conditions while avoiding air interface signaling overhead and service latency caused by frequent terminal reconnection, thus enhancing the flexibility and accuracy of terminal state migration management.
[0024] Example 2 In this embodiment, the terminal session state data includes the number of terminals in a suspended state. Number of terminals in transmission state and the number of terminals waiting in the queue for transmission ; The process of the network data analysis network element collecting terminal session status data includes: The network data analysis element sends a subscription request to the session management element; The session management network element receives subscription requests and reports the number of terminals in a suspended state to the network data analysis network element. Number of terminals in transmission state and the number of terminals waiting in the queue for transmission .
[0025] Specifically, to achieve accurate collection of terminal session state data, this embodiment relies on the collaborative efforts of relevant network elements and service interfaces within the 5G service-oriented architecture. The NWDAF (Network Data Analytics Function) is a standard entity in the 5G core network responsible for collecting and analyzing network data and providing prediction results. In this solution, the NWDAF serves as the deployment carrier and control center for the DQN (Deep Q Network) model, actively collecting multi-dimensional network operation data and performing model inference. The SMF (Session Management Function) is a core network entity responsible for terminal PDU session management (including session establishment, modification, and release) and controlling the routing of user plane function protocol data units; it monitors and maintains the service data flow status of each terminal.
[0026] Specifically, the data collection process is as follows: The NWDAF sends a subscription request for a specific terminal session event to the SMF by calling the SMF's Nsmf service interface; the Nsmf service interface is the standard communication interface exposed by the SMF under the 5G service architecture. After receiving the subscription request, the SMF counts the active connections of the terminal at the session level and reports the number of terminals in a suspended state to the NWDAF. Number of terminals in transmission state and the number of terminals waiting in the queue for transmission .
[0027] The number of terminals in the suspended state This refers to the total number of terminals whose RRC connections have not been released due to a period of no data transmission, and whose terminal context configuration information is still retained on both the access network and core network sides. These terminals can quickly resume data transmission with relatively low signaling overhead.
[0028] The number of terminals in the transmission state This refers to the total number of terminals that are in RRC connected state and are actually sending and receiving uplink or downlink data packets while substantially occupying radio air interface resources.
[0029] Specifically, the transition diagram of the terminal RRC state and sub-states is as follows: Figure 2 As shown, see Figure 2 After the terminal UE enters the RRC CONNECTED state, it is initially in the State Suspend state. When the terminal UE initiates a data transmission or reception request, it transitions to the State Transmission state. Both the Suspend and Transmission states belong to the RRC CONNECTED state in the RRC layer. When the data transmission or reception ends, the terminal UE returns to the State Suspend state. If the terminal UE continuously... If there is no data transmission or reception requirement in a time slot, the inactive timer of the RRC layer will time out, and the terminal UE will exit the RRC CONNECTED state.
[0030] The number of terminals waiting in the queue for transmission This refers to the total number of terminals that have the capability to send and receive business data, but whose data packets are temporarily backed up in the buffer queue waiting for transmission. Specifically, in this embodiment, since this parameter is counted and reported by the SMF on the core network side, the number of terminals queuing for transmission is specifically expressed as: the number of terminals corresponding to downlink data packets queuing for delivery at the user plane network element entity of the core network due to the terminal being temporarily in an unreachable state or limited radio resources.
[0031] Through the collaborative interaction and accurate statistics across network elements, NWDAF can perceive the load distribution and state migration trends of various service terminals in the network in real time, thereby providing a reliable data input basis for the intelligent decision-making of the subsequent DQN model.
[0032] In this embodiment, the network energy efficiency data includes the current time slot. The timing threshold of the RRC inactive timer and the current time slot Energy efficiency ; The process of the network data analysis network element collecting terminal session status data includes: The network data analysis network element sends a subscription request to the access and mobility management network element; The access and mobility management network element receives subscription requests and obtains the current time slot from the radio access network. The timing threshold and current time slot of the RRC inactive timer The energy efficiency is measured and fed back to the network data analysis network element via notification messages.
[0033] Specifically, the overall communication architecture diagram of the NWDAF and the 5G core network is as follows: Figure 3 As shown, see Figure 3 The NWDAF, deployed with a deep Q network model, accesses the system control plane bus through service-oriented interfaces such as Nnwdaf. When it needs to collect data from the underlying access network, the NWDAF, through the control plane bus shown in the diagram, calls the Namf service-oriented interface exposed by the AMF (Access and Mobility Management Function) to send a data subscription request downwards. The AMF establishes a direct physical and logical connection with the RAN through the N2 interface. After receiving the subscription request from the NWDAF, the AMF formally initiates the process of obtaining hardware operating parameters from the underlying access network RAN through the N2 interface.
[0034] Specifically, after acquiring terminal session state data, in order to achieve cross-layer collection of network energy efficiency data, this embodiment further obtains underlying operating parameters through the signaling interaction mechanism between the 5G core network and the RAN. The process of NWDAF collecting network energy efficiency data includes: the NWDAF sending a subscription request to the AMF through the Namf service interface. Here, the AMF is an entity in the 5G core network responsible for terminal access control, mobility management, and security context management. As the control plane anchor point between the core network and the RAN, it is responsible for terminating control plane signaling from the RAN. The Namf service interface is the standard communication interface exposed by the AMF under the 5G service architecture. In this embodiment, the NWDAF initiates a subscription request for network energy efficiency and configuration parameters by calling the Namf service interface, thereby enabling the core network analysis entity to obtain underlying access network data on demand.
[0035] After receiving a subscription request, the AMF obtains the timing threshold of the inactive RRC timer for the current time slot and the energy efficiency of the current time slot from the RAN through the N2 interface, and feeds it back to the NWDAF through a notification message. In this process, the N2 interface serves as the control plane logical interface between the AMF and the RAN (Radio Access Network), used to issue configuration query requests and receive the underlying operating parameters reported by the RAN.
[0036] Specifically, the underlying operating parameters obtained by the AMF from the RAN include the following two core network energy efficiency data: First, the timing threshold of the RRC inactive timer. This threshold refers to the upper limit of a time parameter maintained by the RAN at the Radio Resource Control (RRC) or Media Access Control (MAC) layer to monitor the inactivity duration of terminals. When the RAN detects that a target terminal has not transmitted uplink or downlink data for a duration reaching this timing threshold, the RAN will proactively trigger a state release or suspension procedure (such as sending an RRC Release signaling message), instructing the terminal to migrate from the RRC connected state to the inactive or idle state. Second, the energy efficiency of the current time slot. This parameter refers to the ratio between the RAN's ability to achieve effective service data transmission within a preset current time slot and the physical power consumed. Specifically, it is the total amount of data successfully transmitted by the RAN within the current time slot divided by the total energy consumed by the RAN within that time slot. This energy efficiency parameter will serve as the core feedback indicator for subsequently evaluating the performance of the DQN model in adjusting the RRC inactive timer.
[0037] Through the cross-layer signaling interaction based on the AMF and N2 interfaces, NWDAF successfully obtained the energy efficiency and configuration data of the underlying access network, providing a complete data foundation for subsequent fusion and preprocessing of multi-source asynchronous data.
[0038] In this embodiment, the preprocessing of terminal session state data and network energy efficiency data includes: The collected terminal session state data and the network energy efficiency data are mapped into a state vector. The expression is:
[0039] in, The state vector represents the number of terminals queuing for transmission. Indicates time slot The state vector under the given conditions.
[0040] Specifically, after acquiring the aforementioned multi-dimensional network operation data through SMF and AMF respectively, NWDAF performs preprocessing steps on the terminal session state data and network energy efficiency data. Since data from different core network elements (SMF and AMF) may exhibit asynchronicity in reporting timing and sampling frequency, the preprocessing process first performs preprocessing on the collected data based on the current time slot. Timestamp alignment and redundancy cleanup operations are performed to ensure that the data input to the model are in the same observation window in the time dimension.
[0041] After completing timestamp alignment, NWDAF performs multi-source feature fusion between the collected terminal session state data and the network energy efficiency data, mapping and constructing a unified state vector for the reinforcement learning environment. The state vector The current time slot is comprehensively characterized. The system analyzes the real-time service load characteristics, signaling congestion status, and underlying physical energy consumption levels of the access network. Through the aforementioned preprocessing and vector mapping process, NWDAF successfully transforms discrete communication network signaling data into a standardized, structured AI model input format. This state vector... As input features of the DQN model pre-trained and configured in NWDAF (i.e., the environmental state observed by the agent in reinforcement learning), the deep reinforcement learning model can fully perceive the dynamic changes of the network environment from a global perspective, thereby providing comprehensive and highly structured data support for the subsequent accurate output of the optimal adjustment action for the RRC inactive timer.
[0042] In this embodiment, the deep Q-network model generates an optimal adjustment strategy for the timing threshold of the RRC inactive timer, including: S21. Load the pre-trained target Q-network parameters from the deep Q-network model; S22. Order Represents the current time slot, and retrieves the state vector under the current time slot. Based on the pre-trained Q-network parameters, the Q-values of all adjustment actions for the timing threshold of the RRC inactive timer are output, and the adjustment actions... This includes: increasing the timing threshold of the RRC inactive timer by 1 time slot, decreasing it by 1 time slot, keeping it unchanged, and decreasing it by 2 time slots; S23. From all the Q values of the adjustment actions for the timing threshold of the RRC inactive timer, select the adjustment action with the largest Q value as the optimal adjustment action for the timing threshold of the RRC inactive timer.
[0043] Specifically, after completing the above state vector After its construction, this embodiment enters the intelligent decision-making stage based on deep reinforcement learning. Specifically, NWDAF, based on preprocessed terminal session state data and network energy efficiency data, uses a pre-trained deep Q-network model configured in NWDAF to select the optimal adjustment action for the timing threshold of the RRC inactive timer. This inference and decision-making process mainly includes the following execution steps: First, during the execution phase of actual deployment (i.e., the inference phase), NWDAF instantiates the deep Q-network model in memory and loads the pre-trained target Q-network parameters from that model. At this point, the network parameters of the deep Q-network model are fixed, and backpropagation and gradient update operations are no longer performed.
[0044] Subsequently, Representing the current time slot, NWDAF obtains the completed state vector for the current time slot. Based on the pre-trained Q-network parameters, forward propagation calculations are performed. This Q-network integrates network energy efficiency and terminal state characteristics. As input, the output is the Q value of all adjustment actions for the timing threshold of the RRC inactive timer. In this embodiment, to match the actual control granularity of the 5G access network, the adjustment actions... The control commands are set as a discrete set, specifically including: increasing the timer threshold by one time slot, decreasing it by one time slot, keeping it unchanged, or decreasing it by two time slots. This asymmetric action space design ensures smooth avoidance of sudden signaling storms while also giving the model the ability to perform aggressive power-saving optimizations when the network is extremely idle, thus achieving a good balance between control smoothness and energy efficiency.
[0045] Finally, NWDAF selects the adjustment action with the largest Q-value from all the adjustment actions for the timing threshold of the RRC inactive timer as the optimal adjustment action for the timing threshold of the RRC inactive timer. Since we are currently in the inference application phase, the model abandons the exploration mechanism of the training phase and adopts a fully greedy strategy to directly output the action with the maximum predicted Q-value. This maximum Q-value mathematically represents the current access network load state. Under these circumstances, taking this action is expected to bring the maximum long-term energy efficiency return to the base station, thereby ensuring that the AI decision-making scheme always converges to the globally optimal energy-saving and connectivity management strategy.
[0046] Specifically, the overall communication architecture diagram of NWDAF in the 5G core network is as follows: Figure 3 As shown, a deep Q-network model, serving as the core decision engine, is deployed within NWDAF. Logically, this deep Q-network model is divided into a closed-loop system of agent-environment interaction. Internally, the agent employs a dual-network architecture: a current Q-network for real-time action value output and a target Q-network for stable training. Additionally, an experience replay pool for storing historical interaction data and a loss function calculation module are configured. Furthermore, the network data analysis element connects to the communication bus via the Nnwdaf interface, while other elements connect to the communication bus through corresponding interfaces to implement their functions. For example, the application element connects to the communication bus via the Naf interface, acting as an external application access gateway; the policy control element connects to the communication bus via the Npcf interface, responsible for unified policy rule distribution; the network repository element connects to the communication bus via the Nnrf interface, ensuring service registration and communication between elements; the network openness element connects to the communication bus via the Nnef interface, responsible for securely opening core network data capabilities; and the server authentication element connects to the communication bus via the Nausf interface, responsible for ensuring legitimate and secure terminal access to the network.
[0047] Specifically, the access and mobility management network element is the core anchor point in the 5G core network responsible for terminal access control, mobility management, and control plane signaling interaction. It plays a pivotal role between the network data analysis network element and the RAN. Upward, it provides energy efficiency data subscription services to the network data analysis network element, and downward, it encapsulates the optimization decisions output by the network data analysis network element into standard RRC auxiliary information, which is then output to the terminal device through the N1 interface and to the RAN through the N2 interface.
[0048] Specifically, the session management network element is a dedicated entity in the 5G core network responsible for the lifecycle management of PDU session establishment, modification, and release. It grasps the session-level connection characteristics of terminals and is responsible for reporting the number of terminals in transmission state, suspended state, and queued waiting for transmission to the network data analysis network element in real time. It establishes a connection with the user plane network element through the N4 interface. The user plane performs the underlying forwarding actions of service data, and the buffering and queuing status of the data packets it sends provide important indicator support for the core network side to judge air interface congestion.
[0049] In this embodiment, a schematic diagram of the training process for the deep Q-network model is provided. Figure 4 The process of training the deep Q-network model includes: S211. Initialize the deep Q-network model, construct the current Q-network and the target Q-network with the same structure, and randomly initialize the first weight parameters of the current Q-network and the second weight parameters of the target Q-network; S212. Order Represents time slot variables, Execute S213; S213. Obtain Time Slot Variables state vector Based on the first weight parameter of the current Q network, output all adjustment actions for the timing threshold of the RRC inactive timer. The predicted Q value; S214. Based on all adjustment actions The predicted Q value is obtained by utilizing the dynamically decaying Q value with the number of iterations. Strategy selection and adjustment actions And execute; S215. Obtain and execute adjustment actions The state vector of the next time slot And calculate the instant reward value according to the preset reward function. The expression for the preset reward function is:
[0050] in, and This represents the preset constant parameter. This indicates the current energy efficiency of the wireless access network. This indicates the change in the energy efficiency of the wireless access network before and after the adjustment action was performed; S216. Set the current time slot variable state vector Adjusting movements The state vector of the next time slot and reward function As an interactive sample, it is stored in a pre-defined experience replay buffer; S217. Determine whether the number of interactive samples in the experience replay buffer has reached a preset threshold. If not, then let... Increment the value by 1 and return S213; If so, a batch of samples are randomly drawn from the experience replay buffer, and the target Q value based on the time difference is calculated in combination with the target network. A loss function is constructed based on the target Q value and the predicted Q value. The first weight parameter of the current Q network is updated using the gradient descent method, and S218 is executed. S218. Determine the current time slot variable. If the target network update cycle has been reached as an integer multiple of the preset target network update cycle, and if so, the first weight parameter of the current Q network is synchronized to the second weight parameter of the target Q network, and S219 is executed; otherwise, let... Increment the value by 1 and return S213; S219. Determine the current time slot variable. Has the preset threshold for time slot variables been reached? If yes, then training is complete; otherwise, let... Increment the value by 1 and return S213.
[0051] Specifically, the NWDAF initializes the current Q-network and the target Q-network with the same structure. In this embodiment, the neural network structure of the Q-network adopts a fully connected neural network with two hidden layers, each with 64 neurons to ensure a moderate dimension of feature extraction, and the activation function of each hidden layer adopts the ReLU function to avoid gradient vanishing.
[0052] In agent interaction and action exploration, targeting the current time slot state vector The predicted Q-values for each action are output using the current Q-network. For action selection, this embodiment employs dynamic decay. Strategy: Set an initial exploration rate The value is 0.99, and it dynamically decays with each training iteration multiplied by a decay constant of 0.9998, thus perfectly balancing the extensive exploration in the early stages of training with the utilization of experience in the later stages. The base station executes the selected adjustment action. Then, the environment provides feedback on the state of the next time slot. And calculate the instant reward value based on the reward function. .
[0053] During the process of experience playback and model parameter update, the samples generated by the above interactions will be used. The samples are stored in a pre-defined experience replay buffer, with a pool size of 10,000. Once the number of samples in the pool reaches a threshold, a batch of 64 samples is randomly selected for gradient updates. When calculating the target Q-value, a discount factor of 0.98 is introduced to effectively balance the immediate energy efficiency reward of the access network with its long-term overall return. A learning rate of 0.001 is then used to update the first weight parameters of the current Q-network using gradient descent. After a pre-defined number of steps, the parameters are synchronized to the target Q-network, and the process is iterated until the model converges and training is complete.
[0054] In this embodiment, the network data analysis element generates analysis information containing the RRC inactive timer adjustment action based on the optimal adjustment action for the RRC inactive timer, and further includes: The service-oriented interface of the network data analysis network element transmits analysis information, including RRC inactive timer adjustment actions, to the access and mobility management network element. After receiving the analysis information, the access and mobility management network element encapsulates the analysis information containing the RRC inactive timer adjustment action into RRC auxiliary information and sends it to the radio access network.
[0055] Specifically, after receiving the analysis information, the AMF analyzes its internal decision payload to extract specific adjustment actions or target thresholds for the inactive RRC timer. Subsequently, the AMF performs a protocol conversion operation, encapsulating the analysis information into RRC auxiliary information conforming to the underlying wireless network communication protocol stack specifications. Specifically, the AMF maps the updated timer adjustment parameters to signaling information elements in Next Generation Access Network Application Part (NGAP) protocol messages, and securely and reliably distributes the RRC auxiliary information to the target RAN via the N2 interface (i.e., the standard control plane interface between the AMF and the RAN) using specific control messages (such as UE CONTEXTMODIFICATION REQUEST or dedicated network auxiliary data delivery messages). Through these steps, this solution achieves seamless conversion of macro-control policies generated by AI algorithms into standard signaling recognizable by the 5G underlying physical network, providing accurate execution instructions for the final RRC connection state management.
[0056] In this embodiment, managing the RRC connection state based on the adjusted RRC inactive timer timing threshold includes: Based on the updated RRC inactive timer threshold, the data packet transmission of the target terminal is monitored. If no data packet transmission of the target terminal is detected within the time limit specified by the updated threshold, the radio access network actively triggers the RRC Release procedure and sends a release signaling carrying the suspension configuration to the target terminal, instructing the target terminal to migrate from the RRC connected state to the RRC inactive state or the RRC idle state. Otherwise, reset the RRC inactive timer and keep the target terminal in RRC connected state.
[0057] Specifically, the release signaling carrying the suspend configuration explicitly instructs the target terminal to migrate down from the high-power RRC connected state. Carrying the suspend configuration means that the RAN and core network will retain the terminal's UE context, instructing the terminal to enter the RRC inactive state so that it can be quickly woken up with very low signaling overhead in the event of subsequent data bursts. If the suspend configuration is not carried or due to network policy restrictions, the terminal is instructed to completely fall back to the RRC idle state. This significantly reduces the static maintenance power consumed by the RAN to maintain the physical connection of the terminal, significantly improving the overall energy efficiency of the system.
[0058] Example 3 This embodiment proposes an overall communication architecture between network data analysis network elements and the 5G core network based on the OAI open-source 5G platform, such as... Figure 5 As shown, under this architecture, the AI algorithm of this solution achieves deep software-level integration with the 5G underlying protocol stack.
[0059] Specifically, in this embodiment, lightweight containerization technology (such as Docker) is used to deploy network elements in a microservice manner within the OAI core network's operating environment. Specifically, a container named oai-nwdaf is independently configured and instantiated as a logical functional unit within a general-purpose x86 architecture server cluster.
[0060] The oai-nwdaf container is configured as the host machine of the core decision engine of this invention, internally running the DQN model intelligent decision-making algorithm trained in the aforementioned embodiments. At the network communication level, this container establishes real-time communication connections based on protocols such as HTTP / 2 with the oai-amf container responsible for access and mobility management and the oai-smf container responsible for session management in the existing network through the standardized service interface bus of the OAI platform. Through this SBI bus, oai-nwdaf can subscribe to and pull global state data such as terminal session status and network energy efficiency, and publish and transmit threshold adjustment control commands for the RRC inactive timer.
[0061] In terms of system-level operation and security, the entire core network relies on the oai-nrf container to realize the registration and mutual discovery of various service-oriented microservice network elements, and through the collaborative work of mysql, oai-udr, oai-udm and oai-ausf container group, it completes the persistent storage of underlying physical data, the unified and centralized management of user subscription policies and the 5G AKA security authentication process for legitimate terminal access. In the specific intelligent management and control closed loop, the oai-nwdaf container pulls the running status data (such as the number of suspended terminal connections and the congestion status of the data plane cache queue) from the oai-smf container, which is responsible for PDU session lifecycle management, and the oai-upf container, which is responsible for actual business data routing and forwarding, through the service bus of the OAI platform. After executing AI model inference, the optimal RRC idle timer threshold action instruction is passed to the core anchor point oai-amf container, which is responsible for control plane signaling interaction. It is then converted into RRC auxiliary information and sent to the underlying base station for adjustment signaling. At the same time, it combines the oai-ext-dn container, which acts as a virtual Internet node, to perform end-to-end data traffic transmission and reception and testing. Thus, an AI-driven intelligent connection management software-defined closed loop is realized in the complete open source protocol stack.
[0062] In its specific implementation, this embodiment adds a dynamic listening and configuration update interface to the RRC scheduling processing source code logic of the OAI base station. This interface enables the base station to receive control signaling encapsulated with the latest thresholds via the AMF and N2 interfaces in real time while the system is continuously running, and directly overwrites the parsed values into the currently running bwp_InactivityTimer variable in memory. Through the above-mentioned source code-level parameter mapping and dynamic reconfiguration mechanism, this invention completely establishes a closed-loop control system from "high-level AI intelligent analysis" to "low-level wireless connection disconnection," achieving precise software-defined control of the terminal's RRC connection release behavior at the physical protocol stack level.
[0063] Example 4 In this embodiment, in order to further verify the effectiveness and beneficial technical effects of the DQN-based RRC dynamic connection management method proposed in this invention in complex network environments, this embodiment constructs a detailed discrete event simulation environment for comparative testing and effect evaluation.
[0064] Specifically, the underlying logic of this embodiment is developed based on Python 3.11.4, and the Simpy discrete event simulation library is used to accurately reproduce the timing flow and signaling interaction process of 5G RRC connection status. In the AI decision-making layer, the DQN agent is built and deployed using the PyTorch 2.0.1 deep learning framework.
[0065] In terms of traffic model simulation, a Poisson process is used to simulate random access and burst data requests from a large number of Internet of Things (IoT) devices. The arrival time interval of the underlying data packets is set to follow a parameter. The exponential distribution is used to realistically reproduce the service characteristics of typical 5G massive machine-type communication (mMTC) scenarios.
[0066] In terms of channel model simulation, this embodiment strictly follows the 5G NR standardization protocol to simulate a real wireless channel environment that includes path loss, shadowing fading, and multipath effects, so as to ensure that the underlying transmission power consumption is consistent with the behavior of real base stations.
[0067] To ensure the reproducibility and convergence stability of the technical solution of this invention in industrial applications, the core network structure and training hyperparameters of the DQN agent in the simulation environment are configured as follows: the neural network structure of the Q network adopts a fully connected neural network with two hidden layers, each hidden layer is configured with 64 neurons to ensure that the dimension of feature extraction is appropriate, and the activation function of each hidden layer adopts the ReLU function to avoid gradient vanishing.
[0068] In agent interaction and action exploration, targeting the current time slot state vector The predicted Q-values for each action are output using the current Q-network. For action selection, this embodiment employs dynamic decay. Strategy: Set an initial exploration rate The value is 0.99, and it dynamically decays with each training iteration multiplied by a decay constant of 0.9998, thus perfectly balancing the extensive exploration in the early stages of training with the utilization of experience in the later stages. The base station executes the selected adjustment action. Then, the environment provides feedback on the state of the next time slot. And calculate the instant reward value based on the reward function. .
[0069] During the process of experience playback and model parameter update, the samples generated by the above interactions will be used. The samples are stored in a pre-defined experience replay buffer, with a pool size of 10,000. Once the number of samples in the pool reaches a threshold, a batch of 64 samples is randomly selected for gradient updates. When calculating the target Q-value, a discount factor of 0.98 is introduced to effectively balance the immediate energy efficiency reward of the access network with its long-term overall return. A learning rate of 0.001 is then used to update the first weight parameters of the current Q-network using gradient descent. After a pre-defined number of steps, the parameters are synchronized to the target Q-network, and the process is iterated until the model converges and training is complete.
[0070] Under the aforementioned environment and parameter configurations, the "5GRRC connection management method driven by network data analysis function" proposed in this invention was compared in parallel with the existing "static inactive timer configuration method (e.g., fixed at 10 seconds)". Experimental and training data confirmed that this invention has significant beneficial effects: The DQN agent of this invention can dynamically sense changes in the current network packet arrival rate (i.e., service load) and adaptively optimize to find the optimal timer window that maximizes the energy efficiency of the current access network. This dynamic adjustment mechanism effectively cuts off the "ineffective inactive timer tail" phase when the terminal has no service transmission. Compared with the traditional static configuration scheme, this invention improves the overall average energy efficiency of the access network by 34% to 53%. The loss function and reward curve during the training process show that the DQN agent can quickly and stably reach a convergence state after a certain number of episodes of interaction with the environment. This proves that the AI algorithm of this invention has extremely high robustness and adaptability when facing dynamic and complex communication network environments.
[0071] Example 5 In this embodiment, a computer device is proposed, comprising a memory 101, a processor 102, and a computer program stored in the memory 101 that can be executed by the processor. The processor 102 executes the computer program to implement a management method for the radio resource control layer in a 5G mobile communication network. A schematic diagram of the device is shown below. Figure 6 As shown; In this embodiment, a computer storage medium is also proposed, on which a computer program is stored. The computer program includes program instructions, which, when executed by a computer, cause the computer to perform a management method for the radio resource control layer in a 5G mobile communication network.
[0072] Obviously, the above embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the implementation of the present invention. Those skilled in the art can make other variations or modifications based on the above description. It is neither necessary nor possible to exhaustively describe all embodiments here. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the claims of the present invention.
Claims
1. A management method for the radio resource control layer in a 5G mobile communication network, wherein the 5G mobile communication network includes a core network, a radio access network, and transmission equipment, the core network being connected to the radio access network through the transmission equipment, the core network including network data analysis network elements, access and mobility management network elements, and session management network elements, and the radio access network being connected to management terminal equipment through the radio resource control layer, characterized in that, The connection management method of the radio resource control layer includes the following steps: S1. Use network data analysis network elements to collect terminal session status data and network energy efficiency data and perform preprocessing; S2. Based on the preprocessed terminal session state data and network energy efficiency data, the optimal adjustment action for the timing threshold of the inactive timer of the radio resource control layer is selected using the deep Q network model that has been pre-trained and configured in the network data analysis network element. S3. Based on the optimal adjustment action for the inactive timer of the Radio Resource Control (RRC) layer, generate analysis information containing the RRC inactive timer adjustment action, and send the analysis information to the access and mobility management (MLM) network element. The MLM network element then uses the analysis information to generate RRC auxiliary information and distribute it to the radio access network. S4. Utilize the radio access network to receive radio resource control layer auxiliary information, adjust the timing threshold of the radio resource control layer inactive timer based on the received radio resource control layer auxiliary information, and manage the radio resource control layer connection status based on the adjusted timing threshold of the radio resource control layer inactive timer.
2. The management method of the radio resource control layer in a 5G mobile communication network according to claim 1, characterized in that, The terminal session state data includes the number of terminals in a suspended state. Number of terminals in transmission state and the number of terminals waiting in the queue for transmission ; The process of the network data analysis network element collecting terminal session status data includes: The network data analysis element sends a subscription request to the session management element; The session management network element receives subscription requests and reports the number of terminals in a suspended state to the network data analysis network element. Number of terminals in transmission state and the number of terminals waiting in the queue for transmission .
3. The management method of the radio resource control layer in a 5G mobile communication network according to claim 2, characterized in that, The network energy efficiency data includes the current time slot. The timing threshold of the inactive timer in the radio resource control layer. and the current time slot Energy efficiency ; The process of the network data analysis network element collecting terminal session status data includes: The network data analysis network element sends a subscription request to the access and mobility management network element; The access and mobility management network element receives subscription requests and obtains the current time slot from the radio access network. The timing threshold and current time slot of the inactive timer in the radio resource control layer. The energy efficiency is measured and fed back to the network data analysis network element via notification messages.
4. The management method of the radio resource control layer in a 5G mobile communication network according to claim 3, characterized in that, The preprocessing of terminal session state data and network energy efficiency data includes: The collected terminal session state data and the network energy efficiency data are mapped into a state vector. The expression is: in, The state vector represents the number of terminals queuing for transmission. Indicates time slot The state vector under the given conditions.
5. The management method of the radio resource control layer in a 5G mobile communication network according to claim 4, characterized in that, The deep Q-network model generates an optimal adjustment strategy for the timing threshold of inactive timers in the radio resource control layer, including: S21. Load the pre-trained target Q-network parameters from the deep Q-network model; S22. Order Represents the current time slot, and retrieves the state vector under the current time slot. Based on the trained Q-network parameters, the Q-values of all adjustment actions for the timing threshold of the inactive timer in the Radio Resource Control layer are output, wherein the adjustment actions... This includes: increasing the timing threshold of the inactive timer in the Radio Resource Control layer by 1 time slot, decreasing it by 1 time slot, keeping it unchanged, and decreasing it by 2 time slots; S23. From all the Q values of the adjustment actions for the timing threshold of the inactive timer of the Radio Resource Control layer, select the adjustment action with the largest Q value as the optimal adjustment action for the timing threshold of the inactive timer of the Radio Resource Control layer.
6. The management method of the radio resource control layer in a 5G mobile communication network according to claim 5, characterized in that, The process of training the deep Q-network model includes: S211. Initialize the deep Q-network model, construct the current Q-network and the target Q-network with the same structure, and randomly initialize the first weight parameters of the current Q-network and the second weight parameters of the target Q-network; S212. Order Represents time slot variables, Execute S213; S213. Obtain Time Slot Variables state vector Based on the first weight parameter of the current Q network, output all adjustment actions for the timing threshold of the inactive timer of the Radio Resource Control layer. The predicted Q value; S214. Based on all adjustment actions The predicted Q value is obtained by utilizing the dynamically decaying Q value with the number of iterations. Strategy selection and adjustment actions And execute; S215. Obtain and execute adjustment actions The state vector of the next time slot And calculate the instant reward value according to the preset reward function. The expression for the preset reward function is: in, and This represents the preset constant parameter. This indicates the current energy efficiency of the wireless access network. This indicates the change in the energy efficiency of the wireless access network before and after the adjustment action was performed; S216. Set the current time slot variable state vector Adjusting movements The state vector of the next time slot and reward function As an interactive sample, it is stored in a pre-defined experience replay buffer; S217. Determine whether the number of interactive samples in the experience replay buffer has reached a preset threshold. If not, then let... Increment the value by 1 and return S213; If so, a batch of samples are randomly drawn from the experience replay buffer, and the target Q value based on the time difference is calculated in combination with the target network. A loss function is constructed based on the target Q value and the predicted Q value. The first weight parameter of the current Q network is updated using the gradient descent method, and S218 is executed. S218. Determine the current time slot variable. If the target network update cycle has been reached as an integer multiple of the preset target network update cycle, and if so, the first weight parameter of the current Q network is synchronized to the second weight parameter of the target Q network, and S219 is executed; otherwise, let... Increment the value by 1 and return S213; S219. Determine the current time slot variable. Has the preset threshold for time slot variables been reached? If yes, then training is complete; otherwise, let... Increment the value by 1 and return S213.
7. The management method of the radio resource control layer in a 5G mobile communication network according to claim 1, characterized in that, The network data analysis element generates analysis information containing the radio resource control layer inactive timer adjustment actions based on the optimal adjustment actions for the radio resource control layer inactive timer, and further includes: The service-oriented interface of the network data analysis network element transmits analysis information, including the adjustment actions of the inactive timer of the radio resource control layer, to the access and mobility management network element. After receiving the analysis information, the access and mobility management network element encapsulates the analysis information, which includes the radio resource control layer inactive timer adjustment action, into radio resource control layer auxiliary information and sends it to the radio access network.
8. The management method of the radio resource control layer in a 5G mobile communication network according to claim 1, characterized in that, The management of the radio resource control layer connection state based on the timing threshold of the adjusted radio resource control layer inactive timer includes: Based on the updated Radio Resource Control (RRC) inactive timer threshold, the data packet transmission of the target terminal is monitored. If no data packet transmission of the target terminal is detected within the time limit specified by the updated threshold, the Radio Access Network actively sends a release signaling carrying a suspend configuration to the target terminal, instructing the target terminal to migrate from the RRC connected state to the RRC inactive state or the RRC idle state. Otherwise, reset the Radio Resource Control (RRC) inactive timer and maintain the target terminal in the RRC connected state.
9. A computer device, characterized in that, The computer device includes a memory, a processor, and a computer program stored in the memory that can be run on the processor, wherein the processor executes the computer program to implement the method according to any one of claims 1 to 8.
10. A computer storage medium, characterized in that, It stores a computer program, which includes program instructions that, when executed by a computer, cause the computer to perform the method described in any one of claims 1 to 8.