A base station energy saving mode switching method, device and network equipment
By combining wide-range learning and reinforcement learning models, the switching of base station energy-saving modes is optimized, solving the problem that base station energy-saving modes rely on human experience and achieving more accurate and efficient energy-saving results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA MOBILE COMM LTD RES INST
- Filing Date
- 2021-12-15
- Publication Date
- 2026-06-12
AI Technical Summary
Existing methods for switching energy-saving modes in base stations rely on manual experience, making it difficult to accurately respond to traffic fluctuations and resulting in poor energy-saving effects.
A method combining a width learning model and a reinforcement learning model is adopted to determine the optimal energy-saving mode based on the base station traffic load, and the neural network is optimized through an actor-commentator algorithm to achieve intelligent energy-saving mode switching of the base station.
It improves the accuracy and speed of base station energy-saving mode switching, reduces energy consumption and system costs, and provides flexibility to adapt to traffic changes.
Smart Images

Figure CN116266933B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of communication technology, and in particular to a method, apparatus and network equipment for switching energy-saving modes of a base station. Background Technology
[0002] To meet the demands of 5G communication, small base stations (SBSs) are densely deployed to offload traffic from macro base stations (BSs). While this high-density design can effectively meet peak traffic loads, it leads to high energy consumption during periods of low traffic, wasting wireless resources. Energy consumption can be reduced by controlling base stations to operate in energy-saving mode, such as time-based energy-saving strategies. These strategies rely on expert experience to set the start and end times for energy saving, during which energy-saving cells are shut down and energy-saving tasks are initiated. These time periods vary depending on weekdays, weekends, and holidays. However, due to the fluctuating nature of base station traffic, these experience-based settings assume relatively stable traffic and user numbers, making it difficult to comprehensively and meticulously account for fluctuations. This introduces a probability of error, hindering the accurate and reasonable configuration of base station energy saving and ultimately failing to achieve optimal energy-saving results. Summary of the Invention
[0003] The purpose of this invention is to provide a base station energy-saving mode switching method, apparatus, and network equipment to solve the problem of poor energy-saving effect of current base station energy-saving mode switching methods.
[0004] To achieve the above objectives, embodiments of the present invention provide a base station power-saving mode switching method, comprising:
[0005] The first mapping strategy is obtained based on the base station's first traffic load;
[0006] Based on the first traffic load, the first energy-saving mode of the base station is determined through the first mapping strategy;
[0007] If the first mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the first energy-saving mode, then the base station is switched to the first energy-saving mode.
[0008] Optionally, obtaining the first mapping strategy based on the first traffic load of the base station includes:
[0009] Based on the first traffic load, the first mapping strategy is obtained through a width learning model.
[0010] Optionally, the method further includes:
[0011] If it is determined that the first mapping strategy is not the optimal mapping strategy based on the first traffic load and the first energy-saving mode, then the width learning model is updated.
[0012] Based on the first traffic load, a second mapping strategy is obtained through the updated width learning model;
[0013] Based on the first traffic load, the second energy-saving mode of the base station is determined through the second mapping strategy;
[0014] If the second mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the second energy-saving mode, then the base station is switched to the second energy-saving mode.
[0015] Optionally, before obtaining the first mapping strategy through a width learning model based on the first traffic load, the method further includes:
[0016] The base station's traffic load is used as input, and feature nodes are generated through linear transformation;
[0017] The feature nodes are subjected to a nonlinear transformation to generate enhanced nodes;
[0018] The width learning model is established based on the feature nodes and the enhancement nodes.
[0019] Optionally, the expression for the width learning model is:
[0020]
[0021] in, Let i be the feature node, i = 1, 2, ..., n, X be the traffic load of the base station, and W be the feature node. fi As the weight, and W fi β fi All are randomly generated matrices; H j =ξ([Z1,…,Z) n W hj +β hj ) represents the enhancement node, j = 1, 2, ..., m, W hj As the weight, and W hj β hj All are randomly generated matrices; The pseudo-inverse is used to update the width learning model.
[0022] Optionally, determining the first energy-saving mode of the base station based on the first traffic load and the first mapping strategy includes:
[0023] Based on the reinforcement learning model, the first traffic load is used to obtain the first energy-saving mode through the first mapping strategy.
[0024] Optionally, before obtaining the first energy-saving mode from the first traffic load through the first mapping strategy based on the reinforcement learning model, the method further includes:
[0025] Obtain the system power consumption of the base station when it switches to the target energy-saving mode under the target traffic load;
[0026] Based on the system power consumption, a state value function of the reinforcement learning model is established; wherein, the state value function is used to represent the expected value when the base station switches to the target energy-saving mode.
[0027] Optionally, if the first mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the first energy-saving mode, then switching the base station to the first energy-saving mode includes:
[0028] Obtain the first system power consumption of the base station when it switches to the first energy-saving mode under the first traffic load;
[0029] Based on the power consumption of the first system, the expected value when the base station switches to the first energy-saving mode is obtained through the state value function;
[0030] If the expected value meets the preset conditions, the first mapping strategy is determined to be the optimal mapping strategy, and the base station is switched to the first energy-saving mode.
[0031] Optionally, the system power consumption of the base station when switching to the target energy-saving mode under the target traffic load is obtained, including:
[0032] The first power consumption and the second power consumption are obtained when the base station switches to the target power saving mode under the target traffic load; wherein, the first power consumption is determined by the static power consumption that is independent of the traffic load of the base station and the dynamic power consumption that changes proportionally to the traffic load of the base station, and the second power consumption is other power consumption related to the power consumption and performance of the base station.
[0033] The system power consumption is determined based on the first power consumption and the second power consumption.
[0034] Optionally, obtaining the first power consumption and the second power consumption of the base station when switching to the target energy-saving mode under the target traffic load includes:
[0035] The first power consumption is determined based on the first parameters of the base station in the on state; wherein the first parameters include the proportion of the base station's static power consumption, the traffic load density, and the base station's total power consumption.
[0036] The second power consumption is determined based on the second parameters of the base station in the on and off states; wherein the second parameters include the degree of excellence of KPI indicators related to system power consumption and performance and the total power consumption of the base station.
[0037] Optionally, the expression for system power consumption is:
[0038]
[0039] Where C represents the system power consumption; B' represents the set of base stations that are in the active state; q i ρ represents the proportion of static power consumption of base station i in the set. i For flow load density, P i Let γ be the total power consumption of base station i, γ be the degree of excellence of the KPI indicators related to base station power consumption and performance, and b be the set of base stations in all states.
[0040] Optionally, the expression for the state value function is:
[0041]
[0042] Where Eπ represents the expected value when the energy-saving mode π is adopted, θ is the discount factor, C is the system power consumption, and s is the traffic load of the base station.
[0043] To achieve the above objectives, embodiments of the present invention provide a base station energy-saving mode switching device, comprising:
[0044] The first processing module is used to obtain the first mapping strategy based on the first traffic load of the base station;
[0045] The second processing module is used to determine the first energy-saving mode of the base station based on the first traffic load and through the first mapping strategy.
[0046] The first switching module is configured to switch the base station to the first energy-saving mode if the first mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the first energy-saving mode.
[0047] Optionally, the first processing module includes:
[0048] The first processing unit is configured to obtain the first mapping strategy based on the first traffic load using a width learning model.
[0049] Optionally, the device further includes:
[0050] An update module is used to update the width learning model if it is determined that the first mapping strategy is not the optimal mapping strategy based on the first traffic load and the first energy-saving mode.
[0051] The third processing module is used to obtain the second mapping strategy based on the first traffic load through the updated width learning model;
[0052] The fourth processing module is used to determine the second energy-saving mode of the base station based on the first traffic load and through the second mapping strategy.
[0053] The second switching module is used to switch the base station to the second energy-saving mode if the second mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the second energy-saving mode.
[0054] Optionally, the device further includes:
[0055] The first generation module is used to generate feature nodes by taking the traffic load of the base station as input and generating them through linear transformation.
[0056] The second generation module is used to perform nonlinear transformations on the feature nodes to generate enhanced nodes;
[0057] The first module is used to establish the width learning model based on the feature nodes and the enhancement nodes.
[0058] Optionally, the expression for the width learning model is:
[0059]
[0060] in, Let i be the feature node, i = 1, 2, ..., n, X be the traffic load of the base station, and W be the feature node. fi As the weight, and W fi β fi All are randomly generated matrices; H j =ξ([Z1,…,Z) n W hj +β hj ) represents the enhancement node, j = 1, 2, ..., m, W hj As the weight, and W hj β hj All are randomly generated matrices; The pseudo-inverse is used to update the width learning model.
[0061] Optionally, the second processing module includes:
[0062] The second processing unit is used to obtain the first energy-saving mode from the first traffic load through the first mapping strategy based on the reinforcement learning model.
[0063] Optionally, the device further includes:
[0064] The acquisition module is used to acquire the system power consumption of the base station when it switches to the target energy-saving mode under the target traffic load;
[0065] The second establishment module is used to establish the state value function of the reinforcement learning model based on the system power consumption; wherein the state value function is used to represent the expected value when the base station switches to the target energy-saving mode.
[0066] Optionally, the first switching module includes:
[0067] The first acquisition unit is used to acquire the first system power consumption when the base station switches to the first energy-saving mode under the first traffic load.
[0068] The third processing unit is used to obtain the expected value when the base station switches to the first energy-saving mode based on the power consumption of the first system through the state value function;
[0069] The switching unit is configured to determine the first mapping strategy as the optimal mapping strategy and switch the base station to the first energy-saving mode if the expected value meets the preset conditions.
[0070] Optionally, the acquisition module includes:
[0071] The second acquisition unit is used to acquire the first power consumption and the second power consumption when the base station switches to the target energy-saving mode under the target traffic load; wherein, the first power consumption is determined by the static power consumption that is independent of the traffic load of the base station and the dynamic power consumption that changes proportionally to the traffic load of the base station, and the second power consumption is other power consumption related to the power consumption and performance of the base station.
[0072] A determining unit is configured to determine the system power consumption based on the first power consumption and the second power consumption.
[0073] Optionally, the second acquisition unit is further configured to:
[0074] The first power consumption is determined based on the first parameters of the base station in the on state; wherein the first parameters include the proportion of the base station's static power consumption, the traffic load density, and the base station's total power consumption.
[0075] The second power consumption is determined based on the second parameters of the base station in the on and off states; wherein the second parameters include the degree of excellence of KPI indicators related to system power consumption and performance and the total power consumption of the base station.
[0076] Optionally, the expression for the system power consumption is:
[0077]
[0078] Where C represents the system power consumption; B' represents the set of base stations that are in the active state; q i ρ represents the proportion of static power consumption of base station i in the set. i For flow load density, Pi Let γ be the total power consumption of base station i, γ be the degree of excellence of the KPI indicators related to base station power consumption and performance, and b be the set of base stations in all states.
[0079] Optionally, the expression for the state value function is:
[0080]
[0081] Where Eπ represents the expected value when the energy-saving mode π is adopted, θ is the discount factor, C is the system power consumption, and s is the traffic load of the base station.
[0082] To achieve the above objectives, embodiments of the present invention provide a network device, including: a transceiver, a processor, a memory, and a program or instructions stored in the memory and executable on the processor; when the processor executes the program or instructions, it implements the steps in the base station power-saving mode switching method described above.
[0083] To achieve the above objectives, embodiments of the present invention provide a readable storage medium storing a program or instructions thereon, which, when executed by a processor, implement the steps in the base station power-saving mode switching method described above.
[0084] The beneficial effects of the above-described technical solution of the present invention are as follows:
[0085] In this embodiment of the invention, a first mapping strategy is obtained based on a first traffic load of the base station; a first energy-saving mode of the base station is determined based on the first traffic load and the first energy-saving mode; if the first mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the first energy-saving mode, the base station is switched to the first energy-saving mode to ensure that the switching of the base station's energy-saving mode has a more reasonable and accurate configuration, thereby reducing the base station's energy consumption and reducing system costs, and achieving good energy-saving effect. Attached Figure Description
[0086] Figure 1 This is a flowchart of a base station power-saving mode switching method according to an embodiment of the present invention;
[0087] Figure 2 This is a schematic diagram of base station distribution under a 5G network according to an embodiment of the present invention;
[0088] Figure 3 This is the algorithm architecture for base station power-saving mode switching in this embodiment of the invention;
[0089] Figure 4 This is a block diagram of a base station energy-saving mode switching device according to an embodiment of the present invention;
[0090] Figure 5 This is a block diagram of a network device according to an embodiment of the present invention. Detailed Implementation
[0091] To make the technical problems, technical solutions and advantages of the present invention clearer, a detailed description will be given below in conjunction with the accompanying drawings and specific embodiments.
[0092] It should be understood that the phrase "one embodiment" or "an embodiment" throughout the specification means that a specific feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the invention. Therefore, "in one embodiment" or "in an embodiment" appearing throughout the specification do not necessarily refer to the same embodiment. Furthermore, these specific features, structures, or characteristics can be combined in any suitable manner in one or more embodiments.
[0093] In various embodiments of the present invention, it should be understood that the sequence number of each process described below does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
[0094] In addition, the terms "system" and "network" are often used interchangeably in this article.
[0095] In the embodiments provided in this application, it should be understood that "B corresponding to A" means that B is associated with A, and B can be determined based on A. However, it should also be understood that determining B based on A does not mean determining B solely based on A; B can also be determined based on A and / or other information.
[0096] like Figure 1 As shown, this embodiment of the invention provides a method for switching power-saving modes in a base station, which may specifically include the following steps:
[0097] Step 11: Obtain the first mapping strategy based on the first traffic load of the base station.
[0098] Optionally, the first traffic load here can be understood as the traffic load of the base station at a certain stage / state / time point.
[0099] Optionally, the mapping strategy can be understood as a strategy to map the traffic load of the base station to the corresponding energy-saving mode. For example, the first mapping strategy is a strategy to map the first traffic load of the base station to the first energy-saving mode.
[0100] Optionally, obtaining the first mapping strategy based on the first traffic load of the base station can be implemented based on the policy network in the Actor-Critic (AC) algorithm, such as by using a wide learning model. Specifically, step 11 may include: obtaining the first mapping strategy based on the first traffic load using a wide learning model, such as by using the first traffic load as the input to the wide learning model and obtaining the first mapping strategy as the output.
[0101] Step 12: Based on the first traffic load, determine the first energy-saving mode of the base station through the first mapping strategy.
[0102] Optionally, the energy-saving mode of the base station may include, but is not limited to: sleep mode (e.g., the base station is in a closed state), active mode (e.g., the base station is in a powered-on state), and omnidirectional mode (or low-power mode, e.g., the low-power mode when the base station is powered-on); wherein, the base station system may include multiple base stations, and the energy-saving modes of different base stations may be the same or different, and the first energy-saving mode of a base station may be one of sleep mode, active mode, and omnidirectional mode.
[0103] Optionally, based on the learning model, the base station's traffic load can be used as input to map the energy-saving mode through the energy-saving strategy obtained in step 11 above. Specifically, step 12 may include: based on the reinforcement learning model, obtaining the first energy-saving mode by mapping the first traffic load through the first mapping strategy, for example: using the first traffic load as input to the reinforcement learning model, and mapping the first traffic load through the first mapping strategy obtained in step 11 above to obtain the first energy-saving mode.
[0104] Step 13: If the first mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the first energy-saving mode, then the base station is switched to the first energy-saving mode.
[0105] Optionally, the optimal mapping strategy can be determined based on the comment network in the actor-critic algorithm. This comment network can be based on the state-value function in a reinforcement learning model. Specifically, the actor-critic algorithm includes two neural networks: a policy network and a comment network. The policy network, referred to as the "actor," generates the strategy, taking the state (i.e., the base station's traffic load) as input and outputting the action (i.e., the mapping strategy). The state-value function, referred to as the "critic," determines the quality of the action taken (i.e., the quality of the mapping strategy generated by the policy network), thus determining the final course of the action.
[0106] In the above scheme, a first mapping strategy is obtained based on the first traffic load of the base station; a first energy-saving mode of the base station is determined based on the first traffic load and the first energy-saving mode; if the first mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the first energy-saving mode, the base station is switched to the first energy-saving mode to ensure that the switching of the base station's energy-saving mode has a more reasonable and accurate configuration, thereby reducing the base station's energy consumption and reducing system costs, and achieving good energy-saving effect.
[0107] Specifically, users are evenly distributed within the area and receive services from base stations, such as... Figure 2 As shown. Based on the above base station energy-saving mode switching method, the required energy-saving mode of the base station can be determined. For example, during periods of low traffic (such as when traffic is below the first threshold), the system is set to sleep mode, i.e., shutting down some base stations that are not fully utilizing resources; during periods of medium traffic (such as when traffic is above the first threshold but below the second threshold), it switches to omnidirectional mode (i.e., the base station is on and in low-power mode) to minimize energy consumption; during periods of high traffic (such as when traffic is above the second threshold), it is set to active mode (i.e., the base station is on), thereby significantly reducing energy consumption without affecting service quality and achieving an effective energy-saving solution; wherein, the first threshold is less than the second threshold.
[0108] Furthermore, this scheme optimizes the traditional neural network method based on the width learning algorithm and combines it with the actor-critic reinforcement learning method to solve the problem. While optimizing the base station energy-saving mode switching problem, it improves the model training speed and accuracy. It is also applicable to situations where the training data is incorrect or missing, and can solve the problems of slow model training speed and inaccurate training data in the current base station energy-saving mode switching methods.
[0109] Specifically, such as Figure 3 As shown, an algorithmic architecture for base station energy-saving mode switching is presented. This mainly includes constructing a reinforcement learning process, solving the Multi-Purpose Dependency Calculation (MDP) based on the actor-critic algorithm, and replacing the policy network in the actor-critic algorithm with a wide-learning system to optimize the neural network. Specifically, in the policy network of the actor-critic algorithm, a mapping policy can be obtained based on the base station's traffic load using the wide-learning algorithm. The reinforcement learning algorithm can map the mapping policy obtained through the policy network to the corresponding energy-saving mode based on the base station's traffic load. The critique network of the actor-critic algorithm evaluates whether the mapping policy is optimal based on the base station's traffic load and the corresponding energy-saving policy determined by the reinforcement learning algorithm. When it is determined to be the optimal mapping policy, the base station can be switched to the corresponding energy-saving mode; when it is determined not to be the optimal mapping policy, the neural network can be optimized to ensure a more reasonable and accurate configuration of the energy-saving mode.
[0110] Optionally, in the base station power-saving mode switching method of this embodiment of the invention, the step of optimizing the neural network may include:
[0111] After determining the first energy-saving mode of the base station based on the first traffic load and the first mapping strategy, if it is determined that the first mapping strategy is not the optimal mapping strategy based on the first traffic load and the first energy-saving mode, then the width learning model is updated; a second mapping strategy is obtained based on the first traffic load using the updated width learning model; a second energy-saving mode of the base station is determined based on the first traffic load and the second mapping strategy; if it is determined that the second mapping strategy is the optimal mapping strategy based on the first traffic load and the second energy-saving mode, then the base station is switched to the second energy-saving mode.
[0112] In this embodiment, by optimizing the width learning model, the corresponding energy-saving mode is determined based on the first traffic load of the current base station using the optimal mapping strategy. Furthermore, the use of the width learning model can effectively improve the model training speed based on its fast calculation and powerful modeling capabilities, thereby improving the base station's response speed to energy-saving strategies in response to real-time traffic changes.
[0113] Optionally, the neural network is optimized by replacing the policy network in the actor-critic algorithm with a width learning system, including constructing a width learning model, i.e., before obtaining the first mapping policy through the width learning model based on the first traffic load, further comprising:
[0114] The base station's traffic load is used as input, and feature nodes are generated through linear transformation; nonlinear transformation is performed on the feature nodes to generate enhanced nodes; and the width learning model is established based on the feature nodes and the enhanced nodes.
[0115] In this embodiment, to improve the model training speed, wide learning is used instead of the policy network in the actor-critic algorithm. The framework and algorithm of wide reinforcement learning (BRL) maintain the powerful modeling ability, fast computing ability and autonomous decision-making characteristics of wide learning system (BLS), which can improve the response speed of base station to energy-saving strategies in response to real-time traffic changes, and BLS training can start after the training samples are prepared.
[0116] Width learning is an incremental learning algorithm based on a Random Vector Functional Link Network (RVFL) structure. Unlike the traditional RVFL structure, the input weight matrix of the width learning system is not randomly generated, but rather encoded using a sparse autoencoder, with the optimal weights selected during the decoding process. The input samples in the width learning method undergo a linear transformation to map their feature representations onto the feature plane, forming feature nodes. These feature nodes are then subjected to a nonlinear transformation using an activation function to generate enhancement nodes. The feature nodes and enhancement nodes are connected together to serve as the actual input signal of the system, which is then linearly output via the connection matrix. Similar to RVFL, considering the high time cost and susceptibility to local optima in the classic backpropagation algorithm, the width learning method directly obtains the output connection matrix using a generalized inverse.
[0117] In this embodiment, the policy network in the actor-critic algorithm is replaced by a width learning system. The input X is the state of the base station (i.e., the traffic load of the base station), and the output Y is the action taken by the base station (i.e., the mapping policy). The comment network in the actor-critic algorithm is replaced by a width learning system. The input X is the state of the base station, and the output is used to approximate the value function.
[0118] Specifically, the base station state (i.e., the base station's traffic load) X is transformed by a linear transformation to map its features onto a feature space, represented as follows:
[0119]
[0120] Among them, Z i Let X be the characteristic node, W be the traffic load of the base station, and W be the characteristic node. fi As the weight, and W fi β fi All of them are randomly generated matrices.
[0121] The feature nodes are then subjected to a nonlinear transformation using an activation function to generate enhanced nodes, represented as follows:
[0122] H j =ξ([Z1,…,Z) n W hj +β hj ), j = 1, 2, ..., m
[0123] Among them, H j To enhance the node, W hj As the weight, and W hj β hj All of them are randomly generated matrices.
[0124] The connection between the feature nodes and the augmentation nodes is fed into the output layer, outputting Y, which is the width. The learning model can be represented as:
[0125]
[0126] Where Y represents the output result; Let i be the feature node, i = 1, 2, ..., n, X be the traffic load of the base station, and W be the feature node. fi As the weight, and W fi β fi All are randomly generated matrices; H j =ξ([Z1,…,Z) n W hj +β hj ) represents the enhancement node, j = 1, 2, ..., m, W hj As the weight, and W hj β hj All are randomly generated matrices; It is a false rebellion.
[0127] The width learning model can be updated by updating the pseudo-inverse. The specific width learning system differs from a neural network; instead of changing the kernel of the feature extractor through backpropagation, it calculates the weights of each feature node and augmentation node by finding the pseudo-inverse and iteratively updating it. The formula for calculating the pseudo-inverse is as follows:
[0128]
[0129] in, It is an extended input matrix that includes all input vectors combined with the augmenting nodes.
[0130] in,
[0131] Iterative updates yield pseudo-inverses After that, the input and weight training of the entire width learning system network is completed, which greatly improves the computation speed and reduces parameter complexity compared with neural networks.
[0132] To address the error introduced by this calculation, network training needs to minimize the training error while ensuring model accuracy. Only in this way can the obtained parameters have good generalization performance (i.e., small test error). The pseudo-inverse solution in this embodiment aims to obtain the output weights while minimizing the training error. However, this method may not minimize generalization error and other issues. Therefore, the following optimization problem can also be used to solve it:
[0133]
[0134] Here, v,u is a typical norm regularization. Considering this optimization problem as an l2 norm regularization, and given that the function is convex and has stronger generalization ability, and λ is a constraint factor, we obtain:
[0135] W=(λI+AA T ) -1 A T Y
[0136] Optionally, solving the MDP based on the actor-critic algorithm involves using the state-value function in the reinforcement learning algorithm to determine whether the mapping policy determined by the policy network is the optimal policy. That is, the process of constructing the reinforcement learning model also includes the process of establishing the state-value function. Specifically, before obtaining the first energy-saving mode by mapping the first traffic load through the first mapping policy based on the reinforcement learning model, the process further includes:
[0137] Obtain the system power consumption of the base station when it switches to the target energy-saving mode under the target traffic load; establish the state value function of the reinforcement learning model based on the system power consumption; wherein the state value function is used to represent the expected value when the base station switches to the target energy-saving mode.
[0138] In this embodiment, a state value function is established based on system power consumption to ensure that when the expected value of the base station switches to the target energy-saving mode meets the preset conditions, the corresponding mapping strategy is determined as the optimal mapping strategy, so as to ensure that the optimal mapping strategy is determined while minimizing system power consumption.
[0139] Optionally, the MDP is solved based on the actor-critic algorithm. That is, if the first mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the first energy-saving mode, the base station is switched to the first energy-saving mode, including:
[0140] The system power consumption of the base station when it switches to the first energy-saving mode under the first traffic load is obtained; based on the first system power consumption, the expected value of the base station when it switches to the first energy-saving mode is obtained through the state value function; if the expected value meets the preset conditions, the first mapping strategy is determined to be the optimal mapping strategy, and the base station is switched to the first energy-saving mode.
[0141] Correspondingly, if the expected value does not meet the preset conditions, it is determined that the first mapping strategy is not the optimal mapping strategy, and the above-mentioned step of updating the width learning model is executed.
[0142] In this embodiment, the goal of reinforcement learning is to maximize expected returns (or minimize costs, i.e., minimize system power consumption). That is, as learning progresses, the policy network in the actor-critic algorithm tends towards an optimal value and adopts the optimal mapping policy in each state to minimize base station power consumption. The goal of the comment network in the actor-critic algorithm is to find the optimal mapping policy that maps states (i.e., base station traffic load) to actions (i.e., power-saving modes), thereby minimizing system costs. Thus, if the actor-critic algorithm determines that the first mapping policy is optimal, the base station is switched to the first power-saving mode mapped by the optimal mapping policy, thereby ensuring the accuracy of base station power-saving mode switching and reducing system power consumption.
[0143] Optionally, the system power consumption of the base station when switching to the target energy-saving mode under the target traffic load is obtained, including:
[0144] The system power consumption is determined by acquiring the first power consumption and the second power consumption when the base station switches to the target energy-saving mode under the target traffic load; based on the first power consumption and the second power consumption, the system power consumption is determined.
[0145] The first power consumption is determined by static power consumption that is independent of the traffic load of the base station and dynamic power consumption that varies proportionally with the traffic load of the base station, and the second power consumption is other power consumption related to the power consumption and performance of the base station.
[0146] The acquisition of the first power consumption and the second power consumption when the base station switches to the target energy-saving mode under the target traffic load includes:
[0147] The first power consumption is determined based on the first parameters of the base station in the active state; wherein, the first parameters include the proportion of the base station's static power consumption, the traffic load density, and the base station's total power consumption; for example, it can be determined using the formula: ∑ i∈B’ [(1-q i )ρ i P i +q i P i ], determine the first power consumption, where (1-q i )ρ i P i q represents the dynamic power consumption of base station i. i P i q represents the static power consumption of base station i, and B' is the set of base stations that are in the active state; i ρ represents the proportion of static power consumption of base station i in the set. i For flow load density, P i Let be the total power consumption of base station i.
[0148] The second power consumption is determined based on the second parameters of the base station in the on and off states; wherein, the second parameter includes the degree of excellence of KPI indicators related to system power consumption and performance, and the total power consumption of the base station; for example, it can be determined using the formula: ∑ i∈b γP i Determine the second power consumption; where γ is the degree of excellence of the KPI indicators related to system power consumption and performance, and b is the set of base stations in all states (e.g., base stations in all states can include base stations in the on state and base stations in the off state).
[0149] In this embodiment, by incorporating the power consumption of the base station and the quality of related performance indicators (KPIs) into the algorithm model as system cost, the model is better suited to solving energy-saving problems and effectively improves the accuracy of the model.
[0150] Optionally, the state value function can be determined by constructing a reinforcement learning process, which includes an agent and an environment. Each base station is regarded as an agent, and there is a continuous interaction between the base station and the environment. This interaction process is reflected by a Markov decision process.
[0151] Specifically, Markov decision processes use tuples<S,A,P,C> Let S be the state space, A be the action space, P be the state transition probability, and C be the cost function.
[0152] For example, in stage k, the state set is represented as S(k) = {s(k)1, s(k)2, ...}, where S(k)i represents the state of base station i in stage k (i.e., traffic load); the action set is represented as a(k) = {a(k)1, a(k)2, ...}, base station i is turned off when a(k)i = 0 (i.e., sleep mode) and turned on when a(k)i = 1 (i.e., active mode). When the base station is turned on, if the traffic load is less than a set threshold, the base station will switch to omnidirectional mode (i.e., low power mode); in the base station energy-saving mode switching scenario, P is defined as an idealized state transition probability model that follows a uniform distribution. The cost function C can be determined by the system power consumption, which is determined by the first power consumption and the second power consumption (see the above embodiment for details). The base station power consumption (i.e., the first power consumption) consists of two parts: static power consumption that is independent of traffic load and dynamic power consumption that varies proportionally with traffic load. The second power consumption is other power consumption related to the base station power consumption and performance.
[0153] The expression for the system power consumption is:
[0154]
[0155] Where C represents the system power consumption; B' represents the set of base stations that are in the active state; q i ρ represents the proportion of static power consumption of base station i in the set.i For flow load density, P i Let γ be the total power consumption of base station i, γ be the degree of excellence of the KPI indicators related to base station power consumption and performance, and b be the set of base stations in all states (e.g., base stations in all states can include base stations in the on state and base stations in the off state).
[0156] At each time step, the agent (i.e., the base station) implements a mapping from state (i.e., traffic load) to action (i.e., the base station's energy-saving mode), which is the agent's (i.e., the base station's) mapping strategy. Each action (i.e., the base station's energy-saving mode) has an associated reward (or system cost, which can be represented by system power consumption). The expected value of the discounted cost or reward is called the state-value function, which can be expressed as:
[0157]
[0158] Where Eπ represents the expected value when the energy-saving mode π is adopted, and C(s) k ,π(s k ) represents the system power consumption in stage k, which depends on state s. k and action π(s) k (i.e., the base station's traffic load and corresponding energy-saving mode), θ is a discount factor, whose value is between 0 and 1, indicating that the value of the immediate reward is greater than the value of the subsequent reward. Therefore, the expression for the state value function is:
[0159]
[0160] Where Eπ represents the expected value when the energy-saving mode π is adopted, θ is the discount factor, C is the system power consumption, and s is the traffic load of the base station.
[0161] In this embodiment of the invention, the traffic load of the base station is taken as the state, and the energy-saving mode of the base station is taken as the action. When switching the energy-saving mode, it follows an idealized state transition probability model with uniform distribution. This model is more suitable for energy-saving problems that rely on the feedback of traffic load to switch modes.
[0162] like Figure 4 As shown, an embodiment of the present invention provides a base station power-saving mode switching device 400, comprising:
[0163] The first processing module 410 is used to obtain a first mapping strategy based on the first traffic load of the base station;
[0164] The second processing module 420 is used to determine the first energy-saving mode of the base station based on the first traffic load and through the first mapping strategy.
[0165] The first switching module 430 is used to switch the base station to the first energy-saving mode if the first mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the first energy-saving mode.
[0166] Optionally, the first processing module 410 includes:
[0167] The first processing unit is configured to obtain the first mapping strategy based on the first traffic load using a width learning model.
[0168] Optionally, the device 400 further includes:
[0169] An update module is used to update the width learning model if it is determined that the first mapping strategy is not the optimal mapping strategy based on the first traffic load and the first energy-saving mode.
[0170] The third processing module is used to obtain the second mapping strategy based on the first traffic load through the updated width learning model;
[0171] The fourth processing module is used to determine the second energy-saving mode of the base station based on the first traffic load and through the second mapping strategy.
[0172] The second switching module is used to switch the base station to the second energy-saving mode if the second mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the second energy-saving mode.
[0173] Optionally, the device 400 further includes:
[0174] The first generation module is used to generate feature nodes by taking the traffic load of the base station as input and generating them through linear transformation.
[0175] The second generation module is used to perform nonlinear transformations on the feature nodes to generate enhanced nodes;
[0176] The first module is used to establish the width learning model based on the feature nodes and the enhancement nodes.
[0177] Optionally, the expression for the width learning model is:
[0178]
[0179] in, Let i be the feature node, i = 1, 2, ..., n, X be the traffic load of the base station, and W be the feature node. fi As the weight, and W fi β fi All are randomly generated matrices; H j =ξ([Z1,…,Z) nW hj +β hj ) represents the enhancement node, j = 1, 2, ..., m, W hj As the weight, and W hj β hj All are randomly generated matrices; The pseudo-inverse is used to update the width learning model.
[0180] Optionally, the second processing module 420 includes:
[0181] The second processing unit is used to obtain the first energy-saving mode from the first traffic load through the first mapping strategy based on the reinforcement learning model.
[0182] Optionally, the device 400 further includes:
[0183] The acquisition module is used to acquire the system power consumption of the base station when it switches to the target energy-saving mode under the target traffic load;
[0184] The second establishment module is used to establish the state value function of the reinforcement learning model based on the system power consumption; wherein the state value function is used to represent the expected value when the base station switches to the target energy-saving mode.
[0185] Optionally, the first switching module 430 includes:
[0186] The first acquisition unit is used to acquire the first system power consumption when the base station switches to the first energy-saving mode under the first traffic load.
[0187] The third processing unit is used to obtain the expected value when the base station switches to the first energy-saving mode based on the power consumption of the first system through the state value function;
[0188] The switching unit is configured to determine the first mapping strategy as the optimal mapping strategy and switch the base station to the first energy-saving mode if the expected value meets the preset conditions.
[0189] Optionally, the acquisition module includes:
[0190] The second acquisition unit is used to acquire the first power consumption and the second power consumption when the base station switches to the target energy-saving mode under the target traffic load; wherein, the first power consumption is determined by the static power consumption that is independent of the traffic load of the base station and the dynamic power consumption that changes proportionally to the traffic load of the base station, and the second power consumption is other power consumption related to the power consumption and performance of the base station.
[0191] A determining unit is configured to determine the system power consumption based on the first power consumption and the second power consumption.
[0192] Optionally, the second acquisition unit is further configured to:
[0193] The first power consumption is determined based on the first parameters of the base station in the on state; wherein the first parameters include the proportion of the base station's static power consumption, the traffic load density, and the base station's total power consumption.
[0194] The second power consumption is determined based on the second parameters of the base station in the on and off states; wherein the second parameters include the degree of excellence of KPI indicators related to system power consumption and performance and the total power consumption of the base station.
[0195] Optionally, the expression for the system power consumption is:
[0196]
[0197] Where C represents the system power consumption; B' represents the set of base stations that are in the active state; q i ρ represents the proportion of static power consumption of base station i in the set. i For flow load density, P i Let γ be the total power consumption of base station i, γ be the degree of excellence of the KPI indicators related to base station power consumption and performance, and b be the set of base stations in all states.
[0198] Optionally, the expression for the state value function is:
[0199]
[0200] Where Eπ represents the expected value when the energy-saving mode π is adopted, θ is the discount factor, C is the system power consumption, and s is the traffic load of the base station.
[0201] The apparatus described in the embodiments of the present invention can implement the various embodiments corresponding to the above methods and achieve the same technical effect. To avoid repetition, it will not be described again here.
[0202] This invention also provides a network device, such as... Figure 5 As shown, it includes a transceiver 510, a processor 500, a memory 520, and a program or instructions stored in the memory 520 and executable on the processor 500; when the processor 500 executes the program or instructions, it implements the steps in the above-mentioned base station power-saving mode switching method and achieves the same technical effect. To avoid repetition, it will not be described again here.
[0203] The transceiver 510 is used to receive and send data under the control of the processor 500.
[0204] Among them, Figure 5In this context, the bus architecture may include any number of interconnected buses and bridges, specifically linking various circuits together, represented by one or more processors (processor 500) and memory (memory 520). The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, and power management circuits, which are well known in the art and therefore will not be described further herein. The bus interface provides an interface. The transceiver 510 may be multiple elements, including transmitters and receivers, providing a unit for communicating with various other devices over a transmission medium. The processor 500 is responsible for managing the bus architecture and general processing, and the memory 520 may store data used by the processor 500 during operation.
[0205] An embodiment of the present invention provides a readable storage medium storing a program or instructions. When the program or instructions are executed by a processor, they implement the steps in the base station power-saving mode switching method described above and achieve the same technical effect. To avoid repetition, the details will not be repeated here.
[0206] The processor mentioned above is the processor described in the above embodiments. The readable storage medium includes computer-readable storage media, such as computer read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0207] It should be further noted that the terminals described in this specification include, but are not limited to, smartphones, tablets, etc., and many of the functional components described are referred to as modules in order to emphasize the independence of their implementation.
[0208] In this embodiment of the invention, the module can be implemented in software so that it can be executed by various types of processors. For example, an identified executable code module may include one or more physical or logical blocks of computer instructions, which may be constructed as objects, procedures, or functions. Nevertheless, the executable code of the identified module does not need to be physically located together, but may include different instructions stored in different bits, which, when logically combined, constitute the module and achieve the module's intended purpose.
[0209] In practice, an executable code module can be a single instruction or many instructions, and can even be distributed across multiple different code segments, different programs, and across multiple memory devices. Similarly, operational data can be identified within the module and can be implemented in any suitable form and organized within any suitable type of data structure. This operational data can be collected as a single dataset or distributed across different locations (including different storage devices), and can exist, at least in part, solely as electronic signals within the system or network.
[0210] When a module can be implemented using software, considering the current level of hardware technology, modules that can be implemented in software can be implemented using hardware circuits by those skilled in the art to achieve the corresponding functions, without considering cost. These hardware circuits include conventional very-large-scale integrated circuits (VLSI) or gate arrays, as well as existing semiconductors such as logic chips and transistors, or other discrete components. Modules can also be implemented using programmable hardware devices, such as field-programmable gate arrays, programmable array logic, and programmable logic devices.
[0211] The exemplary embodiments described above are with reference to the accompanying drawings. Many different forms and embodiments are feasible without departing from the spirit and teachings of the invention. Therefore, the invention should not be construed as limiting the exemplary embodiments set forth herein. Rather, these exemplary embodiments are provided to make the invention complete and convey the scope of the invention to those skilled in the art. In these drawings, component dimensions and relative dimensions may be exaggerated for clarity. The terminology used herein is for the purpose of describing particular exemplary embodiments only and is not intended to be limiting. As used herein, unless clearly indicated otherwise, the singular forms “a,” “an,” and “the” are intended to include all such forms. It will be further understood that the terms “comprising” and / or “including”, when used in this specification, indicate the presence of the stated features, integers, steps, operations, components, and / or elements, but do not exclude the presence or addition of one or more other features, integers, steps, operations, components, and / or groups thereof. Unless otherwise indicated, when stated, a range of values includes the upper and lower limits of the range and any subranges in between.
[0212] The above description represents the preferred embodiments of the present invention. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principles of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.
Claims
1. A method for switching power-saving modes in a base station, characterized in that, include: The first mapping strategy is obtained based on the base station's first traffic load; Based on the first traffic load, the first energy-saving mode of the base station is determined through the first mapping strategy; If the first mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the first energy-saving mode, then the base station is switched to the first energy-saving mode; The step of obtaining the first mapping strategy based on the first traffic load of the base station includes: Based on the first traffic load, the first mapping strategy is obtained through a width learning model; The step of determining the first energy-saving mode of the base station based on the first traffic load and through the first mapping strategy includes: Based on the reinforcement learning model, the first traffic load is used to obtain the first energy-saving mode through the first mapping strategy; If it is determined that the first mapping strategy is not the optimal mapping strategy based on the first traffic load and the first energy-saving mode, then the width learning model is updated. Based on the first traffic load, a second mapping strategy is obtained through the updated width learning model; Based on the first traffic load, the second energy-saving mode of the base station is determined through the second mapping strategy; If the second mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the second energy-saving mode, then the base station is switched to the second energy-saving mode.
2. The method according to claim 1, characterized in that, Before obtaining the first mapping strategy through a width learning model based on the first traffic load, the method further includes: The base station's traffic load is used as input, and feature nodes are generated through linear transformation; The feature nodes are subjected to a nonlinear transformation to generate enhanced nodes; The width learning model is established based on the feature nodes and the enhancement nodes.
3. The method according to claim 1, characterized in that, Before obtaining the first energy-saving mode from the first traffic load through the first mapping strategy based on the reinforcement learning model, the method further includes: Obtain the system power consumption of the base station when it switches to the target energy-saving mode under the target traffic load; Based on the system power consumption, a state value function of the reinforcement learning model is established; wherein, the state value function is used to represent the expected value when the base station switches to the target energy-saving mode.
4. The method according to claim 3, characterized in that, If the first mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the first energy-saving mode, then switching the base station to the first energy-saving mode includes: Obtain the first system power consumption of the base station when it switches to the first energy-saving mode under the first traffic load; Based on the power consumption of the first system, the expected value when the base station switches to the first energy-saving mode is obtained through the state value function; If the expected value meets the preset conditions, the first mapping strategy is determined to be the optimal mapping strategy, and the base station is switched to the first energy-saving mode.
5. The method according to claim 3, characterized in that, Obtain the system power consumption of the base station when switching to the target power-saving mode under the target traffic load, including: The first power consumption and the second power consumption are obtained when the base station switches to the target power saving mode under the target traffic load; wherein, the first power consumption is determined by the static power consumption that is independent of the traffic load of the base station and the dynamic power consumption that changes proportionally to the traffic load of the base station, and the second power consumption is other power consumption related to the power consumption and performance of the base station. The system power consumption is determined based on the first power consumption and the second power consumption.
6. The method according to claim 5, characterized in that, The acquisition of the first power consumption and the second power consumption when the base station switches to the target energy-saving mode under the target traffic load includes: The first power consumption is determined based on the first parameters of the base station in the on state; wherein the first parameters include the proportion of the base station's static power consumption, the traffic load density, and the base station's total power consumption. The second power consumption is determined based on the second parameters of the base station in the on and off states; wherein the second parameters include the degree of excellence of KPI indicators related to system power consumption and performance and the total power consumption of the base station.
7. The method according to claim 6, characterized in that, The expression for the system power consumption is: ; in, This refers to system power consumption. This refers to the set of base stations that are currently active. This represents the proportion of static power consumption of base station i in the set. For flow load density, The total power consumption of base station i It represents the quality of KPIs related to base station power consumption and performance, where b is the set of base stations in all states.
8. The method according to claim 3 or 7, characterized in that, The expression for the state value function is: ; in, Indicates that energy-saving mode is adopted. The expected value at time θ is the discount factor. For system power consumption, The traffic load of the base station, k represents the k-th stage, s (0) This represents the initial state.
9. A base station energy-saving mode switching device, characterized in that, include: The first processing module is used to obtain the first mapping strategy based on the first traffic load of the base station; The second processing module is used to determine the first energy-saving mode of the base station based on the first traffic load and through the first mapping strategy. The first switching module is configured to switch the base station to the first energy-saving mode if the first mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the first energy-saving mode. The first processing module includes: The first processing unit is configured to obtain the first mapping strategy based on the first traffic load through a width learning model; The second processing module includes: The second processing unit is used to obtain the first energy-saving mode from the first traffic load through the first mapping strategy based on the reinforcement learning model. An update module is used to update the width learning model if it is determined that the first mapping strategy is not the optimal mapping strategy based on the first traffic load and the first energy-saving mode. The third processing module is used to obtain the second mapping strategy based on the first traffic load through the updated width learning model; The fourth processing module is used to determine the second energy-saving mode of the base station based on the first traffic load and through the second mapping strategy. The second switching module is used to switch the base station to the second energy-saving mode if the second mapping strategy is determined to be the optimal mapping strategy based on the first traffic load and the second energy-saving mode.
10. A network device, comprising: A transceiver, a processor, a memory, and a program or instructions stored in the memory and executable on the processor; characterized in that, when the processor executes the program or instructions, it implements the steps in the base station power-saving mode switching method as described in any one of claims 1-8.
11. A readable storage medium having a program or instructions stored thereon, characterized in that, When the program or instructions are executed by the processor, they implement the steps in the base station power-saving mode switching method as described in any one of claims 1-8.