[0013] The invention is based on figure 1 The network environment shown is the background. This heterogeneous network is covered by a network N with the largest coverage and the least available bandwidth resources. Its network coverage is a circle with a radius of R and the available bandwidth resources are digits B. At the same time, in order to meet service requests in some hotspots, some networks with small coverage but rich bandwidth resources are repeatedly covered within the network. i , The coverage area of each network is a circle with a radius of Ri. The number of available bandwidth resources for each network is B i , Where B iB(i≥1). The bandwidth resources allocated by the network to each user are based on the basic bandwidth units (BWU), according to 2 b Each BWU is allocated (b=0, 1, 2,...).
[0014] According to the 3GPP (3rd Generation Partnership Project) definition of the service types of the 3G system, we have selected three different services: voice service, video service, and data service as the service type of this scenario. According to the quality of service (quality of service, QoS) characteristics of the three services, the analysis is as follows: (1) Voice services require low delay and only require lower channel bandwidth. Therefore, the voice service should try to choose the network with large coverage and low delay, such as N 1 , And we define its bandwidth requirement as B vc BWU, that is, if the network can provide B vc A BWU can serve the voice business. If the network allocates more bandwidth resources for voice services than B vc A BWU, because this will not only improve the QoS of the service, but will cause a waste of valuable network bandwidth resources, so such an allocation result is not desirable. (2) Video services require low delay and sufficient channel bandwidth. In the current communication network, video communication services can have multiple levels of coding methods, corresponding to different levels of service quality, and also corresponding to different levels of bandwidth requirements. Therefore, we can divide the bandwidth requirements of video services into different levels, among which B minvd A BWU represents the minimum bandwidth requirement, that is, if the bandwidth allocated for the video service is less than the minimum bandwidth requirement, the video communication service cannot be established. Because video communication is limited by its encoding method, QoS cannot be infinitely improved with the increase of bandwidth , So we use B maxvd A BWU represents the maximum bandwidth demand, that is, if the bandwidth allocated for the video service is greater than the maximum bandwidth demand, because the QoS will not be improved, it will cause a waste of spectrum resources, so such an allocation result is also undesirable. minvd And B maxvd There are several different service levels between. (3) Data services allow a certain time delay and require high bandwidth. That is, data services are non-real-time bandwidth-sensitive services, so for data services, the more bandwidth the better, because the more bandwidth obtained, the faster the communication can be completed. So we define B minda A BWU is the minimum required bandwidth for data services. As long as the network can provide video services with a bandwidth higher than this minimum bandwidth requirement, data services can be served, and the larger the bandwidth, the better the QoS.
[0015] The call types involved in the present invention include initial call, horizontal switching and vertical switching. The session just initiated is the initial call; when a user moves from a cell in the same network to an adjacent cell, it is a horizontal handover; when a user is transferred from the current network to another different type of network, it is a vertical handover. From the perspective of user experience, interrupting a user's handover request is more unacceptable than blocking a user's initial call request. This is more obvious in the application of real-time services, so the network treats the beginning of real-time services. When calling and handover, different priorities should be given to them, that is, the handover service should be processed first. For this reason, the present invention adopts a bandwidth reservation strategy when processing, that is, first reserve a certain bandwidth resource for the handover service. For the initial call service, only the remaining bandwidth resources of the network can satisfy the service on the basis of exceeding the reserved resources. The call request can be accepted only when the minimum bandwidth requirement is required.
[0016] A. Problem mapping:
[0017] (1) State space S
[0018] The network resource management in the present invention not only assigns different processing priorities to different call types, but also adopts different resource allocation schemes for different service types, so the definition of the state must reflect the difference between call types and service types. The present invention defines the state S as follows:
[0019] S={n, L, c, m} (1)
[0020] Among them, n represents the current number of available networks; L represents the current network load status; c represents the call type, with three cases of initial call, horizontal handover, and vertical handover; m represents the service type, including voice service, video service and data service Kinds of different businesses.
[0021] (2) Action space A
[0022] Because the present invention not only selects a network for access to each communication request, but also allocates appropriate bandwidth to it. Therefore, the network and bandwidth must be included in the action space. The bandwidth allocation of the present invention is based on 2 b (b=0,1,2,...) BWUs are allocated, so the value of b is used here to define action space A:
[0023] A={0,1,2,......n·(K+1)-1} (2)
[0024] Where n is the number of visible networks in the state space, and K represents the maximum value of b (K=max(b)). If there is currently only one network coverage, then A = {0, 1, 2, ... K}, respectively representing the different bandwidth levels that the current network can allocate: 2 0 BWU, 2 1 BWU,...2 2 A BWU. If there are currently two networks covered, then A={0,1,2,...K,K+1,...2K+1}, respectively representing the different bandwidth levels that different networks can allocate .
[0025] (3) Return function r
[0026] For voice services, because it does not require high bandwidth, as long as B vc A BWU can meet his communication needs, so no matter which network it is connected to, as long as the bandwidth allocated to it is more than B vc BWU, the return is 0, if the bandwidth allocated to it is B vc A BWU will get the corresponding return.
[0027] For video services and data services, the bandwidth requirements are involved. The present invention uses a profit function P=f(g, l) for video services and data services to associate each allocation action. Where P represents the profit value of this distribution action, g represents the bandwidth revenue value of this distribution action relative to the previous action, and l represents the cost value of this action, then:
[0028] g=ΔB=B b -B f (3)
[0029] Where B b Indicates the bandwidth after the action, B f Indicates the bandwidth before the action.
[0030] l=B f ·Τ (4)
[0031] Where τ represents the handover delay.
[0032] P=g-σ·l
[0033] (5)
[0034] =B b -B f -σ·B f ·Τ
[0035] Among them, σ is the delay sensitivity coefficient. The larger the σ, the greater the proportion of delay loss in the profit function. In order to reflect the different sensitivities of the video service and the data service to the time delay, in the present invention, σ=0.7 in the video service and σ=0.2 in the data service.
[0036] For the initial call service, since it is the first time to access the network, B f =0, τ=0. Therefore, for the initial call service, the profit value is only determined by the bandwidth obtained. The larger the bandwidth, the greater the profit value; for the handover service (including horizontal handover and vertical handover), the profit value depends not only on the bandwidth after the action , Also depends on the bandwidth value before the action, because only the bandwidth increase before and after the action can be regarded as a gain, and because it is a handover, the handover delay must exist, so the final profit should be based on the bandwidth gain minus the handover delay The amount of bits that can be transmitted during this time.
[0037] From the perspective of the definition of the profit function, if each distribution action is executed in accordance with the maximum profit value, it is good from the user's point of view, because it can obtain as much bandwidth resources as possible, but from the perspective of system blocking rate From a point of view, blindly assigning the maximum bandwidth to users will inevitably increase the blocking rate of the system. If the load is relatively light, the increase in the blocking rate is not obvious, but once the network load is too heavy, the problem of increasing the blocking rate is inevitable.
[0038] To solve this problem, the present invention provides an adaptive bandwidth equalization factor G on this basis:
[0039] G=(1-η i ) b (6)
[0040] Where b indicates that the user is assigned 2 for this assignment b BWU; B represents the number of bandwidth resources already occupied by the selected network, so η i It means the load of the selected network, the load of the network is different, η i It is different, which reflects the difference in G, so that load balancing between different networks can be realized. In the same network, if b is different, the size of G is also different, and η i The greater the value, the greater the difference in the ratio of G corresponding to each b, so that adaptive bandwidth allocation within the same network can be realized.
[0041] The return function of the video business is defined as follows:
[0042] r = β · P + ( 1 - β ) · G B min vd ≤ 2 a ≤ B max vd 0 else - - - ( 7 )
[0043] The return function of the data service is defined as follows:
[0044] r = β · P + ( 1 - β ) · G 2 a ≥ B min da 0 else - - - ( 8 )
[0045] Where β is the weighting coefficient. From the definition of the video service return function, it can be seen that only when the bandwidth allocated by the network is between the minimum and maximum bandwidth required by the video service will the corresponding return be obtained; otherwise, The return will be 0, because the bandwidth allocated for the video service is less than the minimum bandwidth required and the communication service cannot be established, so the return is 0. If the allocated bandwidth is greater than the maximum bandwidth required, it will be affected by the video service encoding method. It is no longer possible to continue to improve QoS, which has caused a waste of spectrum resources, so the return should also be zero.
[0046] The size of the weight coefficient β directly determines the respective proportions of the return function and the adaptive bandwidth equalization factor in the return. If β = 1 and β = 0, two extreme cases are taken, that is, the reward is completely determined by the profit function and completely determined by the adaptive bandwidth equalization factor. The former is the non-adaptive bandwidth allocation involved in the present invention. The larger the bandwidth, the larger the revenue function, so the reward function will drive the network to continuously allocate the maximum bandwidth; for the latter, the smaller the allocated bandwidth, the larger the G, and the reward function will drive the network to continuously allocate the minimum bandwidth. Neither situation is desirable. Therefore, only when β is assigned a value between 0 and 1, the adaptive bandwidth allocation proposed in the present invention will enable the bandwidth allocation strategy to comprehensively consider the two factors of bandwidth and load. When the load is light, try to meet the maximum service The bandwidth demand, when the load is heavy, the influence of the load will increase, and the bandwidth allocated for each service will be reduced. Integrating G into the reward function can not only achieve load balancing between different networks, but also achieve adaptive bandwidth allocation within the same network. By adjusting the size of β, the impact ratio of bandwidth and load can be adjusted, so that the allocation strategy can be adjusted to change the system performance. B Access conditions:
[0047] Theoretically speaking, as long as the bandwidth provided by the network can meet the minimum bandwidth requirements of each service, the communication request can be accessed for its service. However, in order to reflect the priority of real-time switching services, the present invention uses bandwidth reservation Mechanism, reserved B re A BWU to switch business services in real time, assuming that the bandwidth resource that the network can provide in the current state is B pr BWU, B min The minimum number of BWUs required for the current application for services, that is, for voice and video service switching requests, they will be accessed as long as formula (9) is satisfied, while for voice and video service initial call requests, only those that satisfy formula (10) ) To connect it to:
[0048] B pr ≥B min (9)
[0049] B pr ≥B min +B re (10)
[0050] Since the data service has relatively low real-time requirements, the initial call and handover request are not given different priorities, but treated equally. Therefore, the access conditions are all formula (1:2).
[0051] Considering factors such as network load, service type, and call type, based on the Q learning method and the above mapping, a heterogeneous wireless network resource management algorithm is obtained. The specific steps are as follows:
[0052] Step 1: Initialize Q(s, a), such as 0 or a randomly generated value, set the discount factor γ, the initial learning rate α, and the initial exploration probability ε in the action selection algorithm;
[0053] Step 2: Obtain the current status s, including the current load status of each network, call type and service type;
[0054] Step 3: Select the action a to be executed, observe the current state set and the action set, according to the action function value Q of this state t (s, a), select and execute action a according to a certain strategy π;
[0055] Step 4: Obtain the reward r and the state s′ at the next moment, and calculate the current reward r according to formula (7)(8) according to the result of the action execution, and find the maximum value of the action value function of the next state Update Q according to equation (1) t (s, a);
[0056] Step 5: Parameter update. After each iteration, the learning rate and exploration probability must be updated. In order to meet the convergence of Q learning, this article sets them to gradually decrease to 0 with the learning process with a negative exponential law.