Transmission mode selection method and device based on online reinforcement learning
A technology of transmission mode and reinforcement learning, which is applied in the direction of transmission system, wireless communication, advanced technology, etc., and can solve the problems that dynamic programming algorithms cannot perform calculations, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0032] Such as figure 1 As shown, the narrowband IoT system includes a base station BS. There are a large number of nodes in the coverage of the narrowband IoT system base station BS, including two types of nodes: the base station with good channel conditions is adjacent to the node, and the OMA method can be used to directly communicate with the base station BS. ; The edge node of the base station has poor channel conditions, which leads to a high probability of interruption. It is impossible to directly transmit information to the base station BS. Relay cooperative transmission is required. Among them, the base station edge node to the repeater uses NOMA for transmission, and the repeater transmits information to the base station. BS uses OMA to transmit. The present invention takes a mixed transmission model of uplink relay cooperative transmission and direct transmission with a large number of narrowband Internet of Things nodes as an example, and models the narrowband Inte...
Embodiment 2
[0092] The embodiment of the present invention provides a transmission mode selection device based on online reinforcement learning, which is applied to information transmission between a narrowband Internet of Things system node and a base station, such as Figure 5 Shown, including:
[0093] The first obtaining module 21 is configured to obtain current time slot status information of the narrowband Internet of Things system node; for a specific implementation manner, see the related description of step S11 in Embodiment 1, which will not be repeated here.
[0094] The execution module 22 is configured to execute an action using the exploration-using strategy according to the current state information; see the relevant description of step S12 in the embodiment 1 for the specific implementation, which will not be repeated here.
[0095] The calculation module 23 is used to calculate the reward value after the NB-IoT system node performs the action; see the relevant description of step...
Embodiment 3
[0112] The embodiment of the present invention also provides a computer device, such as Image 6 As shown, the computer device may include a processor 31 and a memory 32, where the processor 31 and the memory 32 may be connected by a bus or other means, Image 6 Take the bus connection as an example.
[0113] The processor 31 may be a central processing unit (Central Processing Unit, CPU). The processor 31 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA), or Chips such as other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, or a combination of the above types of chips.
[0114] As a non-transitory computer-readable storage medium, the memory 32 can be used to store non-transitory software programs, non-transitory computer executable programs and modules, suc...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


