Unlock instant, AI-driven research and patent intelligence for your innovation.

Transmission mode selection method and device based on online reinforcement learning

A technology of transmission mode and reinforcement learning, which is applied in the direction of transmission system, wireless communication, advanced technology, etc., and can solve the problems that dynamic programming algorithms cannot perform calculations, etc.

Active Publication Date: 2020-07-07
GLOBAL ENERGY INTERCONNECTION RES INST CO LTD +2
View PDF3 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Therefore, the technical problem to be solved by the present invention is to overcome the defects in the prior art that in the NB-IoT network environment, the network status changes in real time, and ordinary dynamic programming algorithms cannot perform calculations, thereby providing a transmission method based on online reinforcement learning. Mode selection method and device

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Transmission mode selection method and device based on online reinforcement learning
  • Transmission mode selection method and device based on online reinforcement learning
  • Transmission mode selection method and device based on online reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0032] Such as figure 1 As shown, the narrowband IoT system includes a base station BS. There are a large number of nodes in the coverage of the narrowband IoT system base station BS, including two types of nodes: the base station with good channel conditions is adjacent to the node, and the OMA method can be used to directly communicate with the base station BS. ; The edge node of the base station has poor channel conditions, which leads to a high probability of interruption. It is impossible to directly transmit information to the base station BS. Relay cooperative transmission is required. Among them, the base station edge node to the repeater uses NOMA for transmission, and the repeater transmits information to the base station. BS uses OMA to transmit. The present invention takes a mixed transmission model of uplink relay cooperative transmission and direct transmission with a large number of narrowband Internet of Things nodes as an example, and models the narrowband Inte...

Embodiment 2

[0092] The embodiment of the present invention provides a transmission mode selection device based on online reinforcement learning, which is applied to information transmission between a narrowband Internet of Things system node and a base station, such as Figure 5 Shown, including:

[0093] The first obtaining module 21 is configured to obtain current time slot status information of the narrowband Internet of Things system node; for a specific implementation manner, see the related description of step S11 in Embodiment 1, which will not be repeated here.

[0094] The execution module 22 is configured to execute an action using the exploration-using strategy according to the current state information; see the relevant description of step S12 in the embodiment 1 for the specific implementation, which will not be repeated here.

[0095] The calculation module 23 is used to calculate the reward value after the NB-IoT system node performs the action; see the relevant description of step...

Embodiment 3

[0112] The embodiment of the present invention also provides a computer device, such as Image 6 As shown, the computer device may include a processor 31 and a memory 32, where the processor 31 and the memory 32 may be connected by a bus or other means, Image 6 Take the bus connection as an example.

[0113] The processor 31 may be a central processing unit (Central Processing Unit, CPU). The processor 31 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA), or Chips such as other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, or a combination of the above types of chips.

[0114] As a non-transitory computer-readable storage medium, the memory 32 can be used to store non-transitory software programs, non-transitory computer executable programs and modules, suc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a transmission mode selection method and a transmission mode selection device based on online reinforcement learning. The transmission mode selection method comprises the stepsof: acquiring the current time slot state information of a narrowband Internet of Things system node; according to the current state information, executing an action by utilizing an exploration-utilization strategy; calculating a reward value after the narrowband Internet of Things system node finishes executing the action; acquiring next time slot state information of the narrowband Internet ofThings system node; updating a preset Q function according to the reward value and the next time slot state information, and updating a preset action strategy value to obtain a first action strategy value; updating a preset estimation strategy value according to the first action strategy value to obtain a new estimation strategy value; and selecting a transmission mode according to the new estimated strategy value and the first action strategy value. By implementing the transmission mode selection method and the transmission mode selection device, the narrowband Internet of Things system nodeis continuously estimated and compared, so that the narrowband Internet of Things system node can select the transmission mode selection scheme with the maximum energy efficiency.

Description

Technical field [0001] The invention relates to the field of power narrowband Internet of Things, in particular to a transmission mode selection method and device based on online reinforcement learning. Background technique [0002] The Internet of Things in the power environment is a network system that realizes the identification, perception, interconnection and control of power grid infrastructure, personnel and the environment. Considering that the nodes at the edge of the Narrow Band Internet of Things (NB-IoT) base station may be insufficiently covered, resulting in a high probability of interruption and difficulty in meeting service requirements. In order to improve the coverage of the NB-IoT system in the power Internet of Things scenario, from the aspects of communication technology and resource allocation management, orthogonal multiple access technology (Orthogonal Multipel Access, OMA) and non-orthogonal multiple access are considered in related technologies. Access ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): H04W72/04H04L29/08
CPCH04W72/0446H04L67/12H04W72/53Y02D30/70
Inventor 王瑶梁云尹喜阳郭延凯岳顺民田文峰黄凤孙晓艳黄莉黄辉李春龙邓辉
Owner GLOBAL ENERGY INTERCONNECTION RES INST CO LTD