The invention discloses an
algorithm for solving power distribution in
cognitive radio based on
reinforcement learning, which comprises the following steps: S1, setting initial value parameters of a
deep learning algorithm, S2, setting a scene model related to a CR-
NOMA system, and setting an initial state set related to states and actions; S3, when a certain calculation moment t is smaller than or equal to the time value Tmax of the maximum limit, solving a state value at the moment t, calculating a corresponding reward function, and calculating a TD error [
delta]t; S4, selecting the next action of the user based on the value function, and updating the initial value function to Q (st, at)<-Q(st, at)+[eta]c[
delta]t by using the learning rate and the TD error value function; obtaining a corresponding reward according to the selected execution action, obtaining a strategy function [pi](g), and updating the strategy function [pi](g) to [pi](st, at)<-[pi](st, at)-[eta]a[
delta]t; [pi](g); and S5, enabling the TD error value to reach the minimum according to the step S3, continuously iteratively updating, and finally obtaining the maximum reward function value, i.e., ending the
allocation algorithm. The problem that in the prior art, power distribution cannot be well conducted on the premise that channel information is incomplete is solved.