The invention discloses a power allocation algorithm based on reinforcement learning to solve cognitive radio, S1, setting the initial value parameters of the deep learning algorithm, S2, setting the scene model of the CR-NOMA system, and setting the initial state and action State collection; S3, when a certain calculation moment t is less than or equal to the maximum limit time value T max When , obtain the state value at time t and calculate the corresponding reward function, and calculate the TD error δ t ; S4. Select the user's next action based on the value function, and use the learning rate and the TD error value function to update the initial value function to Q(s t ,a t )←Q(s t ,a t )+η c δ t ; Then get the corresponding reward according to the selected execution action, and obtain the policy function π(g), and then update it to π(s t ,a t )←π(s t ,a t )‑η a δ t ;π(g); S5. According to step S3, the TD error value is minimized, iteratively updated, and finally the maximum reward function value is obtained, that is, the allocation algorithm ends. It solves the problem in the prior art that power allocation cannot be well performed under the premise of incomplete channel information.