The invention provides a dynamic spectrum
access method on the basis that the policy planning restricts
Q learning, which comprises the following steps: cognitive users can divide the
frequency spectrum state space, and select out the reasonable and legal
state space; the
state space can be ranked and modularized; each ranked module can finish the Q form initialization operation before finishing the
Q learning; each module can individually execute the
Q learning algorithm; the
algorithm can be selected according to the
learning rule and actions; the actions finally adopted by the cognitive users can be obtained by making the strategic decisions by comprehensively considering all the learning modules; whether the selected
access frequency spectrum is in conflict with the authorized users is determined; if so, the
collision probability is worked out; otherwise, the next step is executed; whether an environmental policy planning
knowledge base is changed is determined; if so, the environmental policy planning
knowledge base is updated, and the learning Q value is adjusted; the above part steps are repeatedly executed till the learning convergence. The method can improve the whole
system performance, and overcome the learning
blindness of the intelligent body, enhance the learning efficiency, and speed up the convergence speed.