Data transmission method, system and device for cognitive non-orthogonal multiple access network

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using cognitive nonorthogonal multiple access network technology, the random access index set and resource block allocation probability distribution of active secondary users are obtained. The maximum throughput is determined by using evolutionary game strategy, which solves the problem of limited number of users in traditional orthogonal multiple access technology and achieves high spectrum efficiency and large-scale connection.

CN116634586BActive Publication Date: 2026-06-19YANGTZE UNIVERSITY

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: YANGTZE UNIVERSITY
Filing Date: 2023-04-17
Publication Date: 2026-06-19

Application Information

Patent Timeline

17 Apr 2023

Application

19 Jun 2026

Publication

CN116634586B

IPC: H04W74/0833; H04B17/382; H04W72/542; H04W72/541; H04W72/0453; H04W72/0446; H04W72/044; H04W16/14

AI Tagging

Application Domain

Transmission monitoring Network planning

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Traditional orthogonal multiple access technologies cannot support more user connections within limited resources, making it difficult to meet spectrum efficiency and large-scale connection demands.

Method used

By employing cognitive nonorthogonal multiple access network technology, the maximum throughput of active secondary users is determined by obtaining the random access index set and resource block allocation probability distribution of active secondary users under different power levels, and resource blocks are allocated for data transmission based on the maximum throughput using an evolutionary game strategy.

Benefits of technology

While meeting the demand for massive connections, it achieves high spectrum efficiency, improving the throughput and resource utilization of the wireless network.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116634586B_ABST

Patent Text Reader

Abstract

This invention discloses a data transmission method, system, and device for cognitive non-orthogonal multiple access networks. The method includes: acquiring a random access index set of active secondary users under different power levels and determining the resource block allocation probability distribution of active secondary users; determining the throughput of active secondary users based on the resource allocation probability distribution and the random access index set; determining whether the throughput of active secondary users is the maximum throughput; if not, determining the maximum throughput of active secondary users based on the throughput of active secondary users through an evolutionary game strategy; and allocating resource blocks for data transmission according to the maximum throughput of active secondary users. Compared to existing technologies that utilize orthogonal multiple access technology, this invention uses cognitive non-orthogonal multiple access network technology, determining the maximum throughput of active secondary users through an evolutionary game strategy, and then allocating resource blocks for data transmission, enabling future wireless networks to achieve high spectral efficiency while meeting the demand for massive connections.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of wireless network technology, and in particular to a data transmission method, system and device for cognitive non-orthogonal multiple access networks. Background Technology

[0002] The explosive growth of mobile devices and the rapid development of broadband services such as augmented reality and virtual reality have exacerbated the spectrum scarcity problem caused by fixed spectrum allocation policies. Traditional communication systems primarily use Orthogonal Multiple Access (OMA) technology. OMA can easily separate the information carried by different user signals with low complexity. However, a drawback of OMA is that the number of users it supports is limited by the number of available orthogonal resources, making it impossible to connect more users within limited resources and thus failing to achieve high spectral efficiency and massive connectivity requirements. Therefore, how to meet the demand for massive connectivity while achieving high spectral efficiency has become a pressing issue.

[0003] The above content is only used to help understand the technical solution of the present invention and does not represent an admission that the above content is prior art. Summary of the Invention

[0004] The main objective of this invention is to provide a data transmission method, system, and device for cognitive non-orthogonal multiple access networks, aiming to solve the technical problem of achieving high spectral efficiency while meeting the demand for massive connections.

[0005] To achieve the above objectives, the present invention provides a data transmission method for a cognitive non-orthogonal multiple access network, the data transmission method for a cognitive non-orthogonal multiple access network comprising:

[0006] Obtain the random access index set of active secondary users under different power levels, and determine the resource block allocation probability distribution of the active secondary users;

[0007] The throughput of the active secondary users is determined based on the resource allocation probability distribution and the random access index set.

[0008] Determine whether the throughput of the active secondary user is the maximum throughput;

[0009] If not, the maximum throughput of the active secondary user is determined by an evolutionary game strategy based on the throughput of the active secondary user.

[0010] Resource blocks are allocated for data transmission based on the maximum throughput of the active secondary users.

[0011] Furthermore, to achieve the above objectives, the present invention also proposes a data transmission system for a cognitive non-orthogonal multiple access network, the data transmission system for the cognitive non-orthogonal multiple access network comprising:

[0012] The acquisition module is used to acquire the random access index set of active secondary users under different power levels, and determine the resource block allocation probability distribution of the active secondary users;

[0013] The determination module is used to determine the throughput of the active secondary users based on the resource allocation probability distribution and the random access index set;

[0014] The judgment module is used to determine whether the throughput of the active secondary user is the maximum throughput;

[0015] The processing module is used to determine the maximum throughput of the active secondary user based on the throughput of the active secondary user through an evolutionary game strategy if no.

[0016] The transmission module is used to allocate resource blocks for data transmission based on the maximum throughput of the active secondary users.

[0017] Furthermore, to achieve the above objectives, the present invention also proposes a data transmission device for a cognitive non-orthogonal multiple access network, the device comprising: a memory, a processor, and a data transmission program for a cognitive non-orthogonal multiple access network stored in the memory and executable on the processor, the data transmission program for a cognitive non-orthogonal multiple access network being configured to implement the steps of the data transmission method for a cognitive non-orthogonal multiple access network as described above.

[0018] Furthermore, to achieve the above objectives, the present invention also proposes a storage medium storing a data transmission program for a cognitive non-orthogonal multiple access network (CNA). When the CNA data transmission program is executed by a processor, it implements the steps of the data transmission method for a CNA as described above.

[0019] This invention first obtains the random access index set of active secondary users under different power levels and determines the resource block allocation probability distribution of active secondary users. Then, based on the resource allocation probability distribution and the random access index set, it determines the throughput of active secondary users. Next, it determines whether the throughput of active secondary users is the maximum throughput. If not, it determines the maximum throughput of active secondary users through an evolutionary game strategy based on the throughput of active secondary users. Finally, it allocates resource blocks for data transmission according to the maximum throughput of active secondary users. Compared to existing Orthogonal Multiple Access (OMA) technology, OMA can easily separate the information carried by different user signals with low complexity. However, a drawback of OMA is that the number of users it supports is limited by the number of available orthogonal resources, making it impossible to connect more users within limited resources, thus failing to achieve high spectral efficiency and large-scale connection requirements. This invention, however, is a cognitive non-orthogonal multiple access network technology that determines the maximum throughput of active secondary users through an evolutionary game strategy and then allocates resource blocks for data transmission, enabling future wireless networks to meet the demand for massive connections while achieving high spectral efficiency. Attached Figure Description

[0020] Figure 1 This is a schematic diagram of the structure of a data transmission device for a cognitive non-orthogonal multiple access network (CANNOR) in the hardware operating environment of an embodiment of the present invention.

[0021] Figure 2 This is a flowchart illustrating the first embodiment of the data transmission method for recognizing non-orthogonal multiple access networks according to the present invention;

[0022] Figure 3 This is a comparison diagram of orthogonal multiple access and non-orthogonal multiple access in the first embodiment of the data transmission method for non-orthogonal multiple access networks of the present invention;

[0023] Figure 4 This is a resource allocation diagram for primary and secondary users in a cognitive power domain nonorthogonal multiple access network data transmission method according to the first embodiment of the present invention.

[0024] Figure 5 This is a diagram of the cognitive network resource allocation evolution game framework combining single-step reinforcement learning, representing the first embodiment of the data transmission method for cognitive non-orthogonal multiple access networks of the present invention.

[0025] Figure 6 This is a schematic diagram of the overall system structure of the first embodiment of the data transmission method for recognizing non-orthogonal multiple access networks of the present invention;

[0026] Figure 7 This is a structural block diagram of the first embodiment of the data transmission system for a cognitive non-orthogonal multiple access network according to the present invention.

[0027] The realization of the objective, functional features and advantages of the present invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0028] It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the scope of the invention.

[0029] Reference Figure 1 , Figure 1 This is a schematic diagram of the data transmission device structure of a cognitive non-orthogonal multiple access network (CANNOR) in the hardware operating environment of the embodiment of the present invention.

[0030] like Figure 1 As shown, the data transmission device of this cognitive non-orthogonal multiple access network may include: a processor 1001, such as a central processing unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used to implement communication between these components. The user interface 1003 may include a display screen or an input unit such as a keyboard; optionally, the user interface 1003 may also include a standard wired interface or a wireless interface. The network interface 1004 may optionally include a standard wired interface or a wireless interface (such as a Wireless-Fidelity (Wi-Fi) interface). The memory 1005 may be high-speed random access memory (RAM) or stable non-volatile memory (NVM), such as a disk drive. The memory 1005 may also optionally be a storage system independent of the aforementioned processor 1001.

[0031] Those skilled in the art will understand that Figure 1 The structure shown does not constitute a limitation on the data transmission device of a cognitive nonorthogonal multiple access network, and may include more or fewer components than shown, or combine certain components, or have different component arrangements.

[0032] like Figure 1 As shown, the memory 1005, which serves as a storage medium, may include an operating system, a network communication module, a user interface module, and a data transmission program for cognitive non-orthogonal multiple access networks.

[0033] exist Figure 1In the data transmission device of the cognitive non-orthogonal multiple access network shown, the network interface 1004 is mainly used for data communication with the network server; the user interface 1003 is mainly used for data interaction with the user; the processor 1001 and the memory 1005 in the data transmission device of the cognitive non-orthogonal multiple access network of the present invention can be set in the data transmission device of the cognitive non-orthogonal multiple access network. The data transmission device of the cognitive non-orthogonal multiple access network calls the data transmission program of the cognitive non-orthogonal multiple access network stored in the memory 1005 through the processor 1001 and executes the data transmission method of the cognitive non-orthogonal multiple access network provided in the embodiment of the present invention.

[0034] This invention provides a data transmission method for a cognitive non-orthogonal multiple access network, referring to... Figure 2 , Figure 2 This is a flowchart illustrating the first embodiment of the data transmission method for a cognitive non-orthogonal multiple access network according to the present invention.

[0035] In this embodiment, the data transmission method of the cognitive non-orthogonal multiple access network includes the following steps:

[0036] Step S10: Obtain the random access index set of active secondary users under different power levels, and determine the resource block allocation probability distribution of the active secondary users.

[0037] It is easy to understand that the execution subject of this embodiment can be a data transmission device of a cognitive non-orthogonal multiple access network with functions such as data processing, network communication and program execution, or other computer devices with similar functions. This embodiment does not limit it.

[0038] This invention addresses the coexistence of primary and secondary users in multi-channel networks under non-orthogonal multiple access in the power domain, where a base station and multiple devices possess multiple channels (resource blocks). One group of devices are primary users with low duty cycles, while the other group consists of secondary users with greater cognitive capabilities. Secondary users are dynamically allocated resource blocks and have higher transmit power than primary users to avoid conflicts with them in the power domain. Primary users are allocated fixed resource blocks for communication with the base station, resulting in poor radio frequency capabilities and flexibility; secondary users are more flexible, dynamically selecting resource blocks based on learned allocation probabilities. Primary and secondary users allocate resource blocks at different power levels.

[0039] In the radio access mode, the broadband unit pool provides centralized storage and communication. Access points are connected to the broadband unit pool via fronthaul links. Data is distributed as edge buffers in access points and user equipment. Not only are buffers allocated to access points, but radio signal processing and resource management are also performed locally. Resource allocation orchestration consists of acquisition, control, decision-making, and distribution, and interacts with the cognitive network through an application programming interface (API).

[0040] It should also be noted that dividing resource blocks under different power levels and introducing a human-controlled non-orthogonal multiple access acquisition and control mode includes: the system consists of a base station and multiple devices, which are divided into two groups: one group consists of primary users with low duty cycles or access probabilities, and the other group consists of secondary users with greater cognitive capabilities. Resource blocks are dynamically allocated to secondary users, and the secondary users' transmit power is higher than that of the primary users. Figure 3 As shown, Figure 3 This diagram compares orthogonal multiple access (OMA) and non-orthogonal multiple access (NMA) access in a first embodiment of the data transmission method for non-orthogonal multiple access networks according to the present invention. Unlike OMA, in NMA, multiple users share the same time or frequency resources, but are assigned different codes or power levels, and are separated at the receiver using continuous interference cancellation technology. Secondary users transmit at higher power than primary users to avoid conflicts with primary users through power domain NMA. The method proposed in this invention has higher throughput, at the cost of higher transmit power for secondary users.

[0041] Primary users have fixed resource blocks to communicate with the base station, resulting in poor radio frequency capabilities and limited flexibility; secondary users are more flexible, able to dynamically select resources based on learned resource allocation probabilities. Figure 3 In the resource allocation shown, resource blocks with different power levels are allocated to secondary users and fixed primary users. Fog radio access is used, and the control plane is not located in the broadband unit pool within the cloud radio access network. The broadband unit pool provides centralized storage and communication, and access points are connected to the broadband unit pool via fronthaul links. Unlike centralized data storage, data is distributed as edge caches across access points and devices. Not only are caches allocated to access points, but radio signal processing and resource management are also performed locally at the access points. Devices directly connect to access points to obtain content, without establishing complex transmission links with the core network.

[0042] The resource allocation orchestrator consists of acquisition, control, decision-making, and distribution modules, and interacts with the cognitive network through an application programming interface (API): ① The acquisition module collects network information such as interference and load; ② The control module interacts with the decision-making module to obtain decision results, including specific coordination schemes and base station information; ③ The distribution module transmits the coordination scheme and base station control messages to the cognitive network; ④ The cognitive network executes the coordinator commands. The radio resource control sublayer of the cognitive network executes the control messages received by the coordinator and sends internal control messages to the physical sublayer to indicate which strategy to use.

[0043] Furthermore, the processing method for obtaining the random access index set of active secondary users under different power levels is as follows: determine the resource blocks under different power levels, obtain the active primary user signal and active secondary user signal of the resource block, determine the resource block receiving signal based on the active primary user signal and active secondary user signal, decode the resource block based on the resource block receiving signal, and when the resource block decoding fails, obtain the total number of secondary users and the access probability of active secondary users, determine the number of active secondary users based on the active secondary user index set of the resource block, determine the average number of active secondary users based on the number of active secondary users, the total number of secondary users and the access probability of active secondary users, and establish the random access index set of active secondary users based on the average number of active secondary users.

[0044] In the specific implementation, active primary and secondary users possess index sets and emit signals. Under different power levels, the received signal from the channel (resource block) and the signal-to-interference-plus-noise ratio (SINR) of the secondary users are acquired. Secondary user signals are decoded through resource blocks. There can be at most one active primary user per resource block, and the primary user signal can also be decoded after continuous interference cancellation. Packet collisions between multiple secondary users cause decoding failure of the primary user within the same resource block; therefore, the probability of packet collisions with secondary users is set sufficiently low. The average number of active secondary users is lower than the number of resource blocks, and the access probability of active primary users is a binomial random variable. With a large total number of secondary users, the access probability of independent active secondary users is low, and the number of active secondary users is a Poisson random variable.

[0045] In this embodiment, as Figure 4 As shown, Figure 4 This invention provides a first embodiment of a data transmission method for cognitive non-orthogonal multiple access networks. It utilizes a cognitive power domain non-orthogonal multiple access network to support a resource allocation diagram for primary and secondary users. Under power domain non-orthogonal multiple access, there is... Each resource block (channel) makes and Power levels for primary and secondary users and Allocate resource blocks for signals The active primary user index set and the active secondary user index set, , .make and It comes from the first One active primary user signal and one active secondary user signal To allocate resource blocks The received signal is the resource block received signal:

[0046]

[0047] In the formula, The background noise is Gaussian distributed. The desired received signal is set. and variance .if Secondary user signal-to-noise ratio :

[0048]

[0049] Among them, the secondary user signal passes through the resource block. Decoding successful, decoding threshold set. : Make If there is at most one active primary user for the same resource block, the secondary user signal can be decoded, and the active primary user signal can also be decoded through continuous interference cancellation.

[0050] Data packets from multiple secondary users collide within a resource block, causing decoding failures for the primary user within the same resource block. Setting the probability of data packet collisions with secondary users to be very low... Through resource blocks The number of primary users accessing the site is the number of primary user accesses. Total number of primary users:

[0051]

[0052] In the formula, and These represent the access probabilities of independent active primary users and active secondary users, i.e., the access probability of active primary users and the access probability of active secondary users. It has parameters and binomial distribution, probability for:

[0053]

[0054] Among them, the total number of secondary users Limited, the average number of active users is the random access index set of active users. , Number of active users:

[0055]

[0056] When the total number of secondary users is large and the access probability of active secondary users is low, setting the number of active secondary users as a Poisson random variable satisfies an approximation. It possesses invariance under random selection.

[0057] Furthermore, the processing method for determining the resource block allocation probability distribution of active secondary users is as follows: obtain the active primary user access probability, primary user access quantity, and total primary user quantity of the resource block; based on the secondary user signal-to-interference-plus-noise ratio, obtain the active secondary user successful transmission probability according to the active primary user access probability, primary user access quantity, active secondary user access probability, and active secondary user index set; determine the successful transmission probability of resource block allocation according to the active secondary user successful transmission probability and the number of active secondary users; and determine the active secondary user resource block allocation probability distribution according to the successful transmission probability of resource block allocation.

[0058] In the implementation, a fixed resource block is allocated to primary users, while secondary users choose from multiple resource blocks. Secondary users employ a random access strategy based on the probability distribution of their chosen resource block selection. System performance changes according to the secondary user's access strategy. The goal of allocating resource blocks to secondary users is to ensure that at most one primary user transmits while no other secondary users transmit. Fog radio access exhibits distributed characteristics and employs an optimal access strategy notified by the base station. The maximum throughput of secondary users is considered Pareto optimal under ideal resource allocation conditions.

[0059] Based on the secondary user signal-to-interference-plus-noise ratio (SINR), the probability of a secondary user successfully transmitting data packets depends on the number of active primary users in the resource allocation. The conditional probability of a secondary user successfully transmitting data packets after being allocated a resource block is obtained, and the resource allocation distribution is derived based on the probability of the resource block being selected.

[0060] In this embodiment, secondary users need to select from resource blocks when sending data packets. Each secondary user has an access policy, which represents the probability distribution of allocating a particular resource block. System performance changes based on the secondary user's access policy. When selecting a resource block, a secondary user aims for at most one primary user to transmit without other secondary users transmitting. Due to the distributed nature of fog radio access, secondary users are unaware of the number of primary and other secondary users on the resource block. The base station is aware of the environment (i.e., the number and activity of primary and secondary users on the resource block) and can find the optimal access policy. The secondary user then adopts the optimal access policy notified by the base station.

[0061] Secondary users in non-orthogonal multiple access (NMO) in the power domain possess the highest throughput, which can be considered Pareto optimal under ideal resource allocation conditions. Based on the signal-to-interference-plus-noise ratio (SINR) of the secondary users, the probability of one and only one active secondary user successfully transmitting, i.e., the probability of successful transmission by an active secondary user, is calculated as follows:

[0062]

[0063] This also depends on the resource block. Number of active main users. Yes. Each active user is allocated a resource block. The conditional probability of successful transmission is the probability of successful transmission of the allocated resource block:

[0064]

[0065] For resource blocks The probability of being selected, a vector This refers to the probability distribution of resource allocation for active secondary users.

[0066] Step S20: Determine the throughput of the active secondary users based on the resource allocation probability distribution and the random access index set.

[0067] Furthermore, the processing method for determining the throughput of active secondary users based on the resource allocation probability distribution and the random access index set is as follows: determine the throughput of active secondary users based on the resource allocation probability distribution, the random access index set, the number of active secondary users, and the probability of successful transmission of active secondary users.

[0068] In practical implementation, to decode the active primary user signal, it is necessary to decode the active secondary user signal and perform continuous interference cancellation to obtain the secondary user throughput and primary user throughput. Aware secondary users opportunistically access the channel without causing serious interference to the less capable primary users. The optimal solution to the problem is the throughput-optimal resource allocation distribution. The base station finds the optimal resource allocation distribution with complete environmental information and transmits to the secondary users. The throughput-optimal selection distribution is the Pareto optimal solution.

[0069] In this embodiment, to decode the active primary user signals in a resource block, it is necessary to decode all active secondary user signals and perform continuous interference cancellation. Secondary user throughput for:

[0070]

[0071] Main user throughput for:

[0072]

[0073]

[0074] According to random access, as long as If it is low enough, the main user throughput will not depend on This is the ideal result for secondary users opportunistically accessing the channel, ensuring they do not interfere with the less capable primary users. This leads to optimized resource allocation:

[0075]

[0076] The base station finds the optimal resource allocation distribution with complete environmental information and sends data to secondary users. Secondary user throughput is a component of the system throughput. This is the Pareto optimal solution.

[0077] Step S30: Determine whether the throughput of the active secondary user is the maximum throughput.

[0078] Furthermore, it is determined whether the throughput of the active secondary user is the maximum throughput. If so, the throughput of the active secondary user is taken as the maximum throughput of the active secondary user, and resource blocks are allocated for data transmission based on the maximum throughput of the active secondary user.

[0079] Step S40: If not, determine the maximum throughput of the active secondary user through an evolutionary game strategy based on the throughput of the active secondary user.

[0080] Furthermore, the processing method for determining the maximum throughput of active secondary users based on the throughput of active secondary users through evolutionary game strategy is as follows: Based on the resource block allocation probability distribution, the average return of the secondary user's mixing strategy is determined by selecting a mixing strategy. It is then determined whether the average return of the secondary user's mixing strategy satisfies the Nash equilibrium condition and the stability condition. If not, the maximum resource block selection return is determined based on the throughput of the active secondary user and the average return of the secondary user's mixing strategy using evolutionary game strategy, and the maximum throughput of the active secondary user is determined based on the maximum resource block selection return. If yes, the maximum throughput of the active secondary user is determined based on the average return of the user's mixing strategy, and resource blocks are allocated for data transmission based on the maximum throughput of the active secondary user.

[0081] The method for determining the maximum resource block selection reward based on the evolutionary game strategy, which considers the throughput of active secondary users and the average reward of the secondary user's mixed strategy, is as follows: Based on the evolutionary game strategy, the resource allocation evolutionary stable strategy and resource allocation replication dynamic are determined through Nash equilibrium and stability conditions, using the average reward of the secondary user's mixed strategy and the resource allocation reward of active secondary users. A single-step reinforcement learning is then established based on the resource allocation evolutionary stable strategy and resource allocation replication dynamic. Based on the single-step reinforcement learning, the maximum resource block selection reward is determined through the resource allocation distribution strategy based on the throughput of active secondary users.

[0082] In this embodiment, reference Figure 5 , Figure 5This invention presents a cognitive network resource allocation evolutionary game framework diagram combining single-step reinforcement learning, based on the first embodiment of the data transmission method for cognitive non-orthogonal multiple access networks. Regarding evolutionary game theory, obtaining the population state and average reward of secondary users includes: employing non-cooperative game theory among secondary users, where each secondary user chooses the optimal strategy to maximize reward or throughput, particularly for secondary users with massive connection demands. Some secondary users may deviate from the rules, improving performance at the expense of other secondary users' performance—a selfish behavior. In a foggy radio access distributed environment, secondary users may want to maximize their own reward by not following given rules, even if this selfish behavior leads to better performance than other secondary users. Therefore, non-cooperative game theory is needed to characterize performance.

[0083] Due to the large number of secondary users, evolutionary game theory has advantages over traditional non-cooperative game theory when used for secondary users competing for multi-channel resources: ① Evolutionary game theory (evolutionary stable strategy) solutions serve as improvements to Nash equilibria (e.g., Nash equilibria are not necessarily effective; there may be multiple Nash equilibria in a game, or no Nash equilibria may exist); ② Evolutionary game theory does not require strong rationality assumptions because it already simulates secondary user behavior; ③ Evolutionary game theory is based on the evolutionary process and is inherently dynamic, modeling and capturing secondary user behavior to change strategies and reach equilibrium over time.

[0084] A large number of players represent a massive number of active secondary users with a set of strategies (actions). Number of resource blocks This can be viewed as the number of actions (channels, resource blocks, rockers). Let the vector... Population status, To select an action The population ratio has constraints

[0085]

[0086] The number of secondary users is limited, and each secondary user chooses a hybrid strategy. , Select actions for players The probability of . Let When the population state is When the player selects an action The reward (fitness) has a hybrid strategy The average return of game players :

[0087]

[0088] for of Simplex The average return for the secondary user hybrid strategy.

[0089] If it exists For all ,satisfy So, population status This is an evolutionary stable state. The evolutionary stable state has two decompositions: ① Nash equilibrium condition: for all... , ② Stability condition: , Reward Vector The Nash equilibrium condition is equivalent to the variational inequality. Evolutionary stable state For all All requirements are met.

[0090] Combining evolutionary game theory with "mutation," the content of statically establishing a stable strategy for resource allocation evolution includes: Evolutionary game theory is the application of game theory in biological evolution. The two major mechanisms are mutation and selection: (1) "Mutation" involves modifying the characteristics of secondary users (as individual genes or player strategies) and introducing secondary users with new characteristics into the population; (2) "Selection" involves retaining secondary users with high fitness while eliminating secondary users with low fitness. In evolutionary games, "mutation" is described by the evolutionary stable strategy from the perspective of a static system, and the "selection" mechanism is described by the replication dynamic from the perspective of a dynamic system. Secondary users, as players in the game, allocate resource blocks according to a certain strategy to maximize returns. Resource allocation game is an evolutionary game, and finding the evolutionary stable state is the optimal strategy (replacing Nash equilibrium). The evolutionary stable state is the strategy for each player, and no single secondary user has a unilateral incentive to change the strategy. Finding the evolutionary stable state depends on the return function. Below, we will find the unique evolutionary stable state of the resource allocation game and design a method to find the evolutionary stable state.

[0091] Game players allocate resource blocks The player reward is the probability of successfully transmitting data packets. Based on mathematical expectation, resource blocks are allocated to active secondary users. The return for:

[0092]

[0093] In the formula, yes A strictly decreasing function, A hybrid strategy is adopted. Average return:

[0094]

[0095] Resource allocation games are non-cooperative games, and the throughput obtained may be smaller than the maximum value in the optimization problem. The optimal resource allocation distribution in evolutionary games differs from that in optimization problems; due to the non-cooperative nature, the performance gap is a cost of anarchy.

[0096] To find the evolutionary stable state of the payoff function in the resource block reward, let the virtual objective function be:

[0097]

[0098]

[0099] Meanwhile, virtual objective function gradient vector As A decreasing function, The derivative is negative. The Hessian matrix is a diagonal matrix with negative diagonal elements:

[0100]

[0101] The virtual objective function is concave, resulting in a convex simplex set; therefore, the optimization problem is a convex optimization with a unique solution. Furthermore, the Nash equilibrium condition is... become The necessary and sufficient condition for the optimal solution, and the estimated value. It is the only solution that satisfies the Nash equilibrium condition. Because yes A strictly decreasing function, if :

[0102]

[0103] Therefore, the stability condition holds.

[0104]

[0105] The solution to the above formula is a unique evolutionary stable state, meaning that the evolutionary stable state of the resource allocation game exists and is unique.

[0106] Combining evolutionary game theory with the concept of "choice," the dynamic content of resource allocation replication dynamics includes competition and cooperation among many unexplained agents, which evolutionary game theory can explain well. The replication dynamics of evolutionary game theory are given by differential equations. To find the evolutionary stable state of the resource allocation game, a virtual objective function is considered. Direct convex optimization is computationally expensive, so a low-complexity method needs to be designed.

[0107] The stable state is found through replication dynamics using differential equations:

[0108]

[0109] Lyapunov function:

[0110]

[0111] Find the evolutionary stable state. The remaining points are in an evolutionarily stable state, which serves as a stability condition. Lyapunov stability, as one of the stability characteristics of solutions to differential equations, describes the stability of solutions near the equilibrium point. To avoid solving differential equations in a replicated state, it is necessary to find an evolutionarily stable state.

[0112]

[0113] Over time ,make , return As a necessary condition for the evolutionary stable state.

[0114] To find all solutions to the above necessary conditions, let ,

[0115]

[0116]

[0117] Without loss of generality, ,because It is a decreasing function, and its inverse function It is also a decreasing function. For a given... ,have .make , Solve under constraints to find satisfy:

[0118]

[0119]

[0120]

[0121] In the formula, It corresponds The evolutionary stable state. Optimal resource allocation distribution for throughput. While maximizing secondary user throughput is achieved, it is not a Nash equilibrium or evolutionary steady state. Therefore, secondary users seek to deviate from the optimal resource block distribution for throughput to maximize rewards, leading to the tragedy of the commons. This is because, although the evolutionary steady state maximizes the virtual objective function... However, it did not maximize the throughput of secondary users. .

[0122] The content of establishing a dynamic, one-step reinforcement learning model that maximizes resource allocation rewards includes: one-step reinforcement learning task correspondence. The jib crane machine can be solved using Thompson sampling (a Monte Carlo sampling method for calculating the maximum return of all jibs). It first assumes that the return of pulling each jib follows a specific probability distribution, and then selects which jib to pull based on the expected return of each jib. A person has multiple jib crane machines in front of them, and doesn't know the actual profitability of each machine beforehand. How can they maximize their overall return by choosing which machine to pull next or whether to stop based on the results of each previous session? The name comes from the fact that jib crane machines have a control lever, and playing them often results in empty pockets. The jib arm is used by secondary users to reduce collisions or improve system throughput through learning. Two different device types are considered in the design: on the one hand, primary users are inflexible on the radio frequency, and each primary user must transmit signals through a predetermined channel; on the other hand, secondary users are flexible and can dynamically select channels to transmit signals. Secondary users learn the channel environment, including the activity level of primary users in each channel.

[0123] A single-step reinforcement learning task is used for secondary users to select on resource blocks. A secondary user has There are several rocker arms (resource blocks). If the rewards for choosing a rocker arm are independent and identically distributed across multiple games, and the average reward is learned by each user, let... rocker arm After multiple rounds of negotiation, the optimal rocker arm was selected. , yes The estimation is as follows. Since multiple secondary users interact within the resource block, the rewards for successful transmissions are not independently and identically distributed. A single-step reinforcement learning task is selected, combined with Thompson sampling, to choose the jib arm with the highest probability of becoming the optimal jib arm, ensuring asymptotically optimal accumulated remorse. The beta distribution is used to describe the probability distribution of the reward for each current action, with each jib arm representing the secondary user... There are shape parameters. and beta distribution , In time If the secondary user In active state, select the rocker arm.

[0124]

[0125]

[0126] Set shape parameters under a consistent prior. After each round, the secondary user... Feedback received , or Indicates secondary user Through resource blocks The transmission was successful or unsuccessful. Shape parameters were updated accordingly.

[0127]

[0128] If the secondary user Inactive, shape parameters are not updated, i.e. , Each user selects the most suitable resource block without conflict.

[0129] Secondary users find a hybrid strategy, using replication dynamics to maximize rewards. Replication dynamics uses base station feedback in time. Update hybrid strategy The current user Allocate resource blocks when active The base station provided feedback. In the copy animation, select the action. The return As a real-time report. Based on feedback , to obtain secondary users Allocate resource blocks Timely return

[0130]

[0131] This user A set of time-slice indexes that are currently active. Active secondary users. Resource allocation distribution

[0132]

[0133] It's the step length. At the same time, the domain is Indicator function Used for real-time rewards Normalization process:

[0134]

[0135] make sure lie in middle.

[0136] The replication dynamics are combined with single-step reinforcement learning tasks, based on resource allocation distribution strategies. Select a rocker arm (resource block) and update the report in real time. In a distributed environment, access is via fog radio based on base station feedback: ① For all One rocker arm (resource block) obtains the secondary user. Resource allocation distribution , ② Time slice loop, secondary user In an active state, based on resource allocation distribution Allocate resource blocks. ③ Active secondary users In time slice Receive feedback from base station at the end ④ Active secondary users according to Update report:

[0137]

[0138] in accordance with Update resource allocation distribution:

[0139]

[0140]

[0141] Proceed to the next transmission, then jump to step ② at the next time interval to enter the next time loop. If the secondary user... If inactive, the resource allocation distribution remains unchanged. .

[0142] There is another reason to find the evolutionary steady state. The throughput produced when all secondary users cooperate and follow the same access rules provided by the base station is called the maximum throughput. While the evolutionary steady state is also a Nash equilibrium and is optimal in the sense that no secondary user is incentivized to change their strategy, it is not necessarily the optimal strategy (Pareto optimal) in terms of maximizing the overall achievable reward or throughput. For example, some secondary users may deviate from the given rules, sacrificing other secondary users to maximize the reward. Therefore, the performance obtained from the evolutionary steady state not only serves as a benchmark for distributed access in fog radio access with competing secondary users but also reveals the "tragedy of the commons"—the gap between maximum throughput and throughput shared among competing secondary users. This tragedy occurs when the pursuit of individual secondary user interests at the expense of overall spectrum allocation welfare leads to overconsumption and ultimately depletion of public spectrum resources, which is detrimental to every secondary user.

[0143] Step S50: Allocate resource blocks for data transmission based on the maximum throughput of the active secondary users.

[0144] In this embodiment, reference Figure 6 , Figure 6The above is a schematic diagram of the overall system structure of the first embodiment of the data transmission method for cognitive non-orthogonal multiple access networks of the present invention. The scheme is divided into seven steps: (1) dividing resource blocks under different power levels and introducing artificially controllable non-orthogonal multiple access acquisition and control mode; (2) obtaining the random access index set of active secondary users under different power levels based on the signal-to-interference-plus-noise ratio; (3) determining the throughput based on the obtained resource allocation probability distribution of secondary users; (4) obtaining the population state and average reward of secondary users in the context of evolutionary game theory; (5) establishing a stable resource allocation evolution strategy statically in combination with the evolutionary game theory "mutation"; (6) establishing a dynamic resource allocation replication strategy dynamically in combination with the evolutionary game theory "selection"; (7) establishing single-step reinforcement learning of the replication strategy to maximize the resource allocation reward.

[0145] In step (1), this invention addresses the coexistence of primary and secondary users in a multi-channel network under non-orthogonal multiple access in the power domain, where a base station and multiple devices possess multiple channels (resource blocks). One group of devices are primary users with low duty cycles, while the other group consists of secondary users with greater cognitive capabilities. Secondary users dynamically allocate resource blocks and have higher transmit power than primary users to avoid conflicts with primary users in the power domain. Primary users are allocated fixed resource blocks for communication with the base station, resulting in poor radio frequency capabilities and flexibility; secondary users are more flexible, dynamically selecting based on learned resource allocation probabilities. Primary and secondary users allocate resource blocks at different power levels.

[0146] In the radio access mode, the broadband unit pool provides centralized storage and communication. Access points are connected to the broadband unit pool via fronthaul links. Data is distributed as edge buffers in access points and user equipment. Not only are buffers allocated to access points, but radio signal processing and resource management are also performed locally. Resource allocation orchestration consists of acquisition, control, decision-making, and distribution, and interacts with the cognitive network through an application programming interface (API).

[0147] In step (2), active primary users and secondary users have index sets and send signals. At different power levels, the received signal of the channel (resource block) and the signal-to-interference-plus-noise ratio (SINR) of the secondary users are acquired. Secondary user signals are decoded through resource blocks. There can be at most one active primary user in the same resource block, and the primary user signal can also be decoded after continuous interference cancellation. Data packet conflicts between multiple secondary users cause decoding failure of the primary user in the same resource block; therefore, the probability of data packet conflicts with secondary users is set sufficiently low. The average number of active secondary users is lower than the number of resource blocks, and the access probability of active primary users is a binomial random variable. With a large total number of secondary users, the access probability of independent active secondary users is low, and the number of active secondary users is a Poisson random variable.

[0148] In step (3), a fixed resource block is allocated to the primary user, and secondary users select from multiple resource blocks. Each secondary user has a random access strategy, and the system performance changes according to the probability distribution of selecting a particular resource block. Resource blocks are allocated to secondary users with the expectation that at most one primary user will transmit without any other secondary users transmitting. Fog radio access has distributed characteristics and adopts the optimal access strategy notified by the base station. The maximum throughput of secondary users is considered to be Pareto optimal under ideal resource allocation conditions.

[0149] Based on the secondary user signal-to-interference-plus-noise ratio (SINR), the probability of a secondary user successfully transmitting data packets depends on the number of active primary users in the resource allocation. The conditional probability of a secondary user successfully transmitting data packets after being allocated a resource block is obtained, and the resource allocation distribution is derived based on the probability of the resource block being selected. To decode the active primary user signals, active secondary users need to be decoded and continuous interference cancellation performed to obtain the secondary user throughput and primary user throughput. Aware secondary users opportunistically access the channel without causing serious interference to less capable primary users. The optimal solution to the problem is the throughput-optimal resource allocation distribution. The base station finds the optimal resource allocation distribution with complete environmental information and transmits to secondary users. The throughput-optimal selection distribution is the Pareto optimal solution.

[0150] In step (4), in a distributed environment, secondary users want to maximize their own rewards without following given rules, if selfish behavior leads to better performance than other secondary users. The existence of selfish secondary users necessitates consideration of non-cooperative game theory. Secondary users choose the optimal strategy to maximize rewards or throughput, realizing secondary user evolutionary game theory for secondary competition for multi-channel resources. As an improvement over Nash equilibrium, evolutionary game theory does not require strong rationality assumptions and simulates secondary user behavior. Evolutionary game theory captures secondary user behavior to change strategies and reaches equilibrium over time.

[0151] There are a large number of active secondary users, each with their own strategy. Resource allocation is based on the population state, with the probability of a resource block being selected being the population proportion of that action. Each secondary user chooses a hybrid strategy based on the probability of the player's chosen action. The average reward for players using hybrid strategies is obtained based on the reward (fitness) of each player's chosen action (channel, resource block, rocker arm). The population state that evolves to a stable state is found by decomposing it into Nash equilibrium conditions and stability conditions.

[0152] In step (5), the game process includes evolutionary stable strategies and replication dynamics, involving mutation and selection. Players maximize their rewards according to their strategies, and players who choose different mixed strategies obtain average rewards. During the evolutionary process, the resource allocation game finds an evolutionary stable state as the optimal strategy to reach Nash equilibrium. Although the evolutionary stable state is also a Nash equilibrium, and is optimal in the sense that no sub-user is incentivized to change their strategy, it is not necessarily Pareto optimal in terms of maximizing the overall achievable reward or throughput. The evolutionary stable state is for each player's strategy, and no single sub-user has a unilateral incentive to change their strategy. Finding the evolutionary stable state depends on the reward function. Resource blocks are allocated to players, and the reward is the probability that the player successfully transmits data packets. At a certain moment, the reward for resource blocks allocated to active sub-users and the average reward under the mixed strategy are obtained. The average reward is based on throughput, and the throughput obtained in non-cooperative games may be worse than the maximum value in the optimization problem. Nash equilibrium is a necessary and sufficient condition for the group state vector to become the optimal solution. Find the stability condition and determine whether the uniqueness holds.

[0153] In step (6), the evolutionary stable state is found through replication dynamics to reduce complexity. Lyapunov stability, as one of the stability parameters of a solution to a differential equation, describes the stability of the solution near the equilibrium point. To avoid solving the differential equation, an evolutionary stable state needs to be found. While the throughput-optimal resource allocation distribution maximizes secondary user rewards, it is neither a Nash equilibrium nor an evolutionary stable state. Secondary users attempting to deviate from the throughput-optimal resource allocation distribution to maximize rewards would lead to the tragedy of the commons. Although the evolutionary stable state maximizes the virtual objective function, it does not maximize the throughput of secondary users.

[0154] In step (7), single-step reinforcement learning is used for secondary user learning and conflict reduction, improving system throughput and allocating resource blocks. The number of cognitive network users is greater than the number of channels. Secondary users learn the channel environment, including the activity level of primary users in each channel. The jib arm machine is used for resource allocation among multiple secondary users. Each secondary user has multiple jibs (resource blocks). A single-step operation includes two aspects: exploration (estimating the merits of the jibs) and utilization (selecting the currently optimal jib arm). When a secondary user is active and transmitting data packets, it selects one of the jibs and receives feedback from the base station. The jibs engage in multiple games to select the optimal one, using Thompson sampling (a Monte Carlo method to calculate the highest reward among all jibs). The jibs at the secondary user's location follow a beta distribution. If the secondary user is active, feedback is received, and the shape parameters are updated; if the secondary user is inactive, the shape parameters are not updated.

[0155] Secondary users find a hybrid strategy and use replication dynamics to maximize rewards. Replication dynamics updates the hybrid strategy in real-time using feedback from the base station. When a secondary user is active, resource blocks are allocated, feedback is received from the base station, and the reward (fitness) of the selected action is used as the real-time reward. Active secondary users select an arm (resource block) based on the resource allocation distribution. The probability distribution is updated based on the reward of the selected arm over time, receiving feedback from the base station and proceeding in a distributed manner.

[0156] It should also be noted that the beneficial effects obtained by this embodiment based on the above scheme are as follows: First, for the coexistence of primary and secondary users in a multi-channel cognitive network, a controllable power domain non-orthogonal multiple access is introduced to establish a non-cooperative game for secondary users to compete for multi-channel resources. Each secondary user learns the resource allocation distribution and maximizes fitness or reward based on base station feedback. Second, in the cognitive non-orthogonal multiple access network, "mutation" is performed from the perspective of a static system. Combining the existence and uniqueness of the evolutionary stable state in evolutionary game, a method for finding the evolutionary stable state (an improvement of Nash equilibrium) is designed. Third, "selection" is performed from the perspective of a dynamic system. A single-step reinforcement learning task with replication dynamics is established so that secondary users in the network can choose multiple channels without conflict when primary users are present. The effectiveness of the method for competing for secondary users is judged based on the performance of the evolutionary stable state.

[0157] This embodiment first obtains the random access index set of active secondary users under different power levels and determines the resource block allocation probability distribution of active secondary users. Then, based on the resource allocation probability distribution and the random access index set, it determines the throughput of active secondary users. Next, it determines whether the throughput of active secondary users is the maximum throughput. If not, it determines the maximum throughput of active secondary users through an evolutionary game strategy based on the throughput of active secondary users. Finally, it allocates resource blocks for data transmission according to the maximum throughput of active secondary users. Compared to existing orthogonal multiple access technologies, OMA can easily separate the information carried by different user signals with low complexity. However, a drawback of OMA is that the number of users it supports is limited by the number of available orthogonal resources, making it impossible to access more users within limited resources, thus failing to achieve high spectral efficiency and large-scale connection requirements. This embodiment, however, is a cognitive non-orthogonal multiple access network technology. It determines the maximum throughput of active secondary users through an evolutionary game strategy and then allocates resource blocks for data transmission, enabling future wireless networks to meet the demand for massive connections while achieving high spectral efficiency.

[0158] Reference Figure 7 , Figure 7 This is a structural block diagram of the first embodiment of the data transmission system for a cognitive non-orthogonal multiple access network according to the present invention.

[0159] like Figure 7As shown, the data transmission system for a cognitive non-orthogonal multiple access network proposed in this embodiment of the invention includes:

[0160] The acquisition module 7001 is used to acquire the random access index set of active secondary users under different power levels, and determine the resource block allocation probability distribution of the active secondary users;

[0161] The determination module 7002 is used to determine the throughput of the active secondary users based on the resource allocation probability distribution and the random access index set;

[0162] The judgment module 7003 is used to determine whether the throughput of the active secondary user is the maximum throughput;

[0163] The processing module 7004 is used to determine the maximum throughput of the active secondary user based on the throughput of the active secondary user through an evolutionary game strategy if no.

[0164] The transmission module 7005 is used to allocate resource blocks for data transmission based on the maximum throughput of the active secondary user.

[0165] Other embodiments or specific implementations of the data transmission system for non-orthogonal multiple access networks of the present invention can be found in the above-described method embodiments, and will not be repeated here.

[0166] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or system. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or system that includes that element.

[0167] The sequence numbers of the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0168] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as read-only memory / random access memory, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in the various embodiments of the present invention.

[0169] The above are merely preferred embodiments of the present invention and do not limit the patent scope of the present invention. Any equivalent structural or procedural transformations made based on the content of the present invention's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of the present invention.

Claims

1. A data transmission method for a cognitive non-orthogonal multiple access network, characterized in that, The data transmission method for the cognitive non-orthogonal multiple access network includes the following steps: Obtain the random access index set of active secondary users under different power levels, and determine the resource block allocation probability distribution of the active secondary users; The throughput of the active secondary users is determined based on the resource block allocation probability distribution and the random access index set. Determine whether the throughput of the active secondary user is the maximum throughput; If not, the maximum throughput of the active secondary user is determined by an evolutionary game strategy based on the throughput of the active secondary user. Resource blocks are allocated for data transmission based on the maximum throughput of the active secondary users.

2. The method as described in claim 1, characterized in that, The step of obtaining the random access index set of active secondary users under different power levels includes: Identify resource blocks at different power levels; Obtain the active primary user signal and active secondary user signal of the resource block; The resource block receiving signal is determined based on the active primary user signal and the active secondary user signal; Decode the resource block based on the received signal of the resource block; When the resource block decoding fails, obtain the total number of secondary users and the probability of active secondary user access. The number of active secondary users is determined based on the active secondary user index set of the resource block; The average number of active users is determined based on the number of active secondary users, the total number of secondary users, and the access probability of active secondary users. A random access index set for active users is established based on the average number of active users.

3. The method as described in claim 2, characterized in that, The step of determining the resource block allocation probability distribution of the active secondary users includes: Obtain the active main user access probability, the number of main user accesses, and the total number of main users for the resource block; Based on the signal-to-interference-plus-noise ratio of secondary users, the probability of successful transmission of active secondary users is obtained according to the access probability of active primary users, the number of accesses of primary users, the access probability of active secondary users, and the index set of active secondary users. The probability of successful transmission of the allocated resource block is determined based on the probability of successful transmission by the active secondary user and the number of active secondary users. The resource block allocation probability distribution of the active secondary user is determined based on the successful transmission probability of the allocated resource block.

4. The method as described in claim 3, characterized in that, The step of determining the throughput of the active secondary user based on the resource block allocation probability distribution and the random access index set includes: The throughput of the active secondary user is determined based on the resource block allocation probability distribution, the random access index set, the number of active secondary users, and the probability of successful transmission by the active secondary user.

5. The method according to any one of claims 1-4, characterized in that, After the step of determining whether the throughput of the active secondary user is the maximum throughput, the method further includes: If so, the throughput of the active secondary user shall be taken as the maximum throughput of the active secondary user. Resource blocks are allocated for data transmission based on the maximum throughput of the active secondary users.

6. The method as described in claim 5, characterized in that, The step of determining the maximum throughput of the active secondary user based on the throughput of the active secondary user through an evolutionary game strategy includes: The average return of the secondary user's hybrid strategy is determined by selecting the hybrid strategy based on the resource block allocation probability distribution. Determine whether the average return of the sub-user hybrid strategy satisfies the Nash equilibrium condition and the stability condition; If not, the maximum resource block selection reward is determined based on the throughput of the active secondary users and the average reward of the secondary user hybrid strategy according to the evolutionary game strategy; The maximum throughput of the active secondary user is determined based on the reward for selecting the largest resource block.

7. The method as described in claim 6, characterized in that, After the step of determining whether the average return of the secondary user hybrid strategy satisfies the Nash equilibrium condition and the stability condition, the method further includes: If so, the maximum throughput of the active secondary user is determined based on the average return of the secondary user hybrid strategy; Resource blocks are allocated for data transmission based on the maximum throughput of the active secondary users.

8. The method as described in claim 6, characterized in that, The step of determining the maximum resource block selection reward based on the evolutionary game strategy according to the throughput of the active secondary users and the average reward of the secondary user mixed strategy includes: Based on the evolutionary game strategy, the resource allocation evolutionary stable strategy and resource allocation replication dynamics are determined by the average return of the secondary user hybrid strategy, the return of the active secondary user resource block allocation, the Nash equilibrium condition, and the stability condition. Single-step reinforcement learning is established based on the aforementioned resource allocation evolution stability strategy and the aforementioned resource allocation replication dynamics. Based on the single-step reinforcement learning, the maximum resource block selection reward is determined through a resource allocation distribution strategy according to the throughput of the active secondary users.

9. A data transmission system for a cognitive non-orthogonal multiple access network, characterized in that, The data transmission system of the cognitive non-orthogonal multiple access network includes: The acquisition module is used to acquire the random access index set of active secondary users under different power levels, and determine the resource block allocation probability distribution of the active secondary users; The determination module is used to determine the throughput of the active secondary users based on the resource block allocation probability distribution and the random access index set; The judgment module is used to determine whether the throughput of the active secondary user is the maximum throughput; The processing module is used to determine the maximum throughput of the active secondary user based on the throughput of the active secondary user through an evolutionary game strategy if no. The transmission module is used to allocate resource blocks for data transmission based on the maximum throughput of the active secondary users.

10. A data transmission device for a cognitive non-orthogonal multiple access network, characterized in that, The device includes: a memory, a processor, and a data transmission program for a cognitive non-orthogonal multiple access network stored in the memory and executable on the processor, the data transmission program for the cognitive non-orthogonal multiple access network being configured to implement the steps of the data transmission method for a cognitive non-orthogonal multiple access network as described in any one of claims 1 to 8.