The invention discloses a 
power grid optimal carbon energy composite flow obtaining method based on 
swarm intelligence reinforcement learning. The method comprises the following steps: S1, establishing a multi-object optimal carbon energy composite flow model 
object function, S2, setting a reward function according to the 
object function, S3, updating a Q value matrix of each main body according to an eligibility trace, S4, calculating a greed action of each main body, S5, updating an action probability matrix of each main body, S6, randomly selecting a pre-judgment action of each main body at a current state, S7, inputting the multiple main bodies in a coordinative manner, and solving an optimal action of a swarm, S8, performing updating and then obtaining action values after correction, S9, determining a 
control variable matrix, and performing load flow calculation, and S10, after the load flow calculation, judging whether the Q value matrix is convergent, taking a result obtained by last load flow calculation as a 
power grid optimal carbon energy composite flow if the Q value matrix is convergent, and returning to the S2 if the Q value matrix is not convergent. The method enables loss of an energy flow and loss of a carbon 
discharge flow in a 
power grid to reach minimums; and the good 
global optimization capability is guaranteed, and the convergence speed of an 
algorithm is obviously improve at the same time.