The invention discloses a multi-to-multi speaker conversion method based on STARGAN and an x vector, which comprises a training stage and a conversion stage, wherein a speech conversion system is achieved by combining the STARGAN and the x vector, the personality similarity and quality of the converted speech can be greatly improved, particularly, for the short-time utterance, the x vector has better characterization performance and better speech conversion quality can be achieved, meanwhile, the problem of over-smoothing in C-VAE can be overcome, and a high-quality speech conversion method isachieved. In addition, the method can achieve the speech conversion under the condition of non-parallel text, the training process does not need any alignment process, the universality and practicability of a speech conversion system are improved, and the method can also achieve that the conversion system with multiple source-target speaker pairs is integrated in a conversion model, namely, the multi-speaker-to-multi-speaker conversion is achieved, and the system has a better application prospect in the fields of cross-language speech conversion, film dubbing, speech translation and the like.