The invention discloses a multi-modal machine translation method based on variational reasoning and multi-task learning. Before the method has translation capability, multi-task modeling is performedon multi-modal information such as images, texts and the like by using a variational reasoning theory, and then a variational multi-modal machine translation model is obtained by training under the condition of giving a sufficient training set, so that the machine translation capability is obtained. And finally, predicting a plurality of translated texts through bundle search and maximum likelihood. The innovation point of the invention lies in that a model capable of integrating multi-modal information such as images into machine translation, namely variational multi-modal machine translation, is created and used. According to the variational model, a set of feature extraction neural network framework confusing image and text semantics is constructed, a modeling process and a self-learning updating process are derived at the same time, a detailed derivation algorithm is given, and an application method is given in an instructive mode.