The invention discloses a target tracking method based on spatio-temporal feature fusion learning, and relates to the technical field of computer vision and pattern recognition. The method comprises the steps that firstly, a space-time feature fusion learning network is constructed, space-time features comprise time sequence features and space features, and the time sequence features are extractedin the mode that Alexnet and a time recurrent neural network are combined; Wherein the spatial features are divided into target object spatial transformation features and background spatial features,and YOLOv3 and Alexnet extraction is adopted respectively. In the initial training process of the network, a training data set and a random gradient descent method are used for training the space-time feature fusion learning network, and after training is completed, the network can obtain the initial capacity for positioning the target object. The image sequence to be tracked is input into the network for forward processing, the network outputs the position and confidence of the target object bounding box, the confidence decides whether the network performs online learning or not, and the position of the bounding box realizes positioning of the target object, so that tracking of the target object is realized.