The invention discloses a method for solving video
questions and answers by use of a hierarchical space-time attention coder-decoder network mechanism. The method mainly comprises the steps that (1) a space-time attention coding neural network is trained according to a video, question and answer
training set, and coexpression of videos and questions is learned; and (2) a decoding neural network is trained according to output of the neural network which finishes coding and obtains video question coexpression together with relevant answers, and the decoding neural network is used for outputting corresponding
natural language answers according to coexpression of the videos and the questions. Compared with ordinary video question and answer solutions, a time attention mechanism is utilized, a sequential relation between video frames is better utilized, meanwhile, a space attention mechanism is utilized to make key locations in the video frames precise, the characteristics of the videos and the questions can be reflected more accurately, and the answers better meeting requirements are generated. The effect of the method in terms of video question and answer problems is better than that of a traditional method.