The invention discloses a method for solving a polymorphic statement video positioning task through a space-time graph reasoning network, and belongs to the field of natural language visual positioning. According to the method, firstly, a video is parsed into a space-time region graph, and the space-time region graph not only has implicit and explicit space sub-graphs of each frame, but also has across-frame time dynamic sub-graph; next, a text clue is added into the space-time region graph, and multi-step cross-modal graph reasoning is established; the multi-step process may support multi-order relational modeling; and thereafter, a temporal boundary of the pipeline is determined using a temporal locator, then the object is located in each frame using a spatial locator having a dynamic selection method, and a smooth pipeline is generated. According to the method, the video does not need to be trimmed when the natural language is positioned, so that the video positioning cost is reduced; and question sentences and declaration sentences can be effectively processed, technical support is provided for higher-level natural language processing and computational vision combined research(such as video questions and answers), and the application prospect is wide.