The invention discloses a sign language recognition method and system based on a double-flow space-time diagram convolutional neural network, and the method comprises the steps: firstly segmenting a sign language motion video into video frames, extracting the upper body and hand skeleton points of a person in a sign language motion video segment, and constructing global and local diagram data; respectively extracting global and local spatial-temporal features by utilizing a double-flow spatial-temporal graph volume network, and obtaining global-local features through feature splicing; meanwhile, texts corresponding to the videos are encoded into word vectors after word segmentation processing, the word vectors and the texts are mapped to the same implicit space through feature transformation, and model training is conducted through a dynamic time warping algorithm; and for the global-local feature sequence, a self-attention mechanism coding and decoding network is adopted to perform serialized modeling on the global-local feature sequence, and a softmax classifier is adopted to obtain words corresponding to each video clip by the output of a decoder, and corresponding text sentences are formed. According to the method, the accuracy of text sentence generation can be improved, and the method has important application value in scenes such as caption generation and human-computerinteraction.