The invention relates to a real-time video face key point detection method based on deep learning, and the method employs a convolutional neural network to carry out the key point detection of a single frame, employs a depth separable convolution to improve the model detection rate, employs a boundary heat map as an additional subtask of an original network to improve the constraint of a global face structure of the original network. The method improves the detection accuracy of an original network, is used for solving a data imbalance loss function of a heat map, improves the generalization capability of a model for a large attitude sample under a limited sample, and improves the inter-frame smoothness through an optical flow loss function. In the detection process, for a frame of which the confidence is lower than a key point confidence threshold due to an extremely large angle, fitting is carried out by utilizing 3DMM to obtain dense key point coordinates, 68-point sampling is carried out on the obtained dense key points according to a projection error between minimum frames, and the consistency with the previous frame is kept. The method has the advantages of real-time performance, capability of utilizing global inter-frame information, high detection accuracy of a face large posture condition and the like.