The invention provides a multi-model fusion video hand division method based on Kinect, which comprises the following steps of (1) capturing video information, (2) dividing images in a video respectively to obtain division results, namely a depth model, a skin color model and a background model, (3) calculating an overlapping rate of every two division results as a characteristic of judging division effects of the results and inputting the three overlapping rates into a neural network, (4) allowing the neural network to output three coefficients (namely confidence coefficients) showing respective reliability of the three models, and weighting the three division results with the confidence coefficients, (5) conducting linear superposition on the weighted division results of the three models, (6) outputting a final binary image of a superposed result through a threshold function and finally dividing an obtained video hand region, and (7) updating the background model, wherein the division results are expressed as binary images. The method has the advantages of low cost, good flexibility and the like.