The invention discloses a Kinect three-dimensional depth image-based head identification and tracking method. An original depth image outputted by a Kinect sensor is first analyzed in order to find out the corresponding relation between distances and gray levels; a target is split, i.e., the original depth image is reversed after the corresponding relation between distances and gray levels is calibrated, so that a gray image is obtained, a clustering algorithm is adopted to divide heads and shoulders in the gray image into two categories, and a histogram is utilized to work out gray level-split gray images, so that a binary image sequence is obtained; the heads are identified, tracked and counted, i.e., on the basis of the ellipse-like shapes and sizes of human heads, the spatial positions of the heads and the shoulders and other features, the binary image sequence and the gray image are traversed, the positions of the human heads are identified, tracks are established, and people who come in and go out are counted. The method solves a lot of problems in current passenger flow statistics, such as crowding and environmental affection, VS2008 software is utilized for simulation, the feasibility and stability of the system are determined, and the precision of the system is higher than 93 percent.