The invention provides a learning state hybrid analysis method for a static multi-person scene. At the beginning of a class, the concentration degrees of the students are highly concentrated, according to the invention, the face is detected by using an algorithm with high front face detection speed and high precision, and the static position areas of the students are estimated; then, the life values and the hit values of the static positions of the students are judged, an algorithm with high side face detection precision is called, so that through the double-layer face detection, the accuracy of face detection in the static multi-person scene, such as a classroom, etc., is greatly improved, and the operation speed is ensured. For the recognized and obtained head postures and facial expressions of the students, the concentration degrees of the students are obtained by comparing and calculating the head postures of the students with the head postures of the surrounding students, the expressions of the students are classified in a plurality of ways, and the diversity of the expression classification and the calculation of the concentration degree of the students can improve the reliability of the analysis result of a multi-modal feature analysis module.