Methods and apparatus are described for monocular 3D human pose estimation and tracking, which are able to recover poses of people in realistic street conditions captured using a monocular, potentially moving camera. Embodiments of the present invention provide a three-stage process involving estimating (10, 60, 110) a 3D pose of each of the multiple objects using an output of 2D tracking-by detection (50) and 2D viewpoint estimation (46). The present invention provides a sound Bayesian formulation to address the above problems. The present invention can provide articulated 3D tracking in realistic street conditions.
The present invention provides methods and apparatus for people detection and 2D pose estimation combined with a dynamic motion prior. The present invention provides not only 2D pose estimation for people in side views, it goes beyond this by estimating poses in 3D from multiple viewpoints. The estimation of poses is done in monocular images, and does not require stereo images. Also the present invention does not require detection of characteristic poses of people.