A method for human
face detection that detects faces independently of their particular poses and simultaneously estimates those poses. Our method exhibits an
immunity to variations in
skin color, eyeglasses,
facial hair, lighting, scale and facial expressions, and others. In operation, we
train a
convolutional neural network to map face images to points on a face manifold, and non-face images to points far away from that manifold, wherein that manifold is parameterized by facial
pose. Conceptually, we view a
pose parameter as a
latent variable, which may be inferred through an energy-minimization process. To
train systems based upon our inventive method, we derive a new type of discriminative
loss function that is tailored to such detection tasks. Our method enables a multi-view
detector that can detect faces in a variety of poses, for example, looking left or right (
yaw axis), up or down (
pitch axis), or tilting left or right (roll axis). Systems employing our method are highly-reliable, run at near real time (5 frames per second on conventional hardware), and is robust against variations in
yaw (±90°), roll (±45°), and
pitch (±60°).