This invention provides a
system and method for determining position of a viewed object in three dimensions by employing 2D
machine vision processes on each of a plurality of planar faces of the object, and thereby refining the location of the object. First a rough
pose estimate of the object is derived. This rough
pose estimate can be based upon predetermined
pose data, or can be derived by acquiring a plurality of planar face poses of the object (using, for example multiple cameras) and correlating the corners of the trained
image pattern, which have known coordinates relative to the origin, to the acquired patterns. Once the rough pose is achieved, this is refined by defining the pose as a
quaternion (a, b, c and d) for rotation and a three variables (x, y, z) for translation and employing an iterative weighted,
least squares error calculation to minimize the error between the edgelets of trained
model image and the acquired runtime edgelets. The overall, refined / optimized pose estimate incorporates data from each of the cameras' acquired images. Thereby, the estimate minimizes the
total error between the edgelets of each camera's / view's trained
model image and the associated camera's / view's acquired runtime edgelets. A final transformation of trained features relative to the runtime features is derived from the iterative error computation.