This invention provides a 
system and method for determining position of a viewed object in three dimensions by employing 2D 
machine vision processes on each of a plurality of planar faces of the object, and thereby refining the location of the object. First a rough 
pose estimate of the object is derived. This rough 
pose estimate can be based upon predetermined 
pose data, or can be derived by acquiring a plurality of planar face poses of the object (using, for example multiple cameras) and correlating the corners of the trained 
image pattern, which have known coordinates relative to the origin, to the acquired patterns. Once the rough pose is achieved, this is refined by defining the pose as a 
quaternion (a, b, c and d) for rotation and a three variables (x, y, z) for translation and employing an iterative weighted, 
least squares error calculation to minimize the error between the edgelets of trained 
model image and the acquired runtime edgelets. The overall, refined / optimized pose estimate incorporates data from each of the cameras' acquired images. Thereby, the estimate minimizes the 
total error between the edgelets of each camera's / view's trained 
model image and the associated camera's / view's acquired runtime edgelets. A final transformation of trained features relative to the runtime features is derived from the iterative error computation.