Efficient and accurate estimation of the position and size of an object is achieved. Convolution of an image with a smoothing filter is repeated to generate a plurality of smoothed images L(x, y, σi) of different scales. Then, a differential image G(x, y, σi) between each pair of the smoothed images L(x, y, σi) of scales σi and σi×2 is generated. Then, a combined image AP is generated by combining the differential images G(x, y, σi), and a position estimating unit estimates the position of the object based on the combined image AP.