A vehicle environment monitoring apparatus capable of extracting an image of a monitored object in an environment around a vehicle by separating the same from the background image with a simple configuration having a single camera mounted on the vehicle is provided. The apparatus includes a first image portion extracting processing unit to extract first image portions (A1, A2) considered to be the head of a pedestrian from a currently picked up image and a previously picked up image by an infrared camera, a mask area setting processing unit to set mask areas (M1(0,0), M1(1,0), . . . , M1(5,8)) around the first image portion (A1) in the currently picked up image, and an object extracting processing unit to carry out pattern matching for the previously picked up image by a comparison pattern obtained through affine transformation of each mask area at a change rate (Rate) between the first image portions (A1, A2), and to set an area (Ar1) including the first image portion (A2) and a second image portion (M2(1,3), M2(2,3), . . . , M2(3,6)) where a displacement amount between the position (black point) corresponding to the centroid of the mask area and the matching position is smaller than a predetermined threshold value to be an image area of the monitored object.