An image generating apparatus comprising a multi-plane buffer comprising a plurality of planes, and can record R, G, B, A and Z values as pixels, and a multi-plane buffer processor which stores in said multi-plane buffer in sequence from closest distance from a vantage point a plurality of defocus data which is data consisting of object model data that has been at least coordinate converted, hidden surface processed, and defocus processed, and which assigns R, G, B, A and Z values to each pixel, this object model data being derived from an object whose positional relationships are to be represented from a vantage point using depth of field, which is the effective focus range within which the object is focused.