The invention discloses a method for extracting an image region of interest based on eye movement data and bottom-layer features. On one hand, the image region of interest, namely, eye movement ROI (Region Of Interest), for reflecting human real semanteme is extracted by visual point tracking experimental data of an eye movement instrument, and on the other hand, the image region of interest, namely, feature ROI, in a general sense is extracted in a form of bottom-layer feature weighted combination, and weight combination with highest similarity, namely, optimal weight, is found out by similarity analysis of the feature ROI and the eye movement ROI. The region of interest of other image of the same type, extracted by using the weight, can more comply with the semantic demands of users.