The invention relates to the field of
computer vision recognition, and provides an
image description method constructed based on a hierarchical feature
relation graph, which comprises the following steps: constructing a training
data set; inputting the image into a
block detection module, and outputting block visual information; inputting the image into a target detection module, and outputting target visual information; inputting the image into a
text detection module, and outputting text visual information; Respectively inputting the three visual information into a description generator, respectively constructing a
relational graph of each type of visual information and training image coordinate information, and optimizing the three visual information; Screening and fusing the three types of visual information to obtain multi-
modal characteristics; inputting the description words into a
recurrent neural network to extract feature information, and predicting the next description wordto generate a complete description
sentence. According to the method, various types of visual information are optimized, screened and fused, so that any input test image is described, and the
image description accuracy can be effectively improved.