The invention provides a visual question-answering method and
system based on
semantic alignment and a storage medium, and relates to the technical field of visual question-answering. According to theembodiment of the invention, the method comprises the steps: firstly obtaining and preprocessing a
data set, extracting original image features and target position features according to an original image, generating an
image description statement according to the target position features, obtaining an
image description word, question features and
image description statement features, and carryingout the
semantic alignment of the original image features and the image description word; and obtaining a first image feature, obtaining a second image feature according to the original image featureand the image description statement feature, obtaining a third image feature according to the original image feature and the question feature, fusing the three image features, the image description statement feature and the question feature to obtain a comprehensive feature, and predicting a final answer result. Therefore, the importance of the image information is highlighted, the information involved in the
feature fusion process is perfected, and the finally generated answer result is more accurate.