The invention discloses an image text description method based on a
visual attention model. The main content comprises the followings: image inputting,
loss function training, stylizing, image enhancing and image
thinning; and the processes are as follows: an input image is firstly adjusted as a content image (256*256) with a dual-linear down-sampling layer, and then stylized through a style subnet; and then a stylized result as the first output image is up-sampled as an image in the size of 512*512, and then the up-sampled image is enhanced through an enhancement subnet to obtain the second output image; the second output image is adjusted as the image in the size of 1024*1024, and finally, a
thinning subnet deletes locally pixelated artifact and further thins the result to obtain a high-resolution result. By use of the image style migration method disclosed by the invention, the brushwork of the artwork can be simulated more closely;
multiple models are combined into a network so as to process the image with bigger and bigger size shot by a modern
digital camera; and the method can be used for training the combined model to realize the migration of multiple artistic styles.