Image-to-language conversion method based on fusion gate loop network model
A network model and image technology, applied in the field of image recognition, can solve difficult problems and achieve the effect of high prediction index and less computer resources
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0059] Such as figure 1 As shown, this embodiment provides an image-to-speech method based on the fusion gate recurrent network model, and the specific operations are as follows:
[0060] Step 1. Randomly divide the images in the image data set into a training set and a test set, preprocess the image data in the training set to obtain an image suitable for the size of the convolutional network and a set containing all word vectors, and divide the preprocessed image Input the VGGNet-16 convolutional neural network to perform convolution to obtain the image output vector.
[0061] In this embodiment, the image data set used is the MSCOCO 2014 data set, which contains more than 80,000 training data sets and more than 40,000 verification data sets. Among them, each image in the data set is mostly a color image with a size of 256×256, and each image corresponds to five English image descriptions of different lengths. First shuffle the images in the image data set, randomly select...
Embodiment 2
[0106] This embodiment compares the results of image-to-language conversion of various network models. The first model uses the fused gate recurrent network model of Embodiment 1, which is different from Embodiment 1 in that the number of iterations is 90,000. After the model training in Example 1 is over, it is found through observation and comparison that the weight model generated at 90,000 iterations is better than that at 100,000 iterations, so the weights generated at 90,000 iterations are selected to carry out the experimental results Evaluation Report. In the experimental evaluation process, if only the word with the highest score is selected each time according to the greedy search method, the final sentence description is often not optimal. Therefore, the beam search method is introduced to select the words with the highest current probability each time. , and recursively in turn until the terminator is selected. In this way, better sentence descriptions can be obt...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


