Image-to-language conversion method based on fusion gate loop network model

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A network model and image technology, applied in the field of image recognition, can solve difficult problems and achieve the effect of high prediction index and less computer resources

Pending Publication Date: 2021-06-01

UNIV OF SCI & TECH LIAONING

View PDF7 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Using computer programs to achieve similar effects faces many problems, because image understanding needs to consider many factors, such as how to use the feature information of the image, how to convert the understood knowledge into a text description, and how to convert these processes into logical codes. For traditional computer algorithms, it is extremely difficult to achieve this task

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0059] Such as figure 1 As shown, this embodiment provides an image-to-speech method based on the fusion gate recurrent network model, and the specific operations are as follows:

[0060] Step 1. Randomly divide the images in the image data set into a training set and a test set, preprocess the image data in the training set to obtain an image suitable for the size of the convolutional network and a set containing all word vectors, and divide the preprocessed image Input the VGGNet-16 convolutional neural network to perform convolution to obtain the image output vector.

[0061] In this embodiment, the image data set used is the MSCOCO 2014 data set, which contains more than 80,000 training data sets and more than 40,000 verification data sets. Among them, each image in the data set is mostly a color image with a size of 256×256, and each image corresponds to five English image descriptions of different lengths. First shuffle the images in the image data set, randomly select...

Embodiment 2

[0106] This embodiment compares the results of image-to-language conversion of various network models. The first model uses the fused gate recurrent network model of Embodiment 1, which is different from Embodiment 1 in that the number of iterations is 90,000. After the model training in Example 1 is over, it is found through observation and comparison that the weight model generated at 90,000 iterations is better than that at 100,000 iterations, so the weights generated at 90,000 iterations are selected to carry out the experimental results Evaluation Report. In the experimental evaluation process, if only the word with the highest score is selected each time according to the greedy search method, the final sentence description is often not optimal. Therefore, the beam search method is introduced to select the words with the highest current probability each time. , and recursively in turn until the terminator is selected. In this way, better sentence descriptions can be obt...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an image-to-language method based on a fusion gate loop network model, which comprises the following steps of: randomly incorporating images in an image data set into a training set, and preprocessing image data in the training set to obtain an image adaptive to the size of a convolutional network and a set containing all word vectors, carrying out convolution on the preprocessed image to obtain an image output vector; conforming the image output vector with the start in the set and taking the image output vector as the input of a fusion gate loop network model, and generating a first hidden layer output after the t0 time step after entering the fusion gate loop network model; combining the output of the first hidden layer and the first word vector in the set as the input of a t1 time step, entering a fusion gate loop network model, obtaining the output of a second hidden layer through the t1 time step, and carrying out loop iteration until all word vectors in the set participate in a loop iteration process, thereby completing the training of the fusion gate loop network model; and inputting a to-be-processed image into the trained fusion gate loop network model to generate language information.

Description

technical field [0001] The invention relates to the technical field of image recognition, in particular to an image-to-language method based on a fusion gate cycle network model. Background technique [0002] Image understanding is the core and hot issue in the field of computer vision research. The core of this issue is how to convert an image into a sentence, which describes the content of the image. Using computer programs to achieve similar effects faces many problems, because image understanding needs to consider many factors, such as how to use the feature information of the image, how to convert the understood knowledge into a text description, and how to convert these processes into logical codes. For traditional computer algorithms, it is extremely difficult to achieve this task. Contents of the invention [0003] In view of the above-mentioned problems existing in the prior art, the present invention provides an image-to-language method based on a fusion gate re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06K9/62G06F40/216G06F40/284G06F40/30G06N3/04G06N3/08

CPCG06F40/216G06F40/284G06F40/30G06N3/049G06N3/08G06N3/045G06F18/214

Inventor周自维王朝阳徐亮

OwnerUNIV OF SCI & TECH LIAONING

Image-to-language conversion method based on fusion gate loop network model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements:Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology