Image description generation method and system based on unsupervised uniqueness optimization

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An image description and generation system technology, applied in neural learning methods, biological neural network models, special data processing applications, etc., can solve the problems of lack of diversity and vividness of sentence descriptions, large differences in descriptions, and unstable training. Achieve the effect of avoiding training instability and loss monitoring difficulties, good diversity, and improving the quality of descriptions

Pending Publication Date: 2020-05-08

INSPUR ARTIFICIAL INTELLIGENCE RES INST CO LTD SHANDONG CHINA

View PDF7 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Methods based on generative adversarial networks can generate diverse descriptions, but due to the complexity of generative adversarial networks, these methods often have the problem of unstable training

[0004] To sum up, the existing image description methods based on maximum likelihood estimation tend to generate sentences similar to the training set while ignoring specific image details, resulting in the lack of diversity and vividness of the generated sentence descriptions, which are different from human-generated descriptions. larger

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0062] combined with figure 1 , this embodiment proposes an image description generation method based on unsupervised uniqueness optimization, and the implementation process of the method includes:

[0063] S1. Obtain paired images and real sentence descriptions generated by humans, and store them in the training set;

[0064] S2. Using the paired data included in the training set to train the image description retrieval model 10 in the SentEval tool.

[0065] S3. Construct the image description generation model 4 using the encoder-decoder framework.

[0066] In this embodiment, the encoder uses ResNet-101 pre-trained on ImageNet;

[0067] The decoder uses a two-layer LTSM with an attention mechanism, the first layer LSTM focuses on visual information, and the second layer LSTM focuses on language information.

[0068] S4. Acquire the images of the training set and input them into the image description generation model 4. The image description generation model 4 generates s...

Embodiment 2

[0085] combined with figure 2 , this embodiment proposes an image description generation system based on unsupervised uniqueness optimization, which includes:

[0086] Obtain storage module 1, which is used to obtain paired images and real sentence descriptions generated by humans, and store them in the training set;

[0087] The training module 2 is used to train the image description retrieval model 10 of the SentEval tool using the paired data contained in the training set;

[0088] Building block 3 for building an image description generation model 4;

[0089] The split processing module 5 is used to obtain the images of the training set and divide them into multiple batches, and is also used to sequentially and circularly input the images contained in the multiple batches into the image description generation model 4;

[0090] The image description generation model 4 is used to obtain the images of the training set and generate sentence descriptions corresponding to th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an image description generation method and system based on unsupervised uniqueness optimization, and relates to the technical field of image description, and the method comprises the steps: S1, obtaining paired images and real sentence description generated by human beings, and storing the paired images and the real sentence description in a training set; s2, training an image description retrieval model by using paired data contained in the training set; s3, constructing an image description generation model; s4, acquiring images of the training set and inputting the images into an image description generation model, enabling the image description generation model to generate sentence description corresponding to the images, and taking the images and the generatedsentence description as paired data to be stored in a comparison set; s5, estimating the similarity of paired data in the comparison set by utilizing an image description retrieval model to obtain loss and gradient; and S6, adjusting the image description generation model according to the loss and the gradient, and returning to execute the step S4 or outputting a final image description generationmodel. According to the method, the high-quality sentence description of the image can be generated in an unsupervised manner, and the description has diversity and uniqueness.

Description

technical field [0001] The invention relates to the technical field of image description, in particular to an image description generation method and system based on unsupervised uniqueness optimization. Background technique [0002] Image description is a task that requires models to acquire a multimodal understanding of the world and express this understanding in natural language text, making it relevant to various fields from human-computer interaction to data management. The practical goal is to automatically generate description images. Related natural language descriptions. Most of the latest neural network models are built on an encoder-decoder architecture, where a convolutional neural network (CNN) is used as an encoder for image features, which are fed to a recurrent neural network (RNN), which The network generates sentence descriptions by acting as a decoder. Decoders typically also include one or more attention layers to focus sentence descriptions on the most...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/62G06N3/04G06N3/08G06F16/532

CPCG06N3/08G06F16/532G06N3/044G06N3/045G06F18/22G06F18/214

Inventor 吴烨李锐金长新

Owner INSPUR ARTIFICIAL INTELLIGENCE RES INST CO LTD SHANDONG CHINA

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Image description generation method and system based on unsupervised uniqueness optimization

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology