Neural framework search method for general multi-modal learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A search method and multi-modal technology, applied in neural learning methods, neural architectures, biological neural network models, etc., can solve problems such as complex content

Pending Publication Date: 2021-03-12

HANGZHOU DIANZI UNIV

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

At the same time, the image carrier in natural scenes has various themes, complex and varied content, and there may be high similarity and redundancy between object frames, which makes the required architecture and also makes the architecture search method face a huge challenge

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0111] The detailed parameters of the present invention are further described in detail below.

[0112] like figure 1 and 2 As shown, the present invention provides a general multimodal learning-oriented neural framework search method MMNasNet. The present invention first performs data preprocessing on image and text data to extract features. Then the encoder-decoder structural redundancy network and corresponding architecture parameters are initialized. Second, the substructures are sampled from the distribution of architectural parameters, and the results are calculated. Then the model search, hot start and alternate update ensure the stability of the search structure. Finally, model training is carried out, and the optimal sub-network is retrained to obtain the optimal network model. The present invention proposes a neural frame search method for multi-modal modeling of image and text, especially for different tasks, a better sub-network is searched, the calculation am...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a neural framework search method for general multi-modal learning. The method comprises the following steps of 1, performing data preprocessing on image and text data, and extracting features; 2, initializing an encoder decoder structure redundant network and corresponding architecture parameters; 3, sampling a substructure from the architecture parameter distribution, andcalculating a result. 4, ensuring the stability of a search structure through model search, hot start and alternate update. and 5, model training: retraining the searched optimal sub-network to obtainan optimal network model. The invention provides a neural frame search method for image text multi-modal modeling. A better sub-network is particularly searched for different tasks, so that the parameter amount calculation amount of the network is reduced, the deep features of each modal are fully utilized, the expression capability of modal expansion features is improved, and a leading effect isobtained in the three multi-modal tasks.

Description

technical field [0001] The invention proposes a general multimodal learning-oriented neural frame search method MMNasNet. Background technique [0002] Visual Question Answering is an emerging task in the multimodal domain, which aims to answer a given question about an image based on the provided image. Specifically, input an image and a question, and answer the corresponding answer to the question after passing through the model. For example, the image content is a street with houses of various colors, different types of cars parked, and a lot of pedestrians walking on the street. When given a specific question, such as "What color is the car on the left of the pedestrian in black walking on the zebra crossing?", the model needs to recognize the object in the image and the different properties of the object and then infer to get the answer. The Visual Grounding task aims to find the corresponding object in the image based on the provided object description. For example,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/04G06K9/62G06N3/08G06T3/40G06T9/00

CPCG06N3/08G06T3/4084G06T9/002G06N3/045G06F18/214

Inventor 余宙俞俊崔雨豪

Owner HANGZHOU DIANZI UNIV

Neural framework search method for general multi-modal learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology