Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Visual language task processing system, training method and device, equipment and medium

A visual language and task technology, applied in the field of visual language task processing systems, can solve problems such as low accuracy and poor model accuracy, and achieve the effect of improving accuracy

Pending Publication Date: 2021-12-14
BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1
View PDF0 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the related art, a unified VL encoder-decoder model composed of a shared multi-layer transformer (Transformer) network is used to pre-train it so that it can handle both visual language understanding tasks and visual language generation tasks , the accuracy of the pre-trained model to deal with the VL task is low, and the model accuracy is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Visual language task processing system, training method and device, equipment and medium
  • Visual language task processing system, training method and device, equipment and medium
  • Visual language task processing system, training method and device, equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus repeated descriptions thereof will be omitted.

[0044] Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided in order to give a thorough understanding of embodiments of the present disclosure. However, those skilled in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a visual language task processing system, a visual language task processing method and device, equipment and a storage medium, and relates to the technical field of artificial intelligence. The system comprises a target encoder, a text encoder and a text decoder, the target encoder and the text encoder are respectively connected with the text decoder, and the target encoder is used for inputting a predetermined image; performing coding processing on the predetermined image to obtain a target representation sequence; outputting a target representation sequence; the text encoder is used for inputting text description; encoding the text description to obtain a word representation sequence; outputting a word representation sequence; the text decoder is used for inputting a target representation sequence and a word representation sequence; decoding the target representation sequence and the word representation sequence to obtain a multi-modal representation sequence; and outputting a multi-modal representation sequence, wherein the multi-modal representation sequence is used for processing the visual language task. The system can improve the accuracy of processing the visual language task to a certain extent.

Description

technical field [0001] The present disclosure relates to the technical field of artificial intelligence, in particular, to a visual language task processing system, a training method, device, equipment and readable storage medium of the visual language task system. Background technique [0002] Vision and language are two basic abilities of artificial intelligence, and the interaction between the two supports a series of unique simulations of the human brain's ability to process information, such as visual language (Vision-Language, VL) understanding (such as visual question answering) and VL generation (e.g. image description). VL technology has a good application prospect in robot vision and helping the visually impaired. [0003] Inspired by the development of natural language pre-training technology, pre-training the VL model to improve the performance of the model for VL tasks has become a development trend. Pre-training the VL model can transfer the multimodal knowle...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/35G06F16/75G06F16/78G06K9/62
CPCG06F16/35G06F16/3344G06F16/7867G06F16/75G06F18/241
Inventor 潘滢炜李业豪姚霆梅涛
Owner BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products