Method and device of training a captioning model, computer equipment and storage medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A subtitle and model technology, applied in the field of training subtitle models, can solve problems such as low training quality, high training difficulty, and data consumption, and achieve the effects of simplifying the training process, improving training quality, and saving memory and data consumption

Pending Publication Date: 2020-10-30

TENCENT AMERICA LLC

View PDF0 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] Embodiments of the present application provide a method and device for training subtitle models, computer equipment and storage media, aiming to solve the problem that existing subtitle model training methods consume both memory and data, and the training is difficult and the training quality is not high The problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0017] Currently, great progress has been made in image and video captioning. Much of this is due to advances in machine translation. For example, the encoder-decoder framework and attention mechanism were first introduced in machine translation and then extended to subtitles. Both image captioning methods and video captioning methods follow their pipelines and apply an attention mechanism in caption generation. Compared with image subtitles, video subtitles describe dynamic scenes rather than static ones.

[0018] from figure 1 As can be seen in , video captioning is much more difficult due to larger appearance variations. Some related techniques propose boundary-aware long-short-term memory (LSTM, long short-term memory) units to automatically detect temporal video segments. Some related techniques integrate natural language knowledge into their networks by training linguistic LSTM models on large external text datasets. Some related technologies extend the Gated Recur...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A method of training a captioning model used to perform automatic video captioning of an input video, including initializing a plurality of long short-term memory (LSTM) units included in the captioning model using cross-entropy loss; training the LSTM units using reinforcement learning; training the LSTM units and a plurality of convolutional neural networks (CNNs) included in the captioning model using multitask training; and generating a video caption corresponding to the input video using the captioning model.

Description

[0001] priority information [0002] This application claims priority to U.S. Application No. 16 / 396,924, entitled "End-to-End Video Captioning with Multi-Task Reinforcement Learning," filed April 29, 2019, the entire contents of which are incorporated by reference In this application. technical field [0003] This application relates to video subtitle technology. Specifically, the present application relates to a method and device for training a subtitle model, a computer device and a storage medium. Background technique [0004] Video subtitles are crucial for many downstream applications such as video retrieval, indexing, browsing, etc. Existing video captioning methods are trained component by component, and the quality of the overall system is affected by the performance of each individual component. [0005] End-to-end (E2E) training of related techniques is often hampered by hardware constraints (e.g., graphics processing unit (GPU) memory) and is prone to overfit...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): H04N21/488H04N21/81H04N5/278G06N3/04G06N3/08

CPCH04N21/4884H04N21/8133H04N5/278G06N3/08G06N3/044G06N3/045G06N3/006G06N3/088G06V20/41G06V10/82G06V10/764G06F18/2413G06N20/00G06V20/47G06F18/217

Inventor 宫博庆

Owner TENCENT AMERICA LLC

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method and device of training a captioning model, computer equipment and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology