Image text description method based on knowledge transfer multi-modal recurrent neural network
A recurrent neural network, multi-modal technology, applied in character and pattern recognition, instruments, computer parts, etc., can solve the problems of limited data set, high data set cost, irrelevant information, etc., to achieve appropriate semantics and accurate grammar. Structured, readable effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0048] Such as figure 1 As shown, an image text description method based on knowledge transfer multimodal recurrent neural network, including the following steps:
[0049] S1: Train an image semantic classifier in the server;
[0050] S2: Train the language model in the server;
[0051] S3: Pre-train the text description generation model in the server and generate description sentences.
[0052] The specific process of step S1 is as follows:
[0053] S11: Collect multiple image datasets: download ready-made datasets, including ImageNet and MSCOCO, since MSCOCO is a pairwise matching dataset of images and text descriptions, take the image part;
[0054] S12: Use the convolutional neural network to extract the corresponding image feature f for each picture in the collected data set I ;
[0055]S13: Make a label set, select 1000 most common words that cover 90% of the words used in the training set of image and text description pair matching, and add objects that do not appe...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com