How to generate a scene graph from an input image

The iterative method enhances scene graph generation by leveraging multimodal models and external knowledge to produce a more detailed and accurate representation of image content, addressing the limitations of existing supervised methods.

JP2026105857APending Publication Date: 2026-06-26ROBERT BOSCH GMBH

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
ROBERT BOSCH GMBH
Filing Date
2025-12-15
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing methods for generating scene graphs from images rely heavily on human-annotated data and require user input, limiting their applicability and accuracy, especially in unsupervised scenarios.

Method used

An iterative method using a multimodal base model and information extraction to generate scene graphs by extracting triplets from initial text descriptions, supplemented with targeted questions and external knowledge, allowing for a more comprehensive and accurate representation of image content.

Benefits of technology

Enables the creation of a detailed and robust scene graph through an unsupervised, dynamic approach, capturing broader contextual details like image quality, weather, and lighting, resulting in a richer and more informative output.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026105857000001_ABST
    Figure 2026105857000001_ABST
Patent Text Reader

Abstract

The present invention relates to a computer-implemented method (1000) for generating a scene graph (41) from an image (11) using iterative refinement of an initial text description (12). [Solution] The machine learning system (1) generates an initial description (12) based on an image (11) and an initial question (21). Triplets (source node, relation, target node) are extracted. The attributes of existing nodes determined from the database (3) iteratively prompt the machine learning system (1) to ask new questions (31), which in turn generate further descriptions (12) and triplets (23). From all the extracted triplets (23), a final scene graph (41) is constructed.
Need to check novelty before this filing date? Find Prior Art