Modulated video token compression via causal compression module with positional information injection

WO2026124754A1PCT designated stage Publication Date: 2026-06-18HUAWEI TECH CO LTD +1

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2024-12-11
Publication Date
2026-06-18

Smart Images

  • Figure EP2024085629_18062026_PF_FP_ABST
    Figure EP2024085629_18062026_PF_FP_ABST
Patent Text Reader

Abstract

Described is a computer apparatus (900) configured to: obtain a plurality of latent frames (103) of the video; and compress (104) a plurality of tokens of the latent frames (103) to generate a plurality of compressed tokens (105) for input into a vision language model (VLM) (109); wherein the compression (104) comprises combining tokens along a temporal dimension and along a spatial dimension. In this way, the compression (104) may be greater than if a single dimension is used, which may reduce the workload of the VLM (109).
Need to check novelty before this filing date? Find Prior Art