Modulated video token compression via causal compression module with positional information injection
WO2026124754A1PCT designated stage Publication Date: 2026-06-18HUAWEI TECH CO LTD +1
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2024-12-11
- Publication Date
- 2026-06-18
Smart Images

Figure EP2024085629_18062026_PF_FP_ABST
Abstract
Described is a computer apparatus (900) configured to: obtain a plurality of latent frames (103) of the video; and compress (104) a plurality of tokens of the latent frames (103) to generate a plurality of compressed tokens (105) for input into a vision language model (VLM) (109); wherein the compression (104) comprises combining tokens along a temporal dimension and along a spatial dimension. In this way, the compression (104) may be greater than if a single dimension is used, which may reduce the workload of the VLM (109).
Need to check novelty before this filing date? Find Prior Art