Image processing method, and apparatus

By transforming image features to the frequency domain and using time steps for adaptive modulation, the DiT model with a U-shaped transformer architecture solves the problem of poor image processing performance in existing technologies, achieving better image enhancement effects and efficiency.

WO2026066492A9PCT designated stage Publication Date: 2026-06-25HUAWEI TECH CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2025-07-04
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing diffusion models have poor output performance in image processing, especially in image super-resolution tasks, and the traditional DiT architecture lacks multi-scale feature extraction capabilities, resulting in unsatisfactory image enhancement effects.

Method used

Image features are converted to the frequency domain for modulation. The frequency components are adaptively modulated using time steps. The DiT model with a U-shaped transformer architecture is adopted. The conversion between the spatial and frequency domains is achieved through Fourier transform, and feature modulation is performed in the frequency domain.

Benefits of technology

It improves the image processing performance, especially in image super-resolution, dehazing, and deblurring tasks, achieving better image enhancement results and higher efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025107096_25062026_PF_FP_ABST
    Figure CN2025107096_25062026_PF_FP_ABST
Patent Text Reader

Abstract

Provided in the embodiments of the present application are an image processing method, and an apparatus in the field of computer vision, which are used for converting image features into spatial domains for modulation, so as to perform adaptive modulation on different frequency components to obtain images having a better enhancement effect. The method comprises: acquiring an input image, wherein the input image may specifically comprise an image read from a storage space, or may be an image collected online by using an electronic device, etc.; and then inputting the input image into a pre-trained image processing model to obtain an output image, wherein the image processing model may be a diffusion model implemented on the basis of a transformer architecture, the image processing model comprises a plurality of modules, and said modules are used for converting input first image features into a spectrogram, modulating the spectrogram by using time steps to obtain a modulated spectrogram, and converting the modulated spectrogram into second image features, the second image features being used for obtaining the output image.
Need to check novelty before this filing date? Find Prior Art