Data processing method and related apparatus

By updating the configuration information of the diffusion model and introducing discriminator training, the problem of insufficient denoising ability of the diffusion model in image super-resolution tasks is solved, achieving more efficient image denoising effect and model adaptability.

WO2026137834A1PCT designated stage Publication Date: 2026-07-02HUAWEI TECH CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2025-07-29
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Existing diffusion models are insufficient in denoising capabilities for image super-resolution tasks, and struggle to effectively handle different types of noise and image quality degradation problems.

Method used

By acquiring noise information from the image, the configuration information of the diffusion model is updated using the first correction matrix, the denoising strategy is dynamically adjusted, and an approximate network is constructed by combining the low-rank adaptive principle to enhance the flexibility and adaptability of the model. A discriminator is introduced for adversarial training to improve the denoising capability.

Benefits of technology

It improves the denoising capability of the diffusion model, enhances the model's flexibility and adaptability, and can better handle different types of noise and image quality degradation problems, while reducing inference time.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025111183_02072026_PF_FP_ABST
    Figure CN2025111183_02072026_PF_FP_ABST
Patent Text Reader

Abstract

Embodiments of the present application disclose a data processing method. The method comprises: acquiring a first image; obtaining a first correction matrix by means of a first model, wherein the first correction matrix is used for indicating noise of the first image; updating configuration information of a second model on the basis of the first correction matrix, wherein the configuration information is used for indicating a denoising strategy when the second model performs denoising processing; and performing denoising processing on the first image by means of the updated second model to obtain a second image. In the embodiments of the present application, the denoising strategy of the second model is adjusted by means of noise information of the input image, so that the second model samples noise more accurately during the denoising process, thereby improving the denoising capability of the model.
Need to check novelty before this filing date? Find Prior Art

Description

A data processing method and related apparatus

[0001] This application claims priority to Chinese Patent Application No. CN202411981871.6, filed on December 27, 2024, entitled "A Data Processing Method and Related Apparatus", the entire contents of which are incorporated herein by reference. Technical Field

[0002] This application relates to the field of artificial intelligence (AI), and more particularly to a data processing method and related apparatus. Background Technology

[0003] Artificial intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to possess the functions of perception, reasoning, and decision-making.

[0004] Super-resolution (SR) image reconstruction aims to reconstruct and restore high-resolution images from low-resolution (LR) images. This process requires meticulous detail restoration to fill in and reconstruct lost information. Currently, using diffusion models (DM) to denoise low-quality images to obtain high-quality images has become a common solution in SR tasks. However, how to further improve the denoising capability of DM models remains a key problem that urgently needs to be solved. Summary of the Invention

[0005] In a first aspect, embodiments of this application provide a data processing method, the method comprising: acquiring a first image; obtaining a first correction matrix through a first model, the first correction matrix being used to indicate noise in the first image; updating configuration information of a second model based on the first correction matrix, the configuration information being used to indicate a denoising strategy when the second model performs denoising processing; and performing denoising processing on the first image through the updated second model to obtain a second image.

[0006] The first image and the second image are a low-resolution image and a high-resolution image, respectively (or, in other words, a low-quality image and a high-quality image). The second model is a super-resolution generation model used to denoise the first image. The first correction matrix is ​​used to update the configuration information of the second model before inference. The configuration information of the second model includes its configuration parameters, such as model behavior parameters (e.g., threshold ranges for various functions, decision rules, etc.), post-processing parameters (e.g., image enhancement, format conversion, etc.), and hardware resource parameters (e.g., GPU memory configuration, parallel processing, etc.). These parameters affect the denoising strategy adopted by the second model during inference, such as the granularity and intensity of denoising.

[0007] In this embodiment, by utilizing the noise information of the input image to dynamically adjust the configuration information of the denoising model, accurate sampling and removal of noise are achieved. This not only improves the denoising capability of the model but also enhances its flexibility and adaptability, enabling it to handle different types of noise and image quality degradation problems.

[0008] The above steps can be the model inference process or the forward propagation process of model training. Furthermore, a loss function can be constructed based on the generated second image and the corresponding high-quality image to update the second model.

[0009] In one possible implementation, the second model includes a first sub-network and a second sub-network, the first sub-network being used to indicate configuration information, the second sub-network being used for denoising, and the similarity between the network parameters of the first sub-network and the network parameters of the second sub-network being less than a first threshold.

[0010] The configuration information of the second model is updated based on the first correction matrix, including:

[0011] The first subnetwork is updated based on the first correction matrix to obtain the updated first subnetwork.

[0012] In this application, the first subnetwork is an approximate network constructed by the second subnetwork using the Low-Rank Adaptation (LoRA) principle. The first subnetwork includes matrix C and matrix B. The network parameters of the second subnetwork are: a first correction matrix C, where W approximates AB. After multiplying the first correction matrix C with either matrix A or matrix B, the short side r is eliminated, which can update the parameters of the first subnetwork without affecting the overall network structure.

[0013] In one possible implementation, after updating the first sub-network based on the first correction matrix to obtain the updated first sub-network, the method further includes:

[0014] The updated network parameters of the first sub-network are merged with those of the second sub-network to obtain the updated second model.

[0015] Specifically, after updating the parameters in the first sub-network (matrix A and matrix B) through the first correction matrix C (rectangular multiplication), the updated network parameters of the first sub-network are merged with the network parameters of the second sub-network (matrix addition). The updated network parameters of the second model are then obtained.

[0016] In one possible implementation, the first correction matrix is ​​obtained through the first model, including:

[0017] The first image is degraded to obtain a first degradation vector, which is a vector representation of the noise in the first image.

[0018] The first degradation vector and the target information of the second model are input into the first model to obtain the first correction matrix. The target information is used to indicate the functional modules of the second model and the network layers corresponding to the functional modules.

[0019] In one possible implementation, the second model includes N functional modules, each of which includes at least one network layer, and the first correction matrix includes M correction matrices, each of which corresponds to a network layer in one of the functional modules, where N is a positive integer greater than 1 and M is a positive integer greater than 1.

[0020] In this application, the first model is a multilayer perceptron (MLP), which includes multiple fully connected networks for learning complex mapping relationships between inputs and outputs. After acquiring the first image, a degradation perception network is used to obtain degradation information of the first image. This degradation information is represented as a degradation vector, indicating the noise and blur level of the first image. This degradation vector and the identifier indicating the functional module of the second model are input into the MLP to obtain the correction matrix corresponding to the network layer in each functional module.

[0021] In one possible implementation, the target information includes the identifier of each of the N functional modules and the type of the network layer corresponding to the functional module.

[0022] Specifically, the encoding network of the diffusion generative model consists of different functional modules (Blocks), each performing a different function. These Blocks work together to encode data and extract features. Each Block corresponds to a unique identifier (Block ID), which is generally assigned during model definition using some method (such as automatic numbering or manual naming). During model training, Block IDs can be used for debugging, visualization, parameter management, and other purposes.

[0023] Each block integrates multiple types of network layers, such as convolutional layers, fully connected layers, and attention layers, which can be combined into the same block to achieve different functions.

[0024] The module is identified by the Block ID, and the model type is specified by the type. This is converted into an embedding vector, which is then combined with the first degradation vector. An MLP is used to generate a unique correction matrix for each functional module's network layer. This allows the updated second model to better adapt to super-resolution tasks that handle degradation information.

[0025] In one possible implementation, the second model is a diffusion model.

[0026] In one possible implementation, the method also includes:

[0027] Acquire the third and fourth images, with the fourth image being a high-resolution version of the third image;

[0028] The second correction matrix is ​​obtained through the first model and is used to indicate the noise in the third image;

[0029] The second model is trained using a discriminator based on the second correction matrix and the fourth image.

[0030] In this application, a discriminator is introduced for adversarial training, so that the diffusion model can have the ability of single-step generative super-resolution after fine-tuning, thereby reducing inference time.

[0031] Secondly, this application provides a data processing apparatus, the apparatus comprising:

[0032] The acquisition module is used to acquire the first image;

[0033] The processing module is used to obtain a first correction matrix through a first model, the first correction matrix being used to indicate the noise in the first image;

[0034] The update module is used to update the configuration information of the second model based on the first correction matrix. The configuration information is used to indicate the denoising strategy when the second model performs denoising processing.

[0035] The denoising module is used to denoise the first image using the updated second model to obtain the second image.

[0036] In one possible implementation, the second model includes a first sub-network and a second sub-network, the first sub-network being used to indicate configuration information, the second sub-network being used for denoising, and the similarity between the network parameters of the first sub-network and the network parameters of the second sub-network being less than a first threshold.

[0037] The update module is specifically used for:

[0038] The first subnetwork is updated based on the first correction matrix to obtain the updated first subnetwork.

[0039] In one possible implementation, the update module is further configured to: merge the updated network parameters of the first sub-network with the network parameters of the second sub-network to obtain the updated second model.

[0040] In one possible implementation, the processing module is specifically used to: perform degradation processing on the first image to obtain a first degradation vector, wherein the first degradation vector is a vector representation of the noise in the first image;

[0041] The first degradation vector and the target information of the second model are input into the first model to obtain the first correction matrix. The target information is used to indicate the functional modules of the second model and the network layers corresponding to the functional modules.

[0042] In one possible implementation, the second model includes N functional modules, each functional module including at least one network layer, and the first correction matrix includes M correction matrices, each of the M correction matrices corresponding to a network layer in a functional module, where N is a positive integer greater than 1 and M is a positive integer greater than 1.

[0043] In one possible implementation, the target information includes the identifier of each of the N functional modules and the type of the network layer corresponding to the functional module.

[0044] In one possible implementation, the second model is a diffusion model.

[0045] In one possible implementation, the noise reduction module is also used for:

[0046] Acquire the third and fourth images, with the fourth image being a high-resolution version of the third image;

[0047] The second correction matrix is ​​obtained through the first model and is used to indicate the noise in the third image;

[0048] The second model is trained using a discriminator based on the second correction matrix and the fourth image.

[0049] Thirdly, this application provides a data processing apparatus, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory to perform the methods described in the second aspect above and any optional methods thereunder, or the methods described in the second aspect above and any optional methods thereunder.

[0050] Fourthly, embodiments of this application provide a computer-readable storage medium storing a computer program that, when run on a computer, causes the computer to perform the methods described in the second aspect and any of its optional methods, or the methods described in the third aspect and any of its optional methods.

[0051] Fifthly, embodiments of this application provide a computer program that, when run on a computer, causes the computer to perform the methods described in the second aspect above and any of its alternatives.

[0052] Sixthly, this application provides a chip system including a processor for supporting the implementation of the functions involved in the foregoing aspects, such as transmitting or processing data or information involved in the foregoing methods. In one possible design, the chip system further includes a memory for storing program instructions and data necessary for the execution device or training device. This chip system may be composed of chips or may include chips and other discrete devices.

[0053] The technical effects of the second to sixth aspects of this application can be understood in conjunction with the technical effects of the first aspect and any implementation thereof. Attached Figure Description

[0054] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0055] Figure 1 is a schematic diagram of a structural framework for artificial intelligence.

[0056] Figures 2 to 7 are schematic diagrams of an application architecture provided in an embodiment of this application;

[0057] Figure 8 is a schematic diagram of a data processing method provided in an embodiment of this application;

[0058] Figure 9 is a schematic diagram of low-rank fine-tuning guided by degradation provided in an embodiment of this application;

[0059] Figure 10 is a schematic diagram of the diffusion model based on degradation information update provided in the embodiments of this application;

[0060] Figure 11 is a schematic diagram of the degradation-guided single-step diffusion training process provided in an embodiment of this application;

[0061] Figure 12 is a schematic diagram of the structure of a data processing device provided in an embodiment of this application;

[0062] Figure 13 is a schematic diagram of a device provided in an embodiment of this application;

[0063] Figure 14 is a schematic diagram of a device provided in an embodiment of this application;

[0064] Figure 15 is a schematic diagram of a chip provided in an embodiment of this application. Detailed Implementation

[0065] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0066] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a particular order or sequence. It should be understood that such data can be interchanged where appropriate so that embodiments of the application described herein can be implemented, for example, in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0067] First, the overall workflow of an artificial intelligence system is described, as shown in Figure 1. Figure 1 is a structural diagram of the main framework of artificial intelligence. The framework is then elaborated from two dimensions: the "intelligent information chain" (horizontal axis) and the "IT value chain" (vertical axis). The "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it could be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, and intelligent execution and output. In this process, data undergoes a condensation process of "data-information-knowledge-wisdom." The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (provided and processed by technology) to the industrial ecosystem of the system.

[0068] (1) Infrastructure

[0069] Infrastructure provides computing power to support artificial intelligence systems, enabling communication with the external world and providing support through a basic platform. This communication occurs through sensors; computing power is provided by intelligent chips (hardware acceleration chips such as CPUs, NPUs, GPUs, ASICs, and FPGAs); and the basic platform includes distributed computing frameworks and related platform guarantees and support, which may include cloud storage and computing, interconnected networks, etc. For example, sensors communicate with the outside world to acquire data, and this data is provided to intelligent chips in the distributed computing system provided by the basic platform for computation.

[0070] (2) Data

[0071] The data at the next layer of infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, text, and IoT data from traditional devices, including business data from existing systems and sensor data such as force, displacement, liquid level, temperature, and humidity.

[0072] (3) Data processing

[0073] Data processing typically includes methods such as data training, machine learning, deep learning, search, reasoning, and decision-making.

[0074] Among them, machine learning and deep learning can perform intelligent information modeling, extraction, preprocessing, and training of data by symbolizing and formalizing it.

[0075] Reasoning refers to the process in which, in a computer or intelligent system, the machine thinks and solves problems by simulating human intelligent reasoning, based on reasoning control strategies and using formalized information. Typical functions include search and matching.

[0076] Decision-making refers to the process of making decisions based on intelligent information after reasoning, and it typically provides functions such as classification, sorting, and prediction.

[0077] (4) General ability

[0078] After the data processing mentioned above, the results of the data processing can be used to form some general capabilities, such as algorithms or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.

[0079] (5) Smart Products and Industry Applications

[0080] Intelligent products and industry applications refer to products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Their application areas mainly include: intelligent terminals, intelligent transportation, intelligent healthcare, autonomous driving, smart cities, etc.

[0081] This application can be applied, but is not limited to, to the field of image super-resolution processing in the field of artificial intelligence. Specifically, it can be applied to neural network search and neural network inference in the field of super-resolution processing. The following will introduce several application scenarios that have been implemented in products.

[0082] To better understand the solutions of the embodiments of this application, the possible application scenarios of the embodiments of this application will be briefly introduced below with reference to Figures 2 to 5.

[0083] The product form of this application embodiment can be an image processing application. Image processing applications can run on terminal devices or cloud-based servers.

[0084] In one possible implementation, referring to Figure 2, an image processing application can perform super-resolution tasks to obtain high-quality images.

[0085] For example, image super-resolution enhancement can be provided in image editing scenarios including mobile phone photo galleries and PC terminals. Taking mobile phone photo editing software as an example, as shown in Figure 2, the user uploads a low-quality image that needs super-resolution enhancement, which is then uploaded to a cloud server via the network. The cloud server uses the algorithm of this invention to enhance the image, returns a high-resolution, high-quality image, and sends it back to the user. The final processed image is then displayed on the UI interface.

[0086] In one possible implementation, a user can open an image processing application installed on a terminal device and input image data. The image processing application can process the input image data using a model trained by the method provided in the embodiments of this application, or by the method provided in the embodiments of this application, and present the processing result to the user (the presentation method may include, but is not limited to, displaying, playing, saving, uploading to the cloud, etc.).

[0087] In one possible implementation, a user can open an image processing application installed on a terminal device and input image data. The image processing application can then send the image data to a cloud-based server. The cloud-based server processes the input image data using a model trained by the method provided in this application embodiment and sends the processing result back to the terminal device. The terminal device can then present the processing result to the user (the presentation method may include, but is not limited to, displaying, playing, saving, or uploading to the cloud).

[0088] Please refer to Figure 3, which is a schematic diagram of the entity architecture of an image processing application running in an embodiment of this application. Figure 3 shows a schematic diagram of a system architecture. The system may include a terminal 100 and a server 200. The server 200 may include one or more servers (Figure 3 illustrates this with one server as an example), and the server 200 can provide image processing functions for one or more terminals.

[0089] The terminal 100 may have an image processing application installed or a webpage related to image processing functions open. The application and webpage can provide an interface. The terminal 100 can receive relevant parameters input by the user on the image processing function interface and send the parameters to the server 200. The server 200 can obtain the processing result based on the received parameters and return the processing result to the terminal 100.

[0090] It should be understood that in some optional implementations, the terminal 100 can also complete the action of obtaining the processing result based on the received parameters on its own, without the need for the server to cooperate. This application embodiment is not limited to this.

[0091] The product form of terminal 100 in Figure 3 is described below;

[0092] The terminal 100 in this application embodiment can be a mobile phone, tablet computer, wearable device, vehicle device, augmented reality (AR) / virtual reality (VR) device, laptop computer, ultra-mobile personal computer (UMPC), netbook, personal digital assistant (PDA), etc., and this application embodiment does not impose any restrictions on it.

[0093] Figure 4 shows a schematic diagram of an optional hardware structure for terminal 100.

[0094] Referring to Figure 4, terminal 100 may include components such as a radio frequency unit 110, a memory 120, an input unit 130, a display unit 140, a camera 150 (optional), an audio circuit 160 (optional), a speaker 161 (optional), a microphone 162 (optional), a processor 170, an external interface 180, and a power supply 190. Those skilled in the art will understand that Figure 4 is merely an example of a terminal or multi-functional device and does not constitute a limitation on the terminal or multi-functional device; it may include more or fewer components than illustrated, or combine certain components, or use different components.

[0095] The input unit 130 can be used to receive input numerical or character information, and to generate key signal inputs related to user settings and function control of the portable multi-functional device. Specifically, the input unit 130 may include a touchscreen 131 (optional) and / or other input devices 132. The touchscreen 131 can collect touch operations performed by the user on or near it (such as operations performed by the user using fingers, knuckles, styluses, or any suitable object on or near the touchscreen), and drive the corresponding connection devices according to a pre-set program. The touchscreen can detect the user's touch actions, convert the touch actions into touch signals and send them to the processor 170, and can receive and execute commands sent by the processor 170; the touch signal includes at least touch point coordinate information. The touchscreen 131 can provide an input interface and an output interface between the terminal 100 and the user. In addition, various types of touchscreens, such as resistive, capacitive, infrared, and surface acoustic wave, can be used to implement the touchscreen. Besides the touchscreen 131, the input unit 130 may also include other input devices. Specifically, other input devices 132 may include, but are not limited to, one or more of the following: physical keyboard, function keys (such as volume control buttons, power buttons, etc.), trackball, mouse, joystick, etc.

[0096] Other input devices 132 can receive input images.

[0097] The display unit 140 can be used to display information input by the user or information provided to the user, various menus of the terminal 100, interactive interfaces, file display, and / or playback of any multimedia file. In this embodiment, the display unit 140 can be used to display the interface of an image processing application, processing results, etc.

[0098] The memory 120 can be used to store instructions and data. The memory 120 may primarily include an instruction storage area and a data storage area. The data storage area can store various types of data, such as multimedia files and text. The instruction storage area can store software units such as operating systems, applications, and instructions required for at least one function, or subsets or extended sets thereof. It may also include non-volatile random access memory. It provides the processor 170 with hardware, software, and data resources for managing the computing device, supporting control software and applications. It is also used for storing multimedia files, as well as storing running programs and applications.

[0099] The processor 170 is the control center of the terminal 100. It connects various parts of the terminal 100 via various interfaces and lines. By running or executing instructions stored in the memory 120 and calling data stored in the memory 120, it performs various functions and processes data of the terminal 100, thereby controlling the terminal device as a whole. Optionally, the processor 170 may include one or more processing units; preferably, the processor 170 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and applications, and the modem processor mainly handles wireless communication. It is understood that the modem processor may not be integrated into the processor 170. In some embodiments, the processor and memory can be implemented on a single chip; in some embodiments, they can also be implemented separately on independent chips. The processor 170 can also be used to generate corresponding operation control signals, send them to the corresponding components of the computing processing device, read and process data in the software, especially read and process data and programs in the memory 120, so that the various functional modules therein perform corresponding functions, thereby controlling the corresponding components to act according to the instructions.

[0100] The memory 120 can be used to store software code related to the data processing method, and the processor 170 can execute the steps of the chip's data processing method, and can also schedule other units (such as the above-mentioned input unit 130 and display unit 140) to achieve the corresponding functions.

[0101] The radio frequency unit 110 (optional) can be used for receiving and transmitting signals during information transmission or calls. For example, it can receive downlink information from the base station and process it for the processor 170; additionally, it can transmit uplink data to the base station. Typically, the RF circuit includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low-noise amplifier (LNA), a duplexer, etc. Furthermore, the radio frequency unit 110 can also communicate wirelessly with network devices and other devices. This wireless communication can use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), etc.

[0102] In this embodiment of the application, the radio frequency unit 110 can send an image to the server 200 and receive the processing result sent by the server 200.

[0103] It should be understood that the radio frequency unit 110 is optional and can be replaced with other communication interfaces, such as a network port.

[0104] The terminal 100 also includes a power supply 190 (such as a battery) that supplies power to various components. Preferably, the power supply can be logically connected to the processor 170 through a power management system, thereby enabling functions such as charging, discharging, and power consumption management through the power management system.

[0105] Terminal 100 also includes an external interface 180, which can be a standard Micro USB interface or a multi-pin connector, which can be used to connect terminal 100 to other devices for communication or to connect a charger to charge terminal 100.

[0106] Although not shown, terminal 100 may also include a flash, a wireless fidelity (WiFi) module, a Bluetooth module, and sensors with various functions, which will not be described in detail here. Some or all of the methods described below can be applied to terminal 100 as shown in Figure 4.

[0107] The product form of server 200 in Figure 3 is described below;

[0108] Figure 5 provides a schematic diagram of the structure of a server 200. As shown in Figure 5, the server 200 includes a bus 201, a processor 202, a communication interface 203, and a memory 204. The processor 202, the memory 204, and the communication interface 203 communicate with each other via the bus 201.

[0109] Bus 201 can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of representation, only one thick line is used in Figure 5, but this does not indicate that there is only one bus or one type of bus.

[0110] The processor 202 can be any one or more of the following processors: central processing unit (CPU), graphics processing unit (GPU), microprocessor (MP), or digital signal processor (DSP).

[0111] Memory 204 may include volatile memory, such as random access memory (RAM). Memory 204 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard disk drive (HDD), or solid state drive (SSD).

[0112] The memory 204 can be used to store software code related to the data processing method, and the processor 202 can execute the steps of the chip's data processing method, and can also schedule other units to achieve corresponding functions.

[0113] It should be understood that the aforementioned terminal 100 and server 200 can be centralized or distributed devices. The processors (e.g., processor 170 and processor 202) in the aforementioned terminal 100 and server 200 can be hardware circuits (such as application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), general-purpose processors, digital signal processors (DSPs), microprocessors or microcontrollers, etc.) or combinations of these hardware circuits. For example, the processor can be a hardware system with instruction execution capabilities, such as a CPU or DSP, or a hardware system without instruction execution capabilities, such as an ASIC or FPGA, or a combination of the aforementioned hardware systems without instruction execution capabilities and hardware systems with instruction execution capabilities.

[0114] It should be understood that the steps related to the model inference process in the embodiments of this application involve AI-related operations. When performing AI operations, the instruction execution architecture of the terminal device and the server is not limited to the processor-memory architecture described above. The system architecture provided in the embodiments of this application will be described in detail below with reference to Figure 6.

[0115] Figure 6 is a schematic diagram of the system architecture provided in an embodiment of this application. As shown in Figure 6, the system architecture 500 includes an execution device 510, a training device 520, a database 530, a client device 540, a data storage system 550, and a data acquisition device 560.

[0116] The execution device 510 includes a calculation module 511, an I / O interface 512, a preprocessing module 513, and a preprocessing module 514. The calculation module 511 may include a target model / rule 501, while the preprocessing modules 513 and 514 are optional.

[0117] The execution device 510 can be a terminal device or a server that runs the aforementioned image processing applications.

[0118] The data acquisition device 560 is used to collect training samples. Training samples can be images, etc. After collecting the training samples, the data acquisition device 560 stores these training samples in the database 530.

[0119] The training device 520 can maintain training samples in the database 530 to obtain the target model / rule 501 from the neural network to be trained (e.g., the graph neural network in the embodiments of this application).

[0120] It should be understood that the training device 520 can perform a pre-training process on the neural network to be trained based on the training samples maintained in the database 530, or fine-tune the model based on the pre-training.

[0121] It should be noted that in practical applications, the training samples maintained in database 530 may not all come from the data acquisition device 560; they may also be received from other devices. Furthermore, it should be noted that training device 520 may not necessarily train the target model / rule 501 entirely based on the training samples maintained in database 530; it may also obtain training samples from the cloud or other sources for model training. The above description should not be construed as limiting the embodiments of this application.

[0122] The target model / rule 501 trained by the training device 520 can be applied to different systems or devices, such as the execution device 510 shown in Figure 6. The execution device 510 can be a terminal, such as a mobile terminal, tablet computer, laptop computer, augmented reality (AR) / virtual reality (VR) device, vehicle terminal, etc., or it can be a server, etc.

[0123] Specifically, the training device 520 can transfer the trained model to the execution device 510.

[0124] In Figure 6, the execution device 510 is configured with an input / output (I / O) interface 512 for data interaction with external devices. Users can input data (such as images in the embodiments of this application) into the I / O interface 512 through the client device 540.

[0125] Preprocessing modules 513 and 514 are used to preprocess the input data received from the I / O interface 512. It should be understood that preprocessing modules 513 and 514 may be absent, or only one preprocessing module may be used. When preprocessing modules 513 and 514 are absent, the calculation module 511 can be used directly to process the input data.

[0126] During the preprocessing of input data by the execution device 510, or during the calculation module 511 of the execution device 510 performing calculations and other related processes, the execution device 510 can call data, code, etc. in the data storage system 550 for corresponding processing, or store the data, instructions, etc. obtained from the corresponding processing into the data storage system 550.

[0127] Finally, the I / O interface 512 provides the processing result to the client device 540, thereby providing it to the user.

[0128] In the scenario shown in Figure 6, the user can manually provide input data, which can be done through the interface provided by I / O interface 512. Alternatively, the client device 540 can automatically send input data to I / O interface 512. If user authorization is required for the client device 540 to automatically send input data, the user can set the corresponding permissions in the client device 540. The user can view the output results of the execution device 510 on the client device 540, which can be presented in various forms such as display, sound, or animation. The client device 540 can also act as a data acquisition terminal, collecting the input data and output results of the input I / O interface 512 as shown in the figure, and storing them as new sample data in database 530. Alternatively, data can be collected directly from the I / O interface 512 without going through the client device 540, using the input data and output results of the input I / O interface 512 as shown in the figure, and storing them as new sample data in database 530.

[0129] It is worth noting that Figure 6 is merely a schematic diagram of a system architecture provided in an embodiment of this application. The positional relationships between the devices, components, modules, etc. shown in the figure do not constitute any limitation. For example, in Figure 6, the data storage system 550 is an external memory relative to the execution device 510. In other cases, the data storage system 550 can also be placed in the execution device 510. It should be understood that the aforementioned execution device 510 can be deployed in the client device 540.

[0130] The following section describes the more detailed architecture of the execution entity of the data processing method in the embodiments of this application.

[0131] The system architecture provided in this application embodiment will be described in detail below with reference to Figure 6. Figure 6 is a schematic diagram of the system architecture provided in this application embodiment. As shown in Figure 6, the system architecture 500 includes an execution device 510, a training device 520, a database 530, a client device 540, a data storage system 550, and a data acquisition device 560.

[0132] The execution device 510 includes a calculation module 511, an I / O interface 512, a preprocessing module 513, and a preprocessing module 514. The calculation module 511 may include a target model / rule 501, while the preprocessing modules 513 and 514 are optional.

[0133] The data acquisition device 560 is used to collect training samples. Training samples can be low-quality images, etc. In this embodiment, the training samples are the data used to train multiple candidate neural networks. After collecting the training samples, the data acquisition device 560 stores them in the database 530.

[0134] The training device 520 can construct multiple candidate neural networks based on the search space maintained in the database 530, and train the neural networks based on training samples to search for and obtain the target model / rule 501. In this embodiment, the target model / rule 501 can be the target neural network.

[0135] It should be noted that in practical applications, the training samples maintained in database 530 may not all come from the data acquisition device 560; they may also be received from other devices. Furthermore, it should be noted that training device 520 may not necessarily train the target model / rule 501 entirely based on the training samples maintained in database 530; it may also obtain training samples from the cloud or other sources for model training. The above description should not be construed as limiting the embodiments of this application.

[0136] The target model / rule 501 trained by the training device 520 can be applied to different systems or devices, such as the execution device 510 shown in Figure 6. The execution device 510 can be a terminal, such as a mobile terminal, tablet computer, laptop computer, augmented reality (AR) / virtual reality (VR) device, vehicle terminal, etc., or it can be a server or cloud, etc.

[0137] Specifically, the training device 520 can transmit the target neural network to the execution device 510.

[0138] In Figure 6, the execution device 510 is configured with an input / output (I / O) interface 512 for data interaction with external devices. Users can input data to the I / O interface 512 through the client device 540.

[0139] Preprocessing modules 513 and 514 are used to preprocess the input data received from the I / O interface 512. It should be understood that preprocessing modules 513 and 514 may be absent, or only one preprocessing module may be used. When preprocessing modules 513 and 514 are absent, the calculation module 511 can be used directly to process the input data.

[0140] During the preprocessing of input data by the execution device 510, or during the calculation module 511 of the execution device 510 performing calculations and other related processes, the execution device 510 can call data, code, etc. in the data storage system 550 for corresponding processing, or store the data, instructions, etc. obtained from the corresponding processing into the data storage system 550.

[0141] Finally, the I / O interface 512 presents the processing results (such as the image processing results in this embodiment) to the client device 540, thereby providing them to the user.

[0142] From the inference side of the model:

[0143] In this embodiment, the computing module 511 of the execution device 510 can obtain the code stored in the data storage system 550 to implement the data processing method in this embodiment.

[0144] In this embodiment of the application, the computing module 511 of the execution device 510 may include hardware circuits (such as application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), general-purpose processors, digital signal processors (DSPs), microprocessors or microcontrollers, etc.) or combinations of these hardware circuits. For example, the training device 520 may be a hardware system with instruction execution capabilities, such as a CPU or DSP, or a hardware system without instruction execution capabilities, such as an ASIC or FPGA, or a combination of the aforementioned hardware systems without instruction execution capabilities and hardware systems with instruction execution capabilities.

[0145] Specifically, the computing module 511 of the execution device 510 can be a hardware system with the function of executing instructions. The data processing method provided in this application embodiment can be software code stored in the memory. The computing module 511 of the execution device 510 can obtain the software code from the memory and execute the obtained software code to implement the data processing method provided in this application embodiment.

[0146] It should be understood that the computing module 511 of the execution device 510 can be a combination of a hardware system without the function of executing instructions and a hardware system with the function of executing instructions. Some steps of the data processing method provided in the embodiments of this application can also be implemented by the hardware system without the function of executing instructions in the computing module 511 of the execution device 510, which is not limited here.

[0147] From the training side of the model:

[0148] In this embodiment, the training device 520 can obtain the code stored in the memory (not shown in Figure 6, which can be integrated into the training device 520 or deployed separately from the training device 520) to implement the data processing method in this embodiment.

[0149] In this embodiment of the application, the training device 520 may include hardware circuits (such as application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), general-purpose processors, digital signal processors (DSPs), microprocessors or microcontrollers, etc.) or combinations of these hardware circuits. For example, the training device 520 may be a hardware system with instruction execution capabilities, such as a CPU or DSP, or a hardware system without instruction execution capabilities, such as an ASIC or FPGA, or a combination of the aforementioned hardware systems without instruction execution capabilities and hardware systems with instruction execution capabilities.

[0150] Specifically, the training device 520 can be a hardware system with instruction execution capabilities. The data processing method provided in this application embodiment can be software code stored in a memory. The training device 520 can retrieve the software code from the memory and execute the retrieved software code to implement the data processing method provided in this application embodiment.

[0151] It should be understood that the training device 520 can be a combination of a hardware system without the function of executing instructions and a hardware system with the function of executing instructions. Some steps of the data processing method provided in the embodiments of this application can also be implemented by the hardware system in the training device 520 without the function of executing instructions, which is not limited here.

[0152] In one possible implementation, the server can provide image processing services to the client side through an application programming interface (API).

[0153] In this process, the terminal device can send relevant parameters (such as image data) to the server through the API provided by the cloud. The server can obtain the processing results based on the received parameters and return the processing results to the terminal.

[0154] The description of the terminal and server can be found in the above embodiments, and will not be repeated here.

[0155] Figure 7 illustrates the process of using a cloud service with image processing capabilities provided by a cloud platform.

[0156] 1. Activate and purchase image processing services.

[0157] 2. Users can download the software development kit (SDK) corresponding to the image processing service. Cloud platforms usually provide multiple development versions of the SDK for users to choose from according to their development environment needs, such as JAVA version SDK, Python version SDK, PHP version SDK, Android version SDK, etc.

[0158] 3. After downloading the corresponding version of the SDK to their local machine according to their needs, users can import the SDK project into their local development environment, configure and debug it in the local development environment, and develop other functions in the local development environment to form an application that integrates image processing capabilities.

[0159] 4. When an image processing application is used, it can trigger an API call for the image processing function when image processing is required. When the application triggers the image processing function, it sends an API request to the running instance of the image processing function service in the cloud environment. The API request carries the image, and the running instance in the cloud environment processes the input image data to obtain the processing result.

[0160] 5. The cloud environment returns the processing result to the application, thus completing one image processing function call.

[0161] Since the embodiments of this application involve a large number of neural network applications, for ease of understanding, the relevant terms and concepts such as neural networks involved in the embodiments of this application will be introduced below.

[0162] (1) Neural Network

[0163] A neural network can be composed of neural units, which can be operational units that take xs and an intercept of 1 as inputs, and whose output can be:

[0164] Where s = 1, 2, ..., n, where n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer, and the activation function can be the sigmoid function. A neural network is a network formed by connecting multiple of the above-mentioned individual neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field, which can be a region composed of several neural units.

[0165] (2) Loss Function

[0166] In training a deep neural network, to ensure the output closely approximates the desired predicted value, we compare the network's prediction with the target value. Based on the difference, we update the weight vector of each layer (usually pre-configuring parameters before the initial update). For example, if the prediction is too high, the weight vector is adjusted to predict a lower value. This adjustment continues until the deep neural network predicts the target value or a value very close to it. Therefore, we need to predefine "how to compare the difference between the predicted and target values," which is the loss function or objective function. These are important equations used to measure the difference between the predicted and target values. Taking the loss function as an example, a higher output value (loss) indicates a greater difference, and training the deep neural network becomes a process of minimizing this loss.

[0167] (3) Backpropagation algorithm

[0168] Convolutional neural networks can employ backpropagation (BP) to correct the parameters in the initial super-resolution model during training, thereby reducing the reconstruction error loss. Specifically, forward propagation of the input signal to the output generates an error loss; this error loss information is then propagated back to update the parameters in the initial super-resolution model, leading to convergence of the error loss. The backpropagation algorithm is an error-loss-driven backpropagation process aimed at obtaining the optimal parameters of the super-resolution model, such as the weight matrix.

[0169] (4) Deep Neural Networks

[0170] Deep Neural Networks (DNNs), also known as multilayer neural networks, can be understood as neural networks with many hidden layers, though there's no specific metric for "many." DNNs can be categorized into three layers based on their position: input layers, hidden layers, and output layers. Generally, the first layer is the input layer, the last layer is the output layer, and the layers in between are hidden layers. All layers are fully connected, meaning that any neuron in the i-th layer is connected to any neuron in the (i+1)-th layer. Although DNNs appear complex, the operation of each layer is actually quite simple, resembling a linear relationship as follows: in, It is the input vector. It is the output vector. α is the offset vector, W is the weight matrix (also called coefficients), and α() is the activation function. Each layer is simply an adjustment of the input vector. The output vector is obtained through such a simple operation. Because DNNs have many layers, the coefficients W and the offset vector... The number of these parameters is therefore quite large. The definitions of these parameters in a DNN are as follows: Taking the coefficient W as an example: Assuming a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as... The superscript 3 represents the layer number where coefficient W resides, while the subscript corresponds to the output third layer index 2 and the input second layer index 4. In summary, the coefficients from the k-th neuron in layer L-1 to the j-th neuron in layer L are defined as follows: It's important to note that the input layer does not have a W parameter. In deep neural networks, more hidden layers allow the network to better represent complex real-world situations. Theoretically, the more parameters a model has, the higher its complexity and "capacity," meaning it can perform more complex learning tasks. Training a deep neural network is essentially the process of learning the weight matrix, with the ultimate goal of obtaining the weight matrix of all layers in the trained deep neural network (a weight matrix formed by the vectors W from many layers).

[0171] (5) Diffusion Model (DM)

[0172] Diffusion models, also known as diffusion generative models, are deep learning generative models based on probabilistic statistics and nonequilibrium thermodynamics. They work by simulating a gradual diffusion process, which first transforms data into unstructured random noise, and then reverses this process to recover meaningful data from the noise. Diffusion generative models consist of two processes: a forward diffusion process and a reverse denoising process. In the forward process, the data is gradually "diffused" into noise. This process begins with the original data and then adds a certain amount of noise to the data at each time step until the data is completely Gaussian noise. The level of noise added at each time step is carefully designed to ensure that enough information is retained to recover the original data as it gradually becomes noise. In the reverse process, the model starts with pure Gaussian noise and gradually removes the noise to recover the original data. This process is achieved by predicting and removing noise at each step. By continuously removing noise, the model can eventually generate a high-resolution image that matches the distribution of the training data.

[0173] (6) Low-Rank Adaptation (LoRA)

[0174] LoRA is a technique for optimizing deep learning models by fine-tuning large-scale pre-trained models. Compared to full fine-tuning, LoRA trains only a small number of parameters while maintaining the performance achievable with full fine-tuning. Assuming the model's weight matrix has a redundant structure, it can be decomposed into a low-rank form, and weight updates are approximated by adding low-rank matrices. This keeps most of the model's weights frozen, with only a small portion of the low-rank matrix being updated, significantly reducing the number of parameters and the computational cost of fine-tuning. During model inference, the fine-tuned low-rank matrices are recombine to form the original weight matrix, which is then used to generate the final output. This approach preserves the capabilities of the pre-trained model while enhancing task-specific performance through fine-tuning.

[0175] For example, LoRA relies on the core concept of low-rank matrix factorization, approximating the weight update by introducing the product (AB) of two low-rank matrices (matrix A and matrix B). AB can be considered as a mapping from the low-dimensional correction matrix to the high-dimensional basis model parameter matrix W.

[0176] (7) Image noise

[0177] In this application, image noise includes the degree of image blurring and noise information. During image capture or transmission, images may encounter various random signal interferences, which cause unexpected random fluctuations in image information or pixel brightness, leading to problems such as decreased sharpness, blurred details, and color distortion.

[0178] (8) Degenerate sensing network

[0179] Degradation-aware networks, also known as degradation prediction networks, primarily extract feature information from images (such as texture, edges, and colors) and learn the mapping relationship between image degradation and the original high-quality image to predict degradation of low-quality images, obtaining degradation information such as the degree of blur and noise.

[0180] In existing technologies, generative adversarial networks (GANs) or diffusion models are typically used to perform image super-resolution tasks. GANs can generate high-resolution images that reconstruct more details than non-generative methods and appear relatively realistic. However, the adversarial training mechanism of GANs can easily lead to unstable image reconstruction quality. Diffusion models, through their progressive generation process, can more effectively recover image details and generate more consistent images.

[0181] However, diffusion models lack awareness of degradation information (such as noise) in LR images, affecting the quality of image generation. This technical solution optimizes the application of diffusion models in super-resolution tasks by introducing a degradation-aware mechanism, thereby improving the ability and effectiveness of image super-resolution.

[0182] To address the aforementioned problems, this application provides a data processing method. As shown in Figure 8, the data processing method provided in this application includes the following steps 801-804.

[0183] 801. Obtain the first image.

[0184] The first image is a low-resolution image, or a low-quality image. This first image is an image uploaded by the user that requires super-resolution processing.

[0185] For example, the image may appear blurry due to poor shooting conditions (such as insufficient light, uneven lighting, or strong reflections); it may also be that some areas of an originally clear image become blurry after being enlarged; or it may be that the image was improperly processed during transmission or compression, resulting in quality loss.

[0186] 802. Obtain a first correction matrix through the first model. The first correction matrix is ​​used to indicate the noise in the first image.

[0187] In this embodiment of the application, the first model is a multilayer perceptron (MLP), which includes multiple fully connected networks for learning complex mapping relationships between inputs and outputs.

[0188] In one possible implementation, after acquiring the first image, a degradation perception network is used to obtain the degradation information of the first image. The degradation information of the image is represented as a two-dimensional vector d = {d n ,d b}∈[0,1] 2 , representing the degree of noise and ambiguity. The estimated degradation vector d is then transformed through a Gaussian Fourier embedding layer into:

[0189] Among them, W e ∈R m To randomly initialize the matrix.

[0190] Then, d e Input the first model, i.e., a multi-layer fully connected network, and obtain the first correction matrix C.

[0191] Specifically, the first correction matrix is ​​used to update the configuration information of the second model before inference. This configuration information includes configuration parameters during inference, such as model behavior parameters (e.g., threshold ranges and decision rules for various functions), post-processing parameters (e.g., image enhancement and format conversion), and hardware resource parameters (e.g., GPU memory configuration and parallel processing). This configuration information is recorded in the network parameters corresponding to each functional module of the second model.

[0192] Optionally, when generating the first correction matrix C, the target information of the second model is also incorporated to indicate the functional modules of the second model and the network layers of each functional module.

[0193] 803. Update the configuration information of the second model based on the first correction matrix. The configuration information is used to indicate the denoising strategy when the second model performs denoising processing.

[0194] In this application, the second model is a diffusion model. This application primarily involves fine-tuning the configuration information of the diffusion model to obtain prior knowledge of low-resolution image degradation, thereby guiding the generation of high-resolution images. The denoising strategies include, for example, adjusting the granularity and intensity of denoising, resulting in more precise noise sampling.

[0195] The second model includes a first sub-network and a second sub-network. The first sub-network is used to indicate the configuration information, and the second sub-network is used to denoise the image. The first sub-network is an approximation of the second sub-network, meaning that the similarity between the network parameters of the first sub-network and the network parameters of the second sub-network is less than a first threshold.

[0196] For example, the first subnetwork is an approximate network constructed by the second subnetwork using the LoRA principle. The first subnetwork includes matrices A and B in LoRA.

[0197] Specifically, updating the configuration information of the second model means updating the parameters in the first sub-network (matrix A and matrix B).

[0198] Please refer to Figure 9, the network parameter matrix W∈R of the second sub-network. d×n Matrix A∈R d×r Matrix B∈R r×n The first correction matrix C∈R r×r , where r < <d,r<<n。

[0199] It is understandable that when a d-row, r-column matrix A is multiplied by an r-row, n-column matrix B, their common dimension r is eliminated during the multiplication operation (i.e., an inner product is performed), thus forming a new d-row, n-column matrix W′∈R. d×n .

[0200] Specifically, matrix C is used to update the parameters in the first sub-network (matrices A and B) (rectangular multiplication), and the updated network parameters of the first sub-network are then merged with the network parameters of the second sub-network (matrix addition). That is: W new =W+A(CB)

[0201] Among them, W new Here, W represents the network parameters updated in the second model, and W represents the original network parameters of the second sub-network. Similarly, after multiplying matrix C with either matrix A or matrix B, the short side r is eliminated.

[0202] In one possible implementation, the diffusion model includes N functional modules, each functional module includes at least one network layer, and the first correction matrix includes M correction matrices, each of the M correction matrices corresponding to a network layer in one of the functional modules.

[0203] Specifically, the encoding network of the diffusion generative model consists of different functional modules (Blocks), each performing a different function. These Blocks work together to encode data and extract features. Each Block corresponds to a unique identifier (Block ID), which is generally assigned during model definition using some method (such as automatic numbering or manual naming). During model training, Block IDs can be used for debugging, visualization, parameter management, and other purposes.

[0204] Each block integrates multiple types of network layers, such as convolutional layers, fully connected layers, and attention layers, which can be combined into the same block to achieve different functions.

[0205] As shown in Figure 10, during step 802 when generating the first correction matrix, the block ID and network layer type of each functional module in the coding network of the diffusion model are converted into embedding vectors and fused with the degradation information of the low-resolution image to generate a correction matrix specific to the network layer of each functional module. Module ID encoding and network layer type encoding That is, the C corresponding to the i-th and j-th modules i,j Generate as: C i,j =MLP(FC(d) e ),l i ,u j )

[0206] FC stands for Fully Connected Network.

[0207] In one possible implementation, the number of parameters for each matrix C is different, i.e., the dimension r is different.

[0208] In one possible implementation, the diffusion model includes a VAE encoder and a UNet network, specifically updating the functional modules within the VAE encoder and the UNet network.

[0209] 804. The first image is denoised using the updated second model to obtain the second image.

[0210] The updated second model is used to denoise the first image, while simultaneously performing super-resolution reconstruction to generate a high-resolution (or high-quality) second image. This process not only removes noise from the original image but also restores details and textures, making it clearer and more realistic.

[0211] Optionally, after obtaining a high-quality image, it can be sent back to the user and displayed on the user's device's UI.

[0212] In one possible implementation, the diffusion model is a single-step diffusion model that has undergone distillation. Please refer to Figure 11, which illustrates the degradation-guided single-step diffusion training process. The specific processing flow includes:

[0213] Degradation perception: Using a pre-trained degradation perception network, the blur and noise levels of the input low-resolution image are predicted to obtain a two-dimensional vector representing the degree of degradation.

[0214] Degradation-guided low-rank fine-tuning: The two-dimensional vector representing the degree of degradation is concatenated with the module ID encoding of the VAE encoder and the denoising UNet network on the feature dimension, respectively. The concatenated result is input into a multi-layer fully connected network to generate independent correction matrices C for each module. Each module of the VAE encoder and each module of the denoising UNet network has its own exclusive correction matrix C.

[0215] Single-step diffusion super-resolution model: Utilizing pairs of low-resolution and high-resolution images, a pre-trained diffusion model is used as a prior for the generative model. The single-step super-resolution model is trained through degradation-guided low-rank fine-tuning, and an adversarial training process is performed by incorporating a CLIP discriminator. This enables the fine-tuned diffusion model to possess single-step generative super-resolution capabilities.

[0216] Specifically, reconstruction loss is adopted. Including L2 loss (i.e., mean squared error (MSE)) and perceptual loss (learned perceptual image patch similarity, LPIPS) Simultaneously, GAN loss is introduced to minimize the distribution difference between the generated image and the real high-resolution image. The complete learning objective is:

[0217] Among them G θ The generator, i.e., the super-resolution model, is represented by λ. L2 , λ LPIPS and λ GAN The weights are used to balance the various losses. The GAN loss is defined as:

[0218] Where D φ Let represent the discriminator, with parameter φ. A pre-trained DINO model is used as the fixed backbone network of the discriminator, and multiple independent classifiers are introduced, each corresponding to a different level of features in the backbone model.

[0219] In this application, a degradation-aware model is used to estimate the degradation of low-resolution images, and this estimation result is input into a multilayer perceptron to generate a correction matrix. This allows the diffusion model to sample noise from low-resolution images more accurately during denoising, thereby improving the model's denoising accuracy. Furthermore, considering that sharing a single correction matrix across all layers might not be fully adaptable to downstream fine-tuning tasks, this approach proposes using a combination of functional module identifiers and types to generate a corresponding correction matrix for each functional module via the multilayer perceptron. Simultaneously, a discriminator is introduced for adversarial training, enabling the diffusion model to achieve single-step generative super-resolution capabilities after fine-tuning, thus reducing inference time.

[0220] The methods provided in the embodiments of this application have been described in detail above. Next, the device for performing the above methods provided in the embodiments of this application will be described.

[0221] Please refer to Figure 12, which is a schematic diagram of the structure of a data processing device 1200 provided in an embodiment of this application. As shown in Figure 12, the device includes:

[0222] The acquisition module 1201 is used to acquire the first image;

[0223] Processing module 1202 is used to obtain a first correction matrix through a first model, wherein the first correction matrix is ​​used to indicate the noise of the first image;

[0224] The update module 1203 is used to update the configuration information of the second model based on the first correction matrix. The configuration information is used to indicate the denoising strategy when the second model performs denoising processing.

[0225] The denoising module 1204 is used to denoise the first image using the updated second model to obtain the second image.

[0226] In one possible implementation, the second model includes a first sub-network and a second sub-network, wherein the first sub-network is used to indicate the configuration information, the second sub-network is used to perform denoising processing, and the similarity between the network parameters of the first sub-network and the network parameters of the second sub-network is less than a first threshold.

[0227] The update module 1203 is specifically used for:

[0228] The first sub-network is updated based on the first correction matrix to obtain the updated first sub-network.

[0229] In one possible implementation, update module 1203 is also used for:

[0230] The updated network parameters of the first sub-network are merged with the network parameters of the second sub-network to obtain the updated second model.

[0231] In one possible implementation, the processing module 1202 is specifically used for:

[0232] The first image is degraded to obtain a first degradation vector, which is a vector representation of the noise in the first image;

[0233] The first degradation vector and the target information of the second model are input into the first model to obtain the first correction matrix. The target information is used to indicate the functional modules of the second model and the network layers corresponding to the functional modules.

[0234] In one possible implementation, the second model includes N functional modules, each functional module including at least one network layer, and the first correction matrix includes M correction matrices, each of the M correction matrices corresponding to a network layer in a functional module, where N is a positive integer greater than 1 and M is a positive integer greater than 1.

[0235] In one possible implementation, the target information includes the identifier of each of the N functional modules and the type of the network layer corresponding to the functional module.

[0236] In one possible implementation, the second model is a diffusion model.

[0237] In one possible implementation, the noise reduction module 1204 is also used for:

[0238] Acquire the third and fourth images, with the fourth image being a high-resolution version of the third image;

[0239] The second correction matrix is ​​obtained through the first model and is used to indicate the noise in the third image;

[0240] The second model is trained using a discriminator based on the second correction matrix and the fourth image.

[0241] This application also relates to an execution device. Figure 13 is a structural schematic diagram of an execution device provided in this application embodiment. As shown in Figure 13, the execution device 1300 can specifically be a tablet, laptop, server, etc., and is not limited here. Specifically, the execution device 1300 includes: a receiver 1310, a transmitter 1320, a processor 1330, and a memory 1340 (wherein the execution device 1300 may have one or more processors 1330, and Figure 13 shows one processor as an example). The processor 1330 may include an application processor 1331 and a communication processor 1332. In some embodiments of this application, the receiver 1310, transmitter 1320, processor 1330, and memory 1340 may be connected via a bus or other means.

[0242] Memory 1340 may include read-only memory and random access memory, and provides instructions and data to processor 1330. A portion of memory 1340 may also include non-volatile random access memory (NVRAM). Memory 1340 stores processor and operation instructions, executable modules, or data structures, or subsets thereof, or extended sets thereof, wherein the operation instructions may include various operation instructions for implementing various operations.

[0243] The processor 1330 controls the operation of the execution device. In specific applications, the various components of the execution device are coupled together through a bus system, which may include not only the data bus, but also power buses, control buses, and status signal buses. However, for clarity, all buses in the diagram are referred to as the bus system.

[0244] The methods disclosed in the embodiments of this application can be applied to or implemented by the processor 1330. The processor 1330 can be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method can be completed by the integrated logic circuitry in the hardware of the processor 1330 or by instructions in software form. The processor 1330 can be a general-purpose processor, a digital signal processor (DSP), a microprocessor, or a microcontroller, and may further include an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. The processor 1330 can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can reside in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory 1340. Processor 1330 reads information from memory 1340 and, in conjunction with its hardware, completes the steps of the above method.

[0245] Receiver 1310 can be used to receive input digital or character information, and to generate signal inputs related to the settings and function control of the execution device. Transmitter 1320 can be used to output digital or character information through the first interface; transmitter 1320 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; transmitter 1320 may also include a display device such as a display screen. This application embodiment also relates to an execution device, and Figure 13 is a structural schematic diagram of an execution device provided in this application embodiment. As shown in Figure 13, the execution device 1300 can specifically be a tablet, laptop, server, etc., and is not limited here. The execution device 1300 may be deployed with the machine allocation system described in the embodiment corresponding to Figure 7, used to implement the resource allocation method in the embodiment corresponding to Figure 3. Specifically, the execution device 1300 includes a receiver 1310, a transmitter 1320, a processor 1330, and a memory 1340 (wherein the execution device 1300 may have one or more processors 1330; Figure 13 shows an example with one processor). The processor 1330 may include an application processor 1331 and a communication processor 1332. In some embodiments of this application, the receiver 1310, transmitter 1320, processor 1330, and memory 1340 may be connected via a bus or other means.

[0246] Memory 1340 may include read-only memory and random access memory, and provides instructions and data to processor 1330. A portion of memory 1340 may also include non-volatile random access memory (NVRAM). Memory 1340 stores processor and operation instructions, executable modules, or data structures, or subsets thereof, or extended sets thereof, wherein the operation instructions may include various operation instructions for implementing various operations.

[0247] The processor 1330 controls the operation of the execution device. In specific applications, the various components of the execution device are coupled together through a bus system, which may include not only the data bus, but also power buses, control buses, and status signal buses. However, for clarity, all buses in the diagram are referred to as the bus system.

[0248] The methods disclosed in the embodiments of this application can be applied to or implemented by the processor 1330. The processor 1330 can be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method can be completed by the integrated logic circuitry in the hardware of the processor 1330 or by instructions in software form. The processor 1330 can be a general-purpose processor, a digital signal processor (DSP), a microprocessor, or a microcontroller, and may further include an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. The processor 1330 can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can reside in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory 1340. Processor 1330 reads information from memory 1340 and, in conjunction with its hardware, completes the steps of the above method.

[0249] Receiver 1310 can be used to receive input digital or character information, and to generate signal inputs related to the settings and function control of the execution device. Transmitter 1320 can be used to output digital or character information through the first interface; transmitter 1320 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; transmitter 1320 may also include a display device such as a display screen.

[0250] This application embodiment also provides a server. Please refer to Figure 14. Figure 14 is a schematic diagram of a server structure provided in this application embodiment. The server 1400 can be deployed with the device described in the embodiment corresponding to Figure 12. Specifically, the server 1400 is implemented by one or more servers. The server 1400 can vary significantly due to different configurations or performance. It can include one or more central processing units (CPUs) 1414 (e.g., one or more processors) and memory 1432, and one or more storage media 1430 (e.g., one or more mass storage devices) for storing application programs 1442 or data 1444. The memory 1432 and storage media 1430 can be temporary or persistent storage. The program stored in the storage media 1430 can include one or more modules (not shown in the figure), and each module can include a series of instruction operations on the server. Furthermore, the CPU 1414 can be configured to communicate with the storage media 1430 and execute a series of instruction operations in the storage media 1430 on the server 1400.

[0251] Server 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input / output interfaces 1458; or, one or more operating systems 1441, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™, etc.

[0252] In this embodiment, the central processing unit 1414 is used to execute the method in the embodiment corresponding to FIG8.

[0253] This application also provides a computer program product that, when run on a computer, causes the computer to perform steps as performed by the aforementioned image processing apparatus, or causes the computer to perform steps as performed by the aforementioned image processing apparatus.

[0254] This application also provides a computer-readable storage medium storing a program for performing signal processing, which, when run on a computer, causes the computer to perform steps as performed by the aforementioned image processing apparatus, or causes the computer to perform steps as performed by the aforementioned image processing apparatus.

[0255] The execution device, server, or terminal device provided in this application embodiment can specifically be a chip. The chip includes a processing unit and a communication unit. The processing unit can be, for example, a processor, and the communication unit can be, for example, an input / output interface, pins, or circuits. The processing unit can execute computer execution instructions stored in the storage unit to cause the chip within the execution device to execute the data processing method described in the above embodiments, or to cause the chip within the server to execute the data processing method described in the above embodiments. Optionally, the storage unit can be a storage unit within the chip, such as a register or cache. Alternatively, the storage unit can be a storage unit located outside the chip within the wireless access device, such as a read-only memory (ROM) or other types of static storage devices capable of storing static information and instructions, such as random access memory (RAM).

[0256] Specifically, please refer to Figure 15, which is a schematic diagram of a chip structure provided in an embodiment of this application. This chip can be represented as a neural network processor (NPU) 1500. The NPU 1500 is mounted as a coprocessor on the host CPU, and tasks are assigned by the host CPU. The core part of the NPU is the arithmetic circuit 1503, which is controlled by the controller 1504 to extract matrix data from the memory and perform multiplication operations.

[0257] In some implementations, the arithmetic circuit 1503 internally includes multiple processing engines (PEs). In some implementations, the arithmetic circuit 1503 is a two-dimensional pulsating array. The arithmetic circuit 1503 can also be a one-dimensional pulsating array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1503 is a general-purpose matrix processor.

[0258] For example, suppose we have an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit retrieves the corresponding data of matrix B from the weight memory 1502 and caches it in each PE of the arithmetic circuit. The arithmetic circuit retrieves the data of matrix A from the input memory 1501 and performs matrix operations with matrix B. The partial result or the final result of the obtained matrix is ​​stored in the accumulator 1508.

[0259] Unified memory 1506 is used to store input and output data. Weight data is directly transferred to weight memory 1502 via Direct Memory Access Controller (DMAC) 1505. Input data is also transferred to unified memory 1506 via DMAC.

[0260] BIU stands for Bus Interface Unit, which is used for interaction between the AXI bus and the DMAC and the Instruction Fetch Buffer (IFB) 1509.

[0261] The Bus Interface Unit (BIU) 1510 is used by the instruction fetch memory 1509 to fetch instructions from external memory, and also by the memory access controller 1505 to fetch the original data of the input matrix A or the weight matrix B from external memory.

[0262] The DMAC is mainly used to move input data from external memory DDR to unified memory 1506, or to weight data to weight memory 1502, or to input data to input memory 1501.

[0263] The vector computation unit 1507 includes multiple arithmetic processing units that further process the output of the computation circuit as needed, such as vector multiplication, vector addition, exponential operations, logarithmic operations, size comparisons, etc. It is mainly used for computation in non-convolutional / fully connected layers of neural networks, such as batch normalization, pixel-level summation, and upsampling of feature planes.

[0264] In some implementations, the vector computation unit 1507 can store the processed output vector in the unified memory 1506. For example, the vector computation unit 1507 can apply a linear function, or a nonlinear function, to the output of the computation circuit 1503, such as linear interpolation of feature planes extracted by a convolutional layer, or, for example, a vector of accumulated values, to generate activation values. In some implementations, the vector computation unit 1507 generates normalized values, pixel-level summed values, or both. In some implementations, the processed output vector can be used as activation input to the computation circuit 1503, for example, for use in subsequent layers of the neural network.

[0265] The instruction fetch buffer 1509 connected to the controller 1504 is used to store the instructions used by the controller 1504;

[0266] Unified memory 1506, input memory 1501, weighted memory 1502, and instruction fetch memory 1509 are all on-chip memories. External memory is proprietary to this NPU hardware architecture.

[0267] The processor mentioned above can be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the above program.

[0268] It should also be noted that the device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. In addition, in the device embodiment drawings provided in this application, the connection relationship between modules indicates that they have a communication connection, which can be implemented as one or more communication buses or signal lines.

[0269] Through the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware, or it can be implemented by special-purpose hardware including application-specific integrated circuits, special-purpose CPUs, special-purpose memory, special-purpose components, etc. Generally, any function performed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be diverse, such as analog circuits, digital circuits, or special-purpose circuits. However, for this application, software program implementation is more often the preferred implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a readable storage medium, such as a computer floppy disk, USB flash drive, mobile hard disk, ROM, RAM, magnetic disk, or optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments of this application.

[0270] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product.

[0271] The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a server or data center that integrates one or more available media. The available medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state disk (SSD)).

Claims

1. A data processing method, characterized by, include: Get the first image; A first correction matrix is ​​obtained through a first model, and the first correction matrix is ​​used to indicate the noise in the first image; The configuration information of the second model is updated based on the first correction matrix. The configuration information is used to indicate the denoising strategy when the second model performs denoising processing. The first image is denoised using the updated second model to obtain the second image.

2. The method of claim 1, wherein, The second model includes a first sub-network and a second sub-network. The first sub-network is used to indicate the configuration information, and the second sub-network is used to perform denoising processing. The similarity between the network parameters of the first sub-network and the network parameters of the second sub-network is less than a first threshold. The step of updating the configuration information of the second model based on the first correction matrix includes: The first sub-network is updated based on the first correction matrix to obtain the updated first sub-network.

3. The method of claim 2, wherein, After updating the first sub-network based on the first correction matrix to obtain the updated first sub-network, the method further includes: The updated network parameters of the first sub-network are merged with the network parameters of the second sub-network to obtain the updated second model.

4. The method according to any one of claims 1 to 3, characterized in that, The process of obtaining the first correction matrix through the first model includes: The first image is degraded to obtain a first degradation vector, which is a vector representation of the noise in the first image. The first degradation vector and the target information of the second model are input into the first model to obtain the first correction matrix. The target information is used to indicate the functional modules of the second model and the network layers corresponding to the functional modules.

5. The method of claim 4, wherein, The second model includes N functional modules, each of which includes at least one network layer. The first correction matrix includes M correction matrices, each of which corresponds to a network layer in a functional module. N is a positive integer greater than 1, and M is a positive integer greater than 1.

6. The method of claim 5, wherein, The target information includes the identifier of each of the N functional modules and the type of the network layer corresponding to the functional module.

7. The method according to any one of claims 1 to 6, characterized in that, The second model is a diffusion model.

8. The method according to any one of claims 1-7, characterized in that, The method further includes: Acquire a third image and a fourth image, wherein the fourth image is a high-resolution version of the third image; A second correction matrix is ​​obtained through the first model, and the second correction matrix is ​​used to indicate the noise in the third image; The second model is trained using a discriminator based on the second correction matrix and the fourth image.

9. A data processing apparatus, characterized by, include: The acquisition module is used to acquire the first image; A processing module is used to obtain a first correction matrix through a first model, wherein the first correction matrix is ​​used to indicate the noise in the first image; An update module is used to update the configuration information of the second model based on the first correction matrix, wherein the configuration information is used to indicate the denoising strategy when the second model performs denoising processing; The denoising module is used to denoise the first image using the updated second model to obtain the second image.

10. The apparatus of claim 9, wherein, The second model includes a first sub-network and a second sub-network. The first sub-network is used to indicate the configuration information, and the second sub-network is used to perform denoising processing. The similarity between the network parameters of the first sub-network and the network parameters of the second sub-network is less than a first threshold. The update module is specifically used for: The first sub-network is updated based on the first correction matrix to obtain the updated first sub-network.

11. The apparatus of claim 10, wherein, The update module is also used for: The updated network parameters of the first sub-network are merged with the network parameters of the second sub-network to obtain the updated second model.

12. The apparatus of any one of claims 9-11, wherein, The processing module is specifically used for: The first image is degraded to obtain a first degradation vector, which is a vector representation of the noise in the first image. The first degradation vector and the target information of the second model are input into the first model to obtain the first correction matrix. The target information is used to indicate the functional modules of the second model and the network layers corresponding to the functional modules.

13. The apparatus of claim 12, wherein, The second model includes N functional modules, each of which includes at least one network layer. The first correction matrix includes M correction matrices, each of which corresponds to a network layer in a functional module. N is a positive integer greater than 1, and M is a positive integer greater than 1.

14. The apparatus of claim 13, wherein, The target information includes the identifier of each of the N functional modules and the type of the network layer corresponding to the functional module.

15. The apparatus of any one of claims 9-14, wherein, The second model is a diffusion model.

16. The apparatus of any one of claims 9-15, wherein, The noise reduction module is also used for: Acquire a third image and a fourth image, wherein the fourth image is a high-resolution version of the third image; A second correction matrix is ​​obtained through the first model, and the second correction matrix is ​​used to indicate the noise in the third image; The second model is trained using a discriminator based on the second correction matrix and the fourth image.

17. A computer storage medium, comprising, The computer storage medium stores one or more instructions that, when executed by one or more computers, cause the one or more computers to perform the operation of the method according to any one of claims 1 to 8.

18. A computer program product, characterised in that, Includes computer-readable instructions that, when executed on a computer device, cause the computer device to perform the method as described in any one of claims 1 to 8.

19. A system comprising at least one processor and at least one memory; the processor and the memory are connected via a communication bus and communicate with each other. The at least one memory is used to store code; The at least one processor is used to execute the code to perform the method as described in any one of claims 1 to 8.

20. A chip, characterized by It includes at least one processing unit and an interface circuit, the interface circuit being used to provide program instructions or data to the at least one processing unit, the at least one processing unit being used to execute the program instructions to implement the method of any one of claims 1 to 8.