Image processing method, apparatus and device

By combining a spatial state model and Hadamard rotation transformation, the problems of fogging and glare in under-display camera images were solved, achieving high-quality image restoration, eliminating fogging and glare in under-display camera imaging, and reducing inter-frame flicker.

CN122265091APending Publication Date: 2026-06-23VIVO MOBILE COMM HANGZHOU CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
VIVO MOBILE COMM HANGZHOU CO LTD
Filing Date
2026-03-20
Publication Date
2026-06-23

Smart Images

  • Figure CN122265091A_ABST
    Figure CN122265091A_ABST
Patent Text Reader

Abstract

The application discloses an image processing method, device and equipment, and belongs to the technical field of electronic equipment. The image processing method comprises the following steps: acquiring a plurality of image frames collected by using an under-screen image sensor; extracting a feature of a first image frame to obtain a first feature map; wherein the first image frame is any one of the plurality of image frames; scanning the first feature map by using a first scanning mode to obtain a first feature vector; inputting the first feature vector into a spatial state space model, and outputting a second feature vector by the spatial state space model, wherein the spatial state space model is used for eliminating fogging and light spots in the first image frame; scanning the second feature vector by using a second scanning mode to obtain a second feature map; wherein the second scanning mode is an anti-scanning mode of the first scanning mode; and mapping the second feature map to a red-green-blue color space to obtain a second image frame.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of electronic equipment technology, and specifically relates to an image processing method, apparatus, and device. Background Technology

[0002] With the rapid development of smart terminal screen display technology, full-screen displays have become a core competitive feature of high-end flagship mobile phones. To pursue the ultimate screen-to-body ratio and visual unity, under-display camera (UDC) technology has emerged. UDC typically places the camera beneath the organic light-emitting diode (OLED) screen panel. During imaging, external light must penetrate multiple layers of the screen and the gaps between pixels to reach the image sensor. Due to the physical obstruction of the screen's pixel array, the incident light undergoes strong diffraction and interference effects, resulting in significant fogging and glare in the image.

[0003] To eliminate fogging and glare in images, related technologies employ image restoration methods based on Convolutional Neural Networks (CNNs). However, due to the inherent local receptive field characteristics of CNNs' convolutional operations, CNNs struggle to capture and remove common global diffraction glare in images, resulting in large areas of fogging artifacts and glare remaining in the restored image, leading to poor image quality. Summary of the Invention

[0004] The purpose of this application is to provide an image processing method, apparatus, and device that can solve the problem of poor image quality.

[0005] In a first aspect, embodiments of this application provide an image processing method, including: Acquire multiple image frames captured using an under-display image sensor; Extract features from the first image frame to obtain the first feature map; wherein the first image frame is any one of multiple image frames; The first feature map is scanned using a first scanning method to obtain the first feature vector; The first feature vector is input into the spatial state space model, and the spatial state space model outputs the second feature vector. The spatial state space model is used to eliminate fog and glare in the first image frame. The second feature vector is scanned using a second scanning method to obtain a second feature map; wherein, the second scanning method is the reverse scanning method of the first scanning method; The second feature map is mapped to the red-green-blue color space to obtain the second image frame.

[0006] Secondly, embodiments of this application provide an image processing apparatus, including: The image acquisition module is used to acquire multiple image frames collected using the under-display image sensor; The shallow feature extraction module is used to extract features from the first image frame to obtain a first feature map; wherein the first image frame is any one of multiple image frames; The spatiotemporal interleaving processing module is used to scan the first feature map using a first scanning method to obtain a first feature vector; input the first feature vector into the spatial state space model, and the spatial state space model outputs a second feature vector; scan the second feature vector using a second scanning method to obtain a second feature map; wherein, the spatial state space model is used to eliminate fog and light spots in the first image frame, and the second scanning method is the reverse scanning method of the first scanning method; The image reconstruction module is used to map the second feature map to the red-green-blue color space to obtain the second image frame.

[0007] Thirdly, embodiments of this application provide an electronic device, which includes: a preprocessing unit and a computing unit; The preprocessing unit is used to acquire multiple image frames collected by the under-display image sensor; extract the features of the first image frame to obtain a first feature map; wherein the first image frame is any one of the multiple image frames; The computing unit is used to scan the first feature map using a first scanning method to obtain a first feature vector; input the first feature vector into a spatial state space model, and the spatial state space model outputs a second feature vector, wherein the spatial state space model is used to eliminate fog and glare in the first image frame; scan the second feature vector using a second scanning method to obtain a second feature map; wherein the second scanning method is the reverse scanning method of the first scanning method; and map the second feature map to a red-green-blue color space to obtain a second image frame.

[0008] Fourthly, embodiments of this application provide an electronic device, which includes a processor and a memory. The memory stores programs or instructions that can run on the processor, and when the programs or instructions are executed by the processor, they implement the steps of the image processing method provided in embodiments of this application.

[0009] Fifthly, embodiments of this application provide a readable storage medium on which a program or instructions are stored, which, when executed by a processor, implement the steps of the image processing method provided in embodiments of this application.

[0010] Sixthly, embodiments of this application provide a chip, the chip including a processor and a communication interface, the communication interface being coupled to the processor, the processor being used to run programs or instructions to implement the steps of the image processing method provided in embodiments of this application.

[0011] In a seventh aspect, embodiments of this application provide a computer program product, which is stored in a storage medium and executed by at least one processor to implement the steps of the image processing method provided in embodiments of this application.

[0012] In this embodiment, multiple image frames acquired using an under-display image sensor are obtained; features of a first image frame are extracted to obtain a first feature map; wherein the first image frame is any one of the multiple image frames; the first feature map is scanned using a first scanning method to obtain a first feature vector; the first feature vector is input into a spatial state model, and the spatial state model outputs a second feature vector; the second feature vector is scanned using a second scanning method to obtain a second feature map; wherein the spatial state model is used to eliminate fogging and glare in the first image frame, and the second scanning method is a reverse scanning method of the first scanning method; the second feature map is mapped to a red-green-blue color space to obtain a second image frame. Thus, by utilizing the global receptive field characteristics of the spatial state model, long-distance optical degradation dependence spanning the entire screen can be captured, eliminating fogging and glare in the image and improving image quality. Attached Figure Description

[0013] Figure 1 This is a schematic flowchart of an image processing method provided in some embodiments of this application; Figure 2 This is another schematic flowchart of an image processing method provided in some embodiments of this application; Figure 3 This is a schematic diagram of the spatiotemporal interleaving processing module provided in some embodiments of this application; Figure 4 These are schematic diagrams of the structure of an image processing apparatus provided in some embodiments of this application; Figure 5 This is a schematic diagram of the structure of an electronic device provided in some embodiments of this application; Figure 6 This is another structural schematic diagram of an electronic device provided in some embodiments of this application; Figure 7 These are schematic diagrams of the hardware structure of electronic devices provided in some embodiments of this application. Detailed Implementation

[0014] The technical solutions of the embodiments of this application will be clearly described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application are within the scope of protection of this application.

[0015] The terms "first," "second," etc., used in the specification and claims of this application are used to distinguish similar objects and not to describe a specific order or sequence. It should be understood that such terms can be used interchangeably where appropriate so that embodiments of this application can be implemented in orders other than those illustrated or described herein, and the objects distinguished by "first," "second," etc., are generally of the same class and the number of objects is not limited; for example, a first object can be one or more. Furthermore, in the specification and claims, "and / or" indicates at least one of the connected objects, and the character " / " generally indicates that the preceding and following objects are in an "or" relationship.

[0016] The terminology used in the implementation section of this application is only for explaining specific embodiments of this application and is not intended to limit this application. The terminology involved in the embodiments of this application is explained below.

[0017] A state-space model (SSM) is a time-series-based model that describes the dynamic relationship between the internal state of a system and its input and output variables.

[0018] The Spatial State Space Model (S-SSM) is a framework that extends state-space modeling methods to spatial or spatiotemporal data analysis, and is used to handle spatially dependent data.

[0019] The Time-State Space Model (T-SSM) is a statistical modeling framework for analyzing and predicting time series data. It describes the system as a relationship between hidden states and observable variables.

[0020] The Hadamard Rotation Transform is a mathematical transformation method that combines the Hadamard Transform and rotation operations.

[0021] The global receptive field is the information that a unit in a neural network (e.g., a neuron, a feature map, or a token) can “see” or “feel” about its entire input space (e.g., all the pixels of an image, all the tokens of a sequence).

[0022] Hidden state variables are key internal variables in a dynamic system that cannot be directly observed but determine the system's behavior. They are widely used in state-space models, machine learning, and control theory.

[0023] The image processing methods, apparatus, and devices provided in this application will be described in detail below with reference to the accompanying drawings and through specific embodiments and application scenarios.

[0024] It should be noted that the image processing method provided in this application can be executed by electronic devices such as mobile phones, tablets, laptops, PDAs, and in-vehicle electronic devices. Some embodiments of this application use electronic devices as the executing entity to illustrate the image processing method provided in this application.

[0025] The image processing method provided in this application can be applied to scenarios where multiple frames of images are acquired using an under-display camera sensor. One specific application scenario is a video call scenario, and another specific application scenario is a video recording scenario.

[0026] It should be noted that the above application scenario is only one example, and other application scenarios may be included in actual applications.

[0027] Figure 1 This is a schematic flowchart of an image processing method provided in some embodiments of this application. The image processing method may include: Step 101: Acquire multiple image frames using the under-display image sensor; In some embodiments of this application, during video calls or recordings, an under-display image sensor is used to acquire images, resulting in multiple image frames acquired by the under-display image sensor.

[0028] In some embodiments of this application, any one of the multiple image frames in the embodiments of this application can be an RGB image or a raw (RAW) image.

[0029] Step 102: Extract the features of the first image frame to obtain the first feature map; wherein, the first image frame is any one of the multiple images; In some embodiments of this application, in step 102, the features of the first image frame can be extracted using a shallow feature extraction module.

[0030] In some embodiments of this application, the shallow feature extraction module may include a single-layer M×M convolutional activation layer and an output layer, where M is a positive integer, for example, M=3; the activation layer employs a Leaky ReLU activation function, and the shallow feature extraction module maps the input 3-channel RGB image or 1-channel original image into a multi-channel shallow feature map. The shallow feature extraction module is used to extract the basic texture and edge information of the image, providing a feature base for image restoration.

[0031] In some embodiments of this application, before step 102, the image processing method provided in this application may further include: orthogonally projecting the original features of the first image frame using Hadamard rotation transformation to reduce the energy of the first pixel and obtain a third image frame, wherein the first pixel is any pixel in the spot region of the first image frame; correspondingly, step 102 may include: extracting features of the third image frame to obtain a first feature map.

[0032] To address the high dynamic range light spots generated by strong diffraction in image frames acquired using under-display image sensors, Hadamard rotation transformation is used to orthogonally project the original features of the image frames. This evenly distributes the energy of high-amplitude outliers concentrated in local pixels to all channels, preventing numerical overflow and accuracy loss in subsequent image processing.

[0033] In some embodiments of this application, the transformation matrix used for the Hadamard rotation transformation can be a 4th-order Hadamard matrix. A 4th-order Hadamard matrix is ​​shown below: This transformation matrix can balance the effect of removing outliers with computational overhead.

[0034] Step 103: Scan the first feature map using the first scanning method to obtain the first feature vector; In some embodiments of this application, the first scanning method in these embodiments includes one of the following: Based on multi-directional two-dimensional selective scanning, Hilbert curve scanning, snake scanning, and zigzag scanning.

[0035] In some embodiments of this application, the multiple directions in the embodiments of this application may include: from left to right, from right to left, from top to bottom, and from bottom to top; the multiple directions may also include: from left to right, from right to left, from top to bottom, from bottom to top, from upper left to lower right, from lower right to upper left, from upper right to lower left, and from lower left to upper right, etc.

[0036] The first feature map is serialized into a one-dimensional feature vector by scanning the first feature map using the first scanning method.

[0037] Step 104: Input the first feature vector into the spatial state space model, and the spatial state space model outputs the second feature vector, wherein the spatial state space model is used to eliminate fog and glare in the first image frame; In some embodiments of this application, after inputting the first feature vector into the spatial state space model, the global receptive field characteristics of the spatial state space model are used to capture long-distance optical degradation dependence across the entire image, eliminate fogging and glare in the image, and obtain the second feature vector.

[0038] Step 105: Scan the second feature vector using the second scanning method to obtain the second feature map; wherein, the second scanning method is the reverse scanning method of the first scanning method; In some embodiments of this application, the second feature vector is reverse-scanned using the reverse scanning method of the first scanning method to restore the second feature vector into a feature map.

[0039] In some embodiments of this application, a multi-directional two-dimensional selective scan can be performed on the first feature map to serialize the first feature map into a one-dimensional vector; the one-dimensional vector is input into the spatial state space model, and the global receptive field characteristics of the spatial state space model are used to capture long-distance optical degradation dependencies across the entire image, eliminating fog and glare in the image. The spatial state space model outputs a second feature vector, and then the multi-directional outputs are fused element by element through a reverse scan operation to restore the two-dimensional feature map; wherein, the multi-directional two-dimensional selective scan can be a four-directional cross scan, the four directions including from the upper left to the lower right, from the lower right to the upper left and bidirectional column directions; the spatial state space model is based on the discretized state space equation as shown in the following formula (1): (1) In formula (1), To hide state variables for the spatial state of image k, Hidden state variables are used to represent the spatial state of the previous frame of image k. , and These are the spatial state transition parameters. , and It can be dynamically generated through an input dependency selection mechanism. The first feature vector obtained by serializing the feature map of image k is the first feature vector in this embodiment of the application. This is the second eigenvector.

[0040] Step 106: Map the second feature map to the red-green-blue color space to obtain the second image frame.

[0041] In some embodiments of this application, in step 106, the image reconstruction module can be used to map the second feature map to the red-green-blue color space to obtain the second image frame.

[0042] In some embodiments of this application, the image reconstruction module includes two N×N convolutional layers, an activation layer, and an output layer. The activation layer is located between the two N×N convolutional layers, where N is a positive integer, for example, N = 3. The activation layer employs the Leaky ReLU activation function. The image reconstruction module maps the second feature map to the red-green-blue color space.

[0043] In some embodiments of this application, there is a global long jump link directly connecting the shallow feature extraction module and the image reconstruction module, which is used to directly transmit the low-frequency background information of the input image to the output end, so that the spatial state spatial model focuses on restoring the high-frequency texture details of the image, and finally outputs a clear image with high signal-to-noise ratio and no diffraction artifacts.

[0044] In some embodiments of this application, after obtaining the second image frame, the second image frame can be stored or displayed on the screen.

[0045] In this embodiment, multiple image frames acquired using an under-display image sensor are obtained; features of a first image frame are extracted to obtain a first feature map; wherein the first image frame is any one of the multiple image frames; the first feature map is scanned using a first scanning method to obtain a first feature vector; the first feature vector is input into a spatial state model, and the spatial state model outputs a second feature vector; the second feature vector is scanned using a second scanning method to obtain a second feature map; wherein the spatial state model is used to eliminate fogging and glare in the first image frame, and the second scanning method is a reverse scanning method of the first scanning method; the second feature map is mapped to a red-green-blue color space to obtain a second image frame. Thus, by utilizing the global receptive field characteristics of the spatial state model, long-distance optical degradation dependence spanning the entire screen can be captured, eliminating fogging and glare in the image and improving image quality.

[0046] In some embodiments of this application, the image processing method provided in this application may further include: obtaining a first hidden state variable corresponding to a fourth image frame, wherein the fourth image frame is the previous image frame of the first image frame; inputting a second feature map and the first hidden state variable into a temporal state space model, wherein the temporal state space model outputs a second hidden state variable corresponding to the first image frame, wherein the second hidden state variable is used to eliminate inter-frame flicker when the fourth image frame changes to the first image frame during the playback of multiple image frames; correspondingly, step 106 may include: mapping the second feature map and the second hidden state variable to a red-green-blue color space to obtain a second image frame.

[0047] In some embodiments of this application, the first feature map may be stored in a tile cache subunit in the cache unit of the electronic device, and the first hidden state variable and the second hidden state variable may be stored in a state cache subunit in the cache unit.

[0048] In some embodiments of this application, an on-chip SRAM cache unit can be configured to store temporary data during image processing to reduce access latency to off-chip video memory. The cache area may include a tile buffer subunit and a state buffer subunit. The tile buffer subunit stores the currently processed image frame, and the state buffer subunit stores the hidden state variables output by the temporal state-space model.

[0049] When determining the hidden state variables corresponding to the first image frame, the hidden state variables of the previous frame of the first image frame are obtained from the state cache subunit. The hidden state variables and the second feature vector of the previous frame of the first image frame are input into the temporal state space model. The temporal state space model performs state update calculation and outputs the hidden state variables corresponding to the first image frame. The hidden state variables corresponding to the first image frame are stored in the state cache subunit for use in the next image frame.

[0050] In some embodiments of this application, when determining the hidden state variable corresponding to the first image frame, the hidden state variable at the time corresponding to the previous image frame can be read from the state cache subunit. Subsequently, the first feature vector of the first image frame is combined with the time-state space model to perform state update calculations, generating the hidden state variables at the time corresponding to the first image frame. The temporal state-space model employs a state-space equation similar to that of the spatial state-space model, but adds a spatiotemporal reverse scanning strategy. The forward scan is performed according to the frame order t=1, 2, ..., T, while the reverse scan is performed according to the frame order t=T, T-1, ..., 1. Finally, the hidden state variables output by the temporal state-space model are written into the state buffer sub-unit for use in the next image frame.

[0051] The time-state-space model is based on the discretized state-space equations as shown in the following formula (2): (2) In formula (2), Let be the hidden state variable at time t corresponding to image k. Let be the hidden state variable at time t-1 corresponding to the previous frame of image k. and These are the time state transition parameters. and It can be dynamically generated through an input dependency selection mechanism. Let be the first feature vector corresponding to image k.

[0052] Figure 2 This is another schematic flowchart of an image processing method provided in some embodiments of this application; the image processing method may include: Step 201: Acquire multiple image frames using the under-display image sensor; Step 202: Extract the features of the first image frame to obtain the first feature map; Step 203: Scan the first feature map using the first scanning method to obtain the first feature vector; Step 204: Input the first feature vector into the spatial state space model, and the spatial state space model outputs the second feature vector; Step 205: Scan the second feature vector using the second scanning method to obtain the second feature map; Step 206: Obtain the first hidden state variable corresponding to the fourth image frame; Step 207: Input the first feature vector and the first hidden state variable into the temporal state space model, and the temporal state space model outputs the second hidden state variable corresponding to the first image frame; Step 208: Map the second feature map and the second hidden state variable to the red-green-blue color space to obtain the second image frame.

[0053] Figure 2 The specific implementation of each step in the image processing method shown can be referred to the description in the above embodiments, and will not be repeated here in the embodiments of this application.

[0054] In the embodiments of this application, a display memory spanning frames can be established through a time-state-space model, which can suppress inter-frame flicker.

[0055] In some embodiments of this application, the spatial state spatial model outputs a second feature vector, which may include: the spatial state spatial model outputs the second feature vector using a shift quantization method; the temporal state spatial model outputs a second hidden state variable corresponding to the first image, which may include: the temporal state spatial model outputs the second hidden state variable using a shift quantization method.

[0056] In some embodiments of this application, the state transition parameters are constrained to a power of 2 by shift quantization, which transforms the high-power floating-point multiplication operation inside the state space model into a bit shift operation, thereby reducing processor power consumption and improving data processing efficiency.

[0057] In some embodiments of this application, a spatiotemporal interleaving processing module can be used to process spatial stream data and temporal stream data that are executed in parallel or alternately. The spatial stream addresses global diffraction artifacts within a single frame image, while the temporal stream addresses temporal consistency of a video sequence. The spatial stream data includes the aforementioned feature vectors and feature maps, and the temporal stream data includes the aforementioned hidden state variables. The spatiotemporal interleaving processing module can include an encoder and a decoder. The encoder includes two downsampling operations, each stage containing a basic processing unit. The number of feature channels increases exponentially with each downsampling stage to compress spatial dimensions and extract semantic features. The decoder includes two upsampling operations, each stage fusing features of the same scale from the encoder through lateral skip connections to recover lost spatial details. The basic processing unit contains parallel spatial and temporal branches. The spatial branch employs multi-directional scanning and a spatial state model to ensure comprehensive capture of the radial diffraction texture of the under-screen camera. The temporal branch employs forward recursive scanning and a temporal state model, strictly adhering to the temporal causality of the video.

[0058] In some embodiments of this application, the spatiotemporal interleaving processing module may include an encoder and a decoder; the encoder and decoder include a spatial state space model and a temporal state space model; the encoder is used to downsample the image features input to the encoder; the decoder is used to upsample the image features output by the encoder.

[0059] In some embodiments of this application, the encoder may include a first basic processing unit and a second basic processing unit. The first basic processing unit is used to downsample the image features input to the first basic processing unit, and the second basic processing unit is used to downsample the image features output by the first basic processing unit. The decoder may include a third basic processing unit, a fourth basic processing unit, and a fifth basic processing unit. The third basic processing unit is used to upsample the image features output by the second basic processing unit, the fourth basic processing unit is used to upsample the image features output by the third basic processing unit, and the fifth basic processing unit is used to upsample the image features output by the fourth basic processing unit. The basic processing unit includes a spatial state space model and a temporal state space model.

[0060] In some embodiments of this application, the fourth basic processing unit is connected to the second basic processing unit, and the fifth basic processing unit is connected to the first basic processing unit.

[0061] Figure 3This is a schematic diagram of the spatiotemporal interleaving processing module provided in some embodiments of this application. The spatiotemporal interleaving processing module 300 includes an encoder 301 and a decoder 302. The encoder 301 includes a basic processing unit 3011 for an X channel and a basic processing unit 3012 for a 2X channel. The decoder 302 includes a basic processing unit 3021 for a 4X channel, a basic processing unit 3022 for a 2X channel, and a basic processing unit 3023 for an X channel, where X is a positive integer, for example, X is 16. The encoder 301 includes a 2X channel basic processing unit 3012 that downsamples the data output by the X channel basic processing unit 3012. The decoder 302 includes a 4X channel basic processing unit 3021 that downsamples the data output by the encoder 301's 2X channel basic processing unit 3012. The decoder 302 includes a 2X channel basic processing unit 3022 that upsamples the data output by the decoder 302's 4X channel basic processing unit 3021. The decoder 302 includes an X channel basic processing unit 3023 that upsamples the data output by the decoder 302's 2X channel basic processing unit 3022. The encoder 301's X channel basic processing unit 3011 and the decoder 302's X channel basic processing unit 3023 are connected in a skip connection. The encoder 301's 2X channel basic processing unit 3012 and the decoder 302's 2X channel basic processing unit 3023 are also connected in a skip connection.

[0062] Wherein, the X-channel basic processing unit 3011 is the first basic processing unit in this application embodiment; the 2X-channel basic processing unit 3012 is the second basic processing unit in this application embodiment; the 4X-channel basic processing unit 3021 is the third basic processing unit in this application embodiment; the 2X-channel basic processing unit 3022 is the fourth basic processing unit in this application embodiment; and the X-channel basic processing unit 3023 is the fifth basic processing unit in this application embodiment.

[0063] In some embodiments of this application, the core matrix multiplication in the state-space model uses Y-bit integer shift operations to maximize processor throughput; while the hidden state variables and their update process are retained to 2Y-bit half-precision floating point or 2Y-bit integer to prevent gradient vanishing or precision accumulation errors during the recursive process, where Y is a positive integer, for example, Y is 8.

[0064] The image processing method provided in this application can be executed by an image processing device. This application uses an image processing device executing the image processing method as an example to illustrate the image processing device provided in this application.

[0065] Figure 4This is a schematic diagram of the structure of an image processing apparatus provided in some embodiments of this application. The image processing apparatus 400 includes: The image acquisition module 401 is used to acquire multiple image frames collected by the under-display image sensor; The shallow feature extraction module 402 is used to extract features from the first image frame to obtain a first feature map; wherein the first image frame is any one of a plurality of image frames; The spatiotemporal interleaving processing module 403 is used to scan the first feature map using a first scanning method to obtain a first feature vector; input the first feature vector into the spatial state space model, and the spatial state space model outputs a second feature vector; scan the second feature vector using a second scanning method to obtain a second feature map; wherein, the spatial state space model is used to eliminate fog and light spots in the first image frame, and the second scanning method is the reverse scanning method of the first scanning method; The image reconstruction module 404 is used to map the second feature map to the red-green-blue color space to obtain the second image frame.

[0066] In this embodiment, multiple image frames acquired using an under-display image sensor are obtained; features of the first image frame are extracted to obtain a first feature map; wherein the first image frame is any one of the multiple image frames; the first feature map is scanned using a first scanning method to obtain a first feature vector; the first feature vector is input into a spatial state model, and the spatial state model outputs a second feature vector; the second feature vector is scanned using a second scanning method to obtain a second feature map; wherein the spatial state model is used to eliminate fogging and glare in the first image frame, and the second scanning method is a reverse scanning method of the first scanning method; the second feature map is mapped to a red-green-blue color space to obtain a second image. Thus, by utilizing the global receptive field characteristics of the spatial state model, long-distance optical degradation dependence spanning the entire screen can be captured, eliminating fogging and glare in the image and improving image quality.

[0067] In some embodiments of this application, the image processing apparatus 400 provided in this application further includes: The preprocessing module is used to orthogonally project the original features of the first image frame using the Hadamard rotation transformation to reduce the energy of the first pixel and obtain the third image frame, wherein the first pixel is any pixel in the spot region of the first image frame. Accordingly, the shallow feature extraction module 402 is specifically used for: Extract features from the third image to obtain the first feature map.

[0068] In some embodiments of this application, the spatiotemporal interleaving processing module 403 is further configured to: Obtain the first hidden state variable corresponding to the fourth image frame, where the fourth image frame is the image frame preceding the first image frame; The first feature vector and the first hidden state variable are input into the time state space model. The time state space model outputs the second hidden state variable corresponding to the first image frame. The second hidden state variable is used to eliminate inter-frame flicker when the fourth image frame changes to the first image frame while playing multiple image frames. Accordingly, the image reconstruction module 404 is specifically used for: The second feature map and the second hidden state variable are mapped to the red-green-blue color space to obtain the second image frame.

[0069] In some embodiments of this application, the spatiotemporal interleaving processing module 403 is specifically used for: The spatial state model uses a shift quantization method to output the second feature vector; The time-state-space model uses shift quantization to output the second hidden state variable.

[0070] In some embodiments of this application, the spatiotemporal interleaving processing module 403 may include an encoder and a decoder; the encoder and decoder include a spatial state space model and a temporal state space model; the encoder is used to downsample the image features input to the encoder; the encoder is used to upsample the image features output by the encoder.

[0071] In some embodiments of this application, the encoder may include a first basic processing unit and a second basic processing unit. The first basic processing unit is used to downsample the image features input to the first basic processing unit, and the second basic processing unit is used to downsample the image features output by the first basic processing unit. The decoder may include a third basic processing unit, a fourth basic processing unit, and a fifth basic processing unit. The third basic processing unit is used to upsample the image features output by the second basic processing unit, the fourth basic processing unit is used to upsample the image features output by the third basic processing unit, and the fifth basic processing unit is used to upsample the image features output by the fourth basic processing unit. The basic processing unit includes a spatial state space model and a temporal state space model.

[0072] In some embodiments of this application, the fourth basic processing unit is connected to the second basic processing unit, and the fifth basic processing unit is connected to the first basic processing unit.

[0073] In some embodiments of this application, the first scanning method includes one of the following: Based on multi-directional two-dimensional selective scanning, Hilbert curve scanning, snake scanning, and zigzag scanning.

[0074] The image processing device in this application embodiment can be an electronic device or a component within an electronic device, such as an integrated circuit or a chip. The electronic device can be a terminal or other devices besides a terminal. For example, the electronic device can be a mobile phone, tablet computer, laptop computer, PDA, in-vehicle electronic device, mobile internet device (MID), augmented reality (AR) / virtual reality (VR) device, robot, wearable device, ultra-mobile personal computer (UMPC), netbook, or personal digital assistant (PDA), etc. It can also be a server, network attached storage (NAS), personal computer (PC), television (TV), ATM, or self-service machine, etc. This application embodiment does not specifically limit the device.

[0075] The image processing device in this application embodiment can be a device with an operating system. This operating system can be Android, iOS, or other possible operating systems; this application embodiment does not specifically limit the specific operating system used.

[0076] The image processing apparatus provided in this application embodiment can achieve... Figures 1 to 3 The various processes implemented in the image processing method embodiment will not be described again here to avoid repetition.

[0077] Optionally, such as Figure 5 As shown, this application embodiment also provides an electronic device 500, including a preprocessing unit 501 and a computing unit 502; The preprocessing unit 501 is used to acquire multiple image frames collected by the under-display image sensor; extract features of the first image frame to obtain a first feature map; wherein the first image frame is any one of the multiple image frames; The computing unit 502 is used to scan the first feature map using a first scanning method to obtain a first feature vector; input the first feature vector into a spatial state space model, and the spatial state space model outputs a second feature vector, wherein the spatial state space model is used to eliminate fog and glare in the first image frame; scan the second feature vector using a second scanning method to obtain a second feature map; wherein the second scanning method is the reverse scanning method of the first scanning method; and map the second feature map to a red-green-blue color space to obtain a second image frame.

[0078] In some embodiments of this application, the preprocessing unit 501 is further configured to: The energy of the first pixel is reduced by orthogonally projecting the original features of the first image frame using the Hadamard rotation transformation, resulting in the third image frame. Here, the first pixel is any pixel in the spot region of the first image frame. The features of the third image frame are extracted to obtain the first feature map.

[0079] In some embodiments of this application, the computing unit 502 is further configured to: Obtain the first hidden state variable corresponding to the fourth image frame, where the fourth image frame is the image frame preceding the first image frame; The second feature vector and the first hidden state variable are input into the time state space model. The time state space model outputs the second hidden state variable corresponding to the first image frame. The second hidden state variable is used to eliminate inter-frame flicker when the fourth image frame changes to the first image frame while playing multiple image frames. The second feature map is mapped to the red-green-blue color space to obtain the second image.

[0080] In some embodiments of this application, the calculation unit 502 includes a shift arithmetic logic subunit 5021; The shift arithmetic logic unit is used to output the second feature vector in the spatial state space model using shift quantization, and to output the second hidden state variable in the temporal state space model using shift quantization.

[0081] In some embodiments of this application, the electronic device 500 further includes: a cache unit 503; the cache unit 503 includes a tile cache subunit 5031 and a state cache subunit 5032; The tile cache subunit 5031 is used to store the first feature map; State cache subunit 5032 is used to store hidden state variables.

[0082] Optionally, such as Figure 6 As shown, this application embodiment also provides an electronic device 500, including a processor 601 and a memory 602. The memory 602 stores a program or instructions that can run on the processor 601. When the program or instructions are executed by the processor 601, they implement the various steps of the image processing method embodiment provided in this application embodiment and can achieve the same technical effect. To avoid repetition, they will not be described again here.

[0083] Figure 7 These are schematic diagrams of the hardware structure of electronic devices according to some embodiments of this application.

[0084] The electronic device 700 includes, but is not limited to, components such as: radio frequency unit 701, network module 702, audio output unit 703, input unit 704, sensor 705, display unit 706, user input unit 707, interface unit 708, memory 709, and processor 710.

[0085] Those skilled in the art will understand that the electronic device 700 may also include a power supply (such as a battery) for supplying power to various components. The power supply may be logically connected to the processor 710 through a power management system, thereby enabling functions such as managing charging, discharging, and power consumption through the power management system. Figure 7 The electronic device structure shown does not constitute a limitation on the electronic device. The electronic device may include more or fewer components than shown, or combine certain components, or have different component arrangements, which will not be elaborated here.

[0086] The processor 710 is configured to: acquire multiple image frames collected by an under-display image sensor; extract features from a first image frame to obtain a first feature map; wherein the first image frame is any one of the multiple image frames; scan the first feature map using a first scanning method to obtain a first feature vector; input the first feature vector into a spatial state space model, and the spatial state space model outputs a second feature vector; scan the second feature vector using a second scanning method to obtain a second feature map; wherein the spatial state space model is used to eliminate fogging and light spots in the first image frame, and the second scanning method is the inverse scanning method of the first scanning method; and map the second feature map to a red-green-blue color space to obtain a second image frame.

[0087] In this embodiment, multiple image frames acquired using an under-display image sensor are obtained; features of a first image are extracted to obtain a first feature map; wherein the first image frame is any one of the multiple image frames; the first feature map is scanned using a first scanning method to obtain a first feature vector; the first feature vector is input into a spatial state model, and the spatial state model outputs a second feature vector; the second feature vector is scanned using a second scanning method to obtain a second feature map; wherein the spatial state model is used to eliminate fogging and glare in the first image frame, and the second scanning method is the inverse scanning method of the first scanning method; the second feature map is mapped to a red-green-blue color space to obtain a second image frame. Thus, by utilizing the global receptive field characteristics of the spatial state model, long-distance optical degradation dependence spanning the entire screen can be captured, eliminating fogging and glare in the image and improving image quality.

[0088] In some embodiments of this application, the processor 710 is also used for: The energy of the first pixel is reduced by orthogonally projecting the original features of the first image frame using the Hadamard rotation transformation, resulting in the third image frame. The first pixel is any pixel in the spot region of the first image frame. The features of the third image are extracted to obtain the first feature map.

[0089] In some embodiments of this application, the processor 710 is also used for: Obtain the first hidden state variable corresponding to the fourth image frame, where the fourth image frame is the image frame preceding the first image frame; The first feature vector and the first hidden state variable are input into the time state space model. The time state space model outputs the second hidden state variable corresponding to the first image frame. The second hidden state variable is used to eliminate inter-frame flicker when the fourth image frame changes to the first image frame while playing multiple image frames. The second feature map is mapped to the red-green-blue color space to obtain the second image frame.

[0090] In some embodiments of this application, the processor 710 is specifically used for: The spatial state model uses a shift quantization method to output the second feature vector; The time-state-space model uses shift quantization to output the second hidden state variable.

[0091] In some embodiments of this application, the first scanning method includes one of the following: Based on multi-directional two-dimensional selective scanning, Hilbert curve scanning, snake scanning, and zigzag scanning.

[0092] It should be understood that, in this embodiment, the input unit 704 may include a graphics processing unit (GPU) 7041 and a microphone 7042. The GPU 7041 processes image data of still images or videos obtained by an image capture device (such as a camera) in video capture mode or image capture mode. The display unit 706 may include a display panel 7061, which may be configured in the form of a liquid crystal display, an organic light-emitting diode, or the like. The user input unit 707 includes at least one of a touch panel 7071 and other input devices 7072. The touch panel 7071 is also called a touch screen. The touch panel 7071 may include a touch detection device and a touch controller. Other input devices 7072 may include, but are not limited to, a physical keyboard, function keys (such as volume control buttons, power buttons, etc.), a trackball, a mouse, and a joystick, which will not be described in detail here.

[0093] The memory 709 can be used to store software programs and various data. The memory 709 may primarily include a first storage area for storing programs or instructions and a second storage area for storing data. The first storage area may store the operating system, application programs or instructions required for at least one function (such as sound playback, image playback, etc.). Furthermore, the memory 709 may include volatile memory or non-volatile memory, or both. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be random access memory (RAM), static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link dynamic random access memory (SLDRAM), and direct memory bus RAM (DRRAM). The memory 709 in the embodiments of this application includes, but is not limited to, these and any other suitable types of memory.

[0094] Processor 710 may include one or more processing units; optionally, processor 710 integrates an application processor and a modem processor, wherein the application processor mainly handles operations involving the operating system, user interface, and applications, and the modem processor mainly handles wireless communication signals, such as a baseband processor. It is understood that the aforementioned modem processor may also not be integrated into processor 710.

[0095] This application also provides a readable storage medium storing a program or instructions. When the program or instructions are executed by a processor, they implement the various processes of the image processing method provided in this application and achieve the same technical effect. To avoid repetition, they will not be described again here.

[0096] The processor is the processor in the electronic device described in the above embodiments. The readable storage medium includes computer-readable storage media, such as computer read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk.

[0097] This application also provides a chip, which includes a processor and a communication interface. The communication interface is coupled to the processor. The processor is used to run programs or instructions to implement the various processes of the image processing method provided in this application and achieve the same technical effect. To avoid repetition, it will not be described again here.

[0098] It should be understood that the chip mentioned in the embodiments of this application may also be referred to as a system-on-a-chip, system chip, chip system, or system-on-a-chip, etc.

[0099] This application also provides a computer program product, which is stored in a storage medium and executed by at least one processor to implement the various processes of the image processing method embodiment provided in this application, and can achieve the same technical effect. To avoid repetition, it will not be described again here.

[0100] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element. Furthermore, it should be noted that the scope of the methods and apparatuses in the embodiments of this application is not limited to performing functions in the order shown or discussed, but may also include performing functions substantially simultaneously or in the reverse order, depending on the functions involved. For example, the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

[0101] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a computer software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal (which may be a mobile phone, computer, server, or network device, etc.) to execute the image processing methods provided in the various embodiments of this application.

[0102] The embodiments of this application have been described above with reference to the accompanying drawings. However, this application is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms under the guidance of this application without departing from the spirit and scope of the claims, and all of these forms are within the protection scope of this application.

Claims

1. An image processing method, characterized in that, The method includes: Acquire multiple image frames captured using an under-display image sensor; Features of the first image frame are extracted to obtain a first feature map; wherein the first image frame is any one of the plurality of image frames; The first feature map is scanned using a first scanning method to obtain a first feature vector; The first feature vector is input into the spatial state space model, and the spatial state space model outputs a second feature vector, wherein the spatial state space model is used to eliminate fog and glare in the first image frame; The second feature vector is scanned using a second scanning method to obtain a second feature map; wherein the second scanning method is the reverse scanning method of the first scanning method. The second feature map is mapped to the red-green-blue color space to obtain the second image frame.

2. The method according to claim 1, characterized in that, Before extracting features from the first image frame to obtain the first feature map, the method further includes: The energy of the first pixel is reduced by orthogonally projecting the original features of the first image frame using the Hadamard rotation transformation, resulting in a third image frame, wherein the first pixel is any pixel in the spot region of the first image frame. The step of extracting features from the first image frame to obtain the first feature map includes: The features of the third image frame are extracted to obtain the first feature map.

3. The method according to claim 1, characterized in that, The method further includes: Obtain the first hidden state variable corresponding to the fourth image frame, wherein the fourth image frame is the previous image frame of the first image frame; The first feature vector and the first hidden state variable are input into the time state space model, and the time state space model outputs a second hidden state variable corresponding to the first image frame. The second hidden state variable is used to eliminate inter-frame flicker when the fourth image frame changes to the first image frame during the playback of the multiple image frames. The step of mapping the second feature map to the red-green-blue color space to obtain the second image frame includes: The second feature map and the second hidden state variable are mapped to the red-green-blue color space to obtain the second image frame.

4. The method according to claim 3, characterized in that, The spatial state model outputs a second feature vector, including: The spatial state model outputs the second feature vector using a shift quantization method. The temporal state-space model outputs a second hidden state variable corresponding to the first image, including: The time-state-space model outputs the second hidden state variable using a shift quantization method.

5. An image processing apparatus, characterized in that, The device includes: The image acquisition module is used to acquire multiple image frames collected using the under-display image sensor; A shallow feature extraction module is used to extract features from a first image frame to obtain a first feature map; wherein, the first image frame is any one of the plurality of image frames; The spatiotemporal interleaving processing module is used to scan the first feature map using a first scanning method to obtain a first feature vector; input the first feature vector into a spatial state space model, and the spatial state space model outputs a second feature vector; scan the second feature vector using a second scanning method to obtain a second feature map; wherein, the spatial state space model is used to eliminate fogging and light spots in the first image frame; the second scanning method is a reverse scanning method of the first scanning method; The image reconstruction module is used to map the second feature map to the red-green-blue color space to obtain the second image frame.

6. An electronic device, characterized in that, The electronic device includes: a preprocessing unit and a computing unit; The preprocessing unit is used to acquire multiple image frames collected by an under-display image sensor; extract features of the first image frame to obtain a first feature map; wherein the first image frame is any one of the multiple image frames; The computing unit is configured to scan the first feature map using a first scanning method to obtain a first feature vector; input the first feature vector into a spatial state space model, wherein the spatial state space model outputs a second feature vector, wherein the spatial state space model is used to eliminate fog and glare in the first image frame; scan the second feature vector using a second scanning method to obtain a second feature map; wherein the second scanning method is a reverse scanning method of the first scanning method; and map the second feature map to a red-green-blue color space to obtain a second image frame.

7. The electronic device according to claim 6, characterized in that, The preprocessing unit is also used for: The original features of the first image frame are orthogonally projected using the Hadamard rotation transformation to reduce the energy of the first pixel, resulting in a third image frame, wherein the first pixel is any pixel in the spot region of the first image frame; features of the third image frame are extracted to obtain the first feature map.

8. The electronic device according to claim 6, characterized in that, The computing unit is also used for: Obtain the first hidden state variable corresponding to the fourth image frame, wherein the fourth image frame is the previous image frame of the first image frame; The first feature vector and the first hidden state variable are input into the time state space model, and the time state space model outputs a second hidden state variable corresponding to the first image frame. The second hidden state variable is used to eliminate inter-frame flicker when the fourth image frame changes to the first image frame during the playback of the multiple image frames. The second feature map and the second hidden state variable are mapped to the red-green-blue color space to obtain the second image frame.

9. The electronic device according to claim 8, characterized in that, The computing unit includes a shift arithmetic logic subunit; The shift arithmetic logic unit is used to output the second feature vector using the shift quantization method in the spatial state space model, and to output the second hidden state variable using the shift quantization method in the temporal state space model.

10. The electronic device according to claim 8, characterized in that, The electronic device further includes: a cache unit; the cache unit includes a tile cache subunit and a state cache subunit; The tile cache subunit is used to store the first feature map; The state cache subunit is used to store the hidden state variables.