Image processing method, computing system, device, and readable storage medium
By combining inverse tone mapping neural networks and attention networks, the problems of image dynamic range and color gamut expansion are solved, achieving efficient image conversion and visual effect enhancement, especially the conversion of SDR images to HDR images.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BOE TECHNOLOGY GROUP CO LTD
- Filing Date
- 2022-03-24
- Publication Date
- 2026-06-16
AI Technical Summary
Existing technologies struggle to effectively extend the dynamic range and color gamut of images during image processing, resulting in poor image display quality.
An inverse tone mapping neural network is employed, combining a mapping network and an attention network. By generating correction coefficients, the parameters of the mapping network are corrected, expanding the dynamic range and color gamut of the image. Further image enhancement is then performed using an enhancement processing network.
It improves the visual effect of the image, especially by converting standard dynamic range images into high dynamic range images, which enhances the brightness, contrast and resolution of the image, and improves the details of light and shadow and color performance.
Smart Images

Figure CN117121049B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of image processing technology, and more specifically, to an image processing method, computing system, device, and readable storage medium. Background Technology
[0002] Artificial intelligence technology is widely used in image processing, which generally includes tasks such as image retouching and color correction, image enhancement, image denoising, super-resolution conversion, and image enhancement. For example, neural networks can be used to convert standard dynamic range (SDR) images into high dynamic range (HDR) images, perform noise reduction, and perform super-resolution conversion. The processed image can better represent the visual information of the real scene compared to the original image. Summary of the Invention
[0003] Some embodiments of this disclosure provide an image processing method, computing system, device, and readable storage medium for improving image processing effects.
[0004] According to one aspect of this disclosure, an image processing method is provided. The method includes: processing a first image using an inverse tone mapping neural network, wherein the inverse tone mapping neural network is configured to expand the dynamic range and color gamut of the first image to obtain an expanded second image, wherein the inverse tone mapping neural network includes a mapping network for implementing the expansion, and further includes an attention network, wherein the input to both the mapping network and the attention network is the first image, and the attention network processes the image content of the first image to generate correction coefficients, the correction coefficients being used to correct the parameters of the mapping network.
[0005] According to some embodiments of this disclosure, the mapping network includes a first convolutional network, an autoresidual network, and a second convolutional network. The mapping network is used to implement the extension by: processing a first image using the first convolutional network to obtain a first feature map; processing the first feature map using the autoresidual network to obtain a second feature map; and processing the second feature map using the second convolutional network to obtain a third feature map, wherein the third feature map serves as the second image, and wherein the correction coefficients are used to correct the parameters of the autoresidual network.
[0006] According to some embodiments of this disclosure, a self-residual network includes m self-residual modules connected in sequence, where m is an integer greater than 1. Processing the first feature map using the self-residual network includes: processing the received first feature map separately using a first processing path and a second processing path in the first self-residual module of the self-residual network to obtain a first residual feature map; and processing the (i-1)th residual feature map obtained from the (i-1)th self-residual module separately using the first processing path and a second processing path in the i-th self-residual module of the self-residual network to obtain the i-th residual feature map, where i is an integer greater than 1 and less than or equal to m. The first processing path includes a residual convolutional layer, and the second processing path is used for processing across the residual convolutional layer.
[0007] According to some embodiments of this disclosure, the residual feature map has n feature layers, where n is a positive integer. The attention network processes the image content of the first image to obtain a coefficient feature map with n×m feature layers. The coefficient feature map is used as a correction coefficient and is multiplied with the residual feature map to correct the parameters of the residual network.
[0008] According to some embodiments of this disclosure, a first convolutional network includes a first convolutional layer and an activation function, and a second convolutional network includes a second convolutional layer.
[0009] According to some embodiments of this disclosure, the first image is an image with a standard dynamic range, and the second image is an image with a high dynamic range.
[0010] According to some embodiments of this disclosure, the method further includes: inputting a second image into an enhancement processing network for processing to obtain an enhanced second image, wherein the enhancement processing network includes a noise reduction network and / or a color mapping network.
[0011] According to some embodiments of this disclosure, the first image is the k-th frame image in the video, and the second image is the expanded k-th frame image, where k is an integer greater than 1. The method further includes: processing the (k-1)-th frame image and the (k+1)-th frame image in the video using an inverse tone mapping neural network to obtain the expanded (k-1)-th frame image and the (k+1)-th frame image; and processing the expanded k-th, (k-1)-th, and (k+1)-th frame images using a super-resolution network to obtain the super-resolution k-th frame image, wherein the resolution of the super-resolution k-th frame image is higher than the resolution of the first image.
[0012] According to some embodiments of this disclosure, the method further includes: processing the k-th, k-1, and k+1 frames of the video using a super-resolution network to obtain a super-resolution k-th frame image, wherein the super-resolution k-th frame image is used as the first image, wherein the resolution of the first image is higher than the resolution of the k-th frame image, and wherein k is an integer greater than 1.
[0013] According to some embodiments of this disclosure, the inverse tone mapping neural network is trained using a content loss function.
[0014] According to another aspect of this disclosure, a computing system for image processing is also provided. The computing system includes: one or more processors; and one or more non-transitory computer-readable media storing instructions, wherein the instructions, when executed by the one or more processors, cause the one or more processors to perform operations, including: processing a first image using an inverse tone mapping neural network, wherein the inverse tone mapping neural network is configured to extend the dynamic range and color gamut of the first image to obtain an extended second image, wherein the inverse tone mapping neural network includes a mapping network for implementing the extension, and further includes an attention network, wherein the input to both the mapping network and the attention network is the first image, and the attention network processes the image content of the first image to generate correction coefficients, the correction coefficients being used to correct the parameters of the mapping network.
[0015] According to some embodiments of this disclosure, the mapping network includes a first convolutional network, an autoresidual network, and a second convolutional network. The mapping network is used to implement the extension by: processing a first image using the first convolutional network to obtain a first feature map; processing the first feature map using the autoresidual network to obtain a second feature map; and processing the second feature map using the second convolutional network to obtain a third feature map, wherein the third feature map serves as the second image, and wherein the correction coefficients are used to correct the parameters of the autoresidual network.
[0016] According to some embodiments of this disclosure, a self-residual network includes m self-residual modules connected in sequence, where m is an integer greater than 1. Processing the first feature map using the self-residual network includes: processing the received first feature map separately using a first processing path and a second processing path in the first self-residual module of the self-residual network to obtain a first residual feature map; and processing the (i-1)th residual feature map obtained from the (i-1)th self-residual module separately using the first processing path and a second processing path in the i-th self-residual module of the self-residual network to obtain the i-th residual feature map, where i is an integer greater than 1 and less than or equal to m. The first processing path includes a residual convolutional layer, and the second processing path is used for processing across the residual convolutional layer.
[0017] According to some embodiments of this disclosure, the residual feature map has n feature layers, where n is a positive integer. The attention network processes the image content of the first image to obtain a coefficient feature map with n×m feature layers. The coefficient feature map is used as a correction coefficient and is multiplied with the residual feature map to correct the parameters of the residual network.
[0018] According to some embodiments of this disclosure, the operation further includes: inputting the second image into an enhancement processing network for processing to obtain an enhanced second image, wherein the enhancement processing network includes a noise reduction network and / or a color mapping network.
[0019] According to some embodiments of this disclosure, the first image is the k-th frame image in the video, and the second image is the expanded k-th frame image, where k is an integer greater than 1. The operation further includes: processing the (k-1)-th frame image and the (k+1)-th frame image in the video respectively using an inverse tone mapping neural network to obtain the expanded (k-1)-th frame image and the (k+1)-th frame image; and processing the expanded k-th, (k-1)-th, and (k+1)-th frame images using a super-resolution network to obtain the super-resolution k-th frame image, wherein the resolution of the super-resolution k-th frame image is higher than the resolution of the first image.
[0020] According to some embodiments of this disclosure, the operation further includes: processing the k-th, k-1, and k+1 frames of the video using a super-resolution network to obtain the super-resolution k-th frame image, the super-resolution k-th frame image being used as a first image, wherein the resolution of the first image is higher than the resolution of the k-th frame image, and k is an integer greater than 1.
[0021] According to some embodiments of this disclosure, the inverse tone mapping neural network is trained using a content loss function, and the first image is an image with a standard dynamic range, and the second image is an image with a high dynamic range.
[0022] According to another aspect of this disclosure, an image processing apparatus is also provided, comprising: a processor; and a memory, wherein the memory stores computer-readable code that, when executed by the processor, performs the image processing method as described above.
[0023] According to another aspect of this disclosure, a computer-readable storage medium is also provided, on which instructions are stored, which, when executed by a processor, cause the processor to perform the image processing method as described above.
[0024] Using image processing methods, computing systems, devices, and readable storage media according to some embodiments of this disclosure, an inverse tone mapping neural network can be used to process an input first image, expanding the dynamic range and color gamut of the first image to obtain an expanded second image. The inverse tone mapping neural network includes a mapping network for implementing the expansion and an attention network. The input to both the mapping network and the attention network is the first image. The attention network processes the image content of the first image to generate correction coefficients, which are used to correct the parameters of the mapping network. The attention network can extract content features from the input first image, making the obtained correction coefficients closely related to the image content. These correction coefficients are then used to adjust the parameters of the mapping network, thereby improving the mapping network's ability to expand the color gamut and dynamic mapping of the image, and enhancing the visual effect of the converted image. Attached Figure Description
[0025] To more clearly illustrate the technical solutions in the embodiments of this disclosure or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0026] Figure 1 A schematic flowchart of an image processing method according to an embodiment of the present disclosure is shown;
[0027] Figure 2 A schematic structural diagram of an inverse tone mapping neural network according to an embodiment of the present disclosure is shown;
[0028] Figure 3 Another schematic structural diagram of an inverse tone mapping neural network according to an embodiment of the present disclosure is shown;
[0029] Figure 4A A network structure diagram of a mapping network according to an embodiment of the present disclosure is shown;
[0030] Figure 4B A network structure diagram of a self-residual module according to an embodiment of the present disclosure is shown;
[0031] Figure 5 A network structure diagram of an attention network according to an embodiment of the present disclosure is shown;
[0032] Figure 6 Another schematic flowchart of an image processing method according to an embodiment of the present disclosure is shown;
[0033] Figure 7A An application flowchart of the image processing method according to an embodiment of the present disclosure is shown;
[0034] Figure 7B A flowchart illustrating another application of the image processing method according to an embodiment of the present disclosure is shown;
[0035] Figure 8A A network structure diagram of a noise reduction network according to an embodiment of the present disclosure is shown;
[0036] Figure 8B A network structure diagram of the residual network ResNet in a noise reduction network according to an embodiment of the present disclosure is shown;
[0037] Figure 9A A schematic diagram of a color mapping network according to an embodiment of the present disclosure is shown;
[0038] Figure 9B A schematic diagram illustrating the training process of a color mapping network is shown.
[0039] Figure 10A A network structure diagram of a super-resolution network according to an embodiment of the present disclosure is shown;
[0040] Figure 10B A network structure diagram of an alignment network in a super-resolution network according to an embodiment of the present disclosure is shown;
[0041] Figure 11 A schematic diagram of a computing system according to an embodiment of the present disclosure is shown;
[0042] Figure 12 A schematic block diagram of an image processing apparatus according to an embodiment of the present disclosure is shown;
[0043] Figure 13 A schematic block diagram of a video processing apparatus according to an embodiment of the present disclosure is shown;
[0044] Figure 14 A schematic diagram of the architecture of an exemplary computing device according to an embodiment of the present disclosure is shown;
[0045] Figure 15 A schematic diagram of a computer storage medium according to an embodiment of the present disclosure is shown. Detailed Implementation
[0046] The technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this disclosure. All other embodiments obtained by those skilled in the art based on the embodiments of this disclosure without creative effort are within the scope of protection of this disclosure.
[0047] The terms “first,” “second,” and similar terms used in this disclosure do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Similarly, terms such as “including” or “comprising” mean that the element or object preceding the word covers the element or object listed after the word and its equivalents, without excluding other elements or objects. Terms such as “connected” or “linked” are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect.
[0048] This disclosure uses flowcharts to illustrate the steps of a method according to embodiments of this disclosure. It should be understood that the preceding or following steps are not necessarily performed in exact order. Instead, the steps can be processed in reverse order or simultaneously. Furthermore, other operations can be added to these processes.
[0049] It is understood that the technical terms and nouns used in this article have meanings known to those skilled in the art.
[0050] Artificial intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a comprehensive technology within computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to possess the functions of perception, reasoning, and decision-making.
[0051] Artificial intelligence (AI) is a comprehensive discipline encompassing a wide range of fields, including both hardware and software technologies. AI software technologies primarily include computer vision, speech processing, natural language processing, and machine learning / deep learning. For example, by training neural networks based on training samples, image processing can be implemented to transform the display of images.
[0052] This disclosure provides an image processing method based on a neural network. The method utilizes an inverse tone mapping neural network to process an input image, expanding its dynamic range and color gamut to obtain an expanded image. Specifically, the inverse tone mapping neural network includes a mapping network for implementing the expansion and an attention network. Both the mapping network and the attention network receive the input image as input. The attention network processes the image content to generate correction coefficients, which are used to correct the parameters of the mapping network. According to the image processing method of this disclosure, the attention network can extract content features from the input image, making the obtained correction coefficients closely related to the image content. These correction coefficients are then used to adjust the parameters of the mapping network, thereby improving the mapping network's ability to expand the color gamut and dynamic mapping range of the image, and enhancing the visual effect of the converted image.
[0053] In embodiments according to this disclosure, an inverse tone mapping neural network is used to enhance the dynamic range of an image, for example, converting an image originally corresponding to Standard Dynamic Range (SDR) into an image corresponding to High Dynamic Range (HDR). Compared to SDR images, HDR images use more bits to represent luminance and chrominance, resulting in greater image information and richer light and shadow details.
[0054] Figure 1 A schematic flowchart of an image processing method according to an embodiment of the present disclosure is shown. First, in step S101, a first image is processed using an inverse tone mapping neural network. The inverse tone mapping neural network is configured to expand the dynamic range and color gamut of the first image to obtain an expanded second image. The inverse tone mapping neural network may include a mapping network used to implement the aforementioned expansion. Furthermore, according to an embodiment of the present disclosure, the inverse tone mapping neural network may also include an attention network. Both the mapping network and the attention network receive the first image as input. The attention network processes the image content of the first image to generate correction coefficients, which are used to correct the parameters of the mapping network.
[0055] In the embodiments of this disclosure, during the image processing using the mapping network, an attention network is also introduced, which is used to extract features from the original input image information and generate correction coefficients. These correction coefficients, which are related to the image content information, are used to correct the parameters of the mapping network, thereby improving the mapping network's ability to expand the dynamic range and color gamut of the image, and thus improving the image display effect after inverse tone mapping.
[0056] According to some embodiments of this disclosure, the first image may be a photograph or one of the photographs in a video or image sequence, without limitation. As an example, the first image may be an image corresponding to SDR, while the processed second image may be an image corresponding to HDR, without limitation.
[0057] According to some embodiments of this disclosure, the mapping network may include a first convolutional network, a self-residual network, and a second convolutional network. The mapping network is used to implement the above-described extension steps, specifically including: processing a first image using the first convolutional network to obtain a first feature map; processing the first feature map using the self-residual network to obtain a second feature map; and processing the second feature map using the second convolutional network to obtain a third feature map, wherein the third feature map serves as the second image. According to some embodiments of this disclosure, the correction coefficients generated by the attention network are used to correct the parameters of the self-residual network.
[0058] Figure 2 A schematic structural diagram of an inverse tone mapping neural network according to an embodiment of the present disclosure is shown, such as... Figure 2 As shown, the input information for both the first convolutional network and the attention network in the mapping network is the original first image. The first convolutional network processes the received first image to obtain a first feature map A1, and the self-residual network processes the first feature map A1 to obtain a second feature map A2. Furthermore, the attention network processes the received first image to obtain correction coefficients, which are used to adjust the coefficients of the self-residual network. Next, the second convolutional network processes the received second feature map A2 to obtain a third feature map A3. This third feature map A3 is then output as the aforementioned second image, for example, as an image with a high dynamic range. The process of the attention network generating correction coefficients and the process of correcting the coefficients of the self-residual network based on these correction coefficients will be described in detail below.
[0059] According to some embodiments of this disclosure, a self-residual network may include m self-residual modules connected in sequence, where m is an integer greater than 1. Processing a first feature map using the self-residual network includes: processing the received first feature map separately using a first processing path and a second processing path in the first self-residual module of the self-residual network to obtain a first self-residual feature map; and processing the (i-1)th self-residual feature map obtained from the (i-1)th self-residual module separately using the first processing path and a second processing path in the i-th self-residual module of the self-residual network to obtain the i-th self-residual feature map, where i is an integer greater than 1 and less than or equal to m. The first processing path includes a self-residual convolutional layer, and the second processing path is used for processing across the self-residual convolutional layer. It is understood that in the self-residual network, the sequentially connected self-residual modules have the same network structure.
[0060] According to some embodiments of this disclosure, the number of feature layers of the self-residual feature map is n, where n is a positive integer. The attention network processes the image content of the first image to obtain a coefficient feature map with a feature layer number of n×m. The coefficient feature map is used as a correction coefficient and is multiplied with the self-residual feature map to correct the parameters of the self-residual network.
[0061] Figure 3 Another schematic structural diagram of an inverse tone mapping neural network according to an embodiment of the present disclosure is shown, such as Figure 3 As shown, the self-residual modules in the self-residual network can represent CARs and are connected sequentially to each other. The number of feature layers in the self-residual feature maps generated by the self-residual modules can be, for example, n = 64, meaning the feature maps have 64 feature layers. The attention network is configured to generate coefficient feature maps with n × m feature layers based on the first image, which are then used to correct the coefficients of the m self-residual modules.
[0062] As an example, for an n×m coefficient feature map, the feature map of the first to nth layers (denoted as B1) is used to correct the parameters of the first self-residual module, the feature map of the (n+1) to (2n)th layers (denoted as B2) is used to correct the parameters of the second self-residual module, and so on, the feature map of the (n×(m-1)+1) to n×mth layers (denoted as Bm) is used to correct the parameters of the mth self-residual module.
[0063] As one way of implementation, Figure 4A A network structure diagram of a mapping network according to an embodiment of the present disclosure is shown. Figure 4B A network structure diagram of a self-residual module according to an embodiment of this disclosure is shown. The following will be combined with... Figure 4A and Figure 4B The network structure of the inverse tone mapping neural network according to embodiments of the present disclosure is described.
[0064] like Figure 4A As shown, according to some embodiments of this disclosure, the first convolutional network includes a first convolutional layer Conv and an activation function ReLU. The network parameters of the first convolutional layer Conv are denoted as k1f64s1. k1 represents the kernel size of the first convolutional layer Conv (1), f64 represents the number of feature layers (64), and s1 represents the stride (1). The first convolutional network Conv uses 1×1 convolution to perform pixel-level point-to-point spatial mapping on the input first image, and then obtains a first feature map A1 via the ReLU activation function. The convolution operation is a linear operation, and the non-linear function ReLU is used to activate neurons. Furthermore, ReLU can overcome the gradient vanishing problem and accelerate the training speed of the neural network.
[0065] Next, the self-residual network consists of m sequentially connected self-residual modules (CARs), each with the same network structure. For example... Figure 4B As shown, for example, the self-residual module (CAR) includes two processing paths, one of which (the first processing path) includes a self-residual convolutional layer. The network parameters of this self-residual convolutional layer can be, for example, k1f64s1, used to perform convolution processing on the received information to extract image features. Figure 4B As shown, another processing path (second processing path) of the self-residual module is used to bypass the processing of the self-residual convolutional layer, so that the received information is directly added to the processing result of the first processing path, that is, to realize self-residual. This self-residual network structure can realize the ability of residual without introducing extra computation, effectively improving the processing and training efficiency of the model.
[0066] Then as Figure 4B As shown, the processing result in the self-residual module after the first and second processing paths will be multiplied by a correction coefficient from the attention network (Attention Layer, AttLayer) to... Figure 4B The example shown is the i-th self-residual module. Here, it is multiplied by the correction coefficient Bi. The result after multiplying by the coefficient is passed through the activation function ReLU and the result is output to the (i+1)-th self-residual module.
[0067] Combination Figure 4A and Figure 4B Understandably, regarding Figure 4AThe mapping network and network parameters k1f64s1 are used. The attention network AttLayer will generate a 64×m coefficient feature map, which will be used as coefficients B1, B2, ..., Bm. These coefficients are multiplied with the feature map before ReLU processing in the self-residual module, thus finely adjusting the output of the mapping network. This is equivalent to multiplying the output of each CAR by a matrix coefficient, which is obtained by extracting image features from the first input image through the attention network. This matrix coefficient is closely related to the content features of the input image, that is, using the feature information of the current image to finely adjust the mapping network and improve the display effect of the transformed image. Then, as... Figure 4A As shown, the first feature map A1 is processed by the self-residual network to obtain the second feature map A2. The second convolutional network receives the second feature map A2, which includes a second convolutional layer with network parameters k1f64s1, and obtains the third feature map A3.
[0068] Figure 5 The network structure diagram of the AttLayer attention network according to an embodiment of this disclosure is shown below. Figure 5 The network structure and network parameters of the attention network according to embodiments of the present disclosure are described.
[0069] like Figure 5 As shown, the AttLayer attention network includes a convolutional layer (Conv) and an activation function (ReLU). The network parameters of this convolutional layer are k1f64s1, meaning it uses a 1×1 convolution to perform pixel-level point-to-point spatial mapping on the input first image. The number of feature layers is 64, and the stride is 1. Then, the AttLayer attention network can include several CRMI network modules, and... Figure 5 The specific structure of the CRMI module is shown below. It can be understood that the number of CRMIs can be set according to processing requirements, and the network structure of each CRMI is identical. Specifically, the CRMI network module can include convolutional layers (Conv) and the ReLU activation function. Following this, the network also includes max pooling layers and instance normalization (InsNorm) layers. Pooling layers mimic the human visual system's dimensionality reduction of data, representing images with higher-level features. The purpose of implementing pooling layers is to reduce information redundancy, improve the model's scale invariance and rotation invariance, and prevent overfitting; they are generally placed after convolutional layers.
[0070] like Figure 5As shown, after the CRMI network module in the AttLayer attention network, there is a bilinear processing layer and a self-residual network. One processing path in the self-residual network includes a convolutional layer with parameters k1f64s1. Then, it is processed by ReLU and the convolutional layer. The network parameters of the convolutional layer here are set to k1f(n×m)s1, so that the number of feature layers of the coefficient feature map output by it is n×m, and it is processed by the Sigmoid function to respectively... Figure 4A The output of the CAR (Autoresidual Carrier) module in the network is multiplied by a dot to adjust the coefficients, thus modifying the features output by the autoresidual network. Since the coefficients used for adjustment are derived from attention feature extraction of the first input image, the adjusted feature map focuses more on the overall features of the input image, resulting in better image transformation. Here, it can be understood that the sigmoid function is used as the activation function in the attention network, mapping variables to the range of 0-1.
[0071] According to some embodiments of this disclosure, the image processing method may further include: inputting a second image into an enhancement processing network for processing to obtain an enhanced second image, wherein the enhancement processing network includes a noise reduction network and / or a color mapping network.
[0072] Specifically, Figure 6 Another flowchart of the image processing method is shown, such as Figure 6 As shown, the image processing method may further include steps S102 and S103. In step S102, the second image is denoised using a denoising network to obtain a third image; and in step S103, the third image is color-mapped using a color mapping network to obtain a fourth image. It is understood that the image processing method according to the embodiments of this disclosure may perform one of the above steps S102 and S103, or both. In the case of performing both, the order of steps S102 and S103 is adjustable and is not limited here.
[0073] In addition, such as Figure 6 As shown, the image processing method may further include step S104, which uses a super-resolution network to process the received three frames of images to obtain a super-resolution image, wherein the resolution of the super-resolution image is higher than that of the first image. In other words, the super-resolution network obtains the current frame image with improved resolution by processing the current frame image and its two adjacent frames.
[0074] After processing by the inverse tone mapping neural network, parameters such as brightness and contrast of the image are improved. This makes noise interference in the image more obvious. Therefore, a denoising network can be used to denoise the image and reduce noise interference. The network structure of the denoising network will be discussed below. Figure 8A and Figure 8B Describe it.
[0075] Color mapping networks are used for color mapping processing of images. For example, image processors may need an image to display warm tones, cool tones, or black and white tones with subjective color grading intentions. This type of color mapping network can be used in such cases, where multiple 3D LUT templates are utilized to map the colors of the image. The implementation process of this color mapping network will be discussed below. Figure 9A Describe it.
[0076] Furthermore, super-resolution networks are used to enhance the resolution of images to meet the requirements of super-resolution displays. For example, the original input image might have a resolution of 2K, which could be enhanced to 4K after processing by a super-resolution network. The network structure of super-resolution networks will be discussed below. Figure 10A and Figure 10B Describe it.
[0077] Figure 7A The flowchart illustrating the application of the image processing method according to an embodiment of the present disclosure is shown. As an example, assuming the first input image is a 2K SDR image, it can first be processed by an inverse tone mapping neural network to expand the dynamic range and color gamut. The second image obtained after processing will be a 2K HDR image. Then, it can be processed by a noise reduction network and a color mapping network respectively. Finally, the resolution of the image is increased to 4K by a super-resolution network to achieve the effect of super-resolution image display.
[0078] Figure 7A This document only illustrates the processing flow of image processing methods according to some embodiments of the present disclosure. It is understood that in other embodiments, for example, denoising networks and color mapping networks may be selectively applied, or their processing order may be adjusted. In some embodiments, the denoising network and color mapping network processing may be omitted, and super-resolution processing may be performed directly after inverse tone mapping processing. For example, the resolution of the first image output by the inverse tone mapping neural network may be increased from 2K to 4K.
[0079] Furthermore, in some other embodiments according to this disclosure, a super-resolution processing procedure may be performed first. Specifically, before performing inverse tone mapping, the image processing method includes: processing the k-th, k-1, and k+1 frames of the video using a super-resolution network to obtain a super-resolution k-th frame image, which serves as a first image, wherein the resolution of the first image is higher than that of the k-th frame image, and k is an integer greater than 1. That is, the resolution of the image is first improved based on the information of adjacent frames in the video, and then inverse tone mapping is performed on the improved image using an inverse tone mapping neural network to obtain an HDR image. It is understood that consecutive images in a video have similar resolutions; for example, the k-th, k-1, and k+1 frames of the video have the same resolution. The current k-th frame image and its adjacent frame images (i.e., k-1 and k+1 frames) are processed by a super-resolution network to obtain the improved k-th frame image, for example, a super-resolution image.
[0080] It is understandable that the above should be adjusted according to the actual application scenario. Figure 6 The order of steps S101-S104 shown in the figure can be adjusted, or only some of the steps can be performed.
[0081] Figure 7B Another application flowchart of the image processing method according to an embodiment of the present disclosure is shown, in Figure 7B In the example, the first input image itself has a 4K resolution, in which case the super-resolution network processing is not required.
[0082] According to some embodiments of this disclosure, since the resolution upscaling process of the super-resolution network does not change the color content information of the image, but rather supplements the image resolution with pixels, for example, super-resolution by 2 times, that is, expanding 1 pixel to 4 pixels. Therefore, to obtain a 4K HDR image, the image can be processed in the first step to obtain 4K resolution, and then a series of image processing operations such as inverse tone mapping and noise reduction can be performed on the 4K resolution image. Alternatively, preferably, inverse tone mapping and noise reduction operations can be performed on a 2K resolution image first, and then the image resolution can be upscaled to 4K in the final step, which can save the computational cost of image processing.
[0083] Figure 8A A network structure diagram of a noise reduction network according to an embodiment of the present disclosure is shown, such as... Figure 8AAs shown, as one implementation, the denoising network according to some embodiments of this disclosure adopts the Unet network structure. First, it passes through a convolutional layer Conv and an activation function ReLU. The network parameters of this convolutional layer are k3f64s2, indicating that the convolutional kernel is 3, the number of feature layers is f=64, and the stride is s=2, that is, a 2x downsampling is performed. Then, several residual networks (ResNet) can be set up. Figure 8B The diagram shows the network structure of the ResNet residual network in a denoising network. Its first processing path includes a convolutional layer k3f64s1, a ReLU activation function, and another convolutional layer k3f64s1. The second processing path bypasses these two convolutional layers and the activation function to achieve the residual structure. Similarly, the number of ResNet residual networks can be set according to actual processing needs, and no limit is imposed here.
[0084] Then as Figure 8A As shown, after ResNet, there is a Unet connection structure, meaning there are processing paths that implement cross-layer connections: Path 1 and Path 2. The denoising network also includes a deconvolutional layer DConv with network parameters k3f64s2, indicating that the convolutional kernel k=3, the number of feature layers f=64, and the stride s=2, achieving a 2x upsampling. The specific network structure and parameters of the denoising network can be found in [reference needed]. Figure 8A They will not be described in detail here. (See references.) Figure 8A The network parameters of the last convolutional layer Conv are k3f256s1, meaning the number of feature layers in the output is 256. Then, it undergoes PixelShuffle (upsampling by 2x) to output the final image. PixelShuffle can be understood as transforming a low-resolution H×W image into a high-resolution rH×rW image through sub-pixel operations. However, PixelShuffle does not achieve resolution upscaling directly through interpolation; instead, it uses periodic shuffling to obtain the high-resolution image. For example, if the network parameters of the convolutional layer before PixelShuffle are f=256 (256 feature layers), after PixelShuffle, the output image will have 64 feature layers, but with improved image resolution.
[0085] Understandably, besides Figure 8A and Figure 8B The denoising network structure shown can also be processed using other open-source denoising network models, such as ArCNN, DnCNN, MeshFlow, etc., simply by updating the model parameters according to the actual dataset.
[0086] Figure 9AA schematic diagram of a color mapping network according to an embodiment of the present disclosure is shown. According to this embodiment, the color mapping network can utilize multiple 3D lookup tables (3D LUTs) as mapping templates to achieve color mapping of images, such as color conversion to cool or warm tones. A 3D LUT can be viewed as a 3D matrix obtained by training a deep learning model. Each 3D LUT is a color mapping template, and the color mapping network can be designed with multiple templates, such as templates with arbitrary tones like black and white, warm, and cool colors, for image processing. The 3D LUT finds the corresponding output value based on the RGB values of the input image. For example, for the input (In) three color channel values (50, 50, 50), the corresponding color output value Out will be found, such as (70, 70, 70), thus achieving color conversion through the lookup table.
[0087] Compared to 1D lookup tables, which can only control single-channel color output and whose outputs are independent, 3D LUTs output three RGB channels, all of which are correlated, which is beneficial for improving color processing results. Furthermore, 3D lookup tables have a huge capacity; for example, a 64-order lookup table can have over 260,000 color output values. This makes the output of color brightness more accurate and realistic. The large capacity of the lookup table data can also store subjective information such as brightness, color, and detail preferences, which is more conducive to color mapping tasks.
[0088] According to some embodiments of this disclosure, a 3D LUT can be represented by the following formula:
[0089]
[0090] Where (i,j,k) correspond to the spatial coordinates of the R, G, and B color channels, respectively. The mapping relationship of the three-dimensional lookup table can be represented as: input pixel The mapped output pixel values are:
[0091]
[0092] Figure 9B The diagram illustrates the training process of a color mapping network. First, the current 3D LUT performs color conversion on the input image to obtain the output image. Then, a loss value is calculated based on a pre-set ground truth image and the output image. The parameters of the lookup table are adjusted based on this loss value, making the display effect of the output image obtained from the lookup table closer to the target ground truth image. This completes the training process of the lookup table. Specifically, the loss function used to calculate the loss value in the model can, for example, be a content loss function.
[0093] The loss function used in the training process described above can be a content loss function. For example, the mean squared error (MSE) loss function can be used, which calculates the sum of squares of the differences between the predicted and target values. Alternatively, the L1 norm loss function, also known as the minimum absolute deviation (LAD) function, can be used, which minimizes the sum of the absolute differences between the target value (Yi) and the predicted value (f(xi)). Another example is the L2 norm loss function, also known as the minimum squared error (LSE) function, which, in general, minimizes the sum of squares of the differences between the target value (Yi) and the predicted value (f(xi)).
[0094] Figure 10A A network structure diagram of a super-resolution network according to some embodiments of the present disclosure is shown, such as... Figure 10A As shown, a super-resolution network according to some embodiments of this disclosure can receive three frames of video images as input.
[0095] In some applications, the image processing method provided according to embodiments of this disclosure can be applied not only to single image processing but also to video processing. In this case, the first image can be one frame from a video. In implementing super-resolution extension processing, receiving three frames is beneficial for extracting more detailed image information from the video sequence, resulting in a better display effect for the upgraded image. Taking the currently processed first image as the k-th frame from the video as an example... Figure 10A The super-resolution network shown can receive the processing results of the (k-1), k, and k+1 frames of the image, for example, those that have been processed by an inverse tone mapping neural network before being processed by the super-resolution network.
[0096] Specifically, such as Figure 10A As shown, the three received frames of images are first processed through a network with the same parameters, and the parameters are shared. Figure 10A The convolutional layer k3f64s1 and the ReLU activation function are shown by the dashed line. Next, the processing results of the (k-1)th and kth frames will enter the AlignNet network, while the processing results of the (k+1)th and kth frames will enter another AlignNet network. This AlignNet network is used to align the feature information of the multi-frame input images.
[0097] Figure 10B The network structure diagram of the alignment network in the super-resolution network is shown, such as... Figure 10BAs shown, the alignment network receives two inputs and first performs feature fusion via a Contact structure. This integrates the feature map information from the two inputs, increasing the number of channels in the feature map. For example, if both the (k-1)th and kth frames have 64 feature layers, the Contact structure will produce a feature map with 128 feature layers. The specific network structures and parameters of the super-resolution network and alignment network can be found in [reference needed]. Figure 10A and Figure 10B They will not be described in detail here.
[0098] Understandably, besides Figure 10A and Figure 10B The super-resolution network structure shown can also be processed using other open-source super-resolution network models, such as DUF, EDSR, and EDVR. You only need to update the model parameters according to the actual dataset.
[0099] According to embodiments of this disclosure, the inverse tone mapping neural network is trained using a content loss function. Similarly, denoising networks and super-resolution networks can be trained separately and then combined for final use. The training process and combination of the above networks... Figure 9B The training process described is similar, and the loss function used can also be one of the loss functions mentioned above, such as L1, L2, MSE, etc., which will not be repeated here.
[0100] Using the image processing method according to some embodiments of the present disclosure, an inverse tone mapping neural network can be used to process an input first image, expanding the dynamic range and color gamut of the first image to obtain an expanded second image. The inverse tone mapping neural network includes a mapping network for implementing the expansion and an attention network. The input to both the mapping network and the attention network is the first image. The attention network processes the image content of the first image to generate correction coefficients, which are used to correct the parameters of the mapping network. The attention network can extract content features from the input first image, making the obtained correction coefficients closely related to the image content. These correction coefficients are then used to adjust the parameters of the mapping network, thereby improving the mapping network's ability to expand the color gamut and dynamic mapping of the image, and enhancing the visual effect of the converted image.
[0101] Furthermore, the image can be further processed using the noise reduction network, color mapping model, and super-resolution network provided in the embodiments of this disclosure, according to the image processing requirements, in order to improve the image display effect.
[0102] This disclosure also provides a computing system for image processing. Specifically, Figure 11 A schematic block diagram of a computing system for image processing according to an embodiment of the present disclosure is shown.
[0103] like Figure 11 As shown, a computing system 1000 for image processing may include one or more processors 1010 and one or more non-transitory computer-readable media 1020 storing instructions.
[0104] According to some embodiments of this disclosure, instructions stored in a non-transitory computer-readable medium 1020, when executed by one or more processors 1010, cause one or more processors to perform operations including: processing a first image using an inverse tone mapping neural network, wherein the inverse tone mapping neural network is configured to extend the dynamic range and color gamut of the first image to obtain an extended second image, wherein the inverse tone mapping neural network includes a mapping network, wherein the mapping network is used to implement the extension, and the inverse tone mapping neural network also includes an attention network, wherein the input to both the mapping network and the attention network is the first image, and the attention network is used to process the image content of the first image to generate correction coefficients, the correction coefficients being used to correct the parameters of the mapping network.
[0105] According to some embodiments of this disclosure, the mapping network includes a first convolutional network, an autoresidual network, and a second convolutional network. The mapping network is used to implement the extension by: processing a first image using the first convolutional network to obtain a first feature map; processing the first feature map using the autoresidual network to obtain a second feature map; and processing the second feature map using the second convolutional network to obtain a third feature map, wherein the third feature map serves as the second image, and correction coefficients are used to correct the parameters of the autoresidual network. According to some embodiments of this disclosure, the first convolutional network includes a first convolutional layer and an activation function, and the second convolutional network includes a second convolutional layer. As an example, the specific network structure of the mapping network can be referred to the above description. Figure 4A The description will not be repeated here.
[0106] According to some embodiments of this disclosure, a self-residual network includes m self-residual modules connected in sequence, where m is an integer greater than 1. Processing the first feature map using the self-residual network includes: processing the received first feature map separately using a first processing path and a second processing path in the first self-residual module of the self-residual network to obtain a first self-residual feature map; and processing the (i-1)th self-residual feature map obtained from the (i-1)th self-residual module separately using a first processing path and a second processing path in the i-th self-residual module of the self-residual network to obtain the i-th self-residual feature map, where i is an integer greater than 1 and less than or equal to m. The first processing path includes a self-residual convolutional layer, and the second processing path is used for processing across the self-residual convolutional layer. As an example, the specific network structure of the self-residual module can be referred to the above description. Figure 4B The description will not be repeated here.
[0107] According to some embodiments of this disclosure, the number of feature layers in the self-residual feature map is n, where n is a positive integer. The attention network processes the image content of the first image to obtain a coefficient feature map with an n×m feature layer. The coefficient feature map serves as correction coefficients and is multiplied by the self-residual feature map to correct the parameters of the self-residual network. As an example, the specific network structure of the attention network can be referred to the above in conjunction with... Figure 5 The description will not be repeated here.
[0108] According to some embodiments of this disclosure, the first image is an image with a standard dynamic range, and the second image is an image with a high dynamic range.
[0109] According to some embodiments of this disclosure, the above operation may further include inputting the second image into an enhancement processing network for processing to obtain an enhanced second image, wherein the enhancement processing network includes a noise reduction network and / or a color mapping network. As an example, the specific network structure of the noise reduction network can be referred to the above description. Figure 8A and Figure 8B The description will not be repeated here. Similarly, for information on color mapping networks, please refer to the above text in conjunction with... Figure 9A The description will not be repeated here. According to some embodiments of this disclosure, one or both of the above-described noise reduction and color mapping processes, or the order in which they are performed, can be selected according to actual needs, and no restrictions are placed here.
[0110] According to some embodiments of this disclosure, the first image is the k-th frame image in the video, and the second image is represented as the expanded k-th frame image, where k is an integer greater than 1.
[0111] According to some embodiments of this disclosure, the above operation may further include: processing the (k-1)th frame image and the (k+1)th frame image in the video using an inverse tone mapping neural network to obtain expanded (k-1)th frame image and (k+1)th frame image; and processing the expanded (k), (k-1), and (k+1)th frame images using a super-resolution network to obtain a super-resolution (k)th frame image, wherein the resolution of the super-resolution (k)th frame image is higher than the resolution of the first image. As an example, the specific network structure of the super-resolution network can be referred to the above description. Figure 10A and Figure 10B The description will not be repeated here. In this partial embodiment, the images in the video are first subjected to inverse tone mapping to achieve SDR to HDR conversion. Then, the obtained HDR image is subjected to super-resolution processing to obtain a super-resolution HDR image.
[0112] According to some embodiments of this disclosure, the above operation may further include: processing the k-th, k-1, and k+1 frames of the video using a super-resolution network to obtain a super-resolution k-th frame image, which is then used as a first image. The resolution of the first image obtained after super-resolution is higher than that of the k-th frame image, where k is an integer greater than 1. In this partial embodiment, the super-resolution network first processes the three frames of the video to obtain a super-resolution first image, and then an inverse tone mapping neural network performs inverse tone mapping processing on the super-resolution first image to achieve SDR to HDR conversion.
[0113] According to some embodiments of this disclosure, the inverse tone mapping neural network is trained using a content loss function.
[0114] According to another aspect of this disclosure, an image processing apparatus is also provided. Figure 12 A schematic block diagram of an image processing apparatus according to an embodiment of the present disclosure is shown.
[0115] like Figure 12 As shown, the device 2000 may include a processor 2010 and a memory 2020. According to an embodiment of this disclosure, the memory 2020 stores computer-readable code that, when executed by the processor 2010, performs the image processing method described above.
[0116] The processor 2010 can perform various actions and processes according to the program stored in the memory 2020. Specifically, the processor 2010 can be an integrated circuit chip with signal processing capabilities. The processor can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components, and can implement or execute the various methods, steps, and logic block diagrams disclosed in the embodiments of this invention. The general-purpose processor can be a microprocessor or any conventional processor, etc.
[0117] Memory 2020 stores computer-executable instruction code that, when executed by processor 2010, is used to implement the image processing method according to embodiments of the present disclosure. Memory 2020 may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory may be random access memory (RAM) used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous linked dynamic random access memory (SLDRAM), and direct memory bus random access memory (DR RAM). It should be noted that the memory of the methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
[0118] As an example, an image processing device can be implemented as a Central Processing Unit (CPU), which serves as the core of a computer system for computation and control, and is the final execution unit for information processing and program execution. Alternatively, an image processing device can also be implemented as a Graphics Processing Unit (GPU), a microprocessor specifically designed for performing image and graphics-related computations on personal computers, workstations, game consoles, and some mobile devices.
[0119] As an application scenario, the image processing method provided in this disclosure can be used to implement video image processing to achieve video feature conversion, such as converting 2K SDR video to 4K HDR video. The device integrating this image processing method can be called a video processing device, for example, implemented in the form of code in a GPU or CPU. Figure 13 A schematic block diagram of a video processing apparatus according to an embodiment of the present disclosure is shown, such as Figure 13 As shown, the hardware device can receive input video for image processing, such as dynamic range expansion, noise reduction, color mapping, and resolution enhancement, and then output the processed video for display. Furthermore, the device can also receive user input indicating which image processing operations to perform; for example, the user can choose whether to perform inverse tone mapping, or select the color mapping template, etc., without limitation. Figure 13The video processing device shown can achieve flexible video image processing.
[0120] The method or computing system according to the embodiments of this disclosure can also be used by means of... Figure 14 The architecture of the computing device 3000 shown is used for implementation. For example... Figure 14 As shown, the computing device 3000 may include a bus 3010, one or more CPUs 3020, a read-only memory (ROM) 3030, a random access memory (RAM) 3040, a communication port 3050 connected to a network, an input / output component 3060, a hard disk 3070, etc. The storage devices in the computing device 3000, such as the ROM 3030 or the hard disk 3070, may store various data or files used for processing and / or communication in the image processing method provided in this disclosure, as well as program instructions executed by the CPU. The computing device 3000 may also include a user interface 3080. Of course, Figure 14 The architecture shown is merely exemplary and can be omitted as needed when implementing different devices. Figure 14 One or more components in the computing device shown.
[0121] According to another aspect of this disclosure, a computer-readable storage medium is also provided. Figure 15 A schematic diagram 4000 of a storage medium according to the present disclosure is shown.
[0122] like Figure 15 As shown, computer storage medium 4020 stores computer-readable instructions 4010. When the computer-readable instructions 4010 are executed by a processor, the image processing method described with reference to the above figures can be performed. The computer-readable storage medium includes, but is not limited to, volatile memory and / or non-volatile memory. Volatile memory may include, for example, random access memory (RAM) and / or cache memory. Non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc. For example, computer storage medium 4020 can be connected to a computing device such as a computer, and then, when the computing device executes the computer-readable instructions 4010 stored on computer storage medium 4020, the image processing method provided according to the embodiments of this disclosure as described above can be performed.
[0123] Those skilled in the art will understand that the contents disclosed herein can be varied and modified in many ways. For example, the various devices or components described above can be implemented in hardware, or in software, firmware, or a combination of some or all of the three.
[0124] Furthermore, while this disclosure makes various references to certain elements of systems according to embodiments of this disclosure, any number of different elements may be used and operated on clients and / or servers. Elements are merely illustrative, and different aspects of the system and method may use different elements.
[0125] Those skilled in the art will understand that all or part of the steps in the above methods can be implemented by a program instructing related hardware, and the program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk. Optionally, all or part of the steps in the above embodiments can also be implemented using one or more integrated circuits. Accordingly, each module / unit in the above embodiments can be implemented in hardware or as a software functional module. This disclosure is not limited to any particular combination of hardware and software.
[0126] Unless otherwise defined, all terms used herein (including technical and scientific terms) shall have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. It should also be understood that terms such as those defined in a common dictionary shall be interpreted as having a meaning consistent with their meaning in the context of the relevant art, and not as having an idealized or highly formalized meaning, unless expressly defined herein.
[0127] The foregoing description is intended to illustrate the present disclosure and should not be construed as limiting it. While several exemplary embodiments of the present disclosure have been described, those skilled in the art will readily understand that many modifications may be made to the exemplary embodiments without departing from the novel teachings and advantages of the present disclosure. Therefore, all such modifications are intended to be included within the scope of the present disclosure as defined by the claims. It should be understood that the foregoing description is intended to illustrate the present disclosure and should not be construed as limiting it to the specific embodiments disclosed, and modifications to the disclosed embodiments and other embodiments are intended to be included within the scope of the appended claims. The present disclosure is defined by the claims and their equivalents.
Claims
1. An image processing method, comprising: The first image is processed using an inverse tone mapping neural network, wherein the inverse tone mapping neural network is configured to expand the dynamic range and color gamut of the first image to obtain an expanded second image. The inverse tone mapping neural network includes a mapping network, wherein the mapping network is used to implement the extension. The inverse tone mapping neural network also includes an attention network, wherein the input to both the mapping network and the attention network is the first image. The attention network is used to process the image content of the first image to generate correction coefficients, and the correction coefficients are used to correct the parameters of the mapping network. The attention network comprises multiple network modules, each of which includes a convolutional layer, an activation function, a pooling layer, and an instance normalization layer following the convolutional layer. In the attention network, after the network module, there is a bilinearization processing layer and a self-residual network, and one processing path in the self-residual network includes the convolutional layer.
2. The method according to claim 1, wherein, The mapping network includes a first convolutional network, a self-residual network, and a second convolutional network. The mapping network is used to implement the extension as follows: The first image is processed using the first convolutional network to obtain a first feature map; The first feature map is processed using the self-residual network to obtain a second feature map; and The second feature map is processed using the second convolutional network to obtain a third feature map, which serves as the second image. The correction coefficient is used to correct the parameters of the self-residual network.
3. The method according to claim 2, wherein, The self-residual network comprises m self-residual modules connected in sequence, where m is an integer greater than 1. The processing of the first feature map using the self-residual network includes: The first feature map received is processed separately using the first processing path and the second processing path in the first self-residual module of the self-residual network to obtain the first residual feature map. The (i-1)th residual feature map obtained by the (i-1)th self-residual module is processed separately using the first processing path and the second processing path in the i-th self-residual module of the self-residual network to obtain the i-th residual feature map, where i is an integer greater than 1 and less than or equal to m. The first processing path includes a residual convolutional layer, and the second processing path is used to process across the residual convolutional layer.
4. The method according to claim 3, wherein, The residual feature map has n feature layers, where n is a positive integer. The attention network processes the image content of the first image to obtain a coefficient feature map with n×m feature layers. The coefficient feature map is used as the correction coefficient and is multiplied by the residual feature map to correct the parameters of the self-residual network.
5. The method according to claim 2, wherein, The first convolutional network includes a first convolutional layer and an activation function, and the second convolutional network includes a second convolutional layer.
6. The method according to claim 1, wherein, The first image is an image with a standard dynamic range, and the second image is an image with a high dynamic range.
7. The method according to claim 1, further comprising: The second image is input into an enhancement processing network for processing to obtain an enhanced second image, wherein the enhancement processing network includes a noise reduction network and / or a color mapping network.
8. The method according to claim 1, wherein, The first image is the k-th frame of the video, and the second image is the expanded k-th frame, where k is an integer greater than 1. The method further includes: The inverse tone mapping neural network is used to process the (k-1)th frame and the (k+1)th frame of the video to obtain expanded (k-1)th frame and (k+1)th frame images, respectively; and The extended k-th, k-1, and k+1-th frames are processed using a super-resolution network to obtain the super-resolution k-th frame image, wherein the resolution of the super-resolution k-th frame image is higher than that of the first image.
9. The method according to claim 1, further comprising: A super-resolution network is used to process the k-th, k-1, and k+1 frames of a video to obtain a super-resolution k-th frame image. The super-resolution k-th frame image is used as the first image, wherein the resolution of the first image is higher than the resolution of the k-th frame image, and k is an integer greater than 1.
10. The method according to claim 1, wherein, The inverse tone mapping neural network is trained using a content loss function.
11. A computing system for image processing, comprising: One or more processors; as well as One or more non-transitory computer-readable media storing instructions, wherein, when executed by the one or more processors, the instructions cause the one or more processors to perform operations, the operations including: The first image is processed using an inverse tone mapping neural network, wherein the inverse tone mapping neural network is configured to expand the dynamic range and color gamut of the first image to obtain an expanded second image. The inverse tone mapping neural network includes a mapping network, wherein the mapping network is used to implement the extension. The inverse tone mapping neural network also includes an attention network, wherein the input to both the mapping network and the attention network is the first image. The attention network is used to process the image content of the first image to generate correction coefficients, and the correction coefficients are used to correct the parameters of the mapping network. The attention network comprises multiple network modules, each of which includes a convolutional layer, an activation function, a pooling layer, and an instance normalization layer following the convolutional layer. In the attention network, after the network module, there is a bilinearization processing layer and a self-residual network, and one processing path in the self-residual network includes the convolutional layer.
12. The computing system according to claim 11, wherein, The mapping network includes a first convolutional network, a self-residual network, and a second convolutional network. The mapping network is used to implement the extension as follows: The first image is processed using the first convolutional network to obtain a first feature map; The first feature map is processed using the self-residual network to obtain a second feature map; and The second feature map is processed using the second convolutional network to obtain a third feature map, which serves as the second image. The correction coefficient is used to correct the parameters of the self-residual network.
13. The computing system according to claim 12, wherein, The self-residual network comprises m self-residual modules connected in sequence, where m is an integer greater than 1. The processing of the first feature map using the self-residual network includes: The first feature map received is processed separately using the first processing path and the second processing path in the first self-residual module of the self-residual network to obtain the first residual feature map. The (i-1)th residual feature map obtained by the (i-1)th self-residual module is processed separately using the first processing path and the second processing path in the i-th self-residual module of the self-residual network to obtain the i-th residual feature map, where i is an integer greater than 1 and less than or equal to m. The first processing path includes a residual convolutional layer, and the second processing path is used to process across the residual convolutional layer.
14. The computing system according to claim 13, wherein, The residual feature map has n feature layers, where n is a positive integer. The attention network processes the image content of the first image to obtain a coefficient feature map with n×m feature layers. The coefficient feature map is used as the correction coefficient and is multiplied by the residual feature map to correct the parameters of the self-residual network.
15. The computing system according to claim 11, wherein, The operation also includes: The second image is input into an enhancement processing network for processing to obtain an enhanced second image, wherein the enhancement processing network includes a noise reduction network and / or a color mapping network.
16. The computing system according to claim 11, wherein, The first image is the k-th frame of the video, and the second image is the expanded k-th frame, where k is an integer greater than 1. The operation further includes: The inverse tone mapping neural network is used to process the (k-1)th frame and the (k+1)th frame of the video to obtain expanded (k-1)th frame and (k+1)th frame images, respectively; and The extended k-th, k-1, and k+1-th frames are processed using a super-resolution network to obtain the super-resolution k-th frame image, wherein the resolution of the super-resolution k-th frame image is higher than that of the first image.
17. The computing system according to claim 11, wherein, The operation also includes: A super-resolution network is used to process the k-th, k-1, and k+1 frames of a video to obtain a super-resolution k-th frame image. The super-resolution k-th frame image is used as the first image, wherein the resolution of the first image is higher than the resolution of the k-th frame image, and k is an integer greater than 1.
18. The computing system according to claim 11, wherein, The inverse tone mapping neural network is trained using a content loss function, wherein, The first image is an image with a standard dynamic range, and the second image is an image with a high dynamic range.
19. An image processing apparatus, comprising: processor; as well as A memory, wherein the memory stores computer-readable code that, when executed by the processor, performs the image processing method as described in any one of claims 1-10.
20. A non-transient computer-readable storage medium having instructions stored thereon, which, when executed by a processor, cause the processor to perform the image processing method as described in any one of claims 1-10.