Model training method, image processing method, device and electronic equipment

By training a network model to directly generate high dynamic range images using ambient brightness information, the ghosting problem caused by moving subjects is solved, thus improving image quality.

CN116797886BActive Publication Date: 2026-06-23VIVO MOBILE COMM CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
VIVO MOBILE COMM CO LTD
Filing Date
2023-08-11
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

When the subject is in motion, the ghosting phenomenon caused by existing image fusion methods affects image quality.

Method used

By acquiring raw image data, brightness information datasets, and high dynamic range image datasets, the first network model is trained to generate a second network model, which directly generates high dynamic range images using ambient brightness information, thus avoiding the image fusion process.

Benefits of technology

It effectively avoids ghosting caused by the movement of the subject, thus improving image quality.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116797886B_ABST
    Figure CN116797886B_ABST
Patent Text Reader

Abstract

The application discloses a model training method, an image processing method, a device and an electronic device, and belongs to the technical field of image processing. The model training method comprises the following steps: acquiring an original image dataset, a luminance information dataset and a high dynamic range image dataset, wherein the luminance information dataset comprises environmental luminance information of a shooting scene corresponding to each original image in the original image dataset, and the high dynamic range image dataset comprises a high dynamic range image corresponding to each original image in the original image dataset; training a first network model based on the original image dataset, the luminance information dataset and the high dynamic range image dataset to obtain a second network model; wherein the input of the second network model comprises original image data and corresponding environmental luminance information, and the output of the second network model comprises high dynamic range image data corresponding to the original image data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of image processing technology, specifically relating to a model training method, an image processing method, an apparatus, and an electronic device. Background Technology

[0002] Currently, backlighting and night scenes are very common shooting scenarios. Therefore, some high dynamic range (HDR) algorithms have been designed to enable the capture of HDR images in these scenarios, thereby improving shooting quality.

[0003] In existing technologies, a combination of long, medium, and short exposures is used to capture details at different brightness levels in a scene. Then, image fusion technology is used to combine low dynamic range (LDR) images with different exposure times to obtain the final HDR image. However, if the subject is in motion, the fused image may exhibit varying degrees of ghosting, affecting image quality. Summary of the Invention

[0004] The purpose of this application is to provide a model training method that can solve the problem that when the subject is in motion, the fused image will have varying degrees of ghosting, affecting image quality.

[0005] In a first aspect, embodiments of this application provide a model training method, which includes: acquiring an original image dataset, a brightness information dataset, and a high dynamic range image dataset, wherein the brightness information dataset includes ambient brightness information of the shooting scene corresponding to each original image in the original image dataset, and the high dynamic range image dataset includes a high dynamic range image corresponding to each original image in the original image dataset; training a first network model based on the original image dataset, the brightness information dataset, and the high dynamic range image dataset to obtain a second network model; wherein the input of the second network model includes the original image data and the corresponding ambient brightness information, and the output of the second network model includes the high dynamic range image data corresponding to the original image data.

[0006] Secondly, embodiments of this application provide an image processing method, which includes: inputting an image to be processed into the second network model described in the first aspect to obtain a high dynamic range image corresponding to the image to be processed.

[0007] Thirdly, embodiments of this application provide a model training apparatus, comprising: an acquisition module for acquiring an original image dataset, a brightness information dataset, and a high dynamic range image dataset, wherein the brightness information dataset includes ambient brightness information of the shooting scene corresponding to each original image in the original image dataset, and the high dynamic range image dataset includes a high dynamic range image corresponding to each original image in the original image dataset; and a training module for training a first network model based on the original image dataset, the brightness information dataset, and the high dynamic range image dataset to obtain a second network model; wherein the input of the second network model includes the original image data and the corresponding ambient brightness information, and the output of the second network model includes the high dynamic range image data corresponding to the original image data.

[0008] Fourthly, embodiments of this application provide an image processing apparatus, which includes: a processing module for inputting an image to be processed into the second network model described in the first aspect to obtain a high dynamic range image corresponding to the image to be processed.

[0009] Fifthly, embodiments of this application provide an electronic device including a processor and a memory, wherein the memory stores programs or instructions executable on the processor, and the programs or instructions, when executed by the processor, implement the steps of the method described in the first or second aspect.

[0010] In a sixth aspect, embodiments of this application provide a readable storage medium on which a program or instructions are stored, which, when executed by a processor, implement the steps of the method described in the first or second aspect.

[0011] In a seventh aspect, embodiments of this application provide a chip, the chip including a processor and a communication interface, the communication interface being coupled to the processor, the processor being used to run programs or instructions to implement the methods described in the first or second aspect.

[0012] Eighthly, embodiments of this application provide a computer program product stored in a storage medium, which is executed by at least one processor to implement the method described in the first or second aspect.

[0013] Thus, in the embodiments of this application, during the model training process, the original image data obtained after one exposure and the corresponding ambient brightness information of the shooting scene are used as a training sample. This sample is then processed by the first network model to obtain a high dynamic range (HDR) image. This HDR image is then combined with a reference HDR image to train the first network model. Based on this, the first network model is trained multiple times using the original image dataset, the brightness information dataset, and the HDR image dataset to obtain the second network model. It is evident that by inputting the original image data obtained after one exposure and the corresponding ambient brightness information into the second network model, HDR image data can be directly obtained to generate an HDR image. This means that a low dynamic range (LVR) image is used to generate a corresponding HDR image, avoiding image fusion and thus preventing ghosting phenomena of varying degrees in the image caused by the moving subject during image fusion. Attached Figure Description

[0014] Figure 1 This is one of the flowcharts of the model training method in the embodiments of this application;

[0015] Figure 2 This is a schematic diagram illustrating the model training method according to an embodiment of this application;

[0016] Figure 3 This is the second flowchart of the model training method in the embodiments of this application;

[0017] Figure 4 This is the third flowchart of the model training method in the embodiments of this application;

[0018] Figure 5 This is a flowchart of an image processing method according to an embodiment of this application;

[0019] Figure 6 This is a block diagram of a model training apparatus according to an embodiment of this application;

[0020] Figure 7 This is a block diagram of an image processing apparatus according to an embodiment of this application;

[0021] Figure 8 This is one of the hardware structure diagrams of the electronic device according to an embodiment of this application;

[0022] Figure 9 This is the second schematic diagram of the hardware structure of the electronic device according to an embodiment of this application. Detailed Implementation

[0023] The technical solutions of the embodiments of this application will be clearly described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application are within the scope of protection of this application.

[0024] The terms "first," "second," etc., used in this specification are used to distinguish similar objects and not to describe a specific order or sequence. It should be understood that such use of data can be interchanged where appropriate so that embodiments of this application can be implemented in orders other than those illustrated or described herein, and the objects distinguished by "first," "second," etc., are generally of the same class and the number of objects is not limited; for example, a first object can be one or more. Furthermore, in the specification and claims, "and / or" indicates at least one of the connected objects, and the character " / " generally indicates that the preceding and following objects are in an "or" relationship.

[0025] The model training method provided in this application embodiment can be executed by the model training device provided in this application embodiment, or by an electronic device integrating the model training device, wherein the model training device can be implemented in hardware or software.

[0026] The model training method provided in this application will be described in detail below with reference to the accompanying drawings, through specific embodiments and application scenarios.

[0027] Figure 1 A flowchart of a model training method according to an embodiment of this application is shown, with an example of the method being applied to an electronic device. The method includes:

[0028] Step 110: Obtain the original image dataset, the brightness information dataset, and the high dynamic range image dataset. The brightness information dataset includes the ambient brightness information of the shooting scene corresponding to each original image in the original image dataset, and the high dynamic range image dataset includes the high dynamic range image corresponding to each original image in the original image dataset.

[0029] In this embodiment, the training dataset consists of three parts: the original image dataset, the brightness information dataset, and the high dynamic range image dataset. The original image dataset includes image data from multiple original images taken with normal exposure. evo The brightness information dataset includes the ambient brightness information of the shooting scene corresponding to each original image. lux The high dynamic range image dataset includes high dynamic range images (I) taken in the same shooting scene as the original images. hdr′ .

[0030] Among them, the image data I of one original image evo The corresponding ambient brightness information I lux and the corresponding high dynamic range image I hdr′ This is a training sample.

[0031] Step 120: Train the first network model based on the original image dataset, brightness information dataset, and high dynamic range image dataset to obtain the second network model.

[0032] The input to the second network model includes the original image data and the corresponding ambient brightness information, and the output of the second network model includes the high dynamic range image data corresponding to the original image data.

[0033] In this step, a second network model is obtained based on the training of the first network model. In practical applications, the original image data taken under normal exposure based on the shooting scene, as well as the ambient brightness information of the shooting scene, are input into the second network model. The second network model outputs high dynamic range image data, and a high dynamic range image can be obtained based on the high dynamic range image data.

[0034] Thus, in the embodiments of this application, during the model training process, the original image data obtained after one exposure and the corresponding ambient brightness information of the shooting scene are used as a training sample. This sample is then processed by the first network model to obtain a high dynamic range (HDR) image. This HDR image is then combined with a reference HDR image to train the first network model. Based on this, the first network model is trained multiple times using the original image dataset, the brightness information dataset, and the HDR image dataset to obtain the second network model. It is evident that by inputting the original image data obtained after one exposure and the corresponding ambient brightness information into the second network model, HDR image data can be directly obtained to generate an HDR image. This means that a low dynamic range (LVR) image is used to generate a corresponding HDR image, avoiding image fusion and thus preventing ghosting phenomena of varying degrees in the image caused by the moving subject during image fusion.

[0035] In some embodiments of the model training method of this application, the first network model includes an encoder and a mapping network.

[0036] In the process of this embodiment, step 120 includes:

[0037] Sub-step A1: Input the original image data from the original image dataset and the corresponding ambient brightness information from the brightness information dataset into the encoder to generate the first feature data and the second feature data.

[0038] In this step, taking a training sample as an example, the original image data I... evo and the corresponding ambient brightness information Ilux The input is sent to the encoder (Enc).

[0039] In some embodiments, the encoder (Enc) uses a pre-trained Visual Geometry Group (VGG)-19 network, which consists of convolutional layers, pooling layers, and activation layers.

[0040] Furthermore, the encoder (Enc) outputs first feature data and second feature data.

[0041] Sub-step A2: Input the first feature data and the second feature data into the mapping network to obtain the third feature data, which indicates the feature information of the high dynamic range image.

[0042] In this step, the first and second feature data are input into the MapNet network, and the third feature data is output.

[0043] The third feature data indicates the feature information of the high dynamic range image, denoted as t. hdr =Mapnet(t ev0 ,t lux The feature information includes pixel values ​​at various locations in the high dynamic range image.

[0044] In this embodiment, the original image data and ambient brightness information are input into the encoder of the first network model. The encoder outputs corresponding first and second feature data. The output feature data expresses the original image data and ambient brightness information in a specific form, which can be used to output third feature data after being input into the mapping network in the next step. The third feature data is used to indicate the feature information of the high dynamic range image, and the feature information of the high dynamic range image can be used to generate a high dynamic range image. It can be seen that the two steps of processing the original image data and ambient brightness information based on this embodiment are used for the subsequent autonomous generation of high dynamic range images by the first network model.

[0045] In the flow of the model training method in some embodiments of this application, step A2 includes:

[0046] Sub-step B1: Expand the first feature data based on the first bit value to obtain the first expanded feature data; expand the second feature data based on the second bit value to obtain the second expanded feature data; the first bit value is the value of the first position in the first feature data, and the second bit value is the value of the second feature data corresponding to the first position.

[0047] Sub-step B2: Perform a dot product of the first extended feature data and the second extended feature data to obtain the first mapping data, which indicates the mapping data of the first position.

[0048] Sub-step B3: Based on the first mapping data and the second mapping data of the third position around the first position, obtain the third feature data.

[0049] In this embodiment, the algorithm involved in the mapping network is proposed.

[0050] In the mapping network, Formula 1 is used:

[0051]

[0052] In Formula 1, padding is used to represent the expansion of a bit value, where edge bits are expanded using the emission expansion method.

[0053] The values ​​corresponding to the same position (taking the first position as an example) of the first feature data and the second feature data are expanded respectively to obtain the first expanded feature data and the second expanded feature data respectively; further, the first expanded feature data and the second expanded feature data are multiplied by a dot to obtain the first mapping data, which is the content represented by the first term in Formula 1.

[0054] The first mapping data is used to represent the mapping data of the first position.

[0055] Furthermore, to obtain the third feature data, the first mapping data is needed, and the second mapping data of the third position around the position (continuing to take the first position as an example) is fused through the weight parameter β.

[0056] For example, taking the first position as an example, after expansion, we can obtain nine positions in a "3×3" matrix. Therefore, when obtaining the feature information of the first position in the high dynamic range image, on the one hand, we need to obtain the mapping data of the first position, and on the other hand, we need to fuse the mapping data of the other eight positions through the weight parameter β.

[0057] In some embodiments, the weight parameter β is a 3×3 matrix, and satisfies β 2,2 =1, and the weight parameter β is the weight parameter corresponding to any third position after expansion.

[0058] In this embodiment, the same length (H), width (W), and number of feature channels (C) are used to represent the same position in the first feature data, the second feature data, and the third feature data.

[0059] Correspondingly, the latter term in Formula 1 represents the second mapping data of at least one third position fused through the weight parameter β. The number of third positions varies depending on the expansion method, and there may be at least one.

[0060] See Figure 2This describes the computation process in the mapping network. The values ​​corresponding to the same position (e.g., the first position) in the first feature data and the second feature data are expanded and multiplied by a dot product, and then fused by the weight parameter β, so as to obtain the pixel value in the third feature data corresponding to the first position.

[0061] In this embodiment, the mapping network maps the value corresponding to each position in the first feature data to the value corresponding to each position in the second feature data one-to-one. Simultaneously, it maps and merges the values ​​of the surrounding positions at each position, thereby avoiding image tearing caused by abrupt changes in the mapping at a certain position. Thus, in this embodiment, ambient brightness information is used to map the values ​​of each position in the original image to the corresponding values ​​in the high dynamic range image, thereby obtaining the high dynamic range image.

[0062] In some embodiments of the model training method of this application, the first network model includes a decoder.

[0063] In the process of this embodiment, step 120 includes:

[0064] Sub-step C1: Input the third feature data into the decoder and output the first high dynamic range image.

[0065] In some embodiments, the decoder (Dec) employs an inverted VGG structure, consisting of convolutional layers, upsampling layers, and activation layers, with t hdr As input, the decoder outputs a first high dynamic range image, where the first high dynamic range image is denoted as I. hdr =Dec(t) hdr ).

[0066] In this embodiment, based on the aforementioned third feature data, and with the support of the decoder's algorithm, a first high dynamic range image is output for type discrimination between the image and a reference high dynamic range image.

[0067] In the model training method flow of some embodiments of this application, step 120 further includes:

[0068] Sub-step D1: Calculate the content loss based on the first high dynamic range image and the corresponding original image data in the original image dataset; calculate the brightness information loss based on the first high dynamic range image and the corresponding ambient brightness information in the brightness information dataset.

[0069] During training, the first and second feature data are obtained from the encoder, and the ReLU of the encoder is saved. 4_1 The output of the layer and the ReLU... 1_1 The output of the layer.

[0070] For reference, content loss Defined as the following function:

[0071]

[0072] in, Represents the ReLU of the encoder 4_1 The output of the layer.

[0073] For reference, the loss of brightness information Defined as the following function:

[0074]

[0075] in, Represents the ReLU of the encoder 1_1 The output of the layer.

[0076] Sub-step D2: Update the parameters of the encoder, mapping network, and decoder based on content loss and luminance information loss.

[0077] In this step, content loss is calculated. and brightness information loss The parameters of the encoder (Enc), decoder (Dec), and mapping network (MapNet) are updated using the backpropagation algorithm.

[0078] In this embodiment, content loss is calculated between the image data of the first high dynamic range image and the corresponding original image data; and luminance information loss is calculated between the image data of the first high dynamic range image and the corresponding ambient luminance information. Furthermore, based on the content loss and luminance information loss, the parameters in the relevant encoder, mapping network, and decoder are updated to make the data processing by the encoder, mapping network, and decoder more accurate.

[0079] In some embodiments of the model training method of this application, the first network model further includes a discriminator.

[0080] In the process of this embodiment, step 120 includes:

[0081] Sub-step E1: Input the first high dynamic range image and the corresponding second high dynamic range image from the high dynamic range image dataset into the discriminator for classification processing to obtain the classification result.

[0082] In some embodiments, the discriminator (D) is a binary classification network consisting of convolutional layers, pooling layers, and activation layers, with a fully connected layer at the end of the network to perform the classification function.

[0083] The generated first high dynamic range image and a reference second high dynamic range image are used as inputs. A discriminator (D) is used to classify the first and second high dynamic range images, resulting in a classification result. This classification result indicates whether the first high dynamic range image is the second high dynamic range image or not. The classification result can be expressed as P = D(I hdr,hdr′ ). Among them, I hdr Used to represent the first high dynamic range image, I hdr′ The value of P is used to represent the second highest dynamic range image. The larger the value of P, the greater the probability that the first highest dynamic range image is the second highest dynamic range image.

[0084] Sub-step E2: Calculate the generation adversarial network loss based on the classification results.

[0085] For reference, To generate the adversarial network loss, it is defined as the following function:

[0086]

[0087] Sub-step E3: Update the parameters of the encoder, decoder, and discriminator based on the generative adversarial network loss.

[0088] In this step, the adversarial loss is calculated. The backpropagation algorithm is then used to update the parameters of the encoder (Enc), decoder (Dec), and discriminator (D).

[0089] In this context, combining with the previous embodiment, the training of the entire model can be described as a minimization problem: Therefore, the loss function for model training is constructed as follows:

[0090]

[0091] in, λ is used to represent the generative adversarial network loss, content loss, and luminance information loss, respectively. gan , λ c , λ lux The weights of the three loss functions can be set as λ during model training. gan =0.5, λ c =1, λ lux =0.6, which can be adaptively adjusted during training based on model convergence and the quality of high dynamic range image generation.

[0092] The model is trained using a large number of training samples and the loss function constructed above, until the model converges.

[0093] In this embodiment, for the generated first high dynamic range image and the second high dynamic range image in the training set, the discriminator performs classification processing to obtain the probability that the first high dynamic range image is classified as the second high dynamic range image. Based on the obtained result, the parameters in the encoder, mapping network, decoder and discriminator are updated to make the difference between the first high dynamic range image and the second high dynamic range image minimal, thereby ensuring that the trained network model can be used to output accurate high dynamic range images.

[0094] In the model training methods of some embodiments of this application, the training steps are as follows: Figure 3 and Figure 4 As shown.

[0095] Step 1: Extract a training sample from the training dataset, including the original image data I evo and ambient brightness information I lux Input the encoder (Enc) to obtain the corresponding feature data t ev0 t lux And save the encoder's ReLU number. 1_1 The output of the layer.

[0096] Step 2: Utilize MapNet to fuse feature data t ev0 t lux This is how the third feature data is obtained.

[0097] Step 3: Generate the first high dynamic range image I using the decoder (Dec). hdr .

[0098] Step 4: Calculate content loss and brightness information loss The parameters of the encoder (Enc), decoder (Dec), and mapping network (MapNet) are updated using the backpropagation algorithm.

[0099] Step 5: Use discriminator D to analyze the first high dynamic range image I. hdr Second High Dynamic Range Image I hdr′ Classify them.

[0100] Step Six: Calculate the adversarial loss The backpropagation algorithm is then used to update the parameters of the encoder (Enc), decoder (Dec), and discriminator (D).

[0101] Step 7: Repeat steps 1 through 6 using multiple training samples until the loss function is satisfied. It converges and tends to stabilize.

[0102] In this application, network iteration makes the high dynamic range image generated by the first network model more detailed.

[0103] The image processing method provided in this application embodiment can be executed by the image processing device provided in this application embodiment, or by an electronic device integrating the image processing device, wherein the image processing device can be implemented in hardware or software.

[0104] The image processing method provided in this application will be described in detail below with reference to the accompanying drawings, through specific embodiments and application scenarios.

[0105] Figure 5 A flowchart of an image processing method according to an embodiment of this application is shown, exemplified by the method being applied to an electronic device running the target model described in the foregoing embodiment. The method includes:

[0106] Step 130: Input the image to be processed into the second network model to obtain the high dynamic range image corresponding to the image to be processed.

[0107] In some embodiments, the sensor is exposed normally to acquire image data of the image to be processed, while an auxiliary photosensitive device is used to acquire ambient brightness information of the corresponding shooting scene.

[0108] The second network model includes the encoder, mapping network, and decoder from the first network model.

[0109] In this embodiment, the image data of the image to be processed and the ambient brightness information are input into the second network model, and a high dynamic range image is output.

[0110] High dynamic range (HMR) images are represented as: I hdr =G(I evo I lux ).

[0111] Thus, in the embodiments of this application, during the model training process, the original image data obtained after one exposure and the corresponding ambient brightness information of the shooting scene are used as a training sample. This sample is then processed by the first network model to obtain a high dynamic range (HDR) image. This HDR image is then combined with a reference HDR image to train the first network model. Based on this, the first network model is trained multiple times using the original image dataset, the brightness information dataset, and the HDR image dataset to obtain the second network model. It is evident that by inputting the original image data obtained after one exposure and the corresponding ambient brightness information into the second network model, HDR image data can be directly obtained to generate an HDR image. This means that a low dynamic range (LVR) image is used to generate a corresponding HDR image, avoiding image fusion and thus preventing ghosting phenomena of varying degrees in the image caused by the moving subject during image fusion.

[0112] In summary, the purpose of this application is to provide a method for generating single-frame high dynamic range (HDR) images based on neural networks. This method utilizes image data acquired through normal exposure by a sensor and ambient brightness information of the shooting scene obtained through a photosensitive device (or a black-and-white sensor), directly generating HDR images through a neural network. The neural network uses a Generative Adversarial Network (GAN) as its overall framework, consisting of a generator network and a discriminator network. The generator network includes an encoder, a mapping network, and a decoder, while the discriminator network includes a discriminator. Specifically, the generator network can process the image data I of the HDR image obtained through normal exposure by the sensor. evo and ambient brightness information I lux As input, the corresponding pixel value is adjusted based on the ambient brightness information of the corresponding pixel (where the pixel value at the corresponding position is obtained through exposure via a monochrome sensor, and the ambient brightness information is obtained through the sensor), so that high image detail can be obtained in different brightness areas, achieving the goal of generating high dynamic range images. The discriminator network is only used in the training of the network model. Compared with traditional methods, it no longer performs the fusion process of multiple low dynamic range images with different exposures, but directly uses a single low dynamic range image to generate the corresponding high dynamic range image, improving the performance and power consumption of electronic devices, and avoiding the ghosting problem caused by image fusion.

[0113] The model training method provided in this application can be executed by a model training device. This application uses an example of a model training device executing the model training method to illustrate the model training device provided in this application.

[0114] Figure 6 A block diagram of a model training apparatus according to an embodiment of this application is shown. The apparatus includes:

[0115] The acquisition module 10 is used to acquire the original image dataset, the brightness information dataset, and the high dynamic range image dataset. The brightness information dataset includes the ambient brightness information of the shooting scene corresponding to each original image in the original image dataset, and the high dynamic range image dataset includes the high dynamic range image corresponding to each original image in the original image dataset.

[0116] Training module 20 is used to train the first network model based on the original image dataset, brightness information dataset and high dynamic range image dataset to obtain the second network model;

[0117] The input to the second network model includes the original image data and the corresponding ambient brightness information, and the output of the second network model includes the high dynamic range image data corresponding to the original image data.

[0118] Thus, in the embodiments of this application, during the model training process, the original image data obtained after one exposure and the corresponding ambient brightness information of the shooting scene are used as a training sample. This sample is then processed by the first network model to obtain a high dynamic range (HDR) image. This HDR image is then combined with a reference HDR image to train the first network model. Based on this, the first network model is trained multiple times using the original image dataset, the brightness information dataset, and the HDR image dataset to obtain the second network model. It is evident that by inputting the original image data obtained after one exposure and the corresponding ambient brightness information into the second network model, HDR image data can be directly obtained to generate an HDR image. This means that a low dynamic range (LVR) image is used to generate a corresponding HDR image, avoiding image fusion and thus preventing ghosting phenomena of varying degrees in the image caused by the moving subject during image fusion.

[0119] In some implementations, the first network model includes an encoder and a mapping network; the training module 20 is also used for:

[0120] The original image data from the original image dataset and the corresponding ambient brightness information from the brightness information dataset are input into the encoder to generate the first feature data and the second feature data.

[0121] The first and second feature data are input into the mapping network to obtain the third feature data, which indicates the feature information of the high dynamic range image.

[0122] In some embodiments, the training module 20 is further configured to:

[0123] The first feature data is expanded based on the first bit value to obtain the first expanded feature data, and the second feature data is expanded based on the second bit value to obtain the second expanded feature data; the first bit value is the value of the first position in the first feature data, and the second bit value is the value of the second feature data corresponding to the first position.

[0124] The first extended feature data and the second extended feature data are multiplied by a dot to obtain the first mapping data, which indicates the mapping data of the first position.

[0125] The third feature data is obtained based on the first mapping data and the second mapping data of the third position around the first position.

[0126] In some embodiments, the first network model includes a decoder; the training module 20 is further configured to:

[0127] The third feature data is input into the decoder, and the first high dynamic range image is output.

[0128] In some embodiments, the training module 20 is further configured to:

[0129] Based on the first high dynamic range image and the corresponding original image data in the original image dataset, calculate the content loss; based on the first high dynamic range image and the corresponding ambient brightness information in the brightness information dataset, calculate the brightness information loss.

[0130] The parameters of the encoder, mapping network, and decoder are updated based on content loss and luminance information loss.

[0131] In some embodiments, the first network model further includes a discriminator; the training module 20 is also configured to:

[0132] Input the first high dynamic range image and the corresponding second high dynamic range image from the high dynamic range image dataset into the discriminator for classification processing to obtain the classification result.

[0133] The loss of the adversarial network is calculated based on the classification results.

[0134] The parameters of the encoder, decoder, and discriminator are updated based on the generative adversarial network loss.

[0135] The image processing method provided in this application can be executed by an image processing device. This application uses an image processing device executing a model training method as an example to illustrate the image processing device provided in this application.

[0136] Figure 7 A block diagram of an image processing apparatus according to an embodiment of this application is shown. The apparatus includes:

[0137] The processing module 30 is used to input the image to be processed into the second network model in the aforementioned embodiment to obtain a high dynamic range image corresponding to the image to be processed.

[0138] Thus, in the embodiments of this application, during the model training process, the original image data obtained after one exposure and the corresponding ambient brightness information of the shooting scene are used as a training sample. This sample is then processed by the first network model to obtain a high dynamic range (HDR) image. This HDR image is then combined with a reference HDR image to train the first network model. Based on this, the first network model is trained multiple times using the original image dataset, the brightness information dataset, and the HDR image dataset to obtain the second network model. It is evident that by inputting the original image data obtained after one exposure and the corresponding ambient brightness information into the second network model, HDR image data can be directly obtained to generate an HDR image. This means that a low dynamic range (LVR) image is used to generate a corresponding HDR image, avoiding image fusion and thus preventing ghosting phenomena of varying degrees in the image caused by the moving subject during image fusion.

[0139] The model training device or image processing device in this application embodiment can be an electronic device or a component in an electronic device, such as an integrated circuit or a chip. The electronic device can be a terminal or other devices besides a terminal. For example, the electronic device can be a mobile phone, tablet computer, laptop computer, handheld computer, in-vehicle electronic device, mobile internet device (MID), augmented reality (AR) / virtual reality (VR) device, robot, wearable device, ultra-mobile personal computer (UMPC), netbook, or personal digital assistant (PDA), etc. It can also be a server, network attached storage (NAS), personal computer (PC), television (TV), ATM, or self-service machine, etc. The embodiments of this application do not specifically limit the scope.

[0140] The model training device or image processing device in the embodiments of this application can be a device with an action system. The action system can be an Android action system, an iOS action system, or other possible action systems, and the embodiments of this application do not specifically limit it.

[0141] The model training device or image processing device provided in this application embodiment can implement the various processes implemented in the above method embodiments, and will not be described again here to avoid repetition.

[0142] In some embodiments, such as Figure 8 As shown, this application embodiment also provides an electronic device 100, including a processor 101, a memory 102, and a program or instructions stored in the memory 102 and executable on the processor 101. When the program or instructions are executed by the processor 101, they implement the various steps of any of the above-described model training methods or image processing method embodiments and can achieve the same technical effect. To avoid repetition, they will not be described again here.

[0143] It should be noted that the electronic devices in the embodiments of this application include the mobile electronic devices and non-mobile electronic devices described above.

[0144] Figure 9 A schematic diagram of the hardware structure of an electronic device to implement an embodiment of this application.

[0145] The electronic device 1000 includes, but is not limited to, the following components: radio frequency unit 1001, network module 1002, audio output unit 1003, input unit 1004, sensor 1005, display unit 1006, user input unit 1007, interface unit 1008, memory 1009, processor 1010, camera 1011, etc.

[0146] Those skilled in the art will understand that the electronic device 1000 may also include a power supply (such as a battery) for supplying power to various components. The power supply may be logically connected to the processor 1010 through a power management system, thereby enabling functions such as managing charging, discharging, and power consumption through the power management system. Figure 9 The electronic device structure shown does not constitute a limitation on the electronic device. The electronic device may include more or fewer components than shown, or combine certain components, or have different component arrangements, which will not be elaborated here.

[0147] In one electronic device, a processor 1010 is configured to acquire an original image dataset, a brightness information dataset, and a high dynamic range (HDR) image dataset. The brightness information dataset includes ambient brightness information of the shooting scene corresponding to each original image in the original image dataset. The HDR image dataset includes HDR images corresponding to each original image in the original image dataset. A first network model is trained based on the original image dataset, the brightness information dataset, and the HDR image dataset to obtain a second network model. The input of the second network model includes the original image data and the corresponding ambient brightness information, and the output of the second network model includes the HDR image data corresponding to the original image data.

[0148] Thus, in the embodiments of this application, during the model training process, the original image data obtained after one exposure and the corresponding ambient brightness information of the shooting scene are used as a training sample. This sample is then processed by the first network model to obtain a high dynamic range (HDR) image. This HDR image is then combined with a reference HDR image to train the first network model. Based on this, the first network model is trained multiple times using the original image dataset, the brightness information dataset, and the HDR image dataset to obtain the second network model. It is evident that by inputting the original image data obtained after one exposure and the corresponding ambient brightness information into the second network model, HDR image data can be directly obtained to generate an HDR image. This means that a low dynamic range (LVR) image is used to generate a corresponding HDR image, avoiding image fusion and thus preventing ghosting phenomena of varying degrees in the image caused by the moving subject during image fusion.

[0149] In some embodiments, the first network model includes an encoder and a mapping network; the processor 1010 is further configured to input the original image data in the original image dataset and the corresponding ambient brightness information in the brightness information dataset into the encoder to generate first feature data and second feature data; input the first feature data and the second feature data into the mapping network to obtain third feature data, wherein the third feature data indicates the feature information of the high dynamic range image.

[0150] In some embodiments, the processor 1010 is further configured to: expand the first feature data based on a first bit value to obtain first expanded feature data; expand the second feature data based on a second bit value to obtain second expanded feature data; wherein the first bit value is the value of a first position in the first feature data, and the second bit value is the value of the second feature data corresponding to the first position; perform a dot product on the first expanded feature data and the second expanded feature data to obtain first mapping data, wherein the first mapping data indicates the mapping data of the first position; and obtain the third feature data based on the first mapping data and the second mapping data of a third position surrounding the first position.

[0151] In some embodiments, the first network model includes a decoder; the processor 1010 is further configured to input the third feature data into the decoder and output a first high dynamic range image.

[0152] In some embodiments, the processor 1010 is further configured to calculate content loss based on the first high dynamic range image and the corresponding original image data in the original image dataset; calculate luminance information loss based on the ambient luminance information corresponding to the first high dynamic range image and the luminance information dataset; and update the parameters of the encoder, the mapping network, and the decoder based on the content loss and the luminance information loss.

[0153] In some embodiments, the first network model further includes a discriminator; the processor 10101 is further configured to input the first high dynamic range image and a second high dynamic range image corresponding to the high dynamic range image dataset into the discriminator for classification processing to obtain a classification processing result; calculate a generative adversarial network loss based on the classification processing result; and update the parameters of the encoder, the decoder, and the discriminator based on the generative adversarial network loss.

[0154] In another electronic device, a processor 1010 is used to input the image to be processed into the second network model to obtain a high dynamic range image corresponding to the image to be processed.

[0155] Thus, in the embodiments of this application, during the model training process, the original image data obtained after one exposure and the corresponding ambient brightness information of the shooting scene are used as a training sample. This sample is then processed by the first network model to obtain a high dynamic range (HDR) image. This HDR image is then combined with a reference HDR image to train the first network model. Based on this, the first network model is trained multiple times using the original image dataset, the brightness information dataset, and the HDR image dataset to obtain the second network model. It is evident that by inputting the original image data obtained after one exposure and the corresponding ambient brightness information into the second network model, HDR image data can be directly obtained to generate an HDR image. This means that a low dynamic range (LVR) image is used to generate a corresponding HDR image, avoiding image fusion and thus preventing ghosting phenomena of varying degrees in the image caused by the moving subject during image fusion.

[0156] In summary, the purpose of this application is to provide a method for generating single-frame high dynamic range (HDR) images based on neural networks. This method utilizes image exposure information data obtained from normal exposure of a sensor and ambient brightness information of the shooting scene obtained from a photosensitive device (or a monochrome sensor). HDR images are then directly generated through a neural network. The neural network uses a Generative Adversarial Network (GAN) as its overall framework, consisting of a generator network and a discriminator network. The generator network includes an encoder, a mapping network, and a decoder, while the discriminator network includes a discriminator. Specifically, the generator network can generate HDR images from the low dynamic range images obtained from normal exposure of the sensor. evo and ambient brightness information data I lux As input, the corresponding pixel intensity value is adjusted based on the ambient brightness information of the corresponding pixel (where the pixel value at the corresponding position is obtained through exposure via a monochrome sensor, and the ambient brightness information is obtained through the sensor), so that high image detail can be obtained in different brightness areas, achieving the goal of generating high dynamic range images. The discriminator network is only used in the training of the network model. Compared with traditional methods, it no longer performs the fusion process of multiple low dynamic range images with different exposures, but directly uses a single low dynamic range image to generate the corresponding high dynamic range image, improving the performance and power consumption of electronic devices, and avoiding the ghosting problem caused by image fusion.

[0157] It should be understood that, in this embodiment, the input unit 1004 may include a graphics processing unit (GPU) 10041 and a microphone 10042. The GPU 10041 processes image data of still images or video images obtained by an image capture device (such as a camera) in video image capture mode or image capture mode. The display unit 1006 may include a display panel 10061, which may be configured in the form of a liquid crystal display, an organic light-emitting diode, etc. The user input unit 1007 includes at least one of a touch panel 10071 and other input devices 10072. The touch panel 10071 is also called a touch screen. The touch panel 10071 may include a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, physical keyboards, function keys (such as volume control buttons, power buttons, etc.), trackballs, mice, and joysticks, which will not be described in detail here. The memory 1009 can be used to store software programs and various data, including but not limited to applications and motion systems. Processor 1010 may integrate an application processor and a modem processor. The application processor mainly handles the action system, user page, and applications, while the modem processor mainly handles wireless communication. It is understood that the modem processor may also not be integrated into processor 1010.

[0158] The memory 1009 can be used to store software programs and various data. The memory 1009 may primarily include a first storage area for storing programs or instructions and a second storage area for storing data. The first storage area may store the operating system, application programs or instructions required for at least one function (such as sound playback, image playback, etc.). Furthermore, the memory 1009 may include volatile memory or non-volatile memory, or both. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be random access memory (RAM), static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link dynamic random access memory (SLDRAM), and direct memory bus RAM (DRRAM). The memory 1009 in this embodiment includes, but is not limited to, these and any other suitable types of memory.

[0159] The processor 1010 may include one or more processing units; optionally, the processor 1010 integrates an application processor and a modem processor, wherein the application processor mainly handles operations involving the operating system, user interface, and applications, and the modem processor mainly handles wireless communication signals, such as a baseband processor. It is understood that the aforementioned modem processor may also not be integrated into the processor 1010.

[0160] This application also provides a readable storage medium storing a program or instructions. When the program or instructions are executed by a processor, they implement the various processes of the above-described model training method or image processing method embodiments and achieve the same technical effect. To avoid repetition, further details are omitted here.

[0161] The processor is the processor in the electronic device described in the above embodiments. The readable storage medium includes computer-readable storage media, such as computer read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk.

[0162] This application embodiment also provides a chip, which includes a processor and a communication interface. The communication interface and the processor are coupled. The processor is used to run programs or instructions to implement the various processes of the above-described model training method or image processing method embodiments, and can achieve the same technical effect. To avoid repetition, it will not be described again here.

[0163] It should be understood that the chip mentioned in the embodiments of this application may also be referred to as a system-on-a-chip, system chip, chip system, or system-on-a-chip, etc.

[0164] This application provides a computer program product, which is stored in a storage medium and executed by at least one processor to implement the various processes of the above-described model training method or image processing method embodiments, and can achieve the same technical effect. To avoid repetition, further details are omitted here.

[0165] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element. Furthermore, it should be noted that the scope of the methods and apparatuses in the embodiments of this application is not limited to performing functions in the order shown or discussed, but may also include performing functions substantially simultaneously or in the reverse order, depending on the functions involved. For example, the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

[0166] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a computer software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods of the various embodiments of this application.

[0167] The embodiments of this application have been described above with reference to the accompanying drawings. However, this application is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms under the guidance of this application without departing from the spirit and scope of the claims, and all of these forms are within the protection scope of this application.

Claims

1. A model training method, characterized in that, The method includes: Acquire the original image dataset, the brightness information dataset, and the high dynamic range image dataset. The brightness information dataset includes the ambient brightness information of the shooting scene corresponding to each original image in the original image dataset. The high dynamic range image dataset includes the high dynamic range image corresponding to each original image in the original image dataset. The original image dataset includes image data of multiple original images taken with normal exposure. The first network model is trained based on the original image dataset, the brightness information dataset, and the high dynamic range image dataset to obtain the second network model; The input to the second network model includes the original image data and the corresponding ambient brightness information, and the output of the second network model includes the high dynamic range image data corresponding to the original image data.

2. The method of claim 1, wherein, The first network model includes an encoder and a mapping network; training the first network model based on the original image dataset, the brightness information dataset, and the high dynamic range image dataset includes: The original image data from the original image dataset and the corresponding ambient brightness information from the brightness information dataset are input into the encoder to generate first feature data and second feature data. The first feature data and the second feature data are input into the mapping network to obtain the third feature data, which indicates the feature information of the high dynamic range image.

3. The method of claim 2, wherein, The step of inputting the first feature data and the second feature data into the mapping network to obtain the third feature data includes: The first feature data is expanded based on the first bit value to obtain the first expanded feature data, and the second feature data is expanded based on the second bit value to obtain the second expanded feature data; the first bit value is the value of the first position in the first feature data, and the second bit value is the value of the second feature data corresponding to the first position; The first extended feature data and the second extended feature data are multiplied by a dot to obtain the first mapping data, which indicates the mapping data of the first position. The third feature data is obtained based on the first mapping data and the second mapping data of the third position around the first position.

4. The method of claim 2, wherein, The first network model includes a decoder; training the first network model based on the original image dataset, the brightness information dataset, and the high dynamic range image dataset includes: The third feature data is input into the decoder, and a first high dynamic range image is output.

5. The method of claim 4, wherein, The training of the first network model based on the original image dataset, the brightness information dataset, and the high dynamic range image dataset further includes: Based on the first high dynamic range image and the corresponding original image data in the original image dataset, calculate the content loss; based on the first high dynamic range image and the corresponding ambient brightness information in the brightness information dataset, calculate the brightness information loss. The parameters of the encoder, the mapping network, and the decoder are updated based on the content loss and the luminance information loss.

6. The method of claim 4, wherein, The first network model further includes a discriminator; the step of training the first network model based on the original image dataset, the brightness information dataset, and the high dynamic range image dataset further includes: The first high dynamic range image and the corresponding second high dynamic range image in the high dynamic range image dataset are input into the discriminator for classification processing to obtain the classification result. The adversarial network loss is calculated and generated based on the classification results. The parameters of the encoder, decoder, and discriminator are updated based on the generative adversarial network loss.

7. An image processing method characterized by, The method includes: The image to be processed is input into the second network model according to any one of claims 1-6 to obtain a high dynamic range image corresponding to the image to be processed.

8. A model training device, characterized in that, The device includes: The acquisition module is used to acquire the original image dataset, the brightness information dataset, and the high dynamic range image dataset. The brightness information dataset includes the ambient brightness information of the shooting scene corresponding to each original image in the original image dataset. The high dynamic range image dataset includes the high dynamic range image corresponding to each original image in the original image dataset. The original image dataset includes image data of multiple original images taken with normal exposure. The training module is used to train the first network model based on the original image dataset, the brightness information dataset, and the high dynamic range image dataset to obtain the second network model; The input to the second network model includes the original image data and the corresponding ambient brightness information, and the output of the second network model includes the high dynamic range image data corresponding to the original image data.

9. An image processing apparatus, characterized in that, The device includes: The processing module is used to input the image to be processed into the second network model according to any one of claims 1 to 6 to obtain a high dynamic range image corresponding to the image to be processed.

10. An electronic device, characterized in that, It includes a processor and a memory, the memory storing programs or instructions that can run on the processor, the programs or instructions being executed by the processor to implement the steps of the model training method as described in any one of claims 1 to 6 or the steps of the image processing method as described in claim 7.