Artificial intelligence-based image coloring method and device, electronic device
By acquiring and transforming prior color information, and combining the modulation and sampling processing of the coloring network, the problems of color bleeding and fading were solved, achieving high-quality and diverse coloring effects.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2021-01-20
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies are prone to color bleeding and fading when generating colored images, which affects image quality and makes it difficult to achieve diverse coloring methods.
By acquiring the color prior information of the image to be colored, transformation and downsampling are performed. Then, a coloring network is used for modulation and upsampling to generate a colored image aligned with the image to be colored. The network is trained by combining adversarial loss, perceptual loss, and domain alignment loss to optimize the coloring effect.
It achieves precise coloring of images to be colored, generating highly aligned and diverse colored images that improve the quality and diversity of colored images.
Smart Images

Figure CN113570678B_ABST
Abstract
Description
Technical Field
[0001] This application relates to image processing technology, and more particularly to an image coloring method, apparatus, electronic device, and computer-readable storage medium based on artificial intelligence. Background Technology
[0002] Artificial Intelligence (AI) is a comprehensive technology within computer science that studies the design principles and implementation methods of various intelligent machines, enabling them to possess perception, reasoning, and decision-making capabilities. AI technology is a multidisciplinary field encompassing a wide range of areas, including natural language processing and machine learning / deep learning. With technological advancements, AI will be applied in more fields and play an increasingly important role.
[0003] Image processing is an important application of artificial intelligence, typically used to generate colorized images from grayscale images. However, related technologies are prone to problems such as color bleeding and fading during the colorization process, which significantly affect the quality of the generated colorized images. Summary of the Invention
[0004] This application provides an image coloring method, apparatus, electronic device, and computer-readable storage medium based on artificial intelligence, which can accurately colorize images to be colored.
[0005] The technical solution of this application embodiment is implemented as follows:
[0006] This application provides an image colorization method based on artificial intelligence, including:
[0007] Obtain the first color prior information of the image to be colored;
[0008] The first color prior information is transformed to obtain second color prior information that is aligned with the image to be colored;
[0009] The image to be colored is downsampled to obtain the first image features;
[0010] Based on the second color prior information, the first image features are modulated and colored to obtain the second image features;
[0011] Based on the second color prior information, the second image features are upsampled to obtain a first colorized image aligned with the image to be colorized.
[0012] This application provides an image coloring device based on artificial intelligence, including:
[0013] The acquisition module is used to acquire the first color prior information of the image to be colored;
[0014] A transformation module is used to transform the first color prior information to obtain second color prior information aligned with the image to be colored;
[0015] The processing module is configured to perform downsampling processing on the image to be colored to obtain a first image feature; and to perform modulation and coloring processing on the first image feature based on the second color prior information to obtain a second image feature; and to perform upsampling processing on the second image feature based on the second color prior information to obtain a first colored image aligned with the image to be colored.
[0016] In the above scheme, the acquisition module is further used for:
[0017] Obtain the encoding vector of the image to be colored;
[0018] The image to be colored is colored by performing an identity mapping on the encoding vector to obtain a second colored image that is not aligned with the image to be colored.
[0019] The multi-scale features obtained during the process of obtaining the second colored image through the identity mapping are used as the first color prior information.
[0020] In the above scheme, the transformation module is further used for:
[0021] Determine the similarity matrix between the image to be colored and the second colored image, wherein the second colored image is obtained by coloring the image to be colored and is not aligned with the image to be colored;
[0022] Based on the similarity matrix, an affine transformation is performed on the multi-scale features in the first color prior information to obtain multi-scale features aligned with the image to be colored.
[0023] The multi-scale features aligned with the image to be colored are used as the second color prior information.
[0024] In the above scheme, the transformation module is further used for:
[0025] Obtain the first positional features of the image to be colored and the second positional features of the second image to be colored;
[0026] Wherein, the first position feature includes the position feature of each pixel in the image to be colored, and the second position feature includes the position feature of each pixel in the second image to be colored;
[0027] Based on the first positional feature and the second positional feature, a similarity matrix is determined between the image to be colored and the second colored image;
[0028] The similarity matrix includes the similarity between each pixel in the image to be colored and each pixel in the second coloring image.
[0029] In the above scheme, the transformation module is further used for:
[0030] The first positional feature and the second positional feature are processed nonlocally to obtain a similarity matrix;
[0031] The similarity matrix is normalized, and the normalized similarity matrix is used as the similarity matrix between the image to be colored and the second image to be colored.
[0032] In the above scheme, the processing module is further used for:
[0033] Based on the multi-scale features aligned with the image to be colored in the second color prior information, the first modulation parameter is determined;
[0034] The first image feature is modulated and colored using the first modulation parameters to obtain the second image feature.
[0035] In the above scheme, the modulation colorization processing is implemented through a colorization network, which includes a residual module; the processing module is further configured to:
[0036] Among the multi-scale features aligned with the image to be colored, the first scale feature corresponding to the residual module in the coloring network is determined.
[0037] The first scale feature is convolved to obtain the first modulation parameter corresponding to the residual module.
[0038] In the above scheme, the processing module is further configured to:
[0039] The first image features are convolved, and the convolution result is linearly transformed using the first modulation parameter.
[0040] The result of the linear transformation is summed with the first image feature, and the summed result is used as the second image feature.
[0041] In the above scheme, the processing module is further configured to:
[0042] Based on the multi-scale features aligned with the image to be colored in the second color prior information, the second modulation parameters are determined;
[0043] The second image features are deconvolutionally processed, and the deconvolution result is linearly transformed using the second modulation parameters. The linear transformation result is then activated to obtain a predicted color image aligned with the image to be colored.
[0044] The predicted color image is subjected to color mode conversion processing to obtain the first colored image.
[0045] In the above scheme, the modulation colorization processing is implemented through a colorization network, which includes an upsampling module; the processing module is further configured to:
[0046] Among the multi-scale features aligned with the image to be colored, determine the second-scale feature corresponding to the upsampling module in the coloring network;
[0047] The second scale feature is convolved to obtain the second modulation parameter corresponding to the upsampling module.
[0048] In the above scheme, the processing module is further configured to:
[0049] The encoded vector is transformed to obtain a transformed vector;
[0050] Based on the transformation vector, determine the third color prior information aligned with the image to be colored;
[0051] Based on the third color prior information, the image to be colored is modulated and colored to obtain a third colored image that is aligned with the image to be colored.
[0052] The third colorized image includes at least one of the following: an image after coloring the background of the image to be colorized, an image after coloring the foreground of the image to be colorized, and an image after adjusting the saturation of the image to be colorized.
[0053] In the above scheme, the downsampling processing, the modulation colorization processing, and the upsampling processing are implemented through a colorization network; the AI-based image colorization device further includes a training module for training the colorization network in the following manner:
[0054] The total loss function is determined based on the adversarial loss function, perceptual loss function, domain alignment loss function, and context loss function corresponding to the coloring network.
[0055] The coloring network is invoked to colorize the image sample to be colored, resulting in a first colored image aligned with the image sample to be colored, a second colored image not aligned with the image sample to be colored, and a predicted color image.
[0056] The first colorized image is obtained by converting the predicted color image;
[0057] An adversarial loss value is determined based on the error between the predicted color image and the corresponding first actual color image; a perceptual loss value is determined based on the error between the second colorized image and the corresponding second actual color image; a domain alignment loss value is determined based on the error between the image sample to be colorized and the second colorized image; and a context loss value is determined based on the error between the first colorized image and the second colorized image.
[0058] The second actual color image is obtained by converting the first actual color image;
[0059] The total loss value is obtained by weighted summing of the adversarial loss value, the perceptual loss value, the domain alignment loss value, and the context loss value.
[0060] The total loss value is backpropagated in the coloring network based on the total loss function to update the parameters of the coloring network.
[0061] This application provides an electronic device, including:
[0062] Memory, used to store executable instructions;
[0063] When the processor executes the executable instructions stored in the memory, it implements the AI-based image coloring method provided in the embodiments of this application.
[0064] This application provides a computer-readable storage medium storing executable instructions for inducing a processor to execute and implement the AI-based image coloring method provided in this application.
[0065] The embodiments of this application have the following beneficial effects:
[0066] A second color prior is determined that is aligned with the image to be colored. Based on the second color prior, modulation and upsampling processing are performed on the first image features corresponding to the image to be colored to obtain a first colored image. Because the second color prior is aligned with the image to be colored, the first colored image generated based on the second color prior is aligned with the image to be colored, thus achieving accurate coloring of the image to be colored. Attached Figure Description
[0067] Figure 1 This is a schematic diagram of the architecture of the artificial intelligence-based coloring system 10 provided in an embodiment of this application;
[0068] Figure 2 This is a schematic diagram of the structure of the terminal 400 provided in the embodiments of this application;
[0069] Figure 3 This is a schematic diagram of the composition structure of the coloring system 10 provided in the embodiments of this application;
[0070] Figure 4 This is a schematic diagram of image coloring provided in an embodiment of this application;
[0071] Figure 5 This is a flowchart illustrating the image colorization method based on artificial intelligence provided in an embodiment of this application;
[0072] Figure 6 This is a flowchart illustrating the image colorization method based on artificial intelligence provided in an embodiment of this application;
[0073] Figure 7 This is a schematic diagram of the coloring effect provided in the embodiments of this application;
[0074] Figure 8 This is a schematic diagram of the coloring effect provided in the embodiments of this application. Detailed Implementation
[0075] To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings. The described embodiments should not be regarded as limitations on this application. All other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0076] In the following description, references are made to “some embodiments,” which describe a subset of all possible embodiments. However, it is understood that “some embodiments” may be the same subset or different subsets of all possible embodiments and may be combined with each other without conflict.
[0077] In the following description, the terms "first / second / third" are used merely to distinguish similar objects and do not represent a specific ordering of objects. It is understood that "first / second / third" may be interchanged in a specific order or sequence where permitted, so that the embodiments of this application described herein can be implemented in an order other than that illustrated or described herein.
[0078] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.
[0079] Before providing a further detailed description of the embodiments of this application, the nouns and terms involved in the embodiments of this application will be explained, and the nouns and terms involved in the embodiments of this application shall be interpreted as follows.
[0080] 1) Color Prior Information: Color-related experience and historical data that can be obtained before image processing, such as feature maps. For example, when a generative adversarial network (GAN) can generate a color-rich image, it is assumed that the GAN contains sufficient color prior information, which can be feature maps that include features from the intermediate layers of the GAN.
[0081] 2) Affine transformation: A linear transformation between two-dimensional vectors. Affine transformations can be achieved through a combination of a series of atomic transformations, such as translation, scaling, flipping, rotation, and shearing.
[0082] 3) Generative Adversarial Networks (GANs): These are deep learning models that consist of a generator and a discriminator. The generator and discriminator learn from each other through a game, resulting in fairly good outputs. The discriminator performs classification predictions based on input variables, while the generator randomly generates observed data using some kind of implicit information.
[0083] 4) Foreground: refers to people or objects in front of or near the foreground of the main subject in the shot.
[0084] Image colorization, or coloring a grayscale image, utilizes deep learning. This method can be divided into two types: fully automatic colorization and colorization based on a reference image. Fully automatic colorization is simple and convenient; it only requires designing a loss function for end-to-end training and testing. However, this method is prone to generating flawed colorized images, such as those with color bleeding or fading. Colorization based on a reference image first requires a colored reference image with similar content to the image to be colored. Then, the colors from the reference image are transferred to the image to be colored based on the matching between the two images. The coloring effect of this method largely depends on the quality of the reference image. If the two images have similar content, the coloring effect will be good; however, if the two images are dissimilar, the effect will be poor. Therefore, this method requires a significant investment of effort in selecting reference images. Furthermore, both methods struggle to achieve diverse coloring results.
[0085] To address the above technical issues, this application provides an image coloring method based on artificial intelligence, which can accurately color the image to be colored and achieve diverse coloring methods.
[0086] The following describes an exemplary application of the AI-based image coloring method provided in the embodiments of this application. This AI-based image coloring method can be implemented by various electronic devices. For example, it can be implemented by a terminal alone, or by a server and a terminal working together. For instance, a terminal can execute the AI-based image coloring method described below independently, or a terminal and a server can execute the AI-based image coloring method described below. For example, the terminal sends an image to be colored to the server, and the server executes the AI-based image coloring method based on the received image.
[0087] The electronic device for image colorization provided in this application can be various types of terminal devices or servers. The server can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms. The terminal can be a smartphone, tablet, laptop, desktop computer, etc., but is not limited to these. The terminal and server can be directly or indirectly connected via wired or wireless communication, and this application does not impose any restrictions on this.
[0088] Taking servers as an example, such as server clusters deployed in the cloud, AI as a Service (AIaaS) is provided to users. The AIaaS platform breaks down several common AI services and provides them as independent or packaged services in the cloud. This service model is similar to an AI-themed marketplace. All users can access and use one or more artificial intelligence services provided by the AIaaS platform through application programming interfaces.
[0089] For example, one type of AI cloud service can be an image coloring service, whereby a cloud server encapsulates the image coloring program provided in this application embodiment. In response to an image coloring trigger operation, the terminal sends an image coloring request carrying the image to be colored to the cloud server. The cloud server then calls the encapsulated image coloring program, generates a first colored image based on the image to be colored, and returns the first colored image to the terminal so that the terminal can display the first colored image.
[0090] In some embodiments, an exemplary coloring system is described using an example of a server and a terminal collaboratively implementing the AI-based image coloring method provided in this application. See also Figure 1 , Figure 1This is a schematic diagram of the architecture of the AI-based coloring system 10 provided in this application embodiment. The terminal 400 is connected to the server 200 through a network 300, which can be a wide area network, a local area network, or a combination of both.
[0091] Server 200 receives an image coloring request from terminal 400, the image coloring request carrying an image to be colored. In response to the image coloring request, server 200 obtains first color prior information of the image to be colored, transforms the first color prior information to obtain second color prior information aligned with the image to be colored, colors the image to be colored using the second color prior information, obtains a first colored image aligned with the image to be colored, and sends the first colored image to terminal 400 for display on terminal 400.
[0092] In some embodiments, taking the electronic device provided in this application as an example as a terminal, the terminal implements the AI-based image coloring method provided in this application by running a computer program. The computer program can be a native program or software module in the operating system; it can be a native application (APP), that is, an AI-based image coloring program that needs to be installed in the operating system to run; or it can be a small program, that is, an AI-based image coloring app that only needs to be downloaded to the browser environment of any client to run. In short, the above-mentioned computer program can be any application, module or plugin of any form.
[0093] The following description uses the terminal 400 described above as an example of the electronic device provided in this application embodiment. See also... Figure 2 , Figure 2 This is a schematic diagram of the structure of the terminal 400 provided in the embodiment of this application. Figure 2 The terminal 400 shown includes at least one processor 410, a memory 450, at least one network interface 420, and a user interface 430. The various components in the terminal 400 are coupled together via a bus system 440. It is understood that the bus system 440 is used to implement communication between these components. In addition to a data bus, the bus system 440 also includes a power bus, a control bus, and a status signal bus. However, for clarity, ... Figure 3 All buses are labeled as Bus System 440.
[0094] The processor 410 can be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor, etc.
[0095] User interface 430 includes one or more output devices 431 that enable the presentation of media content, including one or more speakers and / or one or more visual displays. User interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
[0096] The memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid-state storage, hard disk drives, optical disk drives, etc. The memory 450 may optionally include one or more storage devices physically located away from the processor 410.
[0097] The memory 450 may include volatile memory or non-volatile memory, or both. The non-volatile memory may be read-only memory (ROM), and the volatile memory may be random access memory (RAM). The memory 450 described in this application embodiment is intended to include any suitable type of memory.
[0098] In some embodiments, memory 450 is capable of storing data to support various operations, examples of which include programs, modules, and data structures or subsets or supersets thereof, as illustrated below.
[0099] Operating system 451 includes system programs for handling various basic system services and performing hardware-related tasks, such as the framework layer, core library layer, driver layer, etc., for implementing various basic business functions and handling hardware-based tasks;
[0100] The network communication module 452 is used to reach other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: Bluetooth, WiFi, and Universal Serial Bus (USB), etc.
[0101] Presentation module 453 is configured to enable the presentation of information (e.g., a user interface for operating peripheral devices and displaying content and information) via one or more output devices 431 (e.g., a display screen, a speaker, etc.) associated with user interface 430;
[0102] The input processing module 454 is used to detect and translate one or more user inputs or interactions from one or more input devices 432.
[0103] In some embodiments, the AI-based image coloring apparatus provided in this application can be implemented in software. Figure 2 An AI-based image colorization device 455, stored in memory 450, is shown. This device can be software in the form of programs and plugins, and includes the following software modules: an acquisition module 4551, a transformation module 4552, a processing module 4553, and a training module 4554. These modules are logically connected and can therefore be arbitrarily combined or further separated according to their implemented functions. The functions of each module will be described below.
[0104] See Figure 3 , Figure 3 This is a schematic diagram of the composition structure of the coloring system 10 provided in this application embodiment. The coloring system 10 includes an encoder, a pre-trained GAN, a transform part, and a coloring network. The encoder is used to obtain the encoding vector of the image to be colored. The encoder can be a generator in a generative adversarial network, an encoder part in an autoencoder, or a convolutional neural network. The pre-trained GAN is a trained GAN used to generate a second colored image and first color prior information of the image to be colored. The transform part is used to transform the first color prior information based on the image to be colored and the second colored image to obtain the second color prior information. The coloring network is used to generate a first colored image based on the image to be colored and the second color prior information.
[0105] See Figure 4 , Figure 4 This is a schematic diagram of image coloring provided in an embodiment of this application. For example... Figure 4 As shown, the coloring network includes a downsampling module, a residual module, and an upsampling module. The downsampling module consists of multiple downsampling layers, used to downsample the image to be colored to obtain first image features; the residual module consists of multiple residual blocks, used to modulate and colorize the first image features based on second color prior information to obtain second image features; the upsampling module consists of multiple upsampling layers, used to upsample the second image features based on second color prior information to obtain a first colored image aligned with the image to be colored.
[0106] The following describes the image coloring method based on artificial intelligence provided in this application embodiment, in conjunction with the various components of the coloring system 10 described above. The execution subject of the following method can be a terminal, specifically, it can be implemented by the terminal running the various computer programs described above; of course, based on the understanding of the following text, it is not difficult to see that the image coloring method based on artificial intelligence provided in this application embodiment can also be implemented by the terminal and the server in collaboration.
[0107] See Figure 5 , Figure 5 This is a flowchart illustrating the image colorization method based on artificial intelligence provided in this application embodiment, which will be combined with... Figure 5 The steps shown Figure 3 The various parts of the coloring system shown and Figure 4 Please provide an explanation.
[0108] In step 101, the first color prior information of the image to be colored is obtained.
[0109] In some embodiments, the image to be colored is a grayscale image in LAB color mode, meaning the grayscale image only has a luminance channel (L) and lacks color channels (A and B). If the image to be colored is in RGB color mode, it needs to be converted to LAB color mode first. The first color prior information is the color prior information related to the image to be colored, for example, the color prior information related to the image to be colored in a GAN, i.e., the intermediate layer features of the GAN.
[0110] In some embodiments, such as Figure 4 As shown, the encoded vector of the image to be colored can first be obtained through an encoder. The encoder can be replaced with other convolutional neural networks. Then, a pre-trained GAN is used to colorize the image, resulting in a second colored image. The pre-trained GAN can be a pre-trained BigGAN or a pre-trained StyleGAN. Taking BigGAN as an example, the generator of BigGAN includes multiple residual blocks. The encoded vector is linearly transformed and fed into the first residual block. Each residual block includes a batch normalization (BN) layer, an activation layer, and a convolutional layer. Each residual block is skip-connected through 1×1 convolutions, thus achieving an identity mapping of the encoded vector. The identity mapping directly passes the output of the previous layer (which is also the input of the next layer) to the output of the next layer, making the output of the next layer approximate its input, thus preventing a decrease in accuracy in later layers. Finally, BigGAN generates the second colored image. Alternatively, the residual blocks can all be skip-connected through non-1×1 convolutions.
[0111] During the generation of the second colorized image, the size of the feature map corresponding to the output features of each residual block is different, that is, the scale of the output features is different. The output features (multi-scale features) of different residual blocks are merged to obtain the first color prior information.
[0112] In step 102, the first color prior information is transformed to obtain second color prior information aligned with the image to be colored.
[0113] In some embodiments, alignment refers to the consistent position of the same part (corresponding to one or more pixels) in different images. For example, multiple pixels constituting a rooster's tail are in the same position in different images. Alignment between color prior information and the image to be colored essentially means that the same object is in the same position in both. Since color prior information is expressed in the form of feature maps, alignment between color prior information and the image to be colored means that the same object is in the same position in both the image to be colored and the color prior information. However, the positions of the background and foreground parts of the second colored image and the image to be colored are not one-to-one; that is, the corresponding pixels in the two images are not aligned. Figure 4 In the first image, the location of the rooster's tail in the second colorized image is significantly different from the location of the rooster's tail in the image to be colorized. Correspondingly, the multi-scale features in the first color prior information do not correspond one-to-one with the image features corresponding to the image to be colorized, exhibiting a discrepancy. Therefore, the first color prior information needs to be transformed to obtain second color prior information aligned with the image to be colorized, i.e., color prior information aligned with the image features corresponding to the image to be colorized. At this point, the colorized image corresponding to the second color prior information is aligned with the image to be colorized.
[0114] In some embodiments, transforming the first color prior information to obtain second color prior information aligned with the image to be colored is achieved through a transformation portion in the coloring system 10, the implementation process of which is as follows: Figure 6 Steps 1021 to 1023 are shown.
[0115] In step 1021, a similarity matrix is determined between the image to be colored and the second colored image, which is obtained by coloring the image to be colored and is not aligned with the image to be colored.
[0116] like Figure 4As shown, a feature extractor can extract the first positional features of the image to be colored and the second positional features of the second colored image. The first positional features include the positional features of each pixel in the image to be colored, and the second positional features include the positional features of each pixel in the second colored image. Then, non-local processing is performed on the first and second positional features to obtain a similarity matrix between the image to be colored and the second colored image. This similarity matrix includes the similarity between each pixel in the image to be colored and each pixel in the second colored image. Non-local processing is used to calculate the similarity between a pixel in the image to be colored and any pixel in the second colored image. Calculation methods include dot product, concatenation, and bilinear similarity measurement. When calculating similarity using dot product, the similarity between the two positions is obtained by calculating the dot product of the position vectors (positional features) of corresponding positions in the image to be colored and the second colored image. When calculating similarity using concatenation, the similarity between the two images is predicted by concatenating the position vectors of corresponding positions in the two images and feeding them into the perceptron. Finally, the similarity matrix can be normalized using the softmax function so that the sum of the elements in each row of the similarity matrix is 1. The resulting normalized similarity matrix is then used as the similarity matrix between the image to be colored and the second image to be colored.
[0117] In step 1022, an affine transformation is performed on the multi-scale features in the first color prior information based on the similarity matrix to obtain multi-scale features aligned with the image to be colored.
[0118] In some embodiments, an affine transformation is performed on the multi-scale features in the first color prior information, that is, by multiplying the similarity matrix with the multi-scale features in the first color prior information, a multi-scale feature aligned with the image to be colored can be obtained.
[0119] In step 1023, the multi-scale features aligned with the image to be colored are used as the second color prior information.
[0120] As can be seen, the similarity matrix between the image to be colored and the second colored image is obtained based on the similarity of the positional features at corresponding positions. By performing an affine transformation on the first color prior information through the similarity matrix, the second color prior information aligned with the image to be colored can be obtained, which provides a guarantee for the subsequent generation of the first colored image aligned with the image to be colored.
[0121] In step 103, the image to be colored is downsampled to obtain the first image features.
[0122] In some embodiments, the image to be colored is downsampled using a downsampling module in the coloring network. The downsampling module includes multiple downsampling layers. In each downsampling layer, the input features are convolved to obtain corresponding image features, which represent the positional and semantic information of the image to be colored. The obtained image features are then pooled to obtain the corresponding pooling result, which is used as the input feature for the next layer. The output of the last downsampling layer is used as the first image feature.
[0123] In step 104, the first image features are modulated and colored based on the second color prior information to obtain the second image features.
[0124] In some embodiments, to achieve multi-scale control, the residual module and upsampling module of the coloring network are controlled separately by multi-scale features aligned with the image to be colorized. Different scale features in the multi-scale features aligned with the image to be colorized correspond to different parts of the coloring network. For example, when the upsampling module of the coloring network includes two upsampling layers, there are a total of three scale features in the multi-scale features aligned with the image to be colorized, corresponding to the residual module, the first upsampling layer, and the second upsampling layer, respectively.
[0125] In some possible examples, firstly, based on the multi-scale features aligned with the image to be colored in the second color prior information, the first modulation parameters are determined. That is, among the multi-scale features aligned with the image to be colored, the first-scale features corresponding to the residual modules in the coloring network are determined. These first-scale features are then convolved to obtain the first modulation parameters corresponding to the residual modules. Since a residual module typically consists of at least two residual blocks, the first-scale features are subjected to multiple different convolutional processes in parallel to obtain the first modulation parameters (α and β, where α represents weight and β represents bias) corresponding to each residual block. The dimension of each first modulation parameter is consistent with the dimension of the feature f to be modulated in the corresponding residual block. Each residual block has multiple layers, each consisting of a convolutional layer, a spatially adaptive normalization (SPADE) layer, and an activation layer. The feature f to be modulated is the feature obtained by convolving the input features of the convolutional layer in each residual block. For example, when there are 6 residual blocks in the coloring network, the first scale features are processed in parallel 6 times by different convolutional neural networks to obtain 6 first modulation parameters corresponding to the 6 residual blocks: (α1, β1), (α2, β2), (α3, β3), (α4, β4), (α5, β5), (α6, β6).
[0126] The SPADE layer, similar to the BN layer, is used for regularization and utilizes learned modulation parameters for modulation. Unlike the BN layer, the SPADE layer is a conditional regularization layer, meaning its modulation parameters are externally derived; furthermore, the modulation parameters in the SPADE layer are tensors, not vectors as in the BN layer. Compared to common regularization layers, the SPADE layer can better preserve semantic information, enabling the colorization network to generate a first colorized image with realistic texture.
[0127] Then, the first image features are modulated and colored using the first modulation parameters to obtain the second image features. The first image features are then convolved using the convolutional layer in the residual block to obtain the corresponding convolution result. In the SPADE layer, the obtained convolution result is linearly transformed using the first modulation parameters. The formula for the linear transformation is shown in formula (1):
[0128] f′=f*α+β (1)
[0129] Here, f′ is the feature obtained by modulating the feature to be modulated by the first modulation parameter, and it is also the result of linear transformation.
[0130] In the activation layer, the result of the linear transformation is mapped to a high-dimensional nonlinear region. Finally, the mapped linear transformation result is summed with the first image feature, and the summed result is used as the second image feature. Specifically, when the residual block is an identity mapping, the mapped linear transformation result is directly summed with the first image feature; when the residual block is a non-identical mapping, the first image feature is amplified / shrunk and then added to the mapped linear transformation result. When multiple residual blocks exist, the summed result of the previous residual block is the input of the next residual block, and the summed result of the last residual block is used as the second image feature.
[0131] In step 105, the second image features are upsampled based on the second color prior information to obtain a first colorized image aligned with the image to be colorized.
[0132] In some embodiments, firstly, a second modulation parameter is determined based on multi-scale features aligned with the image to be colored in the second color prior information. That is, among the multi-scale features aligned with the image to be colored, a second scale feature corresponding to the upsampling module in the coloring network is determined. For example, when the upsampling module includes two upsampling layers, there are second scale features in the multi-scale features corresponding to each of the two upsampling layers. The second scale features are then convolved using a convolutional neural network to obtain the second modulation parameter corresponding to the upsampling module.
[0133] Then, the second image features are deconvolved (i.e., upsampled). The deconvolution result is used as the feature to be modulated, and together with the second modulation parameter, it is substituted into the linear transformation formula (1) for linear transformation to obtain the linear transformation result (i.e., the modulated feature). The linear transformation result is activated to obtain the predicted color image of the LAB color mode corresponding to the image to be colored. The predicted color image contains not only the luminance channel of the image to be colored, but also the two color channels that were lost in the image to be colored. By performing color mode conversion on the predicted color image, the corresponding RGB color mode image can be obtained, which is the first colored image.
[0134] When there are multiple upsampling layers, the linear transformation result of the previous upsampling layer is the input of the next upsampling layer, and the linear transformation result of the last upsampling layer is the predicted color image.
[0135] In some embodiments, to obtain a colorized image with diverse coloring effects, the encoding vector can be transformed to obtain a transformed vector. For example, the encoding vector can be modified by: adding a noise vector to the encoding vector; changing the input category during the training of the pre-trained GAN; or finding directions related to color changes through unsupervised learning and then changing the encoding vector along these directions. Then, based on the transformed vector, third color prior information aligned with the image to be colorized is determined. That is, using the transformed vector as the input vector of the pre-trained GAN, the third color prior information (i.e., the intermediate layer features of the pre-trained GAN) is obtained during the process of generating the corresponding colorized image. Finally, based on the third color prior information, the image to be colorized is modulated to obtain a third colorized image aligned with the image to be colorized. The modulation colorization process is similar to that described above and will not be repeated here. The third colorized image includes at least one of the following: an image after coloring the background of the image to be colorized, an image after coloring the foreground of the image to be colorized, and an image after adjusting the saturation of the image to be colorized.
[0136] As can be seen, the embodiments of this application can not only automatically generate colorized images with vivid colors and high alignment with the original image, but also generate colorized images with different coloring effects by controlling and modifying the encoding vector, thereby achieving diversified coloring.
[0137] In some embodiments, the pre-trained GAN is pre-trained with fixed parameters. During encoder training, the error between the image features of the colorized image generated by the pre-trained GAN generator and the image features of the actual color image corresponding to the image to be colorized is determined, and the error is backpropagated in the encoder to update the encoder parameters.
[0138] After training the encoder, the coloring network is trained. First, the total loss function is determined based on the adversarial loss function, perceptual loss function, domain alignment loss function, and context loss function corresponding to the coloring network. Specifically, the adversarial loss function makes the first colorized image generated by the coloring network more realistic; the perceptual loss function makes the first colorized image feel more realistic and reasonable; the domain alignment loss function maps the image to be colorized and the second colorized image to the same feature space; and the context loss function measures the similarity between the two misaligned images (the first colorized image and the second colorized image).
[0139] Then, the colorization system 10 processes the image sample to be colored to obtain a first colored image aligned with the image sample, a second colored image not aligned with the image sample, and a predicted color image. In some possible examples, both the first and second colored images are RGB color mode images, and the predicted color image is a LAB color mode image. Converting the LAB color mode predicted color image yields an RGB color mode image. The first colored image is obtained by converting the color mode of the predicted color image.
[0140] Subsequently, an adversarial loss value is determined based on the error between the predicted color image and the corresponding first actual color image, a perceptual loss value is determined based on the error between the second colorized image and the corresponding second actual color image, a domain alignment loss value is determined based on the error between the image sample to be colorized and the second colorized image, and a context loss value is determined based on the error between the first colorized image and the second colorized image.
[0141] In some possible examples, the first actual color image is the actual LAB color mode color image corresponding to the image to be colored, the predicted color image is the LAB color mode color image obtained by predicting the two missing color channels of the image to be colored, the second colored image is the predicted RGB color mode color image, the second actual color image is the actual RGB color mode color image corresponding to the image to be colored, and the second actual color image is obtained by converting the color mode of the first actual color image.
[0142] After determining each loss value, the adversarial loss value, perceptual loss value, domain alignment loss value, and context loss value are weighted and summed to obtain the total loss value. Finally, the total loss value is backpropagated in the coloring network based on the total loss function to update the parameters of the coloring network.
[0143] As can be seen, the embodiments of this application determine second color prior information aligned with the image to be colored, and perform modulation coloring and upsampling processing on the first image features corresponding to the image to be colored based on the second color prior information, thereby obtaining a first colored image. Because the second color prior information is aligned with the image to be colored, the first colored image generated based on the second color prior information is aligned with the image to be colored, thus achieving accurate coloring of the image to be colored.
[0144] The following will describe an exemplary application of the embodiments of this application in a real-world application scenario.
[0145] In video applications, in response to a user's colorization operation on a grayscale video file, the terminal sends a colorization request, carrying the grayscale video file, to the cloud server. Upon receiving the colorization request, the cloud server decodes the grayscale video file, obtaining multiple video frames, each of which is an image to be colorized. Then, the multiple video frames (images to be colorized) are colorized, resulting in multiple first colorized images. These first colorized images are then encoded to obtain a new colored video file, which is sent to the terminal for presentation.
[0146] The following describes the process of colorizing video frames (images to be colored). For example... Figure 4 As shown, the image to be colored is first processed by an encoder (such as a GAN encoder). l The grayscale image is encoded to obtain an encoding vector z. Then, a pre-trained GAN is used to receive z and generate a second colorized image. And x l The relevant first color prior information (i.e., intermediate layer features F) prior Because the first color prior information and x l Related, rather than with x l Fully aligned (e.g.) Figure 4 middle The rooster tail and x l (The position of the rooster's tail is not consistent in the text), so it is necessary to use x. l and Determine the positional correspondence between the two. Determine x. l and The similarity matrix M between the two is used to represent the positional similarity between their pixels. M is used to compare the first color prior information with x. l Alignment. After alignment, a second color prior is obtained. This second color prior is used to control some parameters in the coloring network, thereby achieving the goal of using color prior information to guide coloring. Finally, the coloring network outputs a first colorized image based on the image to be colorized.
[0147] The following is a detailed explanation of the coloring process described above.
[0148] (a) It is necessary to find the match with x in the pre-trained GAN. l Related color prior information. However, considering x-based... l The problem of "retrieval" of relevant color prior information in pre-trained GANs cannot be defined or optimized, so a receiver x is introduced. l It also outputs an encoder for z, which is a neural network. x is determined by the encoder. l After the corresponding z, the pre-trained GAN receives z and outputs the sum x. l With as much similar content as possible At this point, the multi-scale features F, composed of features from multiple intermediate layers of the pre-trained GAN, are... prior That is, with x l The most relevant first-color prior information. To optimize the encoder, constrain x. l The corresponding actual color image x rgb and The features of the two in the discriminator of the pre-trained GAN are as similar as possible.
[0149] (b) The first color prior information F prior Transform it to make it compatible with x l Alignment. Due to F prior and x l They are usually not aligned in space, so you need to align them first to use F more effectively. prior To guide the coloring process. (x) l and After passing through the same feature extractor, the positional features corresponding to the feature vectors (positional features) of both at all spatial locations are obtained, based on x. l and The dot product of the corresponding positional features yields the similarity matrix M between them, where M(u, v) represents x. l Position u and The similarity between positions v (the similarity between corresponding pixels). Normalize M so that M satisfies ∑ j M(i,j) = 1. Next, based on M, we can pair F... prior Perform an affine transformation to obtain the result with respect to x. l Aligned second color prior information.
[0150] (c) Utilizing x lAligned second-color prior information guides the coloring process. The coloring network consists of two downsampling layers, six residual blocks, and two upsampling layers stacked sequentially. The second-color prior information is convolved to obtain parameters α and β with the same dimension as the feature f to be modulated. These parameters α and β are then used to modulate the feature f, with the modulation formula: f′=f*α+β. Here, the feature f to be modulated represents the image features obtained through convolution in the residual blocks and the upsampling layers of the coloring network, and f′ is the modulated feature. After modulation, the feature f is processed in the next layer. Finally, the coloring network generates a first colorized image aligned with the image to be colorized.
[0151] In some embodiments, the pre-trained GAN can be BigGAN (or StyleGAN), which is pre-trained on the ImageNet dataset. The entire training is divided into two phases: the first phase trains the encoder; the second phase trains the entire model (except for the pre-trained GAN and the encoder, since both are pre-trained and have fixed parameters in the second phase). The loss functions used in the second phase include adversarial loss, perceptual loss, domain alignment loss, and context loss.
[0152] In some embodiments, different color prior information can be used to guide coloring in order to achieve diverse coloring. The first color prior information can be changed by altering z, for example, by adding a noise vector to the encoding vector, or by changing the class of the input when training BigGAN (when the pre-trained GAN is BigGAN), or by finding directions related to color changes through unsupervised learning, and then changing z along these directions, so that the final colorized image can produce different coloring effects.
[0153] like Figure 7 As shown, Figure 7 This is a schematic diagram of the coloring effect provided in the embodiments of this application. Figure 7 The first line is the input image to be colored, the second line is the colored image (result) obtained by the artificial intelligence-based image coloring method proposed in the embodiments of this application, and the third line is to change the bird category to input a grayscale image including birds, and then color the grayscale image with different colors to obtain diverse results.
[0154] like Figure 8 As shown, Figure 8 This is a schematic diagram of the coloring effect provided in the embodiments of this application. Figure 8 The image demonstrates how changing the z-axis along certain directions generates diverse coloring effects. Figure 8 The directions shown include those related to the background color, those related to the foreground color (such as a vase or a truck), and those related to color saturation. Figure 8 The first row (the first image is the one to be colored) shows different images obtained after coloring the background of the image to be colored. The second and third rows (the first image is the one to be colored) show different images obtained after coloring the foreground of the image to be colored. The fourth to sixth rows (the first image is the one to be colored) show different images obtained after adjusting the saturation of the image to be colored.
[0155] As can be seen, the embodiments of this application guide coloring with color prior information, which can automatically and conveniently generate high-quality colorized images with vivid colors. Moreover, different coloring effects can be obtained by controlling and modifying the color prior information, thus achieving diversified coloring.
[0156] The following continues to describe the exemplary structure of the artificial intelligence-based image coloring device 455 provided in the embodiments of this application as a software module. In some embodiments, such as Figure 2 As shown, the software modules stored in the AI-based image coloring device 455 in the memory 450 may include: an acquisition module 4551, used to acquire first color prior information of the image to be colorized; a transformation module 4552, used to transform the first color prior information to obtain second color prior information aligned with the image to be colorized; a processing module 4553, used to perform downsampling processing on the image to be colorized to obtain first image features; and to perform modulation coloring processing on the first image features based on the second color prior information to obtain second image features; and to perform upsampling processing on the second image features based on the second color prior information to obtain a first colorized image aligned with the image to be colorized.
[0157] In some embodiments, the acquisition module 4551 is further configured to acquire the encoding vector of the image to be colored; perform an identity mapping on the encoding vector by coloring the image to be colored in the following manner to obtain a second colored image that is not aligned with the image to be colored; and use the multi-scale features obtained in the process of obtaining the second colored image through identity mapping as the first color prior information.
[0158] In some embodiments, the transformation module 4552 is further configured to determine a similarity matrix between the image to be colored and the second colored image, wherein the second colored image is obtained by coloring the image to be colored and is not aligned with the image to be colored; perform an affine transformation on the multi-scale features in the first color prior information based on the similarity matrix to obtain multi-scale features aligned with the image to be colored; and use the multi-scale features aligned with the image to be colored as the second color prior information.
[0159] In some embodiments, the transformation module 4552 is further configured to acquire a first positional feature of the image to be colored and a second positional feature of the second colored image; wherein the first positional feature includes the positional feature of each pixel in the image to be colored, and the second positional feature includes the positional feature of each pixel in the second colored image; and based on the first positional feature and the second positional feature, determine a similarity matrix between the image to be colored and the second colored image; wherein the similarity matrix includes the similarity between each pixel in the image to be colored and each pixel in the second colored image.
[0160] In some embodiments, the transformation module 4552 is further configured to perform nonlocal processing on the first position feature and the second position feature to obtain a similarity matrix; normalize the similarity matrix and use the obtained normalized similarity matrix as the similarity matrix between the image to be colored and the second colored image.
[0161] In some embodiments, the processing module 4553 is further configured to determine a first modulation parameter based on multi-scale features aligned with the image to be colored in the second color prior information; and to perform modulation and coloring processing on the first image features using the first modulation parameter to obtain the second image features.
[0162] In some embodiments, the modulation colorization process is implemented through a colorization network, which includes a residual module; the processing module 4553 is further configured to determine a first scale feature corresponding to the residual module in the colorization network from the multi-scale features aligned with the image to be colorized; and to perform convolution processing on the first scale feature to obtain a first modulation parameter corresponding to the residual module.
[0163] In some embodiments, the processing module 4553 is further configured to perform convolution processing on the first image features, perform linear transformation on the obtained convolution result through the first modulation parameter, sum the result of the linear transformation with the first image features, and use the obtained summation result as the second image features.
[0164] In some embodiments, the processing module 4553 is further configured to determine a second modulation parameter based on the multi-scale features aligned with the image to be colored in the second color prior information; perform deconvolution processing on the second image features, and perform linear transformation on the deconvolution processing result through the second modulation parameter, and perform activation processing on the linear transformation result to obtain a predicted color image aligned with the image to be colored; and perform color mode conversion processing on the predicted color image to obtain a first colored image.
[0165] In some embodiments, the modulation colorization process is implemented through a colorization network, which includes an upsampling module; the processing module 4553 is further configured to determine a second scale feature corresponding to the upsampling module in the colorization network from the multi-scale features aligned with the image to be colorized; and to perform convolution processing on the second scale feature to obtain a second modulation parameter corresponding to the upsampling module.
[0166] In some embodiments, the processing module 4553 is further configured to perform conversion processing on the encoded vector to obtain a conversion vector; determine third color prior information aligned with the image to be colored based on the conversion vector; perform modulation coloring processing on the image to be colored based on the third color prior information to obtain a third colored image aligned with the image to be colored; wherein the third colored image includes at least one of the following: an image after coloring the background in the image to be colored, an image after coloring the foreground in the image to be colored, and an image after adjusting the saturation of the image to be colored.
[0167] In some embodiments, downsampling, modulation colorization, and upsampling are implemented through a colorization network; the AI-based image colorization device further includes a training module 4554, used to train the colorization network in the following manner: determining a total loss function based on the adversarial loss function, perceptual loss function, domain alignment loss function, and context loss function corresponding to the colorization network; calling the colorization network to perform colorization processing on the image sample to be colorized, obtaining a first colorized image aligned with the image sample to be colorized, a second colorized image not aligned with the image sample to be colorized, and a predicted color image; wherein, the first colorized image is obtained by transforming the predicted color image; based on the predicted color... The adversarial loss value is determined based on the error between the image and the corresponding first actual color image; the perceptual loss value is determined based on the error between the second colorized image and the corresponding second actual color image; the domain alignment loss value is determined based on the error between the image sample to be colorized and the second colorized image; and the context loss value is determined based on the error between the first colorized image and the second colorized image. The second actual color image is obtained by transforming the first actual color image. The adversarial loss value, perceptual loss value, domain alignment loss value, and context loss value are weighted and summed to obtain the total loss value. The total loss value is backpropagated in the colorization network based on the total loss function to update the parameters of the colorization network.
[0168] This application provides a computer-readable storage medium storing executable instructions. When these executable instructions are executed by a processor, they cause the processor to execute the AI-based image colorization method provided in this application. For example... Figure 5 The image coloring method based on artificial intelligence is shown.
[0169] In some embodiments, the storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or it may be a variety of devices including one or any combination of the above-mentioned memories.
[0170] In some embodiments, executable instructions may take the form of a program, software, software module, script, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and may be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
[0171] As an example, executable instructions may, but do not necessarily, correspond to files in a file system. They may be stored as part of a file that holds other programs or data, for example, in one or more scripts in a Hyper Text Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple collaborating files (e.g., a file that stores one or more modules, subroutines, or code sections).
[0172] As an example, executable instructions can be deployed to execute on a single computing device, or on multiple computing devices located in one location, or on multiple computing devices distributed across multiple locations and interconnected via a communication network.
[0173] In summary, this embodiment of the application determines a second color prior information aligned with the image to be colored, and performs modulation coloring and upsampling processing on the first image features corresponding to the image to be colored based on the second color prior information, thereby obtaining a first colored image. Because the second color prior information is aligned with the image to be colored, the first colored image generated based on the second color prior information is also aligned with the image to be colored, thus achieving automatic and accurate coloring of the image to be colored. Furthermore, this embodiment of the application can also generate colored images with different coloring effects by controlling and modifying the color prior information, achieving diversified coloring.
[0174] The above description is merely an embodiment of this application and is not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, and improvements made within the spirit and scope of this application are included within the scope of protection of this application.
Claims
1. An image colorization method based on artificial intelligence, characterized in that, The method includes: Obtain the first color prior information of the image to be colored; The first color prior information is transformed to obtain second color prior information that is aligned with the image to be colored; The image to be colored is downsampled to obtain the first image features; Based on the second color prior information, the first image features are modulated and colored to obtain the second image features; Based on the second color prior information, the second image features are upsampled to obtain a first colorized image aligned with the image to be colorized.
2. The method according to claim 1, characterized in that, The acquisition of the first color prior information of the image to be colored includes: Obtain the encoding vector of the image to be colored; The image to be colored is colored by performing an identity mapping on the encoding vector to obtain a second colored image that is not aligned with the image to be colored. The multi-scale features obtained during the process of obtaining the second colored image through the identity mapping are used as the first color prior information.
3. The method according to claim 1, characterized in that, The transformation of the first color prior information to obtain second color prior information aligned with the image to be colored includes: Determine the similarity matrix between the image to be colored and the second colored image, wherein the second colored image is obtained by coloring the image to be colored and is not aligned with the image to be colored; Based on the similarity matrix, an affine transformation is performed on the multi-scale features in the first color prior information to obtain multi-scale features aligned with the image to be colored. The multi-scale features aligned with the image to be colored are used as the second color prior information.
4. The method according to claim 3, characterized in that, Determining the similarity matrix between the image to be colored and the second colored image includes: Obtain the first positional features of the image to be colored and the second positional features of the second image to be colored; Wherein, the first position feature includes the position feature of each pixel in the image to be colored, and the second position feature includes the position feature of each pixel in the second image to be colored; Based on the first positional feature and the second positional feature, a similarity matrix is determined between the image to be colored and the second colored image; The similarity matrix includes the similarity between each pixel in the image to be colored and each pixel in the second coloring image.
5. The method according to claim 4, characterized in that, The step of determining the similarity matrix between the image to be colored and the second colored image based on the first positional feature and the second positional feature includes: The first positional feature and the second positional feature are processed nonlocally to obtain a similarity matrix; The similarity matrix is normalized, and the normalized similarity matrix is used as the similarity matrix between the image to be colored and the second image to be colored.
6. The method according to claim 1, characterized in that, The process of modulating and colorizing the first image features based on the second color prior information to obtain the second image features includes: Based on the multi-scale features aligned with the image to be colored in the second color prior information, the first modulation parameter is determined; The first image feature is modulated and colored using the first modulation parameters to obtain the second image feature.
7. The method according to claim 6, characterized in that, The modulation coloring process is implemented through a coloring network, which includes a residual module. The determination of the first modulation parameters based on the multi-scale features aligned with the image to be colored in the second color prior information includes: Among the multi-scale features aligned with the image to be colored, the first scale feature corresponding to the residual module in the coloring network is determined. The first scale feature is convolved to obtain the first modulation parameter corresponding to the residual module.
8. The method according to claim 6, characterized in that, The step of modulating and colorizing the first image features using the first modulation parameters to obtain the second image features includes: The first image features are convolved, and the convolution result is linearly transformed using the first modulation parameters. The result of the linear transformation is summed with the first image feature, and the summed result is used as the second image feature.
9. The method according to claim 1, characterized in that, The step of upsampling the second image features based on the second color prior information to obtain a first colorized image aligned with the image to be colorized includes: Based on the multi-scale features aligned with the image to be colored in the second color prior information, the second modulation parameters are determined; The second image features are deconvolutionally processed, and the deconvolution result is linearly transformed using the second modulation parameters. The linear transformation result is then activated to obtain a predicted color image aligned with the image to be colored. The predicted color image is subjected to color mode conversion processing to obtain the first colored image.
10. The method according to claim 9, characterized in that, The modulation colorization process is implemented through a colorization network, which includes an upsampling module. The determination of the second modulation parameters based on the multi-scale features aligned with the image to be colored in the second color prior information includes: Among the multi-scale features aligned with the image to be colored, determine the second-scale feature corresponding to the upsampling module in the coloring network; The second scale feature is convolved to obtain the second modulation parameter corresponding to the upsampling module.
11. The method according to claim 2, characterized in that, The method further includes: The encoded vector is transformed to obtain a transformed vector; Based on the transformation vector, determine the third color prior information aligned with the image to be colored; Based on the third color prior information, the image to be colored is modulated and colored to obtain a third colored image that is aligned with the image to be colored. The third colorized image includes at least one of the following: an image after coloring the background of the image to be colorized, an image after coloring the foreground of the image to be colorized, and an image after adjusting the saturation of the image to be colorized.
12. The method according to claim 1, characterized in that, The downsampling process, the modulation colorization process, and the upsampling process are implemented through a colorization network; Before obtaining the first color prior information of the image to be colored, the method further includes: The coloring network is trained in the following manner: The total loss function is determined based on the adversarial loss function, perceptual loss function, domain alignment loss function, and context loss function corresponding to the coloring network. The coloring network is invoked to colorize the image sample to be colored, resulting in a first colored image aligned with the image sample to be colored, a second colored image not aligned with the image sample to be colored, and a predicted color image. The first colorized image is obtained by converting the predicted color image; An adversarial loss value is determined based on the error between the predicted color image and the corresponding first actual color image; a perceptual loss value is determined based on the error between the second colorized image and the corresponding second actual color image; a domain alignment loss value is determined based on the error between the image sample to be colorized and the second colorized image; and a context loss value is determined based on the error between the first colorized image and the second colorized image. The second actual color image is obtained by converting the first actual color image; The total loss value is obtained by weighted summing of the adversarial loss value, the perceptual loss value, the domain alignment loss value, and the context loss value. The total loss value is backpropagated in the coloring network based on the total loss function to update the parameters of the coloring network.
13. An image coloring device based on artificial intelligence, characterized in that, include: The acquisition module is used to acquire the first color prior information of the image to be colored; A transformation module is used to transform the first color prior information to obtain second color prior information aligned with the image to be colored; The processing module is used to downsample the image to be colored to obtain the first image features; And for modulating and colorizing the first image features based on the second color prior information to obtain the second image features; and for upsampling the second image features based on the second color prior information to obtain a first colorized image aligned with the image to be colorized.
14. An electronic device, characterized in that, include: Memory, used to store executable instructions; A processor, when executing executable instructions stored in the memory, implements the AI-based image coloring method according to any one of claims 1 to 12.
15. A computer-readable storage medium, characterized in that, It stores executable instructions for causing a processor to execute, thereby implementing the AI-based image coloring method according to any one of claims 1 to 12.