Artificial intelligence painting display system and method based on image core extraction technology
By employing image preprocessing, lightweight neural networks, and deep style transfer techniques, the problems of dull colors and blurred subjects in digital painting have been solved, achieving efficient and accurate style transfer and enhanced artistic expression.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENZHEN YIBOTANG DIGITAL CULTURE TECH GRP CO LTD
- Filing Date
- 2026-02-28
- Publication Date
- 2026-06-23
AI Technical Summary
Existing digital painting techniques lack preprocessing in color control, resulting in dull colors and blurred subject matter. Furthermore, traditional semantic segmentation models are computationally complex, and the style transfer process is not precise or efficient, affecting artistic expression and visual quality.
The image preprocessing module performs low-pass filtering and pixel-by-pixel difference operations, combined with an improved lightweight neural network architecture for core region recognition, and uses a deep style transfer network to accurately transfer artistic styles to the core regions, and finally performs adaptive normalization adjustments.
It improves the precision and efficiency of color control in digital painting, ensures that the main subject of the image stands out, avoids background interference, enhances artistic expression and visual quality, and supports real-time painting generation.
Smart Images

Figure CN122265451A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of computer vision and digital image technology, and more specifically, to an artificial intelligence painting display system and method based on image core extraction technology. Background Technology
[0002] With the continuous advancement of modern technology, computer painting techniques have developed rapidly. Digital painting, with its efficiency and convenience, has been widely adopted by designers, architects, and commercial illustrators, and applied in various fields such as animation concept design, picture book design, and fashion design. Digital painting differs fundamentally from traditional painting in style and creative concept, a difference particularly evident in color relationships. Therefore, it demands higher precision in color control, highlighting the increasing importance of color control systems. However, current digital painting technologies still have some shortcomings in practical use. For example, common color control systems often lack image preprocessing, resulting in less vibrant colors and affecting the accuracy of control. Simultaneously, existing neural network-based style transfer techniques often lack an understanding of the semantic content of the image, failing to effectively distinguish between the subject and background. When an artistic style is uniformly applied to the entire image, the details of the core areas that should be the visual focus may be over-stylized and distorted, leading to a lack of subject prominence and blurred visual focus. Furthermore, using the same level of stylization for both the background and core areas can sometimes introduce unnecessary textural interference, distracting the viewer from the subject matter and reducing the artistic expression and visual quality of the final work.
[0003] On the other hand, some studies have attempted to guide style transfer by introducing semantic segmentation maps, but their segmentation models are often computationally complex and difficult to implement in practice. Furthermore, the content loss function in the style transfer process is usually still calculated based on the entire image, failing to achieve truly accurate and efficient style transfer dominated by the core region, and thus failing to meet the dual requirements of efficiency and effectiveness in digital painting creation. Summary of the Invention
[0004] To overcome the aforementioned deficiencies of the prior art, embodiments of the present invention provide an artificial intelligence painting display system and method based on image core extraction technology. The system addresses the problem of blurred subject and reduced artistic expression caused by global processing as mentioned in the background technology through the following solutions.
[0005] To achieve the above objectives, the present invention provides the following technical solution: an artificial intelligence painting display system and method based on image core extraction technology, the system comprising: Image preprocessing module: used to enhance the input original image to obtain optimized digital painting data; The core painting region extraction module is used to receive the optimized digital painting data and adopt a semantic segmentation model based on an improved lightweight neural network architecture to automatically identify and segment the core region representing the painting theme from the optimized digital painting data, and generate structured core painting region data. Style painting generation module: used to receive the core painting area data, transfer the specified art style features to the core area through a deep style transfer network optimized by an adversarial training process, and finally output specific stylized painting data; Digital painting generation module: used to adaptively standardize and uniformly adjust the specific stylized painting data, synthesize and output the final digital painting work.
[0006] Preferably, the enhancement processing of the input original image specifically includes: The input original image is low-pass filtered to separate high-frequency defect information, and the filtered image is converted to the original image space and subjected to pixel-by-pixel difference operation with the original image to obtain optimized digital painting data with enhanced details and color contrast.
[0007] Preferably, the low-pass filtering employs a Gaussian filtering algorithm, which convolves the image using a two-dimensional Gaussian kernel function, with the kernel size adaptively adjusted according to the resolution of the input image.
[0008] Preferably, the improved lightweight neural network architecture incorporates channel attention and spatial attention mechanisms to adaptively weight important feature channels and spatial locations during semantic segmentation.
[0009] Preferably, the lightweight neural network adopts an encoder-decoder structure, wherein the encoder part uses depthwise separable convolution for feature extraction.
[0010] Preferably, the training process of the deep style transfer network relies on a specially optimized style loss function.
[0011] Preferably, the stylization loss function is a weighted combination of the content loss function, the style loss function, and the identity loss function; wherein, the calculation area of the content loss function is dynamically limited by the core painting area data, rather than the entire image area.
[0012] Preferably, the adaptive normalization adjustment performed by the digital painting generation module specifically includes the collaborative optimization of the global contrast, color saturation, and brightness parameters of the specific stylized painting data; The parameter set upon which the collaborative optimization process is based supports two configuration modes: one is pre-configuration based on the standard color characteristics of the target display platform, and the other is receiving real-time dynamic adjustment instructions through the user interaction interface integrated into the system.
[0013] Preferably, the present invention also includes an artificial intelligence painting display method based on image core extraction technology, the method comprising: S1: Preprocess the input image to obtain optimized digital painting data; S2: Receive the optimized digital painting data, and use a semantic segmentation model based on an improved lightweight neural network architecture to automatically identify and segment the core region representing the painting theme from the data, and generate core painting region data. S3: Receive the core painting area data, and transfer the specified art style features to the core area through a deep style transfer network optimized by an adversarial training process to generate specific stylized painting data; S4: Adaptively standardize and uniformly adjust the specific stylized painting data, synthesize and output the final digital painting.
[0014] The technical effects and advantages of this invention are as follows: 1. This invention separates high-frequency defect information in the original image and enhances details and color contrast through low-pass filtering and pixel-by-pixel differential operation in the image preprocessing module. At the same time, the size of the filter kernel is adaptively adjusted according to the resolution of the input image, providing high-quality optimized data for subsequent core area extraction and stylization processing. This effectively solves the problem of dull image colors and low control accuracy caused by lack of preprocessing in common color control systems, and improves the accuracy and ease of use of color control in digital painting. 2. This invention, through an improved lightweight neural network architecture, adaptively weights important feature channels and spatial locations during semantic segmentation, accurately identifies and segments the core region of a painting, significantly reducing computational complexity and resource consumption. It overcomes the shortcomings of traditional semantic segmentation models, which are computationally complex and difficult to implement, laying the foundation for subsequent accurate style transfer while ensuring system processing efficiency and supporting real-time or near-real-time painting generation. 3. This invention uses a deep style transfer network optimized through adversarial training in the style painting generation module, combined with a specially optimized stylization loss function, to accurately transfer specified artistic style features to the core area. This avoids the problem of the core area being submerged and the background introducing texture interference caused by the style being applied uniformly to the entire picture. As a result, the generated digital painting works have prominent subjects and distinct layers, significantly enhancing artistic expression and visual quality. 4. This invention optimizes the global contrast, color saturation, and brightness parameters of specific stylized painting data through adaptive standardization adjustment of the digital painting generation module. The parameter set supports a pre-configuration mode based on the standard color characteristics of the target display platform and a real-time dynamic adjustment mode through the user interaction interface. This ensures the visual consistency of the final work on different display platforms, simplifies the operation process, gives users fine control over the painting effect, and improves the user experience and system flexibility. Attached Figure Description
[0015] Figure 1 This is a schematic diagram of the overall structure of the present invention; Figure 2 This is a schematic diagram of the core drawing area extraction module of the present invention; Figure 3 This is a schematic diagram of the style painting generation module of the present invention; Figure 4 This is a schematic diagram of the digital painting generation module of the present invention. Detailed Implementation
[0016] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0017] As attached Figure 1-4 The AI-powered painting display system and method based on image core extraction technology, as shown, include: Image preprocessing module: used to enhance the input original image to obtain optimized digital painting data.
[0018] It should be specifically noted that the enhancement processing of the input original image specifically includes: The input original image is low-pass filtered to separate high-frequency defect information, and the filtered image is converted to the original image space and subjected to pixel-by-pixel difference operation with the original image to obtain optimized digital painting data with enhanced details and color contrast. The low-pass filter employs a Gaussian filtering algorithm, which convolves the image using a two-dimensional Gaussian kernel function. The size of the filtering kernel is adaptively adjusted according to the resolution of the input image.
[0019] It should be further explained that the image preprocessing module is used to enhance the input original image to eliminate noise, enhance details and color contrast, and obtain optimized digital painting data; the specific steps are as follows: Input original image: Let the input original image be I, with a resolution of W*H, where W is the width, H is the height, and the color space is RGB. I can be any digital image format, uploaded through the system interface or obtained in real time. Low-pass filtering: The original image I is low-pass filtered using a Gaussian filtering algorithm to separate high-frequency defect information. The Gaussian filtering is performed by convolving the image with a two-dimensional Gaussian kernel function, and the kernel size is adaptively adjusted according to the resolution of the input image. The Gaussian kernel function is defined as follows: Where (x, y) are the relative coordinates of the pixels within the kernel, and their range is determined by the kernel size. The standard deviation of the Gaussian distribution controls the filter strength. The value is adaptively calculated based on image resolution. Where k is a scaling factor, with a value range of [0.001, 0.01]; the filter kernel size S is an odd number, and the adaptive adjustment formula is: Where 'a' is the proportionality coefficient, with a value ranging from [0.01, 0.05]. This indicates rounding down to the nearest integer.
[0020] Filtered image Through convolution operations: , where (u,v) are the pixel coordinates of the image, I(u+i,v+j) is the pixel value of the original image at position (u+i,v+j), and G(i,j) is the weight of the Gaussian kernel at (i,j).
[0021] Differential operation enhancement: By filtering the image The image is converted to the original image space and then subjected to pixel-by-pixel differencing with the original image I to enhance detail and color contrast; the differencing image... The calculation formula is: The subtraction is performed pixel-by-pixel, and the result... Includes high-frequency information; optimized digital painting data It is obtained through weighted fusion, and its calculation formula is as follows: , where α is the enhancement factor, and its value ranges from [0.5, 2.0].
[0022] Final output The optimized digital painting data is used in subsequent steps. All parameters (k, a, α) are optimized through offline experiments and stored in the system configuration file.
[0023] It should be further noted that for several common image resolutions, the scaling factor 'a', enhancement factor 'α', and Gaussian kernel size 'S' are fixed values: when the image resolution is 720P, the scaling factor 'a' is 0.02, the enhancement factor 'α' is 1.2, and the Gaussian kernel size 'S' is 25*25; when the image resolution is 1080P, the scaling factor 'a' is 0.03, the enhancement factor 'α' is 1.0, and the Gaussian kernel size 'S' is 38*38; when the image resolution is 4K, the scaling factor 'a' is 0.04, the enhancement factor 'α' is 0.8, and the Gaussian kernel size 'S' is 77*77.
[0024] The core painting region extraction module receives the optimized digital painting data and uses a semantic segmentation model based on an improved lightweight neural network architecture to automatically identify and segment the core regions representing the painting theme from the optimized digital painting data, generating structured core painting region data.
[0025] It should be further explained that the core area of the painting is defined as the area in the image that has semantic salience and can represent the theme of the painting. The determination criteria include: semantic category, visual salience, and user-preset rules.
[0026] It should be noted that the improved lightweight neural network architecture incorporates channel attention and spatial attention mechanisms to adaptively weight important feature channels and spatial locations during semantic segmentation.
[0027] It should be further explained that the neural network architecture uses a lightweight neural network with an encoder-decoder structure, embedding channel attention and spatial attention mechanisms, and adaptively weighting important feature channels and spatial locations; It should be further explained that the improved lightweight neural network is built on the MobileNetV3-Lite architecture. The encoder uses depthwise separable convolution, the number of convolution kernels is configured to be half the number of input channels, the stride is set to 2, and batch normalization and ReLU6 activation functions are added after each convolutional layer. The number of parameters is controlled within 3M.
[0028] It should be further explained that the encoder part uses depthwise separable convolution for feature extraction to reduce computational cost; depthwise separable convolution decomposes standard convolution into depthwise convolution and pointwise convolution; depthwise convolution outputs a feature map by applying a single convolution kernel to each input channel. Pointwise convolution uses a 1x1 convolution kernel to fuse channel information and output a feature map. Let the size of the input feature map X be... ,in For height, For width, Given the number of input channels and a kernel size of K*K, the number of parameters in a depthwise separable convolution is: ,in This represents the number of output channels.
[0029] The channel attention mechanism employs an SE module, which calculates weights for each channel to highlight important feature channels; let the input features be... Where R is the set of real numbers; firstly, channel statistics are obtained through global average pooling. , Where c is the channel index currently being calculated; then the attention weights are calculated through two fully connected layers. ,in and It is a learnable weight matrix, where r is the reduction ratio, with values ranging from [4, 16]. It is the ReLU activation function. It is the Sigmoid function; its output characteristics .
[0030] The spatial attention mechanism uses a 3x3 convolutional kernel to highlight important regions by calculating weights for each spatial location; the input feature F cam Through spatial attention maps ,calculate: W S It is a learnable weight vector, weight vector W S Initialized as a Xavier normal distribution, updated during training with gradient descent, where c is the currently computed channel index. It is the Sigmoid function; the final output feature is... .
[0031] The semantic segmentation process involves optimizing the digital painting data I enhanced Input a neural network and output a binary mask of the core region. Where 1 represents a pixel in the core region and 0 represents the background; the training data is pre-trained using a public dataset and fine-tuned for the painting theme; the loss function is a combination of cross-entropy loss and Dice loss. Where p is the pixel index, y p For real labels, To predict probabilities, λ represents the balancing weights within the range of [0.5, 1.5]. Post-processing uses morphological operations to remove small noise regions and extracts the largest connected region as the core region. The generated core drawing region data includes the mask M and bounding box coordinates. Finally, structured core drawing region data is output. , where B is the bounding box used to define the style transfer region.
[0032] Style painting generation module: It is used to receive the core painting area data, transfer the specified art style features to the core area through a deep style transfer network optimized by an adversarial training process, and finally output specific stylized painting data.
[0033] It should be noted that the training process of the deep style transfer network relies on a specially optimized style loss function.
[0034] It should be further explained that the stylization loss function is a weighted combination of the content loss function, the style loss function, and the identity loss function; wherein, the calculation area of the content loss function is dynamically limited by the core painting area data, rather than the entire image area.
[0035] It should be further explained that the deep style transfer network is based on a generative adversarial network architecture, including a generator G and a discriminator D; the generator adopts a U-Net structure, receives a content image and a style image, and outputs a stylized image; the discriminator is used to distinguish between real style images and generated images; the generator input is a core region image. From through mask M Extraction, i.e. ,in, Represents pixel-wise multiplication and style image Choose from a predefined style library; the generator outputs a stylized core region. .
[0036] It should be further explained that the stylization loss function training process uses a specially optimized loss function, which is content loss. Style loss Loss of identity Weighted combination; where content loss ensures that the stylized image retains the structural content of the core region, and the computational region is composed of Dynamically limited, only for pixels in the core region; set It is the feature map extracted from image I by the l-th layer of the pre-trained VGG network. The content loss is defined as the mean square error between the stylized image and the content image in the feature space. Where p is the spatial location index on the feature map. Let C represent the Euclidean norm. l H l W l These represent the number of channels, height, and width of the feature map at layer l, respectively.
[0037] Style loss measures the similarity between the generated image and the style image in stylistic features such as texture and brushstrokes. This is achieved by comparing the differences in their feature map Gram matrices. First, the Gram matrix G of the feature map is calculated. lIt captures the correlation between different feature channels, thus characterizing the style of the image; for the feature map of the l-th layer, Reshape it into a matrix Then the Gram matrix of this layer The calculation is as follows: ,Right now Where i and j are feature channel indices; style loss is a weighted sum of the differences in the Gram matrix between the generated image and the style image on the multi-layer feature maps: ,in w represents the square of the Frobenius norm. l The weights for each layer of loss are used to balance the importance of style features at different scales, and are set to... L S This is the total number of layers used to calculate the style loss.
[0038] Identity loss is introduced to maintain color consistency and prevent excessive loss of identity information in the content image during style transfer; it directly compares the stylized image with the original content image in pixel space or shallow feature space. ,in This represents the L1 norm.
[0039] Total loss function: The total training loss of the style transfer network generator G is the weighted sum of the three losses mentioned above. ,in , , This is a hyperparameter used to balance the weights among content preservation, style transfer, and identity preservation; its value range is [value range missing]. , , The specific value is determined through grid search.
[0040] It should be further explained that the adversarial training discriminator D uses the PatchGAN architecture, with a total of 5 convolutional layers. The first 4 layers are 3*3 convolutions, and the last layer is a 1*1 convolution. The number of output channels are 64, 128, 256, 512, and 1, respectively. The loss function of D is: The adversarial loss of generator G is: The total training loss includes L total and L adv Training is completed by alternately optimizing G and D; finally, stylized painting data is output. The core area has been styled, while the background area remains unchanged.
[0041] It should be further noted that the discriminator was trained for 10,000 iterations with an initial learning rate of 0.0002. The StepLR decay strategy was adopted, with the learning rate multiplied by 0.5 every 2,000 iterations. Each time the generator was trained, the discriminator was trained twice.
[0042] Digital painting generation module: used to adaptively standardize and uniformly adjust the specific stylized painting data, synthesize and output the final digital painting work.
[0043] It should be specifically noted that the adaptive normalization and unified adjustment performed by the digital painting generation module specifically includes the collaborative optimization of the global contrast, color saturation and brightness parameters of the specific stylized painting data; The parameter set upon which the collaborative optimization process is based supports two configuration modes: one is pre-configuration based on the standard color characteristics of the target display platform, and the other is receiving real-time dynamic adjustment instructions through the user interaction interface integrated into the system.
[0044] It should be further explained that the digital painting generation module handles specific stylized painting data I styled Adaptive normalization and uniform adjustments are performed to synthesize and output the final digital painting. These adjustments include coordinated optimization of global contrast, color saturation, and brightness parameters. The specific steps are as follows: Adaptive normalization adjustment: for I styled Global enhancements are performed to ensure visual consistency and display adaptability; global contrast adjustment enhances contrast by using histogram stretching, setting I... styled The brightness channel is V, and the adjusted brightness The calculation is as follows: V min and V max These are the minimum and maximum values of V. newmin and V newmax It is the target range; if V max =V min ,but Color saturation adjustment, saturation channel S adjusted to: ,in This is the saturation gain factor, with a value range of [0.5, 2.0]; luminance channel. Further adjustments to ,in The brightness gain factor has a value range of [0.7, 1.5]; collaborative optimization, parameter set. It supports two configuration modes: a pre-configuration mode and a user interaction mode; finally, the adjusted HSV image is converted back to the RGB color space to obtain the final digital painting artwork. finalThe output format is a common image file, and the resolution is the same as the input.
[0045] It should be further noted that the pre-configuration mode sets parameter values based on the standard color characteristics of the target display platform; for example, for web page display. , , , The user interaction mode receives real-time adjustment commands through the system-integrated graphical user interface, and users can modify parameters by sliding controls.
[0046] It should be further noted that the above steps are executed in series through system modules, and the image preprocessing module outputs I. enhanced The core region extraction module generates D. core The input style painting generation module then generates I. styled Finally, the digital painting generation module outputs I. final The system is implemented using Python and a deep learning framework. All models are pre-trained and deployed on the server side, supporting real-time processing.
[0047] Secondly: The accompanying drawings of the embodiments disclosed in this invention only involve the structures involved in the embodiments disclosed in this invention. Other structures can refer to the general design. In the absence of conflict, the same embodiment and different embodiments of this invention can be combined with each other. In conclusion, the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. An artificial intelligence painting display system based on image core extraction technology, characterized in that, include: Image preprocessing module: used to enhance the input original image to obtain optimized digital painting data; The core painting region extraction module is used to receive the optimized digital painting data and adopt a semantic segmentation model based on an improved lightweight neural network architecture to automatically identify and segment the core region representing the painting theme from the optimized digital painting data, and generate structured core painting region data. Style painting generation module: used to receive the core painting area data, transfer the specified art style features to the core area through a deep style transfer network optimized by an adversarial training process, and finally output specific stylized painting data; Digital painting generation module: used to adaptively standardize and uniformly adjust the specific stylized painting data, synthesize and output the final digital painting work.
2. The artificial intelligence painting display system based on image core extraction technology according to claim 1, characterized in that: The enhancement processing of the input original image specifically includes: The input original image is low-pass filtered to separate high-frequency defect information, and the filtered image is converted to the original image space and subjected to pixel-by-pixel difference operation with the original image to obtain optimized digital painting data with enhanced details and color contrast.
3. The artificial intelligence painting display system based on image core extraction technology according to claim 2, characterized in that: The low-pass filter employs a Gaussian filtering algorithm, which convolves the image using a two-dimensional Gaussian kernel function. The size of the filter kernel is adaptively adjusted according to the resolution of the input image.
4. The artificial intelligence painting display system based on image core extraction technology according to claim 1, characterized in that: The improved lightweight neural network architecture incorporates channel attention and spatial attention mechanisms to adaptively weight important feature channels and spatial locations during semantic segmentation.
5. The artificial intelligence painting display system based on image core extraction technology according to claim 4, characterized in that: The lightweight neural network employs an encoder-decoder structure, where the encoder part uses depthwise separable convolutions for feature extraction.
6. The artificial intelligence painting display system based on image core extraction technology according to claim 1, characterized in that: The training process of the deep style transfer network relies on a specially optimized style loss function.
7. The artificial intelligence painting display system based on image core extraction technology according to claim 6, characterized in that: The stylization loss function is a weighted combination of the content loss function, the style loss function, and the identity loss function; wherein, the calculation area of the content loss function is dynamically limited by the core painting area data, rather than the entire image area.
8. The artificial intelligence painting display system based on image core extraction technology according to claim 1, characterized in that: The adaptive normalization and unified adjustment performed by the digital painting generation module specifically includes the collaborative optimization of the global contrast, color saturation and brightness parameters of the specific stylized painting data. The parameter set upon which the collaborative optimization process is based supports two configuration modes: one is pre-configuration based on the standard color characteristics of the target display platform, and the other is receiving real-time dynamic adjustment instructions through the user interaction interface integrated into the system.
9. An artificial intelligence painting display method based on image core extraction technology, used to implement the artificial intelligence painting display system based on image core extraction technology as described in any one of claims 1-8, characterized in that, include: S1: Preprocess the input image to obtain optimized digital painting data; S2: Receive the optimized digital painting data, and use a semantic segmentation model based on an improved lightweight neural network architecture to automatically identify and segment the core region representing the painting theme from the data, and generate core painting region data. S3: Receive the core painting area data, and transfer the specified art style features to the core area through a deep style transfer network optimized by an adversarial training process to generate specific stylized painting data; S4: Adaptively standardize and uniformly adjust the specific stylized painting data, synthesize and output the final digital painting.