An interactive painting system, method and storage medium based on deep learning
By using a deep learning-based interactive drawing system and model, the challenge of completing lines in hand-drawn sketches has been solved, enabling educational applications of line drawing. It provides real-time line completion and tutoring functions, improving the completeness and accuracy of line drawings.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HANGZHOU MIAOJI TECH CO LTD
- Filing Date
- 2022-12-27
- Publication Date
- 2026-06-30
AI Technical Summary
There is a lack of interactive drawing solutions in the current technology that can complete the lines of hand-drawn sketches, especially in educational and tutoring applications. Furthermore, the lack of texture and contextual information in hand-drawn sketches makes line drawing completion very challenging.
An interactive painting system based on deep learning is adopted, including a painting area and a display area. It utilizes a predictive line drawing parsing model and an automatic line completion model to generate completed line drawings after training with a generator and a discriminator. The PoolFormer, AdaIN and Simplify models are combined for data processing and prediction. Geometric consistency loss, semantic reconstruction loss, L1 loss and adversarial loss are used to optimize the model.
It enables line completion of hand-drawn sketches, providing educational and tutoring functions. Through real-time line drawing completion algorithms, it guides users to draw line drawings correctly and completely, improving the integrity and accuracy of line drawings.
Smart Images

Figure CN115797728B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of interactive painting technology, specifically to an interactive painting system, method, and storage medium based on deep learning. Background Technology
[0002] In recent years, the widespread adoption of touchscreen devices (such as smartphones and tablets) has made interactive drawing much easier than ever before, leading to the increasing popularity of sketch-oriented applications such as GauGAN and DeepFaceDrawing. However, these applications are mostly for entertainment purposes and lack educational value. Therefore, an algorithm that can guide and instruct people to draw line drawings better through human-computer interaction is crucial, including line drawing tutorials for figures, landscapes, and various objects. This leads to the technical field of line completion. Line completion aims to infer reasonable lines to fill in missing strokes in a sketch. Although there is a great deal of work on image completion, line drawing completion has received relatively little attention. Hand-drawn sketches lack texture and contextual information and are generally considered to be more blurred than natural images. Therefore, many image completion methods designed for color images cannot be directly applied to sketches. Furthermore, due to the nature of hand-drawn sketches, sketches of the same object may be drawn in different styles, making sketch completion very challenging. Summary of the Invention
[0003] The purpose of this application is to provide an interactive drawing system, method, and storage medium based on deep learning, in order to solve the problem in the prior art of lacking an interactive drawing solution that can complete the lines of hand-drawn sketches and has educational and tutoring functions.
[0004] To achieve the above objectives, embodiments of this application provide an interactive painting system based on deep learning, comprising: a painting area and a display area, wherein,
[0005] The drawing area is used to acquire the initial line drawing drawn by the user, and the display area is used to display the completed line drawing and / or complete parsing information generated based on the initial line drawing;
[0006] The method for generating complete analytical information corresponding to the initial line drawing includes: using the constructed predictive line drawing analytical model, taking the initial line drawing as input, and obtaining the complete analytical information of the output, wherein the predictive line drawing analytical model is obtained by training the PoolFormer model;
[0007] The method for generating the completed line drawing includes: using a constructed automatic line completion model, taking the initial line drawing and the corresponding complete parsing information as input, to obtain the output completed line drawing. The automatic line completion model is obtained by training a generator and a discriminator. The generator includes an encoder and a decoder.
[0008] Optionally, the method for constructing the automatic line completion model includes:
[0009] The image dataset was obtained and processed using the AdaIN and Simplify models to obtain the first line drawing dataset.
[0010] The first line drawing dataset is preprocessed to obtain a second line drawing dataset containing incomplete line drawings. The preprocessing method includes line erasure or line extraction.
[0011] Using the constructed predictive line drawing analysis model, the second line drawing dataset is taken as input to obtain the output analysis information;
[0012] The generator is trained using the second line drawing dataset and its corresponding parsing information, along with the first line drawing dataset, to obtain the automatic line completion model.
[0013] Optionally, the method for constructing the predictive line drawing analytical model includes:
[0014] Obtain the parsing information corresponding to the image dataset;
[0015] The PoolFormer model is trained using the parsing information corresponding to the image dataset and the second line drawing dataset to obtain the predicted line drawing parsing model to output complete parsing information.
[0016] Methods for constructing analytical models for predicting incomplete line drawings for loss calculation include:
[0017] The PoolFormer model was trained using the second line drawing dataset;
[0018] The labels of the output parsing information corresponding to the erased parts in the line drawing data processed by the line erasure method in the second line drawing dataset are modified to clear the parsing information corresponding to the erased parts;
[0019] Only the parsing information corresponding to the portion of the line drawing data extracted from the second line drawing dataset after processing by the line extraction method is output, and finally the predicted incomplete line drawing parsing model is obtained.
[0020] Optionally, it also includes:
[0021] Based on the parsing information corresponding to the first line drawing dataset and the image dataset respectively, the loss between the output result and the true value of the line auto-completion model is calculated using a loss function to optimize the line auto-completion model. The loss function includes geometric consistency loss, semantic reconstruction loss, L1 loss and / or adversarial loss.
[0022] Optionally, it also includes:
[0023] The attribute area is used to provide a depth estimation map, analytical information and / or selectable attributes corresponding to the initial line drawing. Based on the acquired selection instructions, the attribute area completes the lines corresponding to the attributes on the basis of the initial line drawing in the drawing area.
[0024] The method for generating the analytical information corresponding to the initial line drawing includes: using the predicted line drawing analytical model, taking the initial line drawing as input, and obtaining the output analytical information;
[0025] The method for generating the depth estimation map corresponding to the initial line drawing includes: using the constructed line drawing depth estimation model, taking the initial line drawing as input, and obtaining the output depth estimation map, wherein the line drawing depth estimation model is obtained by training the Pix2PixHD model.
[0026] Optionally, the method for constructing the line drawing depth estimation model includes:
[0027] The depth map corresponding to the image dataset is obtained using the LeRas-based BoostingMonocularDepth model;
[0028] The InceptionV3 model was used to extract image features from the first line drawing dataset;
[0029] The Pix2PixHD model is trained using the image features and the depth map corresponding to the image dataset to obtain the line drawing depth estimation model.
[0030] Optionally, it also includes:
[0031] The drawing area overlaps with the display area, and the completed line drawing with transparency is displayed in the display area;
[0032] And / or, the drawing area overlaps with the display area, where full analytical information with transparency is displayed.
[0033] Optionally, it also includes:
[0034] Based on the obtained line proportions of the initial line drawing as a whole and / or its various parts, the completeness of the initial line drawing is evaluated and provided according to a preset standard, which includes the average line proportion in the first line drawing dataset.
[0035] To achieve the above objectives, this application also provides an interactive painting method based on deep learning, comprising the following steps:
[0036] Obtain the initial line drawing drawn by the user, and generate and display the corresponding completed line drawing and / or complete parsing information based on the initial line drawing;
[0037] The method for generating complete analytical information corresponding to the initial line drawing includes: using the constructed predictive line drawing analytical model, taking the initial line drawing as input, and obtaining the complete analytical information of the output, wherein the predictive line drawing analytical model is obtained by training the PoolFormer model;
[0038] The method for generating the completed line drawing includes: using a constructed automatic line completion model, taking the initial line drawing and the corresponding complete parsing information as input, to obtain the output completed line drawing. The automatic line completion model is obtained by training a generator and a discriminator. The generator includes an encoder and a decoder.
[0039] The method for constructing the automatic line completion model includes:
[0040] The image dataset was obtained and processed using the AdaIN and Simplify models to obtain the first line drawing dataset.
[0041] The first line drawing dataset is preprocessed to obtain a second line drawing dataset containing incomplete line drawings. The preprocessing method includes line erasure or line extraction.
[0042] Using the constructed predictive line drawing analysis model, the second line drawing dataset is taken as input to obtain the output analysis information;
[0043] The generator is trained using the second line drawing dataset and its corresponding parsing information, along with the first line drawing dataset, to obtain the automatic line completion model.
[0044] The method for constructing the predictive line drawing analysis model includes:
[0045] Obtain the parsing information corresponding to the image dataset;
[0046] The PoolFormer model is trained using the parsing information corresponding to the image dataset and the second line drawing dataset to obtain the predicted line drawing parsing model to output complete parsing information.
[0047] To achieve the above objectives, this application also provides a computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a machine, implements the steps of the method described above.
[0048] The embodiments of this application have the following advantages:
[0049] This application provides an interactive drawing system based on deep learning, including a drawing area and a display area. The drawing area is used to acquire an initial line drawing drawn by a user, and the display area is used to display a completed line drawing and / or complete analytical information generated based on the initial line drawing. The method for generating the completed line drawing includes: using a constructed automatic line completion model, taking the initial line drawing and the corresponding complete analytical information as input, to obtain the output completed line drawing. The automatic line completion model is obtained by training a generator and a discriminator, and the generator includes an encoder and a decoder.
[0050] Through the aforementioned system, in the field of interactive painting, a real-time deep learning-based line drawing completion algorithm guides users to draw a line drawing correctly and completely, achieving educational and tutoring functions while completing the lines of hand-drawn sketches. Attached Figure Description
[0051] To more clearly illustrate the embodiments of this application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are merely exemplary, and those skilled in the art can derive other embodiments based on the provided drawings without creative effort.
[0052] Figure 1 An interface diagram of an interactive painting system based on deep learning provided in an embodiment of this application;
[0053] Figure 2 A structural diagram of an automatic line completion model for an interactive drawing system based on deep learning, provided in an embodiment of this application;
[0054] Figure 3 This is a schematic diagram of the preprocessing results of an interactive painting system based on deep learning, provided in an embodiment of this application. Detailed Implementation
[0055] The following specific embodiments illustrate the implementation of this application. Those skilled in the art can easily understand other advantages and effects of this application from the content disclosed in this specification. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0056] Furthermore, the technical features involved in the different embodiments of this application described below can be combined with each other as long as they do not conflict with each other.
[0057] One embodiment of this application provides an interactive painting system based on deep learning, with reference to... Figure 1 , Figure 1 The diagram provided in this application illustrates an interface of an interactive drawing system based on deep learning. It should be understood that the system may also include additional boxes not shown and / or the boxes shown may be omitted, and the scope of this application is not limited in this respect.
[0058] The deep learning-based interactive painting system provided in this embodiment includes a painting area and a display area, wherein...
[0059] The drawing area is used to acquire the initial line drawing drawn by the user, and the display area is used to display the completed line drawing and / or complete parsing information generated based on the initial line drawing;
[0060] The method for generating complete analytical information corresponding to the initial line drawing includes: using the constructed predictive line drawing analytical model, taking the initial line drawing as input, and obtaining the complete analytical information of the output, wherein the predictive line drawing analytical model is obtained by training the PoolFormer model;
[0061] The method for generating the completed line drawing includes: using a constructed automatic line completion model, taking the initial line drawing and the corresponding complete parsing information as input, to obtain the output completed line drawing. The automatic line completion model is obtained by training a generator and a discriminator. The generator includes an encoder and a decoder.
[0062] Specifically, the drawing area allows users to draw line drawings, while the display area shows the results automatically completed by the system based on the user's line drawings, and can also display complete analysis information.
[0063] In some embodiments, the drawing area overlaps with the display area, where the completed line drawing with transparency is displayed;
[0064] And / or, the drawing area overlaps with the display area, where full analytical information with transparency is displayed.
[0065] Specifically, the drawing area and the display area can overlap. That is, the same area serves as both a drawing area for users to draw line drawings and a display area showing the automatically completed or fully analyzed results based on the user's line drawings. This automatically completed or fully analyzed result can overlap with the user's line drawings or be separated from them by a certain distance. It should be understood that this application does not limit the number of display areas; there can be multiple display areas, each displaying the automatically completed or fully analyzed results. One display area can overlap with the drawing area.
[0066] In some embodiments, a tools area is also included.
[0067] Specifically, the tools area provides users with several options for using the system. In this area, the system offers basic tools such as a "brush," "eraser," and "lasso tool." In addition, users can select their desired drawing theme, such as people, landscapes, fruits, and various objects. After selecting the theme, users can draw lines in the drawing area. If the user stops drawing for more than a few seconds or clicks the "submit" button, the system will automatically complete the drawing based on the lines currently drawn. The completed result will be displayed in the display area. Meanwhile, in some embodiments, to better guide the user to continue drawing, the completed line drawing will also be displayed in the drawing area (i.e., as described in the previous embodiments, the drawing area and the display area overlap; it should be noted that there are two display areas in this embodiment, one displaying the completed result outside the drawing area, and the other overlapping with the drawing area, and the displayed completed result overlaps with the line drawing drawn by the user). The completed lines displayed in the drawing area are light gray (with transparency), allowing the user to continue drawing in the drawing area based on the light gray markings. Of course, the user can also continue drawing randomly according to their own ideas. After continuing to draw, the user can still click the "Submit" button to perform a second algorithmic generation.
[0068] Taking drawing a face line drawing in the tools area as an example, initially, the drawing area on the page will display a complete face analysis infographic with transparency to guide the user in positioning the face area during the drawing process. The user can also click "Switch Reference Image," at which point the reference image will become a complete line drawing with transparency. (Reference) Figure 1 The painting area.
[0069] The system of this application can be applied to the teaching and entertainment of line drawing of figures, landscapes, and various objects. In order to facilitate the explanation of the modeling method used in this system, some examples in the following embodiments are described and explained using face sketch line drawing as an example. It should be understood that the protection scope of the embodiments of this application is not limited to this, but can be extended to the application of line drawing of figures, landscapes, and various objects for teaching and entertainment based on the principle of this system.
[0070] refer to Figure 2 In some embodiments, the method for constructing the automatic line completion model includes:
[0071] The image dataset was obtained and processed using the AdaIN and Simplify models to obtain the first line drawing dataset.
[0072] The first line drawing dataset is preprocessed to obtain a second line drawing dataset containing incomplete line drawings. The preprocessing method includes line erasure or line extraction.
[0073] Using the constructed predictive line drawing analysis model, the second line drawing dataset is taken as input to obtain the output analysis information;
[0074] The generator is trained using the second line drawing dataset and its corresponding parsing information, along with the first line drawing dataset, to obtain the automatic line completion model. (In this application, the face parsing information refers to the face mask.)
[0075] Specifically, line drawing dataset creation: To create a large dataset of face sketch line drawings, in some embodiments, based on the CelebaHQMask dataset (image dataset), it is first generated using the AdaIN model. The initial line drawing of size. Since the generated result is a grayscale image, this embodiment uses the Simplify model to simplify the grayscale image to obtain the final face sketch line drawing dataset (the first line drawing dataset), denoted as... This is denoted as CelebaHQMask-Sketch. The photo is the input, AdaIN is the line drawing of the grayscale image generated by the AdaIN model, and AdaIN+Simplify is a simplification of the AdaIN grayscale image using the Simplify model. It is also the final presentation of the first line drawing dataset created in this embodiment.
[0076] Data Preprocessing: To train the line completion algorithm model, data preprocessing of CelebaHQMask-Sketch is required. The first method is the line erasure method, which uses three methods: 1. Erasing connected lines, 2. Erasing at least one facial feature (left / right eyebrow, left / right eye, left / right ear, nose, and mouth), and 3. Erasing lines inside the hair. During the actual erasure process, 1-2 erasure methods are randomly selected. When the first method is selected, 40%-50% of connected lines are randomly erased. When the second method is selected, the 19-channel face analysis information included in the CelebaHQMask dataset is used to erase the lines based on the selected face analysis information. The number of facial features erased varies each time; it may only erase the left eye, or it may erase all facial features, such as... Figure 3 As shown in (g)(h)(i). To simulate the order of drawing processes as closely as possible, the second method of line extraction is introduced below. The order in which people draw figures varies from person to person, but generally, most people will first draw the outer contour of the hair, the outer contour of the face, or the facial features, such as... Figure 3 (a)(b)(c) are shown in the diagram. Then, draw the face shape within the outer contour of the hair, and draw the facial features within the face shape contour, as shown in the diagram. Figure 3 Then, from (d)(e), we obtain the hair outline, facial features, and face shape, such as Figure 3 (f) In this paper, the actual implementation method for extracting lines is described below using the extraction of facial contour lines as an example.
[0077] By analyzing facial information, we can first obtain the mask information for the skin channels. At this point, the pixel value at the skin location is 1, and the rest are 0. Then, for... Expansion and corrosion operations are performed to obtain and ,
[0078] Result = (y*m skin d +255*(1-m skin d ))*(1-m skin e )+255*m skin e
[0079] Therefore, the incomplete face sketch line drawing dataset (the second line drawing dataset) is used as the input to the model, denoted as . .
[0080] After obtaining the erased face sketch line drawing dataset (the second line drawing dataset), the face sketch prediction line drawing parsing model is obtained through pre-training. To predict facial analysis information The facial analysis information obtained here will be compared with the erased facial sketch line drawing. The data is stitched together as input to the automatic line completion model. While the CelebaHQMask dataset contains relevant face analysis information (19 channels), the training task of this invention merges the left and right eyebrows, left and right eyes, and left and right ears to obtain 16 channels of analysis information. Therefore, this 16-channel face analysis information is used as the face sketch line drawing. The corresponding real face analysis information is denoted as ,Right now Figure 2 The real facial analysis information in the image.
[0081] In some embodiments, the method for constructing the predictive line drawing analysis model includes:
[0082] Obtain the parsing information corresponding to the image dataset;
[0083] The PoolFormer model is trained using the parsing information corresponding to the image dataset and the second line drawing dataset to obtain the predicted line drawing parsing model to output complete parsing information.
[0084] Methods for constructing analytical models for predicting incomplete line drawings for loss calculation include:
[0085] The labels of the output parsing information corresponding to the erased parts in the line drawing data processed by the line erasure method in the second line drawing dataset are modified to clear the parsing information corresponding to the erased parts;
[0086] Only the parsing information corresponding to the portion of the line drawing data extracted from the second line drawing dataset after processing by the line extraction method is output, and finally the predicted incomplete line drawing parsing model is obtained.
[0087] Specifically, the construction steps of the line drawing face parsing and prediction model are as follows:
[0088] Training a model to predict complete facial details from incomplete line drawings: The user-drawn lines represent incomplete facial line drawings. If the model can predict complete facial details from the incomplete line drawing area, this facial details can serve as valuable auxiliary information for training the line completion model. Since there are currently no models for generating facial line drawing details, the PoolFormer model is used to address this issue. The training data consists of erased facial sketch line drawings. and as real facial analysis information The trained predictive line drawing analysis model for face recognition is denoted as... ,Right now Figure 2 Face analysis network 1.
[0089] Training a model to predict incomplete / complete face parsing information using incomplete / complete line drawings: In the optimization process of the automatic line completion algorithm, this invention uses semantic reconstruction loss, where face parsing information is recognized from the completed line drawings output by the network. The recognized face parsing information is then compared with the real face parsing information to apply loss constraints. Therefore, the face parsing recognition network used in this process needs to have the following capabilities: incomplete lines can predict incomplete face parsing information; complete line drawings can predict complete face parsing information.
[0090] The network used during training is the PoolFormer model. To enable the model to predict incomplete facial information from incomplete lines, this embodiment performs two processing steps on the first line drawing dataset: the first is based on the line erasing method mentioned in the previous embodiment, and the second is based on the line extraction method. Firstly, in the line erasing method, if the erased part is a facial feature, the label in the facial analysis corresponding to the facial feature is modified to the skin label, such as... Figure 3 As shown in the lower right corner of (g)(h)(i). When processing face parsing information using the line extraction method, if a specific part of the information is selected, the corresponding part of the face parsing information is extracted, such as... Figure 3 As shown in the lower right corner of (af). Finally, the trained analytical model for predicting incomplete line drawings for face recognition is denoted as... ,Right now Figure 2 The face parsing network 2 in the image is used for subsequent loss calculation.
[0091] The model structure of the automatic line completion model mainly consists of a generator G and a discriminator D. This model employs a supervised learning strategy for image generation. Given paired training sets... ,in The input is a line drawing of a face sketch with the lines erased (second line drawing dataset). The goal of the model is to train a generator G that can complete incomplete facial line drawings. Specifically, it aims to complete incomplete facial line drawings. Face analysis corresponding to incomplete face line drawings The images are input together into generator G to generate a complete line drawing of a human face. At the same time, the model needs to train a discriminator D to determine whether the input lines are complete facial line images.
[0092] Generator: The generator mainly consists of two parts: an encoder and a decoder, and follows the overall shape of "U-Net". Each encoding layer consists of a convolutional layer with a stride of 2 and a kernel size of 4, a ReLU layer, and an instance normalization layer. Except for the first layer, the network parameters of the remaining encoder layers are shared. Each decoding layer (except for the last two branches) consists of a transposed convolutional layer, a ReLU layer, and an improved adaptive normalization module. (Example...) Figure 2 As shown.
[0093] Discriminator: The discriminator's model structure mainly consists of 5 convolutional layers. Each layer (except the last one) contains a Leaky ReLU layer and an instance normalization layer. The last convolutional layer predicts a true or false label matrix.
[0094] In some embodiments, based on the parsing information corresponding to the first line drawing dataset and the image dataset, a loss function is used to calculate the loss between the output result and the true value of the automatic line completion model, so as to optimize the automatic line completion model. The loss function includes geometric consistency loss, semantic reconstruction loss, L1 loss and / or adversarial loss.
[0095] Specifically, in the task of completing missing face line drawings into complete face line drawings, the embodiments of this application use geometric consistency loss, semantic reconstruction loss, L1 loss, and adversarial loss, which will be described below.
[0096] Geometric consistency loss: Line drawing completion generated by constraints Compared to realistic line drawings To maintain structural texture consistency, the algorithm employs gradients of similar magnitude and orientation. Therefore, a gradient bias metric is designed as a geometric consistency loss to guide the generator's learning. Specifically, the Prewitt operator is first used to extract... and The horizontal and vertical gradients, which in The gradient vector at a position is defined as follows:
[0097]
[0098] in, and Indicates the image in The horizontal and vertical gradients of the location.
[0099] Then, the values at each position were calculated. and The gradient divergence between the two points is represented by cosine similarity in this algorithm, and its formula is as follows:
[0100]
[0101] in Indicated as in The magnitude and direction of the gradient divergence. A smaller gradient divergence indicates better geometric consistency. The final calculation method for geometric consistency loss is as follows:
[0102]
[0103] Semantic Reconstruction Loss: Compared to the original Pix2Pix model, a multi-task module is added to the generator. Specifically, two deconvolutional branches are used in the last layer of the generator's decoder: one for predicting the line drawing to complete the image. Another method for predicting semantic labels Finally, the real facial analysis information will be used. With predicted face analysis information The L2 distance between them is used as the semantic reconstruction loss.
[0104] In order to generate It can predict the correct facial recognition information, so it also calculates... Predicted facial analysis information With real facial analysis information The L2 distance between them.
[0105] The formula is as follows:
[0106]
[0107] L1 Loss: This model employs a supervised learning strategy to process the generated completed lines. With a realistic and complete line drawing The L1 distance is used as the pixel loss, and its formula is as follows:
[0108]
[0109] It also constrains the generation After deep model Predicted Depth Map and Reality The distance between them should be as small as possible.
[0110]
[0111] Adversarial loss: Based on the Pix2Pix discriminator, the input to the model's discriminator is fine-tuned. Specifically, it incorporates real face parsing information... The generated face parsing information is then concatenated with a real facial outline along the channel dimension and input into a discriminator, which is expected to classify it as real. The generated and completed facial lines are concatenated along the channel dimension and input into the discriminator, which is expected to classify it as false. Therefore, the adversarial loss formula is:
[0112]
[0113] Finally, the geometric consistency loss, semantic reconstruction loss, L1 loss, and adversarial loss are combined to train the model. The model optimization problem is as follows:
[0114]
[0115] in, As weight.
[0116] In some embodiments, the interactive drawing system further includes: an attribute area, which is used to provide a depth estimation map, analytical information and / or selectable attributes corresponding to the initial line drawing, and the attribute area completes the lines corresponding to the attributes based on the acquired selection instructions on the basis of the initial line drawing in the drawing area;
[0117] The method for generating the analytical information corresponding to the initial line drawing includes: using the predicted line drawing analytical model, taking the initial line drawing as input, and obtaining the output analytical information;
[0118] The method for generating the depth estimation map corresponding to the initial line drawing includes: using the constructed line drawing depth estimation model, taking the initial line drawing as input, and obtaining the output depth estimation map, wherein the line drawing depth estimation model is obtained by training the Pix2PixHD model.
[0119] Specifically, the attribute area displays the 3D model corresponding to the face line drawing. It also previews the planar face resolution information predicted in real-time by the algorithm. Users can modify this information to guide the system in generating a line completion result for the user-specified face resolution. Modification of the face resolution information is primarily achieved by changing the color; each facial feature corresponds to a different color. Users can circle the area, fill it, and modify the corresponding color. Since face drawing involves many complex attributes, the attribute area also provides selectable attributes such as gender, hairstyle (bangs, straight hair, curly hair), and accessories (hat, earrings, glasses). After the user selects the corresponding attribute, the system can generate a completed face line drawing with the relevant attributes in real-time based on the lines in the drawing area. This operation better guides and teaches users how to draw face line drawings under different given conditions.
[0120] In some embodiments, the method for constructing the line drawing depth estimation model includes:
[0121] The depth map corresponding to the image dataset is obtained using the LeRas-based BoostingMonocularDepth model;
[0122] The InceptionV3 model was used to extract image features from the first line drawing dataset;
[0123] The Pix2PixHD model is trained using the image features and the depth map corresponding to the image dataset to obtain the line drawing depth estimation model.
[0124] Specifically, the line drawing depth map generation model (i.e., the line drawing depth estimation model): Training the model to generate line drawing depth maps requires having the corresponding real depth maps for the line drawings. Unfortunately, the CelebaHQMask dataset does not contain corresponding depth maps for face photos. Therefore, the LeRas-based BoostingMonocularDepth model is used to generate corresponding depth maps from photos in the image dataset. These depth maps will then be used as the real depth maps for the line drawings, denoted as... In order to enable a face sketch to predict a face depth map, this application references the training method of the depth map prediction network in information-drawing. However, since the training data of its depth map prediction network is not for faces, this embodiment retrains the model to be suitable for the face task of this invention.
[0125] First, the InceptionV3 model is used to extract the line drawing of the human face sketch. The image features, where the input image size is . The obtained features are the output of the Mixed 6b node in InveptionV3, with a size of [missing information]. The feature is then input into the Pix2PixHD model to predict the depth map, and the loss function is calculated to optimize the model. During training, only the model weights of Pix2PixHD are updated; the model weights of InceptionV3 are not updated. Finally, the pre-trained line drawing depth estimation model is denoted as... ,Right now Figure 2 Depth map prediction network in [the context of the network].
[0126] In some embodiments, the method further includes: evaluating and providing the completeness of the initial line drawing based on the obtained line proportions of the whole and / or individual parts of the initial line drawing according to a preset standard, wherein the preset standard includes the average proportion of lines in the first line drawing dataset.
[0127] Specifically, to evaluate the completeness of users' line drawings, the average proportion of lines (black pixels) in the entire image is calculated using 30,000 line drawings from the CelebHQMask-Sketch dataset. This average proportion is used as the standard for evaluating the completeness of the user's line drawing. This process not only considers the overall line proportion but also calculates the average line proportion for each facial feature based on facial analysis information. Therefore, after each user submits their work, the completeness of each part of the line drawing is calculated to assist users in their secondary creation.
[0128] One embodiment of this application provides an interactive painting method based on deep learning. It should be understood that the method may also include steps not shown and / or the steps shown may be omitted, and the scope of this application is not limited in this respect.
[0129] The steps include: obtaining the initial line drawing drawn by the user, generating and displaying the corresponding completed line drawing and / or complete parsing information based on the initial line drawing;
[0130] The method for generating complete analytical information corresponding to the initial line drawing includes: using the constructed predictive line drawing analytical model, taking the initial line drawing as input, and obtaining the complete analytical information of the output, wherein the predictive line drawing analytical model is obtained by training the PoolFormer model;
[0131] The method for generating the completed line drawing includes: using a constructed automatic line completion model, taking the initial line drawing and the corresponding complete parsing information as input, to obtain the output completed line drawing. The automatic line completion model is obtained by training a generator and a discriminator. The generator includes an encoder and a decoder.
[0132] The method for constructing the automatic line completion model includes:
[0133] The image dataset was obtained and processed using the AdaIN and Simplify models to obtain the first line drawing dataset.
[0134] The first line drawing dataset is preprocessed to obtain a second line drawing dataset containing incomplete line drawings. The preprocessing method includes line erasure or line extraction.
[0135] Using the constructed predictive line drawing analysis model, the second line drawing dataset is taken as input to obtain the output analysis information;
[0136] The generator is trained using the second line drawing dataset and its corresponding parsing information, along with the first line drawing dataset, to obtain the automatic line completion model.
[0137] The method for constructing the predictive line drawing analysis model includes:
[0138] Obtain the parsing information corresponding to the image dataset;
[0139] The PoolFormer model is trained using the parsing information corresponding to the image dataset and the second line drawing dataset to obtain the predicted line drawing parsing model to output complete parsing information.
[0140] In some embodiments, a method for constructing a predictive analytical model for incomplete line drawings for loss calculation includes:
[0141] The labels of the output parsing information corresponding to the erased parts in the line drawing data processed by the line erasure method in the second line drawing dataset are modified to clear the parsing information corresponding to the erased parts;
[0142] Only the parsing information corresponding to the portion of the line drawing data extracted from the second line drawing dataset after processing by the line extraction method is output, and finally the predicted incomplete line drawing parsing model is obtained.
[0143] In some embodiments, it also includes:
[0144] Based on the parsing information corresponding to the first line drawing dataset and the image dataset respectively, the loss between the output result and the true value of the line auto-completion model is calculated using a loss function to optimize the line auto-completion model. The loss function includes geometric consistency loss, semantic reconstruction loss, L1 loss and / or adversarial loss.
[0145] In some embodiments, the method further includes: providing a depth estimation map, analytical information and / or selectable attributes corresponding to the initial line drawing, wherein the attribute area completes the lines corresponding to the attributes based on the acquired selection instructions on the basis of the initial line drawing in the drawing area;
[0146] The method for generating the analytical information corresponding to the initial line drawing includes: using the predicted line drawing analytical model, taking the initial line drawing as input, and obtaining the output analytical information;
[0147] The method for generating the depth estimation map corresponding to the initial line drawing includes: using the constructed line drawing depth estimation model, taking the initial line drawing as input, and obtaining the output depth estimation map, wherein the line drawing depth estimation model is obtained by training the Pix2PixHD model.
[0148] In some embodiments, the method for constructing the line drawing depth estimation model includes:
[0149] The depth map corresponding to the image dataset is obtained using the LeRas-based BoostingMonocularDepth model;
[0150] The InceptionV3 model was used to extract image features from the first line drawing dataset;
[0151] The Pix2PixHD model is trained using the image features and the depth map corresponding to the image dataset to obtain the line drawing depth estimation model.
[0152] In some embodiments, the method further includes: the drawing area overlaps with the display area, and the completed line drawing with transparency is displayed in the display area;
[0153] And / or, the drawing area overlaps with the display area, where full analytical information with transparency is displayed.
[0154] In some embodiments, the method further includes: evaluating and providing the completeness of the initial line drawing based on the obtained line proportions of the whole and / or individual parts of the initial line drawing according to a preset standard, wherein the preset standard includes the average proportion of lines in the first line drawing dataset.
[0155] For specific implementation methods, please refer to the aforementioned system implementation examples, which will not be repeated here.
[0156] Through the aforementioned system or method, in the field of interactive painting, a real-time deep learning-based line drawing completion algorithm guides users to correctly and completely draw a line drawing, achieving educational and tutoring functions while completing the lines of a hand-drawn sketch. Using a deep learning-based line drawing completion algorithm, aided by facial recognition and depth map analysis of the line drawing, the algorithm completes the user's incomplete line drawing in real time, thus assisting the user in drawing a complete and accurate line drawing.
[0157] This application may be a method, apparatus, system, and / or computer program product. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for performing various aspects of this application.
[0158] Computer-readable storage media can be tangible devices capable of holding and storing instructions for use by an instruction execution device. Computer-readable storage media can be, for example—but not limited to—electrical storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital multifunction disc (DVD), memory sticks, floppy disks, mechanical encoding devices, such as punch cards or recessed protrusions storing instructions thereon, and any suitable combination of the foregoing. The computer-readable storage media used herein are not to be construed as transient signals themselves, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or electrical signals transmitted through wires.
[0159] The computer-readable program instructions described herein can be downloaded from computer-readable storage media to various computing / processing devices, or downloaded via a network, such as the Internet, local area network, wide area network, and / or wireless network, to an external computer or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to the computer-readable storage media in the respective computing / processing device.
[0160] The computer program instructions used to perform the operations of this application may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, status setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages. The computer-readable program instructions may be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuits, such as programmable logic circuits, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), are personalized by utilizing the status information of the computer-readable program instructions. These electronic circuits can execute the computer-readable program instructions to implement various aspects of this application.
[0161] Various aspects of this application are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.
[0162] These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processing unit of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner. Thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.
[0163] Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions that execute on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.
[0164] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those marked in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0165] Note that, unless otherwise explicitly stated, all features disclosed in this specification (including any appended claims, abstract, and drawings) may be replaced by alternative features for achieving the same, equivalent, or similar purpose. Therefore, unless explicitly stated otherwise, each disclosed feature is merely one example of a set of equivalent or similar features. Where used, "further," "preferably," "even further," and "more preferably" are simple starting points for describing another embodiment based on the foregoing embodiments, the combination of which with the foregoing embodiments constitutes the complete configuration of another embodiment. Any combination of several "further," "preferably," "even further," or "more preferably" settings following the same embodiment constitutes yet another embodiment.
[0166] Although this application has been described in detail above with general descriptions and specific embodiments, some modifications or improvements can be made to it, which will be obvious to those skilled in the art. Therefore, all such modifications or improvements made without departing from the spirit of this application fall within the scope of protection claimed in this application.
Claims
1. An interactive painting system based on deep learning, characterized in that, It includes a painting area and an exhibition area, among which, The drawing area is used to acquire the initial line drawing drawn by the user, and the display area is used to display the completed line drawing and / or complete parsing information generated based on the initial line drawing; The method for generating complete analytical information corresponding to the initial line drawing includes: using the constructed predictive line drawing analytical model, taking the initial line drawing as input, and obtaining the complete analytical information of the output, wherein the predictive line drawing analytical model is obtained by training the PoolFormer model; The method for generating the completed line drawing includes: using a constructed automatic line completion model, taking the initial line drawing and the corresponding complete parsing information as input, to obtain the output completed line drawing. The automatic line completion model is obtained by training a generator and a discriminator. The generator includes an encoder and a decoder. The method for constructing the predictive line drawing analysis model includes: Obtain the parsing information corresponding to the image dataset; The PoolFormer model is trained using the parsing information corresponding to the image dataset and the second line drawing dataset to obtain the predicted line drawing parsing model to output complete parsing information. Methods for constructing analytical models for predicting incomplete line drawings for loss calculation include: The labels of the output parsing information corresponding to the erased parts in the line drawing data processed by the line erasure method in the second line drawing dataset are modified to clear the parsing information corresponding to the erased parts; Only the parsing information corresponding to the portion of the line drawing data extracted from the second line drawing dataset after processing by the line extraction method is output, and finally the predicted incomplete line drawing parsing model is obtained.
2. The deep learning-based interactive painting system according to claim 1, characterized in that, The method for constructing the automatic line completion model also includes: The image dataset was obtained and processed using the AdaIN and Simplify models to obtain the first line drawing dataset. The first line drawing dataset is preprocessed to obtain a second line drawing dataset containing incomplete line drawings. The preprocessing method includes line erasure or line extraction. Using the constructed predictive line drawing analysis model, the second line drawing dataset is taken as input to obtain the output analysis information; The generator is trained using the second line drawing dataset and its corresponding parsing information, along with the first line drawing dataset, to obtain the automatic line completion model.
3. The deep learning-based interactive painting system according to claim 1, characterized in that, Also includes: Based on the parsing information corresponding to the first line drawing dataset and the image dataset respectively, the loss between the output result and the true value of the line auto-completion model is calculated using a loss function to optimize the line auto-completion model. The loss function includes geometric consistency loss, semantic reconstruction loss, L1 loss and / or adversarial loss.
4. The deep learning-based interactive painting system according to claim 1, characterized in that, Also includes: The attribute area is used to provide a depth estimation map, analytical information and / or selectable attributes corresponding to the initial line drawing. Based on the acquired selection instructions, the attribute area completes the lines corresponding to the attributes on the basis of the initial line drawing in the drawing area. The method for generating the analytical information corresponding to the initial line drawing includes: using the predicted line drawing analytical model, taking the initial line drawing as input, and obtaining the output analytical information; The method for generating the depth estimation map corresponding to the initial line drawing includes: using the constructed line drawing depth estimation model, taking the initial line drawing as input, and obtaining the output depth estimation map, wherein the line drawing depth estimation model is obtained by training the Pix2PixHD model.
5. The deep learning-based interactive painting system according to claim 4, characterized in that, The method for constructing the line drawing depth estimation model includes: The depth map corresponding to the image dataset is obtained using the LeRas-based BoostingMonocularDepth model; The InceptionV3 model was used to extract image features from the first line drawing dataset; The Pix2PixHD model is trained using the image features and the depth map corresponding to the image dataset to obtain the line drawing depth estimation model.
6. The deep learning-based interactive painting system according to claim 1, characterized in that, Also includes: The drawing area overlaps with the display area, and the completed line drawing with transparency is displayed in the display area; And / or, the drawing area overlaps with the display area, where full analytical information with transparency is displayed.
7. The deep learning-based interactive painting system according to claim 1, characterized in that, Also includes: Based on the obtained line proportions of the initial line drawing as a whole and / or its various parts, the completeness of the initial line drawing is evaluated and provided according to a preset standard, which includes the average line proportion in the first line drawing dataset.
8. An interactive painting method based on deep learning, characterized in that, Including the following steps: Obtain the initial line drawing drawn by the user, and generate and display the corresponding completed line drawing and / or complete parsing information based on the initial line drawing; The method for generating complete analytical information corresponding to the initial line drawing includes: using the constructed predictive line drawing analytical model, taking the initial line drawing as input, and obtaining the complete analytical information of the output, wherein the predictive line drawing analytical model is obtained by training the PoolFormer model; The method for generating the completed line drawing includes: using a constructed automatic line completion model, taking the initial line drawing and the corresponding complete parsing information as input, to obtain the output completed line drawing. The automatic line completion model is obtained by training a generator and a discriminator. The generator includes an encoder and a decoder. The method for constructing the automatic line completion model includes: Obtain the parsing information corresponding to the image dataset; The PoolFormer model is trained using the parsing information corresponding to the image dataset and the second line drawing dataset to obtain the predicted line drawing parsing model to output complete parsing information. Methods for constructing analytical models for predicting incomplete line drawings for loss calculation include: The labels of the output parsing information corresponding to the erased parts in the line drawing data processed by the line erasure method in the second line drawing dataset are modified to clear the parsing information corresponding to the erased parts; Only the parsing information corresponding to the portion of the line drawing data extracted from the second line drawing dataset after processing by the line extraction method is output, and finally the predicted incomplete line drawing parsing model is obtained.
9. The deep learning-based interactive painting method according to claim 8, characterized in that, The method for constructing the automatic line completion model also includes; The image dataset was obtained and processed using the AdaIN and Simplify models to obtain the first line drawing dataset. The first line drawing dataset is preprocessed to obtain a second line drawing dataset containing incomplete line drawings. The preprocessing method includes line erasure or line extraction. Using the constructed predictive line drawing analysis model, the second line drawing dataset is taken as input to obtain the output analysis information; The generator is trained using the second line drawing dataset and its corresponding parsing information, along with the first line drawing dataset, to obtain the automatic line completion model.
10. A computer storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a machine, it implements the steps of the method as described in claims 8-9.